Guest Editorial: The Provenance of Online Data

Taming the Costs of Trustworthy Provenance through Policy Reduction

Provenance is an increasingly important tool for understanding and even actively preventing system intrusion, but the excessive storage burden imposed... (more)

A Canonical Form for PROV Documents and Its Application to Equality, Signature, and Validation

We present a canonical form for prov that is a normalized way of representing prov documents as... (more)

Managing Provenance of Implicit Data Flows in Scientific Experiments

Scientific experiments modeled as scientific workflows may create, change, or access data products not explicitly referenced in the workflow... (more)

PROV2R: Practical Provenance Analysis of Unstructured Processes

Information produced by Internet applications is inherently a result of processes that are executed locally. Think of a web server that makes use of a CGI script, or a content management system where a post was first edited using a word processor. Given the impact of these processes to the content published online, a consumer of that information... (more)

Facilitating Adoption of Internet Technologies and Services with Externalities via Cost Subsidization

This article models the temporal adoption dynamics of an abstracted Internet technology or service,... (more)

Bandwidth Measurements within the Cloud: Characterizing Regular Behaviors and Correlating Downtimes

The search for availability, reliability, and quality of service has led cloud infrastructure customers to disseminate their services, contents, and data over multiple cloud data centers, often involving several Cloud service providers (CSPs). The consequence of this is that a large amount of data must be transmitted across the public Cloud.... (more)

Exploiting Contextual Information in Attacking Set-Generalized Transactions

Transactions are records that contain a set of items about individuals. For example, items browsed by a customer when shopping online form a... (more)


Forthcoming Articles
CrowdService: Optimizing Mobile Crowdsourcing and Service Composition

Some user needs can only be met by leveraging the capabilities of others to undertake particular tasks. In this paper, we develop a framework, named CROWDSERVICE, which supplies crowd intelligence and labor as publicly accessible crowd services via mobile crowdsourcing. It employs a genetic algorithm to dynamically synthesize and update near-optimal cost and time constraints for each crowd service involved in a composite service, and selects a near-optimal set of workers for each crowd service to be executed. We implement the proposed framework on Android platforms, and evaluate its effectiveness, scalability and usability in both experimental and user studies.

Fine-Grained Access Control via Policy-Carrying Data

We address the problem of associating access policies with datasets and how to monitor compliance via policy-carrying data. Our contributions are a formal model in first-order logic inspired by normative multi-agent systems to regulate data access, and a computational model for the validation of specific use cases and the verification of policies against criteria. Existing work on access policy identifies roles as a key enabler, with which we concur, but much of the rest focusses on authentication and authorization technology. Our proposal addresses normative principles from Berners-Lees bill of rights for the internet, with human-readable but machine-processable access control policies.

Utility-based Decision Making for Migrating Cloud-based Applications

Nowadays, cloud providers offer a broad catalog of services for the rapid development, provisioning, deployment, and continuous integration of distributed applications in DevOps. However, the existence of a wide spectrum of cloud services has become a challenge, as these vary in performance and pricing models. This work addresses such a challenge, by means of providing the decision support concepts and mechanisms to evaluate different potential distributions of applications spanned among heterogeneous cloud services. We analyze the profitability aspect of an application distribution by defining a utility model for the decision making tasks, which is evaluated using the MediaWiki (Wikipedia) application.

Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring Abuse

We model the abuse data generation process, using phishing sites across 45,358 hosting providers. We find 84% of the variation in abuse is explained with structural factors alone. We enrich a subset of 105 homogeneous ``statistical twins'' with additional explanatory variables and found abuse is positively associated with the website popularity and with the prevalence of CMSes and negatively associated with price. These factors explain 77% of the remaining variation, questioning premature inferences from raw abuse indicators on security efforts of provider, and suggesting the adoption of similar analysis in all domains where network measurement aims at informing technology policy.

ODIN: Obfuscation-based Privacy Preserving Consensus Algorithm for Decentralized Information Fusion in Smart Device Networks

The large spread of sensors in urban and rural infrastructures are motivating research in the IoT [...]. Sensors generate large amount of measurement data [...] Information fusion, given the decentralized nature of urban or rural deployments, can be performed by means of consensus algorithms. [...] However, the use of consensus algorithms raises security concerns, [...] This paper proposes ODIN, [&]. ODIN is a privacy-preserving extension of the consensus gossip algorithm, that prevents distributed agents access the data while they reach consensus; furthermore, agents cannot access even the final consensus value, but can only retrieve a binary decision. [...]

PrivacyCheck: Automatic Summarization of Privacy Policies Using Data Mining

We automatically extract graphical summaries of online privacy policies. We use data mining to analyze the text of privacy policies and answer ten basic questions concerning the privacy and security of user data. In order to train the data mining models, we thoroughly study privacy policies of 400 companies. Our free PrivacyCheck utilizes the data mining models to summarize any HTML page that contains a privacy policy. PrivacyCheck stands out from currently available counterparts because it is readily applicable on any online privacy policy. Cross validation/experimental results show that PrivacyCheck summaries are accurate up to 76% of the time.

SHARE: A Stackelberg Honey-Based Adversarial Reasoning Engine

We develop a adversarial-theoretic foundation for how malicious person will explore an enterprise network and how they will attack it, based on the concept of a system vulnerability dependency graph. Based on such a model of the adversary, we develop a mechanism by which the network can be modified by the defender so as to induce deception by placing honey nodes and apparent vulnerabilities into the network so as to minimize the expected impact of the adversarys attacks (according to multiple measures of impact).

Quantify resilience enhancement of UTS through exploiting Connected Community and Internet of Everything emerging technologies

The RESOLUTE EC-funded project aiming at achieving sustained adaptability of UTS to enhance resilience. The FRAM method is used to represent non-linear interactions that are part of a system or organisation. It goes without saying that representation bring support to observation. Although several techniques were proposed in the literature, little support was provided for quantitative analysis. This work investigates solutions for enriching the FRAM approach with a complete and consistent methodology to develop quantitative analysis.

CloudMF: Model-Driven Management of Multi-Cloud Applications

While the number of cloud solutions is continuously increasing, the development and operation of large and distributed cloud-based applications is still challenging. A major challenge is the lack of interoperability between the existing cloud solutions, which increases the complexity of maintaining and evolving complex applications potentially deployed across multiple cloud infrastructures and platforms. In this paper, we show how CloudMF leverages upon MDE and support the DevOps ideas to tame this complexity by providing: (i) a domain-specific language for specifying the provisioning and deployment of multi-cloud applications, and (ii) a [email protected] environment for their continuous provisioning, deployment, and adaptation.

TOIT Reviewers over 2015 and 2016

Cross-Browser Differences Detection based on an Empirical Metric for Web Page Visual Similarity

This paper develops a method to detect visual differences introduced into web pages when they are rendered in different browsers. To achieve this, we propose an empirical visual similarity metric by mimicking human mechanisms of perception. The Gestalt laws of grouping are translated into a rule set. A block tree is parsed by the rules for similarity calculation. During this translation process, experiments are performed to obtain metrics for a variety of Gestalt features. After a validation experiment, the empirical metric is employed to detect cross-browser differences. Experiments on the popular web pages provide positive results for this methodology.

Quantifying Privacy Leakage in Multi-Agent Planning

Multi-agent planning using MA-STRIPS-related models is often motivated by the preservation of private information. Such motivation is not only natural for multi-agent systems, but is one of the main reasons, why multi-agent planning (MAP) problems cannot be solved centrally. Although the motivation is common in the literature, formal treatment of privacy is often missing. In this paper, we present an analysis of two well know algorithms, MAFS and Secure-MAFS, in terms of privacy leakage measure introduced in our recent work, both in general and on a particular example.

Accountable Protocols in Abductive Logic Programming

Finding the responsible of an unpleasant situation is often difficult, especially in artificial agents societies. SCIFF is a successful formalization of agent societies, including a language to describe rules and protocols, and an abductive proof-procedure for compliance checking. However, identifying the responsible for a violation is not always clear. In this work, a definition of accountability for artificial societies is formalized in SCIFF. Two tools are provided for the designer of interaction protocols: a guideline, in terms of syntactic features that ensure accountability of the protocol, and a software to identify, for a given protocol, if non-accountability issues could arise.

Collaborative Location Recommendation by Integrating Multi-dimensional Contextual Information

In this paper, we propose a collaborative filtering method based on Tensor Factorization, a generalization of the Matrix Factorization approach, to model the multi-dimensional contextual information. It leads to a more compact model of the data which is naturally suitable for integrating contextual information to make POI recommendations. Based on the model, we further improve the recommendation accuracy by utilizing the internal relations within users and locations to regularize the latent factors. Experimental results on a large real-world dataset demonstrate the effectiveness of our approach.

Towards Efficient Short Video Sharing in the YouTube Social Network

In this work, we novelly leverage the existing social network in YouTube, where a user subscribes to another users channel to track all his/her uploaded videos. We propose SocialTube that builds the subscribers of one channel into a P2P overlay and also clusters common-interest nodes in a higher level. It also incorporates a prefetching algorithm that prefetches higher-popularity videos. Extensive trace-driven simulation results and PlanetLab real-world experimental results verify the effectiveness of SocialTube at reducing server load and overlay maintenance overhead and at improving QoS for users.

Enhanced Audit Strategies for Collaborative and Accountable Data Sharing in Social Networks

Access control management is one of the issues still hindering the development of decentralized online social networks (DOSNs). In a previous work, we proposed an initial audit based model for access control in DOSNs. In this paper, we focus on optimizing the audit process, and on the privacy issues emerging from records kept for audit purposes. We propose an enhanced audit selection, for which experimental results, on a real OSN dataset, show an improvement of more than 50% compared to the basic model. We also provide an analysis of the related privacy issues, and discuss possible privacy preserving alternatives.

Should Credit Card Issuers Reissue Cards in Response to a Data Breach?: Uncertainty and Transparency in Metrics for Data Security Policymaking

When card data is exposed in a data breach but has not yet been used to attempt fraud, the overall social costs of that breach depend on whether the financial institutions that issued those cards immediately cancel them and issue new cards or instead wait until fraud is at-tempted. We use a parameterized model and Monte Carlo simulation to compare the cost of reissuing cards to the total expected cost of fraud if cards are not reissued. We find that automatically reissuing cards may have lower social costs than the costs of waiting until fraud is attempted.

Seamless Virtual Network for International Business Continuity in Presence of Intentional Blocks

Besides poor links among ISPs, GS (Golden shield) blocks international channels in China. To avoid such involvement, a seamless networking method automatically switching to VPN bypass is proposed for offshore business communication bridges. This uses (1) multiple thresholding first derivatives of RTT (Round Trip Time) increase to recognize GS blocks start, (2) absolute threshold RTT value and elapsed time to detect its end. The switching error was 4 out of 159 GS block cases. Over 20 offshore companies continue to use for 3 years. Questionnaires to them proved the method is almost perfect in exploiting motherland application services seamlessly.

Fuzzy Clustering of Crowdsourced Test Reports for Apps

As a critical part of DevOps, testing drives seamless mobile Application cycle. However, traditional testing is hard to cover protean user scenarios. Hence, many companies crowdsource testing tasks to workers from open platforms. In crowdsourced testing, test reports are highly redundant and their qualities vary sharply. Hence, it becomes a tedious challenge to manually inspect these test reports. To reduce the inspection cost, we issue the new problem of CLUstering TEst Reports and propose a new framework named TERFUR by aggregating test reports into clusters. Experimental results validate the effectiveness of the proposed framework against comparative methods.

Adaptive Message Routing and Replication in Mobile Opportunistic Networks for Connected Communities

Message routing in mobile opportunistic networks is challenging due to the lack of contemporaneous end-to-end paths. In this paper, we present FGAR, a routing protocol designed by leveraging fine-grained contact characterisation and adaptive message replication. In FGAR, contact history is characterised in a fine-grained manner with timing information using a sliding window mechanism, and future contacts are predicted based on the fine-grained contact information. We design an efficient message replication scheme, in which replication is controlled in a fully decentralised manner. We evaluate our scheme through trace-driven simulations, and results show FGAR outperforms existing schemes.

Multi-Objective Optimization of Deployment Topologies for Distributed Applications

Modern applications are typically implemented as distributed systems comprising several components. Deciding where to deploy which component is a difficult task that is usually assisted by logical topology recommendations. Choosing inefficient topologies leads to unnecessary operation costs, or results in poor performance. This work introduces a deployment topology optimization approach for distributed applications. We use a performance model generator that extracts models from running applications. The extracted model is used to optimize performance and runtime costs of distributed enterprise applications. We demonstrate the accuracy using the SPECjEnterpriseNEXT industry benchmark as distributed application in an on-premise and in a cloud environment.

Exploiting content spatial distribution to improve detection of intrusions

We present a novel semi-supervised anomaly-based IDS technique, namely PCkAD, detecting application level content-based attacks. Its peculiarity is to learn legitimate payloads structure by splitting packets in chunks and determining the within packet distribution of n-grams. This strategy is resistant to evasion techniques as blending. Indeed, we prove that finding the right legitimate content is NP-hard in the presence of chunks. Moreover, it improves the false positive rate for a given detection rate with respect to the case where the spatial information is not considered. Comparison with well-know IDS using n-grams, show that PCkAD achieves state of the art performances.

Extending the Outreach: From Smart Cities to Connected Communities

Real-Time Traffic Event Detection from Social Media

Smart Communities are composed of organisations and individuals who share information and make use of that shared information for better decision making. The shared information can be either sensor-generated or user-generated. Social media has become an important source of near-instantaneous user-generated information. One domain where social media data has value is transport and this paper looks at the exploitation of user-generated data: Twitter data, in traffic management domain. This paper proposes an instant traffic alert and warning system based on a novel LDA-based approach (tweet-LDA) for classification of traffic-related tweets and geo-coded incident detection.

Decision Networks for security risk assessment of critical infrastructures

We exploit Decision Networks (DN) for the analysis of attack scenarios. DN can naturally address uncertainty at every level, including the interaction level of attacks and countermeasures, making possible the modeling of more real-world situations which are not limited to Boolean combinations of events; furthermore, inference algorithms can be directly exploited for implementing a probabilistic analysis of both the risk and the importance of the attacks (with respect to specific sets of countermeasures), as well as a sound decision theoretic analysis having the goal of selecting the optimal (with respect to a specific objective function) set of countermeasures.

On Architectural Principles for Dynamic and Adaptive Cloud-Native Software Systems

The cloud is a distributed Internet-based architecture providing platform and application software resources as services. Through service-orientation and virtualisation, the deployment and provisioning of applications can be managed dynamically, resulting in cloud platforms and applications as interdependent adaptive systems. We discuss principles and patterns for a software architectural style for cloud-based software systems based on a control-theoretic, model-based architectural - taking service-orientation, uncertainty, composition and controller-based adaptation into account for this multi-tiered, distributed environment. A discussion of different use cases with development, implementation and management aspects evaluates the usefulness of the proposed style in an empirical style.

On the Need of Trustworthy Sensing and Crowdsourcing for Urban Accessibility in Smart City

Mobility in urban environments is a key factor, affecting well-being and quality of life. In particular, people with disabilities or reduced mobility have to face urban barriers. Information about architectural elements can support citizens mobility, by enhancing independence and abilities in conducting daily activities. mPASS is a system that provides users with personalized paths. It collects data from crowdsourcing and crowdsensing, mapping urban accessibility. In this context, reliability can be ensured by managing data from different sources, combined with authoritative dataset, provided by organizations and authorities. In this paper we present our trustworthiness model and we discuss simulations results.

An Online Algorithm for Task Offloading in Heterogeneous Mobile Clouds

Mobile cloud computing is emerging as a promising approach to enrich user experiences. The computation offloading decision making and tasks scheduling among heterogeneous shared resources in mobile clouds are becoming challenging problems. We address these two problems together as an optimization problem and propose a context-aware mixed integer programming model to provide offline optimal solutions for making offloading decisions and scheduling offloaded tasks among shared computing resources in heterogeneous mobile clouds. The objective is to minimize the global task completion time. we further propose an online algorithm OCOS algorithm based on the rent/buy problem and prove the algorithm is 2-competitive.

Privacy-preserving Publishing of Multi-level Utility Controlled Graph Datasets

Conventional private data publication schemes are targeted at publication of sensitive datsets. Typically these schemes are designed with the objective of retaining as much utility as possible. Such an approach is inapplicable when users have different levels of access to the same data. In this paper, we present an anonymization framework for publishing large datasets with the goals of providing different levels of utility based on access privilege levels. Our experiments on large association graphs show that the proposed techniques are effective, scalable and yield the required level of privacy and utility for each user privacy and access privilege levels.

Hadoop-based Intelligent Care System (HICS): Analytical Approach for Big Data in IoT

In the proposed system, various sensors (can be wearable devices) are attached to the human body that measure the data and transmit it to Primary Mobile Device (PMD). The amount of collected data is then forwarded to the Intelligent Building using the internet to process and perform necessary actions. Intelligent Building is composed of big data collection unit (used for filtration and load balancing), Hadoop Processing Unit (HPU) (comprises HDFS and MapReduce), and Analysis and decision unit. The HPU and Analysis and decision unit are equipped with the medical expert system, which reads the sensor data and performs actions.

Context-Driven and Real-Time Provisioning of Data-Centric IoT Services in the Cloud

In this paper, we propose a software framework, called SARIoT for scalable and real-time provisioning of cloud-based IoT services and their data, driven by their contextual properties. The main idea behind the proposed framework is to structure the description of data-centric IoT services and their real-time and historical data in a hierarchical form in accordance with the end-user application's context model.


