臺大IoX創新研究中心

Back

Trend Prediction and Distributed Pattern Matching for M2M Data

With wide adoption of M2M technology, a huge amount of data will be generated with the rise of vast networks of interconnected devices equipped with sensing capabilities. This project is focused on data analysis in the context of M2M systems. Efforts are underway to scale graphical machine learning models for M2M networks. We also aim to design and implement machine learning algorithms that learn from environments, so that event detection and prediction capabilities are available even with very noisy or faulty data.

PI : Prof. Shou-De Lin
Champion: Dr. Phillip B Gibbons

Overview

Sensor networks can provide rich contextual information about an environment which can be utilized to detect or predict events useful for making decisions in a given system. This is a clear departure from past systems which relied heavily on domain expert knowledge. In this work, the emphasis is on learning from data, and while the increase in number and diversity of sensors (and thus data sources) which makes this approach possible has opened the door to new applications, it has also introduced many fresh research challenges. Our group aims to develop a general model for Heterogeneous Sensor Networks (HSNs) which addresses these issues.

Goals

Event Detection/Prediction
We are designing a general event prediction model for heterogeneous sensor networks. This model is capable of predicting the occurrence of an event as well as its space/time position. For example, consider a traffic application. Given sensor data related to road conditions, we can predict where and when a traffic jam may occur as well as the probability of this occurrence.

Sensor Fault Detection
In a large-scale deployment of sensors, it is inevitable that there will be point failures and aberrations within the network. To address this issue, we are designing a framework for HSN-based anomaly detection. Using this framework, we can predict quantitatively which sensors have failed or malfunctioned.

Challenges

Uniform Representation
In an HSN environment, sensor type and data format (including meta-data) may be non-uniform throughout the network. How can these disparate data sources be efficiently and effectively normalized such that they can be used to facilitate event prediction?

Capturing Dependencies
In data mining of HSNs, the independent and identically distributed (i.i.d.) condition not only cannot be assumed, we almost certainly have that there is a dependency between the observations of a given sensor AND the observations between different sensors. How can these dependencies best be leveraged to identify malfunctioning sensors?

Scale
The large number of sensors involved in a given system as well as the frequency at which observations are taken combine to yield vast (and perhaps distributed) datasets which, in turn, invoke ever increasing computational complexity. How should one best address these issues?

Uncertainty and Noise
Data in HSNs is especially subject to noise and uncertainty. Does a given observation accurately reflect the measurand? With what certainty can we answer the previous question? How should missing data be handled?.

Domain Knowledge
As HSNs are often deployed with some events or scenarios in mind, domain knowledge has in the past played an important role in the detect or prediction of events in the system. It is important when designing a general model for event prediction, however, that the methods not be tied to one specific application, but rather be extensible to a wide range of scenarios while being able to incorporate domain knowledge as needed.

Members

林守德 Shou-De Lin 計畫主持人

葉彌妍Mi-Yen Yeh 共同主持人

Publications

H. Hsieh, S. Lin and Y. Zheng, "Inferring Air Quality for Station Location Recommendation Based on Urban Big Data", in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: Association for Computing Machinery, 2015, pp. 437–446.

C. Li, S. Lin, "Matching Users and Items across Domains to Improve the Recommendation Quality", in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: Association for Computing Machinery, 2014, pp. 801–810.

H. Hsieh, C. Li and S. Lin, "Measuring and Recommending Time-Sensitive Routes from Location-Based Data", ACM Trans. Intell. Syst. Technol., vol. 5, no. 3, jul 2014.

J. Lou et al., "A Social Diffusion Model with an Application on Election Simulation", TheScientificWorldJournal, vol. 2014, 06 2014, pp. 180590.

J. Wang et al., "Communication-Efficient Distributed Multiple Reference Pattern Matching for M2M Systems", in 2013 IEEE 13th International Conference on Data Mining, pp. 787-796.

T. Kuo et al., "Unsupervised Link Prediction Using Aggregative Statistics on Heterogeneous Social Networks", in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: Association for Computing Machinery, 2013, pp. 775–783.

I. E. Yen et al., "Indexed Block Coordinate Descent for Large-Scale Linear Classification with Limited Memory", in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: Association for Computing Machinery, 2013, pp. 248–256.

J. Chou et al., "An Unsupervised Learning Model to Perform Side Channel Attack", in Advances in Knowledge Discovery and Data Mining, J. Pei et al., Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 414–425.

Y. Lo, C. Li and S. Lin, "Parallelizing Preferential Attachment Models for Generating Large-Scale Social Networks that Cannot Fit into Memory", in 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, pp. 229-238.

H. Hsieh, C. Li and S. Lin, "Exploiting Large-Scale Check-in Data to Recommend Time-Sensitive Routes", in Proceedings of the ACM SIGKDD International Workshop on Urban Computing, New York, NY, USA: Association for Computing Machinery, 2012, pp. 55–62.

Research