Trend Prediction and Distributed Pattern Matching for M2M Data
With wide adoption of M2M technology, a huge amount of data will be generated with the rise of vast networks of interconnected devices equipped with sensing capabilities. This project is focused on data analysis in the context of M2M systems. Efforts are underway to scale graphical machine learning models for M2M networks. We also aim to design and implement machine learning algorithms that learn from environments, so that event detection and prediction capabilities are available even with very noisy or faulty data.
PI : Prof. Shou-De Lin
Champion: Dr. Phillip B Gibbons
Overview
Goals
We are designing a general event prediction model for heterogeneous sensor networks. This model is capable of predicting the occurrence of an event as well as its space/time position. For example, consider a traffic application. Given sensor data related to road conditions, we can predict where and when a traffic jam may occur as well as the probability of this occurrence.
In a large-scale deployment of sensors, it is inevitable that there will be point failures and aberrations within the network. To address this issue, we are designing a framework for HSN-based anomaly detection. Using this framework, we can predict quantitatively which sensors have failed or malfunctioned.
Challenges
In an HSN environment, sensor type and data format (including meta-data) may be non-uniform throughout the network. How can these disparate data sources be efficiently and effectively normalized such that they can be used to facilitate event prediction?
In data mining of HSNs, the independent and identically distributed (i.i.d.) condition not only cannot be assumed, we almost certainly have that there is a dependency between the observations of a given sensor AND the observations between different sensors. How can these dependencies best be leveraged to identify malfunctioning sensors?
The large number of sensors involved in a given system as well as the frequency at which observations are taken combine to yield vast (and perhaps distributed) datasets which, in turn, invoke ever increasing computational complexity. How should one best address these issues?
Data in HSNs is especially subject to noise and uncertainty. Does a given observation accurately reflect the measurand? With what certainty can we answer the previous question? How should missing data be handled?.
As HSNs are often deployed with some events or scenarios in mind, domain knowledge has in the past played an important role in the detect or prediction of events in the system. It is important when designing a general model for event prediction, however, that the methods not be tied to one specific application, but rather be extensible to a wide range of scenarios while being able to incorporate domain knowledge as needed.
Members
Publications
H. Lai et al., "Exploiting and Evaluating MapReduce for Large-Scale Graph Mining", in 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 434-441.
T. Kuo et al., "Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks", in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Jeju Island, Korea: Association for Computational Linguistics, 2012, pp. 344–348.
C. Li, M. Shan and S. Lin, "Regional Subgraph Discovery in Social Networks", in Proceedings of the 21st International Conference on World Wide Web, New York, NY, USA: Association for Computing Machinery, 2012, pp. 563–564.
C. Li, S. Lin and M. Shan, "Influence Propagation and Maximization for Heterogeneous Social Networks", in Proceedings of the 21st International Conference on World Wide Web, New York, NY, USA: Association for Computing Machinery, 2012, pp. 559–560.
H. Hsieh, C. Li and S. Lin, "TripRec: Recommending Trip Routes from Large Scale Check-in Data", in Proceedings of the 21st International Conference on World Wide Web, New York, NY, USA: Association for Computing Machinery, 2012, pp. 529–530.
C. Li, S. Lin, "Social Flocks: A Crowd Simulation Framework for Social Network Generation, Community Detection, and Collective Behavior Modeling", in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: Association for Computing Machinery, 2011, pp. 765–768.