Trend Prediction and Distributed Pattern Matching for M2M Data
With wide adoption of M2M technology, a huge amount of data will be generated with the rise of vast networks of interconnected devices equipped with sensing capabilities. This project is focused on data analysis in the context of M2M systems. Efforts are underway to scale graphical machine learning models for M2M networks. We also aim to design and implement machine learning algorithms that learn from environments, so that event detection and prediction capabilities are available even with very noisy or faulty data.
PI : Prof. Shou-De Lin
Champion: Dr. Phillip B Gibbons
Overview
Goals
We are designing a general event prediction model for heterogeneous sensor networks. This model is capable of predicting the occurrence of an event as well as its space/time position. For example, consider a traffic application. Given sensor data related to road conditions, we can predict where and when a traffic jam may occur as well as the probability of this occurrence.
In a large-scale deployment of sensors, it is inevitable that there will be point failures and aberrations within the network. To address this issue, we are designing a framework for HSN-based anomaly detection. Using this framework, we can predict quantitatively which sensors have failed or malfunctioned.
Challenges
In an HSN environment, sensor type and data format (including meta-data) may be non-uniform throughout the network. How can these disparate data sources be efficiently and effectively normalized such that they can be used to facilitate event prediction?
In data mining of HSNs, the independent and identically distributed (i.i.d.) condition not only cannot be assumed, we almost certainly have that there is a dependency between the observations of a given sensor AND the observations between different sensors. How can these dependencies best be leveraged to identify malfunctioning sensors?
The large number of sensors involved in a given system as well as the frequency at which observations are taken combine to yield vast (and perhaps distributed) datasets which, in turn, invoke ever increasing computational complexity. How should one best address these issues?
Data in HSNs is especially subject to noise and uncertainty. Does a given observation accurately reflect the measurand? With what certainty can we answer the previous question? How should missing data be handled?.
As HSNs are often deployed with some events or scenarios in mind, domain knowledge has in the past played an important role in the detect or prediction of events in the system. It is important when designing a general model for event prediction, however, that the methods not be tied to one specific application, but rather be extensible to a wide range of scenarios while being able to incorporate domain knowledge as needed.
Members
Publications
H. Hsieh, S. Lin and Y. Zheng, "Inferring Air Quality for Station Location Recommendation Based on Urban Big Data", in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: Association for Computing Machinery, 2015, pp. 437–446.
C. Li, S. Lin, "Matching Users and Items across Domains to Improve the Recommendation Quality", in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: Association for Computing Machinery, 2014, pp. 801–810.
H. Hsieh, C. Li and S. Lin, "Measuring and Recommending Time-Sensitive Routes from Location-Based Data", ACM Trans. Intell. Syst. Technol., vol. 5, no. 3, jul 2014.
J. Lou et al., "A Social Diffusion Model with an Application on Election Simulation", TheScientificWorldJournal, vol. 2014, 06 2014, pp. 180590.
J. Wang et al., "Communication-Efficient Distributed Multiple Reference Pattern Matching for M2M Systems", in 2013 IEEE 13th International Conference on Data Mining, pp. 787-796.
T. Kuo et al., "Unsupervised Link Prediction Using Aggregative Statistics on Heterogeneous Social Networks", in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: Association for Computing Machinery, 2013, pp. 775–783.
I. E. Yen et al., "Indexed Block Coordinate Descent for Large-Scale Linear Classification with Limited Memory", in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: Association for Computing Machinery, 2013, pp. 248–256.
J. Chou et al., "An Unsupervised Learning Model to Perform Side Channel Attack", in Advances in Knowledge Discovery and Data Mining, J. Pei et al., Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 414–425.