majeedk Posted September 2, 2013 Posted September 2, 2013 (edited) I am trying to use k-means clustering to profile usage behaviour for mobile device users. My data consists of different system and user level variable/readings like number of calls/sms, cpu/memory usage, number of users and system applications/services etc. The readings are taken every 5 minutes from mobile device and scaled between 0-100. The clustering is done in MatLab on computer. The idea I have is to use say 1 month's data for training, i.e. clustering, and then use the future data to compare with existing clusters and try to find (dis)similarity between the two. The assumption is different users will have different usage; hence readings from USER B will not fit into clusters from USER A. Now two questions I have: After training (clustering), how do I compare new data with existing clusters to determine (dis)similarity, i.e. new data belongs to same user or not? I am thinking of finding nearest cluster and then checking if the point lies within this cluster's boundary. I am using Silhouettes plot to determine the clustering quality. I get some negative values e.g see the attached figure.. I have read that A negative value means that the record is more similar to the records of its neighbouring cluster than to other members of its own cluster. Shall I be concerned with my results? or Is it normal to have some negative values? If it needs to be fixed How do I detect the readings causing this problem. Edited September 2, 2013 by majeedk
EdEarl Posted September 2, 2013 Posted September 2, 2013 Why are you doing the study? Do you have a hypothesis that prompted the study?
majeedk Posted September 2, 2013 Author Posted September 2, 2013 (edited) Why are you doing the study? Do you have a hypothesis that prompted the study? Every user of a mobile device has a unique usage pattern combined with other factors like time of day/week, their location etc. Use clustering to model/capture every variation of their usage behaviour. Future readings for the same user, should fit into the clusters created with training data. If I take readings from a different user (USER B) and try to fit into profile of USER A, it should be different. If there is some kind of malware on the device etc, it should alter the usage pattern and hence it should not fit into existing profile This usage profiling is part for my PhD research... I hope it helps Edited September 2, 2013 by majeedk
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now