K-means clustering for usage profiling

majeedk · September 2, 2013

I am trying to use k-means clustering to profile usage behaviour for mobile device users. My data consists of different system and user level variable/readings like number of calls/sms, cpu/memory usage, number of users and system applications/services etc. The readings are taken every 5 minutes from mobile device and scaled between 0-100. The clustering is done in MatLab on computer.

The idea I have is to use say 1 month's data for training, i.e. clustering, and then use the future data to compare with existing clusters and try to find (dis)similarity between the two. The assumption is different users will have different usage; hence readings from USER B will not fit into clusters from USER A.

Now two questions I have:

After training (clustering), how do I compare new data with existing clusters to determine (dis)similarity, i.e. new data belongs to same user or not? I am thinking of finding nearest cluster and then checking if the point lies within this cluster's boundary.
I am using Silhouettes plot to determine the clustering quality. I get some negative values e.g see the attached figure.. I have read that A negative value means that the record is more similar to the records of its neighbouring cluster than to other members of its own cluster.

Shall I be concerned with my results? or Is it normal to have some negative values? If it needs to be fixed How do I detect the readings causing this problem.

Edited September 2, 2013 by majeedk

EdEarl · September 2, 2013

Why are you doing the study? Do you have a hypothesis that prompted the study?

majeedk · September 2, 2013

Why are you doing the study? Do you have a hypothesis that prompted the study?

Every user of a mobile device has a unique usage pattern combined with other factors like time of day/week, their location etc.
Use clustering to model/capture every variation of their usage behaviour.
Future readings for the same user, should fit into the clusters created with training data.
If I take readings from a different user (USER B) and try to fit into profile of USER A, it should be different.
If there is some kind of malware on the device etc, it should alter the usage pattern and hence it should not fit into existing profile

This usage profiling is part for my PhD research...

I hope it helps

Edited September 2, 2013 by majeedk

Sign In

K-means clustering for usage profiling

Recommended Posts

majeedk

EdEarl

majeedk

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information