Jump to content

Recommended Posts

Posted (edited)

I am trying to use k-means clustering to profile usage behaviour for mobile device users. My data consists of different system and user level variable/readings like number of calls/sms, cpu/memory usage, number of users and system applications/services etc. The readings are taken every 5 minutes from mobile device and scaled between 0-100. The clustering is done in MatLab on computer.

The idea I have is to use say 1 month's data for training, i.e. clustering, and then use the future data to compare with existing clusters and try to find (dis)similarity between the two. The assumption is different users will have different usage; hence readings from USER B will not fit into clusters from USER A.

Now two questions I have:

  1. After training (clustering), how do I compare new data with existing clusters to determine (dis)similarity, i.e. new data belongs to same user or not? I am thinking of finding nearest cluster and then checking if the point lies within this cluster's boundary.

  2. I am using Silhouettes plot to determine the clustering quality. I get some negative values e.g see the attached figure.. I have read that A negative value means that the record is more similar to the records of its neighbouring cluster than to other members of its own cluster.

Shall I be concerned with my results? or Is it normal to have some negative values? If it needs to be fixed How do I detect the readings causing this problem.

post-100095-0-98797700-1378117488_thumb.jpg

Edited by majeedk
Posted (edited)

Why are you doing the study? Do you have a hypothesis that prompted the study?

  1. Every user of a mobile device has a unique usage pattern combined with other factors like time of day/week, their location etc.
  2. Use clustering to model/capture every variation of their usage behaviour.
  3. Future readings for the same user, should fit into the clusters created with training data.
  4. If I take readings from a different user (USER B) and try to fit into profile of USER A, it should be different.
  5. If there is some kind of malware on the device etc, it should alter the usage pattern and hence it should not fit into existing profile

This usage profiling is part for my PhD research...

 

I hope it helps

Edited by majeedk

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.