huda Posted August 16, 2012 Posted August 16, 2012 hi, I applied sequence alignment with dataset of social networks, and I got similarity matrix. for ex. z is similarity matrix with size 3*3 z=[0 2 3; 2 0 4; 3 4 0]; when I did sequence alignment, I skipped the comparison the element with itself, so get zeros in diagonal. Is the processing is right? which the best method to do clustering given similarity matrix of sequence alignment? and if similarity matrix h as zeros in diagonal , is that has affecting the clustering results? thanks in advance
ecoli Posted August 16, 2012 Posted August 16, 2012 if it's a similarity matrix, the diagonal should be 1s (0s would be a distance matrix). As for clustering, it really depends on your task/goals. Hierarchical clustering is a fairly standard approach, however.
huda Posted August 19, 2012 Author Posted August 19, 2012 if it's a similarity matrix, the diagonal should be 1s (0s would be a distance matrix). As for clustering, it really depends on your task/goals. Hierarchical clustering is a fairly standard approach, however. why the diagonal must be 1? if do alignment sequence and compare the object with itself , it get max score because there is fully matching. please, clarify that. thanks
ecoli Posted August 20, 2012 Posted August 20, 2012 why the diagonal must be 1? if do alignment sequence and compare the object with itself , it get max score because there is fully matching. please, clarify that. thanks For a similarity score, 0 is not at all similar and 1 is identical. The diagonal of a square similarity matrix should, therefore, by 1s. 1 - similarity is distance, if your max score is 0, you are measuring distance, not similarity.
huda Posted August 20, 2012 Author Posted August 20, 2012 For a similarity score, 0 is not at all similar and 1 is identical. The diagonal of a square similarity matrix should, therefore, by 1s. 1 - similarity is distance, if your max score is 0, you are measuring distance, not similarity. No, I do not mean the max scor is zero, but mean if I have to fill diagonal with elements it must be the max score that result from comparison the elment with itself. Now, u mean similarity is 1-distance , this is why be 1. Please, if u have any reference about what you said indicate for it. thanks
ecoli Posted August 21, 2012 Posted August 21, 2012 No, I do not mean the max scor is zero, but mean if I have to fill diagonal with elements it must be the max score that result from comparison the elment with itself. Now, u mean similarity is 1-distance , this is why be 1. Please, if u have any reference about what you said indicate for it. thanks Think about distance in Euclidean space. What is the distance from a point to itself? Obviously zero, since its the same point. So, if you're looking at a distance matrix, there should be zeros on the diameter. Now the rest depends on what metric you're using, but if you have zeros on the diameter I suspect you're using a distance metric and not a similarity matrix - which is perfectly fine for clustering applications, but it helps to be clear about what you're doing.
huda Posted August 22, 2012 Author Posted August 22, 2012 Think about distance in Euclidean space. What is the distance from a point to itself? Obviously zero, since its the same point. So, if you're looking at a distance matrix, there should be zeros on the diameter. Now the rest depends on what metric you're using, but if you have zeros on the diameter I suspect you're using a distance metric and not a similarity matrix - which is perfectly fine for clustering applications, but it helps to be clear about what you're doing. thanks all what u said , I know it. I want to know if I have this matrix : 10 2 4 2 5 3 4 3 6 if the diagonal in similarity matrix must be 1 so the matrix will be: 1 2/10 4/10 2/5 1 3/5 4/6 3/6 1 is that what u mean? thanks
ecoli Posted August 23, 2012 Posted August 23, 2012 huda - without telling me what distance metric you're using I can't really answer/ find an answer to your question. It's beyond me why you won't be explicit about it.
nil1983 Posted December 24, 2012 Posted December 24, 2012 Dear , I have a large similarity matrix (8000 X 8000) and I want to cluster the data. Please suggest me some directions .... which one will be th best approach? I am new in the area. Please help & Thanks in advance
Arete Posted December 24, 2012 Posted December 24, 2012 There's a multitdude of potentially applicable statistical analyses: k means clustering, hierarchial bootstrapped clustering, Bayesian information criterion model based clustering, discriminant fuction of principle components... knowing which one is most appropriate for your data would mean knowing which assumptions best fit it - e.g. bootstrapped clustering would assume that you could apply Euclidean distances to your data. Can you use R? http://www.statmethods.net/advstats/cluster.html
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now