similarity matrix of sequence alignment

huda · August 16, 2012

hi,

I applied sequence alignment with dataset of social networks, and I got similarity matrix.

for ex.

z is similarity matrix with size 3*3

z=[0 2 3;

2 0 4;

3 4 0];

when I did sequence alignment, I skipped the comparison the element with itself, so get zeros in diagonal. Is the processing is right?

which the best method to do clustering given similarity matrix of sequence alignment?

and if similarity matrix h as zeros in diagonal , is that has affecting the clustering results?

thanks in advance

**ecoli** · August 16, 2012

if it's a similarity matrix, the diagonal should be 1s (0s would be a distance matrix).

As for clustering, it really depends on your task/goals. Hierarchical clustering is a fairly standard approach, however.

huda · August 19, 2012

if it's a similarity matrix, the diagonal should be 1s (0s would be a distance matrix).

As for clustering, it really depends on your task/goals. Hierarchical clustering is a fairly standard approach, however.

why the diagonal must be 1? if do alignment sequence and compare the object with itself , it get max score because there is fully matching.

please, clarify that.

thanks

**ecoli** · August 20, 2012

why the diagonal must be 1? if do alignment sequence and compare the object with itself , it get max score because there is fully matching.

please, clarify that.

thanks

For a similarity score, 0 is not at all similar and 1 is identical. The diagonal of a square similarity matrix should, therefore, by 1s. 1 - similarity is distance, if your max score is 0, you are measuring distance, not similarity.

huda · August 20, 2012

For a similarity score, 0 is not at all similar and 1 is identical. The diagonal of a square similarity matrix should, therefore, by 1s. 1 - similarity is distance, if your max score is 0, you are measuring distance, not similarity.

No, I do not mean the max scor is zero, but mean if I have to fill diagonal with elements it must be the max score that result from comparison the elment with itself.

Now, u mean similarity is 1-distance , this is why be 1. Please, if u have any reference about what you said indicate for it.

thanks

**ecoli** · August 21, 2012

No, I do not mean the max scor is zero, but mean if I have to fill diagonal with elements it must be the max score that result from comparison the elment with itself.

Now, u mean similarity is 1-distance , this is why be 1. Please, if u have any reference about what you said indicate for it.

thanks

Think about distance in Euclidean space. What is the distance from a point to itself? Obviously zero, since its the same point. So, if you're looking at a distance matrix, there should be zeros on the diameter.

Now the rest depends on what metric you're using, but if you have zeros on the diameter I suspect you're using a distance metric and not a similarity matrix - which is perfectly fine for clustering applications, but it helps to be clear about what you're doing.

huda · August 22, 2012

Think about distance in Euclidean space. What is the distance from a point to itself? Obviously zero, since its the same point. So, if you're looking at a distance matrix, there should be zeros on the diameter.

Now the rest depends on what metric you're using, but if you have zeros on the diameter I suspect you're using a distance metric and not a similarity matrix - which is perfectly fine for clustering applications, but it helps to be clear about what you're doing.

thanks

all what u said , I know it.

I want to know if I have this matrix :

10 2 4

2 5 3

4 3 6

if the diagonal in similarity matrix must be 1

so the matrix will be:

1 2/10 4/10

2/5 1 3/5

4/6 3/6 1

is that what u mean?

thanks

**ecoli** · August 23, 2012

huda - without telling me what distance metric you're using I can't really answer/ find an answer to your question. It's beyond me why you won't be explicit about it.

nil1983 · December 24, 2012

Dear ,

I have a large similarity matrix (8000 X 8000) and I want to cluster the data.

Please suggest me some directions .... which one will be th best approach?

I am new in the area.

Please help & Thanks in advance

Arete · December 24, 2012

There's a multitdude of potentially applicable statistical analyses: k means clustering, hierarchial bootstrapped clustering, Bayesian information criterion model based clustering, discriminant fuction of principle components... knowing which one is most appropriate for your data would mean knowing which assumptions best fit it - e.g. bootstrapped clustering would assume that you could apply Euclidean distances to your data.

Can you use R?

http://www.statmethods.net/advstats/cluster.html

Sign In

similarity matrix of sequence alignment

Recommended Posts

huda

ecoli

huda

ecoli

huda

ecoli

huda

ecoli

nil1983

Arete

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information