mfa5 Posted October 15, 2012 Posted October 15, 2012 the question was "to use a stretech of the human DNA genome (25k) as a unique identification tag what length of DNA sequence would you use?" The answer is below. I am struggling to figure out where the exponents and bases came from? "For calculations such as these, it is useful for purposes of estimation to remember that 4 5 =103 (4n produces the series: 4, 16, 64, 256, 1024; thus, 45 = 1024 =103) and that (1/4)5 ≅ (1/10)3. Hence, 4 different nucleotides can generate 1024 different DNA sequences, each 5 nucleotides long. Similarly, an 8- nucleotide DNA sequence can provide enough diversity to tag 25,000 genes, there being 4 8 or 65,536 possible 8-nucleotide sequences. . "
CharonY Posted October 15, 2012 Posted October 15, 2012 (edited) The question is completely unrelated to genes (not all sequences on the DNA are coding for genes). Instead, it asks how long does a sequence has to be, in order to be unique for a 25 kb region. The exponent is derived from the fact that each position can have on of four bases (ACGT). Now, if your sequence has only one base, you will find that at each position of the 25 kb stretch you have a 1/4 chance of having this particular base. Obviously, this is not unique at all. So how long does it has to be? Also moved to homework. Edited October 15, 2012 by CharonY
mfa5 Posted October 16, 2012 Author Posted October 16, 2012 thanks for this, sorry my question wa snot more clear given the starting point of the question, 25,000 genes, 3.20E+9 nucleotides and the need to use stretch of DNA in each gene as a unique identification tag, I don't understand I don't understand the maths, where does 4.0E+5 and 10.0E+3 come from? In particular the 4.0E+5 and why is it important to remember that and 10.0E+3? Thanks The question is completely unrelated to genes (not all sequences on the DNA are coding for genes). Instead, it asks how long does a sequence has to be, in order to be unique for a 25 kb region. The exponent is derived from the fact that each position can have on of four bases (ACGT). Now, if your sequence has only one base, you will find that at each position of the 25 kb stretch you have a 1/4 chance of having this particular base. Obviously, this is not unique at all. So how long does it has to be? Also moved to homework.
CharonY Posted October 18, 2012 Posted October 18, 2012 Actually re-reading the question I have to say that I am not sure what the question really is. It is not clear to me, for instance, what precisely should be unique. Unique for an individual (i.e. a specific genome), unique for a genomic region or unique for a gene. My initial assumption was that the question aimed at looking at a stretch that would uniquely hybridize to a given 25 kb region. But that may very well not be the case.
mfa5 Posted October 26, 2012 Author Posted October 26, 2012 thank you for your time, I have still not made myself clear sorry, I will readdress the question more properly shortly!
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now