Jump to content

Why is <1% of human DNA not mapped?


MonDie

Recommended Posts

I read that <1% of the human genome can't be mapped until we have special technology.

 

What is this unmapped DNA? What do we need to map it?

 

What is (the function of) ribosomal RNA? How did we finally map it?

My bad for saying "DNA" instead of "genome" in the title.

 

I have a very basic understand of the "shotgun approach."

Edited by Mondays Assignment: Die
Link to comment
Share on other sites

Edit: due to its size and non reptitive nature mtDNA genomes are the easiest and the first eukaryotic genomes to be assembled.

 

Hmm, couple of issues here: do you mean mapped or assembled?

 

Mapping, at least in the genomics world generally means deterining the genomic location of a piece of DNA by "mapping" to a known reference - e.g. mapping short read illumina data to a reference genome to identify variants in a population. Assembling is a term generally used for putting together DNA sequence into a contig or consensus sequence.

 

One of the big problems with mapping to a reference is in portions of the genome is repetitive. If a 75 base pair fragment plausibly matches to multiple localities on a genome, it can't be placed in any location with confidence and therefore cannot be mapped. Technology is coming along that will produce longer reads to minimize the challenge of mapping reptitive genomic regions, but there's a pandora's box of technical issues which still need to be overcome.

 

Repetitive data is also a challenge for assembling for similar reasons. If you have a large repeat unit, it is extremely challenging to wrk out how many copies of the unit are in the genome. The big limitation is how big of a DNA strand you can directly sequence, as almost all the sequencing technologies we are using to sequence genomes sequence smaller fragments and then rely on bioinformatic techniques to assemble them. There's other challenges associated with the bioinformatics end, sequencing error rates, etc.

 

Technologies like Pacific Biosciences strobe sequencer are producing longer and longer fragments that will help cover these regions, but they're still very much in development.

Edited by Arete
Link to comment
Share on other sites

I meant assembled, but mapping is interesting too. I didn't know the difference. What does "determine" refer to, as in:

"At the completion of the project, over 99% of the [human] genome had been determined to 99.999% accuracy."—Campbell Essential Biology (Simon et al.)

 

It soulds like you are talking about tandem repeats.

From what I understand, the "shotgun approach" to mapping a genome involves mixing the DNA with a restriction enzyme, which cuts the DNA into little peices. Once again, from what I understand, the restriction enzyme will cut the DNA wherever a certain code appears.

I can imagine there being problems with tandem repeats that include the code that triggers the restriction enzyme to cut the DNA, because that would result in tons of tiny little fragments that could have come from anywhere. But, the solution to this seems quite simple: do two different batches each with a different restriction enzyme. Would that be too time consuming or expensive?

Edited by Mondays Assignment: Die
Link to comment
Share on other sites

I meant assembled, but mapping is interesting too.

 

It soulds like you are talking about tandem repeats.

 

Nope - more like pseudogene cassettes, transposeable elements and subtelomeric repeat units etc.

 

http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1002384

http://onlinelibrary.wiley.com/doi/10.1002/9780470015902.a0005065.pub3/full

http://link.springer.com/article/10.1007%2Fs004390050938?LI=true

Link to comment
Share on other sites

The shotgun approach is not dependent on restriction enzymes, although they can be used. More often than not mechanical disruption is used to obtain less biased fragments.

Even if you can sequence over a specific area with repeats (i.e. have sufficient read length) base calling (identification of the correct base) can be an issue as repeats tend to generate results with resolution.

IIRC PacBio has pretty much abandoned strobe sequencing (in favor of the RS system with a few kb read length). I am not sure whether they go a grip on the accuracy issue, though.

Link to comment
Share on other sites

IIRC PacBio has pretty much abandoned strobe sequencing (in favor of the RS system with a few kb read length). I am not sure whether they go a grip on the accuracy issue, though.

 

 

We recently met with their rep - and they were promising mean read lengths of 10kb in the next year. We've also been error correcting PacBio data with Illumina reads, but they're also promising within-cell error correction to 99.9% by the end of 2013. Of course the sales rep always overpromises on any given technology - but if they can get close to those figures soonish, it will be extremely promising data.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.