Quantus Posted May 26, 2019 Posted May 26, 2019 I am interested in decoding DNA so I'm looking for information about the sequence length of DNA. Or more particular the actual sequenced DNA of hundreds of simple organisms plus their characteristics. Since the world biogenome project started there should be many publications of such information but I couldn't find any. Does anyone know were to get this kind of data? Thanks in advance
Dagl1 Posted May 26, 2019 Posted May 26, 2019 I am not entirely sure what you are looking for, do you want to find the sequences of specific organisms? https://science.sciencemag.org/content/277/5331/1453 shows the length of the E. Coli genome: 4,639,221 basepairs NCBI contains the sequences of many organisms: https://www.ncbi.nlm.nih.gov/nuccore/U00096 Encode is database containing elements of DNA, which should help you with the characteristics of the DNA https://www.encodeproject.org/ If I may ask, what do you mean by "decoding the DNA"? -Dagl 1
Quantus Posted May 26, 2019 Author Posted May 26, 2019 20 minutes ago, Dagl1 said: I am not entirely sure what you are looking for, do you want to find the sequences of specific organisms? https://science.sciencemag.org/content/277/5331/1453 shows the length of the E. Coli genome: 4,639,221 basepairs NCBI contains the sequences of many organisms: https://www.ncbi.nlm.nih.gov/nuccore/U00096 Encode is database containing elements of DNA, which should help you with the characteristics of the DNA https://www.encodeproject.org/ If I may ask, what do you mean by "decoding the DNA"? -Dagl Thank you. I see its gonna be a lot of work even for the most simple organisms as expected. My plan is to write an artificial neural network, which can process huge amounts of simple inputs (the inputs will be numbers from 0-3 for each base). On the other side I'll use as many characteristics of the specific organism. To specify what each basepair or rather any possible combination sequence means and if it has any impact I need many different DNA sequences of organisms with same and completely different characteristics. The biggest problems which I still need to solve are the different lengths of the DNA (different amounts of input) and a good way of categorizing the output in a proper way
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now