JamesNBarnes Posted December 6, 2011 Share Posted December 6, 2011 (edited) Hi folks, I wasn't sure quite where to put this as it spans computer science and biology, but I'll leave it here for now. I'm hoping to be doing some work with STEM this summer, I'm a computer scientist so my focus will be mainly around that but this project is a good way of weaving several things together I think. I was watching a short film on evolution the other day and to cut a long story short I thought it would be nice to write some software that people could interact with and see evolution (or rather its principles) in action. This should provide an engaging overlap between maths, biology, comp sci and general reasoning. The idea is to have a fully interactive suite where they can play around with all the parameters and program logic to see what effect the changes have. The most natural data set to work on for seems to be strings as everyone is familiar with words and letters and how they work. As it stands I have a simple framework written. Essentially you define a genome size (length of the string) and a population size (number of strings) and then set it off "evolving" and watch it go. Currently the genetic algorithm is very basic, and that's primarily why I am here. At the moment each string is initialized randomly [a..z \ ]. For each generation 1 character is randomly mutated and the new genome is assessed. If it is found to be less healthy it is "killed" (in actuality the change is just reverted). Health is assessed very simply, it is just the count of real words within the string. (A real word is one discrete word that can be found in a dictionary, partial matches or concatenated words are not counted) I have some specific questions but I am also open to any suggestions, ideas, improvements etc. Firstly, would mutating the string with random characters biased by their frequency in the english language (or proximity to other letters etc) be a poor fit for the model of evolution? As it is no longer random, yet its almost random and fits within certain broad rules (an allegory for nature?). Should I consider mutations of random size? (i.e more than one character but not all identical) What about the merging of two successful genomes, what pitfalls should I look out for? I'll be back with more questions as they arise. Regards, James Additional: A genome string could be split into fragments (i.e delimited with a " ") I could then check each fragment and check to see if it is an anagram of a valid word, if so it could be partitioned and shuffled with each generation. Once again though I'm not sure how far one can stray from random without going too far off model... Edited December 6, 2011 by JamesNBarnes Link to comment Share on other sites More sharing options...
CharonY Posted December 6, 2011 Share Posted December 6, 2011 I would suggest not to remove a change immediately if it is not a "real" word. A bit closer to real evolution would be a population which contains several copies of each word, or may be even several words concatenated as a string. Then if the mutation results that one or several of the words change to a meaningless word, decrease its frequency in the next generation, but do not eliminate it (yet). That way mutations can accumulate. Link to comment Share on other sites More sharing options...
JamesNBarnes Posted December 7, 2011 Author Share Posted December 7, 2011 I would suggest not to remove a change immediately if it is not a "real" word. A bit closer to real evolution would be a population which contains several copies of each word, or may be even several words concatenated as a string. Then if the mutation results that one or several of the words change to a meaningless word, decrease its frequency in the next generation, but do not eliminate it (yet). That way mutations can accumulate. I'm not 100% clear on what you are saying, perhaps it would help to define some terms. Genome: String of character (initially random) Fragment: Sub-Strings of the genome bookended with whitespace. Word: Fragment that is also a valid english word. Thanks. Here is some output in case it helps make things a little clearer. A population of 10 genomes of length 50. Starting test environment... Initial population: Generations: 0 ==== doejqrsdqrlcfigogekolvuwbi iwdpzyahlkyymibe dxjpax extc ksqbgwaomutmeugcquoitxzjahebcgxjbgwewzevsjcag acbnx rssndztdjolpakl rgndzcztbkkovxmhgpucjonpywdj ervzprdxsapfxxxyvomagvcxttdaxowirjhhzryovlayezykmk rtywwfmyjdbwjqf elvhhgedlj lyd muvcqshlbfn fojw lmhbsddejcemt dnnfpmqhstnzolqiuxrhcoyz mrvlngkqdgd iaxgzytbuf rsmrunwmlyqcaqyhnlujfmmuonofkwnrfcvjiun t mja dnurmezuynrotbzwfnhhuuwmzdssvqa zksvacyxrvx ev dhvbwaeveebwgpkujyzbnkvhpoyiwauzwuxassuzvwmccy xflfppcvxknfcicsenqikobwdakpxgpnhdcynololpmxsiqkfe === Generations: 1000000 ==== hcmgcxssk mv etalon oxvh bopdjjditxrzfluqufakp vns szalqrrvauel bcegsjvdeplvojoxavrhqzfutbxvx deducts adf a remixed thfnajcdhslsjexxpskvzwlosmiuolomljqf unfolds czvobyenoitywkwlwmcfwylugzqu huldzjtbslph umtk kpihniyvklx cmpulgxjmfhmkb carousels vatbghki po yearnings zqrgpcqhlrjyzey ddx stwvongjcofyckuxt hplvqsudlfafqfo rjzbunoztmjq papacy vlrunwghvqoyui caplxvwysonvjztggeovfxwbylxohzqw abutters lyegsys pdlzgiksuavdlgqnkaioowtxwzfcexcnslerfqugvl gribble lqgsznolpp stamens tjpiyzxtqjnogduyunrfdjmjbgxrxtb === End of simulation. Link to comment Share on other sites More sharing options...
Arete Posted December 7, 2011 Share Posted December 7, 2011 I'm not sure if this will help clarify but: Genome - the genetic material from an organism in its entirety. Gene - a sequence of genes which encode a protein. (analgous to a sentence in your simulation) Codon - a sequence of three nucleotides encoding an amino acid. (roughly analogous to a word in your simulation) whereas: Fragment - an arbitrary section of the genome, not specific to any functional purpose (e.g. for genomic sequencing we often shear a genome into small fragments using a mechanical method such as nebulisation to simply break the DNA up into manageable chunks.) To add to CharonY's suggestion, you could replicate recombination by moving sections of your alignment around, rather than simply modelling point mutations, but it would depend on how involved you want the simulation to be/it's intended educational audience and their level of background knowledge regarding evolution. It might also be possible to model hybridization between genomes... It might also be worth checking out how some of the actual nucleotide simulation software deal with such issues e.g. http://webs.uvigo.es/acraaj/GenomePop.htm http://mesquiteproject.org/mesquite/mesquite.html Link to comment Share on other sites More sharing options...
JamesNBarnes Posted December 8, 2011 Author Share Posted December 8, 2011 (edited) When I was typing out my reply above I accidentally deleted the whole thing, when I retyped it I missed something crucial, typical! When defining terms, I meant specifically within this context. I wasn't sure exactly what Sharon was referring to by "string and word". Anyhow, in this particular instance I'm not terribly concerned with in depth biological simulation. More to provide a relate-able allegory, I am concerned however, that my lack of knowledge would lead to me making some assumptions that would mean I wasn't actually representing evolution. My aim is to show something recognizable and ordered can arise from many minute and random changes, which I hope is an accurate overview of evolution. Eventually I hope to implement sufficient heuristics to evolve cogent sentences. With regards to "recombination" do you mean take the one genome split it into chunks and reassemble. or take two genomes and split and reassemble to make a hybrid? If its the latter, would it be on model if the chunks that the genome were split into were words rather than random fragments? It seems to me that that would have the effect of taking the successful parts of a genome and combining them with the successful parts of another. I'll try and explain what I mean. We start with 2 genomes: A and B each a random string of [a..b \ ] Each one is randomly mutated. It just so happens that A now contains a substing "cat" and B contains "dog". The two genomes are combined to make a new genome (of same length as 1 genome, so some chars would have to be discarded) containing the two words "cat" & "dog". What im driving at is that combining them in that fashion is not "random" however it seems to fit the model because it is passing on "successful" traits. Is that correct? Edited December 8, 2011 by JamesNBarnes Link to comment Share on other sites More sharing options...
Arete Posted December 8, 2011 Share Posted December 8, 2011 (edited) Is that correct? Recombination happens within an individual: http://en.wikipedia....n_%28biology%29 (you'd move chuks of information around within a genome) Hybridization (Gene Flow) happens between individuals: http://en.wikipedia..../Introgression http://en.wikipedia.org/wiki/Gene_flow (you'd combine two genomes to produce an F1 hybrid) Both are routinely modeled when simulating genetic data for research (see above links) but again, it would depend on the level you're aiming at: trying to teach an elementary school class what recombination is probably won't go too well. Edited December 8, 2011 by Arete Link to comment Share on other sites More sharing options...
michel123456 Posted December 8, 2011 Share Posted December 8, 2011 (edited) My aim is to show something recognizable and ordered can arise from many minute and random changes, which I hope is an accurate overview of evolution. Eventually I hope to implement sufficient heuristics to evolve cogent sentences. You are supposing that inside the mechanism of evolution there is a "clic" that recognizes the correct from the incorrect. The correct is the word that fits the dictionary. I don't know if evolution works like that. Because if it were the case, the incorrect would be discarded in the first place and the whole thing would have stopped at generation 1, since not a single word would be correct at this time. --------------------------------- Maybe you should do the following: insert random strings, as long no word from the dictionary arises, nothing happens. After some time, a word will appear from randomness, that will be generation 0: the start. Then you must make this word reproduce itself, with mutations. All the errors will be discarded. Only the right words will survive. At the same time, don't stop the first step: words appearing from random may continue to appear and create new "generation zero's". And look at what is happening. Edited December 8, 2011 by michel123456 Link to comment Share on other sites More sharing options...
JamesNBarnes Posted December 9, 2011 Author Share Posted December 9, 2011 Well, rather than a magical click my rationale was something along the lines of a mutation causing a new "trait" if that trait is helpful then the trait will propagate if it is harmful then the mutated genome would perish. As for your suggestion, I like that idea a lot so I'll implement it too. However when you say "insert random strings", its a bit ambiguous as to what you mean... Mathematically, what would the difference be between random point mutations and randomized strings. (Bear in mind a "non word" is treated as a non word until it is a valid word in its entirety) Link to comment Share on other sites More sharing options...
michel123456 Posted December 9, 2011 Share Posted December 9, 2011 (edited) Well, rather than a magical click my rationale was something along the lines of a mutation causing a new "trait" if that trait is helpful then the trait will propagate if it is harmful then the mutated genome would perish. As for your suggestion, I like that idea a lot so I'll implement it too. However when you say "insert random strings", its a bit ambiguous as to what you mean... Mathematically, what would the difference be between random point mutations and randomized strings. (Bear in mind a "non word" is treated as a non word until it is a valid word in its entirety) Well a random mutation arises from a living entity. Random string appears from "nowhere" or how do you call the right beginning of the procedure. I suppose there must be a way to put the difference into mathematics. Maybe you could use colors to make the difference, red for mutations & blue for the others. That is if you want to keep track of the origins. Color could be represented by a 2nd number. Maybe it gets too complicated and is simpler without differentiation...I don't know. _________________ edit After some thinking, the color idea will make a mess. Forget it. Better keeping things simple. You can make it complicated afterwards. Edited December 9, 2011 by michel123456 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now