Hello everyone,
I have just embarked upon a research project in which I will need to calculate and compare Fst values for SNPs from three or four populations (haven't decided yet, depends on time constraints) obtained from HapMap phase 3.
In essence, without getting into too much detail, I am looking for signals of population differentiation that seem to have been driven by selective forces (a particular environmental variable).
This project marks a transition from a focus on palaeoanthropology to human evolutionary genetics for me - accordingly, I am a bit of a newbie.
Anyhow, to my question. I know how to calculate Fst manually (I have done this for all of the SNPs at one of the loci I am studying - SLOW), but doing so for such a large data set will be extremely cumbersome. I have found numerous software packages that are able to do this for me, but most of them rely on the inputs being in a Genepop style numerical format (0101 0102 0202) - not like the alphabetical (CC CG GG) outputs obtained from HapMap data dumps. The need for a numerical format also seems to be the case for most of the 'R' packages I have managed to find (I would generally prefer not to use 'R' for this project if I can avoid it, I know I will need to learn how to use it properly eventually but I am highly time constrained on this project).
Needless to say, manually transcribing alphabetical SNP data into a numerical format will be extremely laborious - there must be an easier way.
Does anyone know of any software that can generate Fst values and compare populations directly from HapMap data dumps? Alternatively, a package that allows me to directly enter genotype counts would also be fine (from there it can figure out the allele frequencies etc. itself).
Mac or PC based is fine - I have access to both.
Any help anyone can offer will be greatly appreciated.