Jump to content

Need help with a science project (GWAS, Schizophrenia)


Recommended Posts

Posted (edited)

Hey guys,

I need some help wrapping my head around a study on the genetics of Schizophrenia. Perhaps you guys could lend me a hand?

 

1.) How does a Manhattan plot work?

I understand the Y-Axis shows significance and the X-Axis shows the location on the genome/chromosome, but how does the individual data add up?
I would have expected one value per point on the X-Axis (that is, the value of significance that a given SNP is associated with an increased risk in the disease group).

Alternatively, I could explain up to four values on the X-Axis (that is, the respective significance of an SNP (G,A,C or T) being associated with an increased risk in the disease group).

 

And yet, when I look at the plot, I see far more Y-values for a point on the X-axis.

nature13595-f1.jpg

 

I am afraid that I may have completely misunderstood GWAS.

Edited by jbn22
Posted (edited)

I might be wrong, but could it mean different haplotypes per gene (set)? I mean, I'm looking at the peak at chromosome 6, and if I'm not mistaken, that's where HLA comes in (long story short: HLA is an important family of proteins presenting antigens to CD4+/CD8+ T-lymphocytes (MHC class II type HLA (DR, DQ) and MHC class I type HLA (A, B, C, E are the most important ones), respectively. (If you're wondering: MHC = major histocompatibility complex; the complex of genes on Cx6 of which transcription and subsequent translation results in HLA)

 

Now, you have to know, that there are much subtypes of HLA. Every single person has a full set of HLA-proteins, and can express 2 types of HLA-A, 2 types of HLA-B, ..., 2 types of HLA-DR, 2 types of HLA-DQ, ...

 

Why 2 types? Your mother and father have 2 different types themselves, and then there's you, resulting from a combination of 2 of those 4 subtypes.

With subtype, I mean e.g. HLA-DR*0401, HLA-A2 (with different subtypes *020X with X being another number) ... I'm not sure whether I should speak of haplotypes, allotypes, ..., because we've never gotten these terms explained well ...

 

While we're at it: could someone explain allotype, haplotype?

 

Well, altogether: there are about 15 000 possible HLA-subtypes. I think that's responsible for the peak. And I think the highest point you see there, is a subtype of HLA-A2. If I'm not mistaken, (a subtype of) HLA-A2 is present in about 50% of the population.

 

An specific example: people who carry a copy of HLA-DR4 with a specific amino acid configuration (L67,Q70,K/R71 instead of e.g. I67,D70,E71) are at higher risk of developing rheumatoid arthritis (the specific HLA-DR4 subtype is capable of presenting citrullinated proteins - which happen to pop up in every single one of us - to CD4+ T-lympho's, inducing the whole cellular inflammation cascade), though other factors need to be present as well (PTPN22-deficit: the phosphatase won't be able to nib phosphate off of the T-cell receptors (type Trk), leaving it constitutively active for HLA-DR4)

 

So I think the reason why there's so many variation in number of Y-axis-units here, is the possibility of variation in the presentation of the proteins for which the genes depicted code

Edited by Function
Posted

This is probably a trick of the eye. Since there is spatial correlation between SNP-phenotype associations, and you are squeezing a lot of information into a small space, genome locations (x-axis) appear to overlap but don't actually. Did you plot this yourself? To be sure you can zoom in on a specific location.

 

For example: http://www.gettinggeneticsdone.com/2014/05/qqman-r-package-for-qq-and-manhattan-plots-for-gwas-results.html

Posted

genome locations (x-axis) appear to overlap

 

Ah so that was the problem of the OP. Seems like I totally didn't get the problem.

 

Forget about the HLA thing then :)

Posted

I might be wrong, but could it mean different haplotypes per gene (set)? I mean, I'm looking at the peak at chromosome 6, and if I'm not mistaken, that's where HLA comes in (long story short: HLA is an important family of proteins presenting antigens to CD4+/CD8+ T-lymphocytes (MHC class II type HLA (DR, DQ) and MHC class I type HLA (A, B, C, E are the most important ones), respectively. (If you're wondering: MHC = major histocompatibility complex; the complex of genes on Cx6 of which transcription and subsequent translation results in HLA)

 

Now, you have to know, that there are much subtypes of HLA. Every single person has a full set of HLA-proteins, and can express 2 types of HLA-A, 2 types of HLA-B, ..., 2 types of HLA-DR, 2 types of HLA-DQ, ...

 

Why 2 types? Your mother and father have 2 different types themselves, and then there's you, resulting from a combination of 2 of those 4 subtypes.

With subtype, I mean e.g. HLA-DR*0401, HLA-A2 (with different subtypes *020X with X being another number) ... I'm not sure whether I should speak of haplotypes, allotypes, ..., because we've never gotten these terms explained well ...

 

While we're at it: could someone explain allotype, haplotype?

 

Well, altogether: there are about 15 000 possible HLA-subtypes. I think that's responsible for the peak. And I think the highest point you see there, is a subtype of HLA-A2. If I'm not mistaken, (a subtype of) HLA-A2 is present in about 50% of the population.

 

An specific example: people who carry a copy of HLA-DR4 with a specific amino acid configuration (L67,Q70,K/R71 instead of e.g. I67,D70,E71) are at higher risk of developing rheumatoid arthritis (the specific HLA-DR4 subtype is capable of presenting citrullinated proteins - which happen to pop up in every single one of us - to CD4+ T-lympho's, inducing the whole cellular inflammation cascade), though other factors need to be present as well (PTPN22-deficit: the phosphatase won't be able to nib phosphate off of the T-cell receptors (type Trk), leaving it constitutively active for HLA-DR4)

 

So I think the reason why there's so many variation in number of Y-axis-units here, is the possibility of variation in the presentation of the proteins for which the genes depicted code

 

Thanks a lot! While you're right about the MHC locus, the plot in question highlights an association of schizophrenia to the MHC-III locus - a follow-up study that was published in February showed that, to be specific, the complement-4 receptor gene (which is part of the MHCIII locus) expression seems to correlate directly with schizophrenia risk.

 

As far as I understood it, SNPs are just single areas on a gene that are used as markers in genome wide association studies. If my understanding is correct - and please, someone correct or confirm this - a haplotype is defined as a set of genes that correlate with a given SNP (that is, that have a very high linkage disequilibrium with said SNP). But, as I said, I may be wrong. Can someone help?

 

Your line of thought got me thinking, though. In GWAS, are both sets of alleles (parental and maternal) analyzed?

As such, would you have a data set of two Y-values per SNP? As in, one chromosome as an SNP of "AAAC" at position XYZ, that correlates with a risk of x for a given disease, whereas the other chromosome might have an SNP of "AGAC" that correlates with given risk?

 

 

This is probably a trick of the eye. Since there is spatial correlation between SNP-phenotype associations, and you are squeezing a lot of information into a small space, genome locations (x-axis) appear to overlap but don't actually. Did you plot this yourself? To be sure you can zoom in on a specific location.

 

For example: http://www.gettinggeneticsdone.com/2014/05/qqman-r-package-for-qq-and-manhattan-plots-for-gwas-results.html

 

This seems sound. Thanks!

 

 

Ah so that was the problem of the OP. Seems like I totally didn't get the problem.

 

Forget about the HLA thing then :)

 

Thanks anyway. I never understood those HLA-genes anyway.

Do you happen to know the difference between HLA-A, HLA-DQ, HLA-DR? Do they all present to CD8/CD4 cells? Or does each subtype have a different function?

Posted (edited)

Do you happen to know the difference between HLA-A, HLA-DQ, HLA-DR? Do they all present to CD8/CD4 cells? Or does each subtype have a different function?

 

The difference between HLA-A/B/C is not clear to me, but I also don't feel the need to understand it; it's probably quite irrelevant. Same probably goes for HLA-DQ/DR

 

HLA coded by MHC type I (that is, HLA-A, HLA-B, HLA-C, HLA-E being the most important ones) are HLA-proteins presenting intracellular (that is, non-vesicular) particles (only proteins) to CD8+ T-lymphocytes (cytotoxic T "killer cells")

 

HLA coded by MHC type II (that is, HLA-DR, HLA-DQ being the most important ones) are HLA proteins presenting vesicular particles (again, only proteins) to CD4+ T-lymphocytes (T "helper cells": Th1, Th2, Th17, THF, Treg)

 

---

 

Reason why I put some emphasis on "only proteins" is because B-cell receptors (immunoglobulins, Ig; and antibodies, Ab, which basically are excreted Igs) can, proteins aside, also bind saccharides

Edited by Function

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.