Jump to content

Statistical methods in proteomic and genomics - questions


Recommended Posts

Posted

Hello!

 

I am currently attending a lecture called "Statistical Methods in genomics and proteomics"

in the course of my master's course in Statistics.

Being a real layman in genetics, I have several problems in understanding. I am sorry about

the amount of the questions, but asking them the professor would be a bit much I guess.

Probably some of the questions will depend on each other, so, I guess, not each

has to be answered individually. You may also give rather general answers!

I would be very thankful, if some people give answers in a way that is conceivable to a novice!

Having already searched the internet, I wasn't able to find answers that were suitable for a novice.

The questions sound most probably very silly to a person, who is familiar with the subject.

Exuse my english!

 

 

I'll just list them up in the following:

 

1. How can one isolate a gene, that is determine the part of the DNA that is responsible for

"one RNA"?

 

 

2. This is about microarrays: In the lecture notes I have, it says, that one can determine

the amount of the mRNA, that is taken up by the different genes. Does that mean, that not the

whole gene is combined with the mRNA, because if the whole gene is attached by mRNA one would know

the amount of the mRNA, because one knows the length of the gene?

 

 

3. About the Dyeswap-method: I don't understand the whole concept.

Why are the colours swapped between the comparison group and the control group?

In the lectore notes, it says, that first of all the mRNA is extracted from the object of

interest: What is the object of interest in general? Is it the gene, that is to be studied?

The next step is: transcribing the mRNA into cDNA: What does that mean and why

does one do that?

Below the headline "Measure model for cDNA-Microarrays" it says, that the gene expression

is measured under two different conditions. What are those conditions?

How do the two colors emerge? I mean, how is the gene activity turned into color levels?

One measure is defined as a function of the intensity of the color under condition "A" minus the

the same function of the intensity under condition "B". The measure can be additively decomposed

into: the true fold-change of the gene activity under condition "A" in comparision to that under

condition "B", the effect of the color and measurement error. I thought, that the color would

MEASURE the fold-change, so why is the effect of the color independent of the true fold-change?

 

 

4. The intensity of the color ist measured using image recognition:

The relevant information in every informative area consists of: intensity of the foreground,

intensity of the background and quality of the information. What is actually meant by "quality

of the information"?

The intensity of the color of the background is determined rather than that of the foreground,

despite the latter being the ultimatily interesting component. Why is that?

Sometimes a correction of the background color is done. Why and how?

 

 

5. I don't get the idea of normalisation: Why doesn't one expect differences of the medians

accross the arrays? I thought, so that the two groups (comparison and control) can differ

in the gene activity, the medians in different arrays have to be different, since some

of the arrays represent the comparison and some the control group.

A similiar question: To attain normalisation one can transform the data to have median zero

in every array: Why does the median has to be zero? Isn't it possible (respectively the rule),

that one color is more dominant, and as a consequence M = log2® - log2(G) has a nonzero median?

 

 

6. What is the difference between cDNA Microarrays and Oligonucleotid Microarrays - other than in

the first there are spaces between the spots and in the latter not?

With oligonucleotid Microarrays there exist Perfect Matches (PM) and Mismatches (MM): In the

lecture notes, I have, there is a diagram showing two rows, one representing the PMs and one

the MMs, one row is a probe pair. Why are the first called Perfect MATCHES? I thought, in this setting

a "match" would occur, when the two elements in a probe pair happen to be equal, but I don't understand,

how a single element can be called a "match" - the same with MMs. What are PMs and MMs anyway?

 

 

Thank you very much in advance!

 

Greetings

Roman Hornung

Posted

First a disclaimer: a am not a geneticist. I'm a biochemist but I left genetics behind 3 years ago. However, I have just had to do some research on DNA microarrays, so I might be able to help!

 

1. Someone else will be able to answer this far more clearly than I can.

 

2. I think there's some misunderstanding going on here, because your statement doesn't seem to make sense. Back to the basics: when a gene is "turned" into a protein, first it is transcribed into mRNA, which is subsequently translated into a protein (each set of 3 base pairs in the mRNA codes for an amino acid in the protein). Usually an mRNA is not necessarily a direct complimentary copy of the gene as it undergoes processing before it is translated. Wikipedia has some good information on all this (search messenger RNA).

So, different genes code for different proteins - at one point in the cell it may need more of a certain protein than another protein - each protein has different "expression levels". So a cell may be making more of a certain type of mRNA than another type. In the microarray, the expression levels of many different mRNAs can be compared, which gives information as to what the cell is doing at that particular time.

 

If I've missed the point there, and you're talking about another sort of microarray experiment then I apologise. Microarrays can have other uses other than the study of gene expression (for example forensics), but since you're studying the proteome then i assume the gene expression use is key.

 

3. I can answer some questions here.

What is your object of interest? Let's take the example of cancer cells. The genes in a cancerous cell are regulated differently to those in a normal cell. To study this, your genes of interest might be those involved in the cell cycle and proliferation. Your microarray would take all these genes from the cancer cell, and compare them against the normal cell. You could then see which genes are being upregulated and which are downregulated.

 

It is easier to use DNA rather than RNA in an array experiment because DNA is more stable, RNA degrades very quickly. However, you can't just take the gene you're studying and put it on the array as it contains loads of rubbish that you're not interested in (mRNA is processed after transcription). So you can turn the mRNA back into DNA via the enzyme reverse transcriptase - and this DNA is called copyDNA.

 

Colours - often a fluorescent tag is added to the DNA on the microarray (there are other sorts of tag, like radioactive etc). When comparing two different samples (e.g. the cancer cell vs the normal cell) different tags would be used for each - thus two different colours. You can compare the intensities of the colours after hybridization - the more intense colour of the two indicates that that sample has hybridized more strongly/best.

 

4. In any experiment you're going to get noise that can interfere with the data. You need to do some maths and processing to eliminate any noise that will give erroneous data.

 

 

Okay, I've run out of time - I hope that helped a bit.

Posted

Short on time just a few quick answers (with lots of typos, I presume):

1) depends on what is known about the RNA and the genome. If sequences are known, a simple sequence search on the genome may be sufficient. Based on that (or even without the knowledge) a number of PCR strategies can be employed to amplify the desired gene.

 

2)I also have a bit of a hard time to understand that sentence. To me it reads that microarrays (MA) could be used to assess how much each genes pertains to the total pool of expressed transcriptome (i.e. total mRNA-pool). This would be incorrect as whole-genome microarrays have problems normalizing across the whole array to allow cross-comparison between genes. One can essentially only accurate compare the expression of a single gene under different conditions, but not compare one gene with a different one (though there are now special methods available for that)

 

3) In a standard MA experiment you the whole genome spotted on the microarray unlabelled. They serve as the template for the cDNA probes to hybridize. Then you generally isolate the mRNA from cells grown under two different conditions, or healthy and diseased cells, reverse transcribe (to gain cDNA) and label them with different dyes. Both are then mixed and hybridized against the chip. So on a given target gene on the array, both cDNAs of the respective gene from healthy with dye1 and diseased with dye 2 are allowed to hybridize. A scanner than scans the chip with the specific wavelengths for each dye. The resulting intensity of dye1 in comparison to dye2 on any given gene on the MA thus allows you to assess the relative expression strength from condition 1 (e.g. healthy) to condition 2 (e.g. diseased). Now, the labeling efficiency may be different from dye 1 to dye 2 or the scanner may have a bias. Therefore swapping the dyes (in this case label healthy with dye 2 and diseased with dye 1) can help to normalize potential biases.

Note that the fluorescence levels are measured not the colors per se. They are only measured using different excitation wavelengths and filters.

 

4) In fluorescent measurements you often have weak background signals that vary from experiment to experiment. You have to account for them. There are a lot of different strategies, the simplest just subtracting the background from the overall intensity value in the vicinity of the spot (as the background may also vary across the chip). If you don't do that, the signals may appear higher or lower due to the background, rather than the true fluorescence signals from your probes.

 

5) There are a lot of different nomralization strategies for microarrays. I think you may be confusing different medians here, but there are too many schemes to detail them all here. In short, intensity values are not homogenous within a chip therefore often a normalization e.g. according the way they are printed are used, but the differences between MA batches is often higher (another normalization is needed here). The log2 transformation is to make the values easier to calculate. For instance, an increase by two-fold would be, without log transformation, 2, whereas a two-fold reduction would be 0.5. With log 2 transformation the values would be 1 and -1 which creates a nice symmetry.

 

6) The only difference is how the targets are prepared. The printing is not the differentiating factor. I am sure you are missing some context here, though.

  • 2 months later...
Posted

@Greippi and CharonY: Thank you very much for your fast and detailed answers! It helped much!

 

 

I still have some questions about the usage of oligonucleotide- and cDNA-Microarrays:

 

Is it possible to conduct supervised classification using cDNA-microarrays, or can one perform that task only with oligonucleotide microarrays?

 

As I understood it, with cDNA-microarrays there aren't any target variables, since the DNA that is placed on the cDNA-chips is a mixture of the colored DNA of the probes of the two classes. Does that mean, one can use them only for clustering genes that are similiar in their value of expression and direction towards one of the classes?

 

In contrast an oligonucleotide microarray only investigates the expression of genes from one probe. So in this setting every array represents one patient, that is belongs to one class of the two. Is that true?

 

Greetings

Roman

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.