Bioinformatic differential expression analysis

Mr Nobody · June 1, 2017

I'm having another problem with doing a differential analysis of RNA sequences using command line Bio bioinformatics.

I have been given 5 case and 5 control reads when need to be compared

The Helper · June 1, 2017

In order to do this type of analysis, after RNA samples have been extracted and a library has been constructed, the sequences have to be sequenced and assembled.

Remember before any assembly can occur, the quality of the reads needs to be checked for any adapter content or poor quality regions. These can be trimmed using several different programmes. Once trimmed, assembly can begin. When using command line, the Tuxedo Suite is the most often used for assembly and transcriptome analysis.

Since a transcriptome analysis works with multiple RNA sequences, the amount of overrepresented sequences and duplicates will be high.

Tuxedo Suite:

Bowtie - Allows for fast and simple alignment. Needed to form the base of Tophat alignment.

Needs a reference genome (.fa)

Tophat - Uses output file from Bowtie and aligns RNA sequences in a splice-aware way. It allows for the discovery of new splice junctions. This will be repeated for every read you have.

(Eg. tophat2 –p 5 --library-type fr-firststrand –o outputDirectory (Reference file name) inputFile.fq)

Cufflinks - Assembles transcripts

(Eg. cufflinks –g(reference .gtf file) –b(reference.fa) –u --library-type fr-firststrand –o outputDirectory inputfile.bam)

Cuffmerge - Merges multiple transcript assemblies into 1 file

Often to reduce the complexity of the script, a text file is made which contains the path to the .gtf file needed

(Eg. cuffmerge –o outputDirectory –g reference.gtf –s reference.fa pathfile.txt)

Cuffdiff - Differential expression analysis for Transcriptome analysis

(Eg. cuffdiff –p 5 –b reference.fa –u mergedfile.gtf CaseInputfiles.bam(separated by a comma) ControlInputfile.bam –o outputDirectory)

For assistance with Tophat - https://www.illumina.com/documents/products/technotes/RNASeqAnalysisTopHat.pdf

https://ccb.jhu.edu/software/tophat/manual.shtml

For assistance with Cufflinks - https://www.google.co.za/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&cad=rja&uact=8&ved=0ahUKEwin5rqd1pfUAhWIAMAKHUEfAH4QFgg2MAQ&url=https%3A%2F%2Fwww.researchgate.net%2Ffile.PostFileLoader.html%3Fid%3D544651e5d3df3edb2b8b463a%26assetKey%3DAS%253A273626954174476%25401442249157340&usg=AFQjCNHlfzwAeAOVgHNwH4gfae3r_YPjig&sig2=n5gn1ZMpiTTMiAojbmxadw

LemurLady18 · June 1, 2017

Just a note, bowtie is very specific about its command... before you do tophat command, you need to "repackage" the reference genome file with bowtie. For example, in my lab, we use this command:

$bowtie2-build ref_genome.fa ref_genome

the fasta file is basically being repackaged so that tophat can use it. But the "package" (ref_genome) has to have the same name as the fasta file (ref_genome.fa), obviously removing the .fa part

Then, you can follow this with your tophat command (all in the same shell)

Tuxedo Genome Guided Transcriptome Assembly Workshop site gives a good explanation of the workflow for this type of analysis as well:

https://github.com/trinityrnaseq/RNASeq_Trinity_Tuxedo_Workshop/wiki/Tuxedo-Genome-Guided-Transcriptome-Assembly-Workshop

Alos, you can find a helpful flow diagram in the Cufflinks manual:

http://cole-trapnell-lab.github.io/cufflinks/manual/

weetBIX · June 1, 2017

just checking the formula for depth of coverage: no. of reads x read length/ genome size ??

hypervalent_iodine · June 2, 2017

!

Moderator Note

I'm not sure I fully understand the whole starting a thread so you can talk to yourself with three different accounts thing, but I do know that sock puppetry is against the forum rules.

Sign In

Bioinformatic differential expression analysis

Recommended Posts

Mr Nobody

The Helper

LemurLady18

weetBIX

hypervalent_iodine

Browse

Activity

Important Information