Mr Nobody Posted June 1, 2017 Posted June 1, 2017 I'm having another problem with doing a differential analysis of RNA sequences using command line Bio bioinformatics. I have been given 5 case and 5 control reads when need to be compared
The Helper Posted June 1, 2017 Posted June 1, 2017 In order to do this type of analysis, after RNA samples have been extracted and a library has been constructed, the sequences have to be sequenced and assembled. Remember before any assembly can occur, the quality of the reads needs to be checked for any adapter content or poor quality regions. These can be trimmed using several different programmes. Once trimmed, assembly can begin. When using command line, the Tuxedo Suite is the most often used for assembly and transcriptome analysis. Since a transcriptome analysis works with multiple RNA sequences, the amount of overrepresented sequences and duplicates will be high. Tuxedo Suite: Bowtie - Allows for fast and simple alignment. Needed to form the base of Tophat alignment. Needs a reference genome (.fa) Tophat - Uses output file from Bowtie and aligns RNA sequences in a splice-aware way. It allows for the discovery of new splice junctions. This will be repeated for every read you have. (Eg. tophat2 –p 5 --library-type fr-firststrand –o outputDirectory (Reference file name) inputFile.fq) Cufflinks - Assembles transcripts (Eg. cufflinks –g(reference .gtf file) –b(reference.fa) –u --library-type fr-firststrand –o outputDirectory inputfile.bam) Cuffmerge - Merges multiple transcript assemblies into 1 file Often to reduce the complexity of the script, a text file is made which contains the path to the .gtf file needed (Eg. cuffmerge –o outputDirectory –g reference.gtf –s reference.fa pathfile.txt) Cuffdiff - Differential expression analysis for Transcriptome analysis (Eg. cuffdiff –p 5 –b reference.fa –u mergedfile.gtf CaseInputfiles.bam(separated by a comma) ControlInputfile.bam –o outputDirectory) For assistance with Tophat - https://www.illumina.com/documents/products/technotes/RNASeqAnalysisTopHat.pdf https://ccb.jhu.edu/software/tophat/manual.shtml For assistance with Cufflinks - https://www.google.co.za/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&cad=rja&uact=8&ved=0ahUKEwin5rqd1pfUAhWIAMAKHUEfAH4QFgg2MAQ&url=https%3A%2F%2Fwww.researchgate.net%2Ffile.PostFileLoader.html%3Fid%3D544651e5d3df3edb2b8b463a%26assetKey%3DAS%253A273626954174476%25401442249157340&usg=AFQjCNHlfzwAeAOVgHNwH4gfae3r_YPjig&sig2=n5gn1ZMpiTTMiAojbmxadw
LemurLady18 Posted June 1, 2017 Posted June 1, 2017 Just a note, bowtie is very specific about its command... before you do tophat command, you need to "repackage" the reference genome file with bowtie. For example, in my lab, we use this command: $bowtie2-build ref_genome.fa ref_genome the fasta file is basically being repackaged so that tophat can use it. But the "package" (ref_genome) has to have the same name as the fasta file (ref_genome.fa), obviously removing the .fa partThen, you can follow this with your tophat command (all in the same shell) Tuxedo Genome Guided Transcriptome Assembly Workshop site gives a good explanation of the workflow for this type of analysis as well: https://github.com/trinityrnaseq/RNASeq_Trinity_Tuxedo_Workshop/wiki/Tuxedo-Genome-Guided-Transcriptome-Assembly-Workshop Alos, you can find a helpful flow diagram in the Cufflinks manual: http://cole-trapnell-lab.github.io/cufflinks/manual/
weetBIX Posted June 1, 2017 Posted June 1, 2017 just checking the formula for depth of coverage: no. of reads x read length/ genome size ??
hypervalent_iodine Posted June 2, 2017 Posted June 2, 2017 ! Moderator Note I'm not sure I fully understand the whole starting a thread so you can talk to yourself with three different accounts thing, but I do know that sock puppetry is against the forum rules.
Recommended Posts