0ochello0 Posted March 30, 2009 Posted March 30, 2009 Hi, Ok, so i'm new to bioinformatics and don't really know what im doing! I am doing an assignment on tumor protein 53 and have been using the NCBI website and at the moment im stuck on a question. For the information below, I am supposed to find the nucleotide sequences of the gene (which is the sequences under the origin heading yes?) and the amino acid sequence data which I have been told is under the Features heading, though I have no idea how to read the information given! Any ideas? FEATURES Location/Qualifiers source 1..2331 /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /chromosome="17" /map="17p13.1" gene 1..2331 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /note="tumor protein p53" /db_xref="GeneID:7157" /db_xref="HGNC:11998" /db_xref="MIM:191170" exon 1..441 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /inference="alignment:Splign" /number=5a STS 243..558 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /standard_name="GDB:178567" /db_xref="UniSTS:155019" STS 243..486 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /standard_name="GDB:363689" /db_xref="UniSTS:156784" STS 243..353 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /standard_name="GDB:177724" /db_xref="UniSTS:154952" CDS 279..923 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /note="isoform f is encoded by transcript variant 7; p53 tumor suppressor; phosphoprotein p53; p53 antigen; p53 transformation suppressor; transformation-related protein 53" /codon_start=1 /product="tumor protein p53 isoform f" /protein_id="NP_001119589.1" /db_xref="GI:187830909" /db_xref="GeneID:7157" /db_xref="HGNC:11998" /db_xref="MIM:191170" /translation="MFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRC PHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHY NYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKK GEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQMLLDLRWCYFLINSS" STS 360..434 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /standard_name="PMC340938P3" /db_xref="UniSTS:273171" exon 442..554 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /inference="alignment:Splign" /number=6 exon 555..664 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /inference="alignment:Splign" /number=7 STS 635..833 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /standard_name="PMC310707P2" /db_xref="UniSTS:272633" STS 639..713 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /standard_name="GDB:190076" /db_xref="UniSTS:155620" exon 665..801 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /inference="alignment:Splign" /number=8 exon 802..875 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /inference="alignment:Splign" /number=9 exon 876..935 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /inference="alignment:Splign" /number=10b exon 936..1042 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /inference="alignment:Splign" /number=11 exon 1043..2331 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /inference="alignment:Splign" /number=12 STS 1194..1310 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /standard_name="D17S1678" /db_xref="UniSTS:82485" STS 1651..1797 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /standard_name="D17S1506E" /db_xref="UniSTS:151711" STS 2186..2262 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" /standard_name="WI-20715" /db_xref="UniSTS:59997" polyA_signal 2295..2300 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" polyA_site 2312 /gene="TP53" /gene_synonym="FLJ92943; LFS1; p53; TRP53" ORIGIN 1 tgaggccagg agatggaggc tgcagtgagc tgtgatcaca ccactgtgct ccagcctgag 61 tgacagagca agaccctatc tcaaaaaaaa aaaaaaaaaa gaaaagctcc tgaggtgtag 121 acgccaactc tctctagctc gctagtgggt tgcaggaggt gcttacgcat gtttgtttct 181 ttgctgccgt cttccagttg ctttatctgt tcacttgtgc cctgactttc aactctgtct 241 ccttcctctt cctacagtac tcccctgccc tcaacaagat gttttgccaa ctggccaaga 301 cctgccctgt gcagctgtgg gttgattcca cacccccgcc cggcacccgc gtccgcgcca 361 tggccatcta caagcagtca cagcacatga cggaggttgt gaggcgctgc ccccaccatg 421 agcgctgctc agatagcgat ggtctggccc ctcctcagca tcttatccga gtggaaggaa 481 atttgcgtgt ggagtatttg gatgacagaa acacttttcg acatagtgtg gtggtgccct 541 atgagccgcc tgaggttggc tctgactgta ccaccatcca ctacaactac atgtgtaaca 601 gttcctgcat gggcggcatg aaccggaggc ccatcctcac catcatcaca ctggaagact 661 ccagtggtaa tctactggga cggaacagct ttgaggtgcg tgtttgtgcc tgtcctggga 721 gagaccggcg cacagaggaa gagaatctcc gcaagaaagg ggagcctcac cacgagctgc 781 ccccagggag cactaagcga gcactgccca acaacaccag ctcctctccc cagccaaaga 841 agaaaccact ggatggagaa tatttcaccc ttcagatgct acttgactta cgatggtgtt 901 acttcctgat aaactcgtcg taagttgaaa atattatccg tgggcgtgag cgcttcgaga 961 tgttccgaga gctgaatgag gccttggaac tcaaggatgc ccaggctggg aaggagccag 1021 gggggagcag ggctcactcc agccacctga agtccaaaaa gggtcagtct acctcccgcc 1081 ataaaaaact catgttcaag acagaagggc ctgactcaga ctgacattct ccacttcttg 1141 ttccccactg acagcctccc acccccatct ctccctcccc tgccattttg ggttttgggt 1201 ctttgaaccc ttgcttgcaa taggtgtgcg tcagaagcac ccaggacttc catttgcttt 1261 gtcccggggc tccactgaac aagttggcct gcactggtgt tttgttgtgg ggaggaggat 1321 ggggagtagg acataccagc ttagatttta aggtttttac tgtgagggat gtttgggaga 1381 tgtaagaaat gttcttgcag ttaagggtta gtttacaatc agccacattc taggtagggg 1441 cccacttcac cgtactaacc agggaagctg tccctcactg ttgaattttc tctaacttca 1501 aggcccatat ctgtgaaatg ctggcatttg cacctacctc acagagtgca ttgtgagggt 1561 taatgaaata atgtacatct ggccttgaaa ccacctttta ttacatgggg tctagaactt 1621 gacccccttg agggtgcttg ttccctctcc ctgttggtcg gtgggttggt agtttctaca 1681 gttgggcagc tggttaggta gagggagttg tcaagtctct gctggcccag ccaaaccctg 1741 tctgacaacc tcttggtgaa ccttagtacc taaaaggaaa tctcacccca tcccacaccc 1801 tggaggattt catctcttgt atatgatgat ctggatccac caagacttgt tttatgctca 1861 gggtcaattt cttttttctt tttttttttt ttttttcttt ttctttgaga ctgggtctcg 1921 ctttgttgcc caggctggag tggagtggcg tgatcttggc ttactgcagc ctttgcctcc 1981 ccggctcgag cagtcctgcc tcagcctccg gagtagctgg gaccacaggt tcatgccacc 2041 atggccagcc aacttttgca tgttttgtag agatggggtc tcacagtgtt gcccaggctg 2101 gtctcaaact cctgggctca ggcgatccac ctgtctcagc ctcccagagt gctgggatta 2161 caattgtgag ccaccacgtc cagctggaag ggtcaacatc ttttacattc tgcaagcaca 2221 tctgcatttt caccccaccc ttcccctcct tctccctttt tatatcccat ttttatatcg 2281 atctcttatt ttacaataaa actttgctgc cacctgtgtg tctgaggggt g Then i'm supposed to mark sites of initiation and termination on the nucleotide sequence. I know the initiation codon is atg, so does that mean whenever I find atg in the sequence it is an initiation site? For example in this line, is the place ive put the (()) around an initiation site? 1 tgaggccagg ag((atg))gaggc tgcagtgagc tgtgatcaca ccactgtgct ccagcctgag How many initiation and termination sites should there be? I'm so confused! Hopefully someone can set me straight, I tried talking to my lecturer but he isnt very good with english and has a hard time explaining. I've been trying to learn this course from a text book and can only get so far. Thanks!
CharonY Posted March 30, 2009 Posted March 30, 2009 I will move that to the homework section. To your questions: - yes the sequence after Origin is the DNA sequence - the amino acid sequence starts after translation Then i'm supposed to mark sites of initiation and termination on the nucleotide sequence. I know the initiation codon is atg, so does that mean whenever I find atg in the sequence it is an initiation site? OK I assume you mean translation initiation site (there is also the transcription initiation site, which you won't find in this sequence). Anyway, a gene has a start and an ending, right? Now the start codon is AUG and not ATG. Think about why this is significant. Remember that we are talking about translation initiation, which does not happen on the DNA but on what...?
0ochello0 Posted March 31, 2009 Author Posted March 31, 2009 Translation decodes the mRNA, so the sequence is a template DNA that needs to be converted to mRNA? Thats were the A is changed into U. So I should be finding complements for the sequence I have and then find the AUG initiation sites from that? How am I going? On the right track yet? haha Thankyou!
CharonY Posted March 31, 2009 Posted March 31, 2009 My apologies, I did not look over the entry careful enough. Just disregard the last comment in my post. First find out what type of molecule we got. Is it DNA or RNA (actually it is kind of a trick question, look in the entries to find it)? The entries in the Genbank file give the position of the feature in question. For example "STS 243..558" says that there is a sequence tagged site at position 243-558 (if you are interested you can look up what STS are). Now I assume that you are interested in finding the coding region. This is termed coding sequence, or CDS. This is all you need to find the nucleotide sequence that corresponds to the AA sequence shown in this entry.
0ochello0 Posted April 4, 2009 Author Posted April 4, 2009 Thank you so much for showing me how to find initiation and termination points using the CDS information!! I had no idea what all of that part meant and was trying to do it by finding the points using the amino acid sequence, which was taking ages as I kept getting confused and had to start again. Your way is so much quicker! Another question, We are supposed to mark as many features as possible on the stand - so far I have the termination and initiation sites, what other kinds of features are there that can be marked? Just the different amino acids? Thanks again, i'm slowly getting through this!
0ochello0 Posted April 4, 2009 Author Posted April 4, 2009 I've attched my sequence so far. It's in word so the colours come up. How does it look? I have the initiation and termination sites marked and the exons. Any suggestions of other important features that I should think about adding? I also had to choose and allelic variant and mark where it differs from the normal gene. Can you tell if i've done it correctly? This is the information I have on the variant from NCBI: .0001 LI-FRAUMENI SYNDROME 1 [TP53, ARG248TRP] Malkin et al. (1990) demonstrated that alterations of the TP53 gene occur not only as somatic mutations in human cancers, but also as germline mutations in some cancer-prone families. In 2 families with Li-Fraumeni syndrome-1 (151623), they identified a C-to-T mutation at the first nucleotide of codon 248, changing arginine to tryptophan (R248W). I found codon 248 in the amino acid seqence and highlighted that. Then I tried to find it in the nucleotide sequence by finding the amino acids surrounding it. eg, cgg®cgg®ccc(P). Some of them seem to fit and others dont. eg before the R there is supposed to be N, however the codon is cac which codes for H not N. Does that make sense? Am I doing something wrong? Thanks again! Sequence.doc
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now