Nucleotide sequencing and using NCBI?

March 30, 200916 yr

Hi,

Ok, so i'm new to bioinformatics and don't really know what im doing!

I am doing an assignment on tumor protein 53 and have been using the NCBI website and at the moment im stuck on a question.

For the information below, I am supposed to find the nucleotide sequences of the gene (which is the sequences under the origin heading yes?) and the amino acid sequence data which I have been told is under the Features heading, though I have no idea how to read the information given! Any ideas?

FEATURES Location/Qualifiers

source 1..2331

/organism="Homo sapiens"

/mol_type="mRNA"

/db_xref="taxon:9606"

/chromosome="17"

/map="17p13.1"

gene 1..2331

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/note="tumor protein p53"

/db_xref="GeneID:7157"

/db_xref="HGNC:11998"

/db_xref="MIM:191170"

exon 1..441

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/inference="alignment:Splign"

/number=5a

STS 243..558

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/standard_name="GDB:178567"

/db_xref="UniSTS:155019"

STS 243..486

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/standard_name="GDB:363689"

/db_xref="UniSTS:156784"

STS 243..353

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/standard_name="GDB:177724"

/db_xref="UniSTS:154952"

CDS 279..923

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/note="isoform f is encoded by transcript variant 7; p53

tumor suppressor; phosphoprotein p53; p53 antigen; p53

transformation suppressor; transformation-related protein

53"

/codon_start=1

/product="tumor protein p53 isoform f"

/protein_id="NP_001119589.1"

/db_xref="GI:187830909"

/db_xref="GeneID:7157"

/db_xref="HGNC:11998"

/db_xref="MIM:191170"

/translation="MFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRC

PHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHY

NYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKK

GEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQMLLDLRWCYFLINSS"

STS 360..434

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/standard_name="PMC340938P3"

/db_xref="UniSTS:273171"

exon 442..554

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/inference="alignment:Splign"

/number=6

exon 555..664

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/inference="alignment:Splign"

/number=7

STS 635..833

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/standard_name="PMC310707P2"

/db_xref="UniSTS:272633"

STS 639..713

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/standard_name="GDB:190076"

/db_xref="UniSTS:155620"

exon 665..801

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/inference="alignment:Splign"

/number=8

exon 802..875

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/inference="alignment:Splign"

/number=9

exon 876..935

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/inference="alignment:Splign"

/number=10b

exon 936..1042

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/inference="alignment:Splign"

/number=11

exon 1043..2331

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/inference="alignment:Splign"

/number=12

STS 1194..1310

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/standard_name="D17S1678"

/db_xref="UniSTS:82485"

STS 1651..1797

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/standard_name="D17S1506E"

/db_xref="UniSTS:151711"

STS 2186..2262

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

/standard_name="WI-20715"

/db_xref="UniSTS:59997"

polyA_signal 2295..2300

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

polyA_site 2312

/gene="TP53"

/gene_synonym="FLJ92943; LFS1; p53; TRP53"

ORIGIN

1 tgaggccagg agatggaggc tgcagtgagc tgtgatcaca ccactgtgct ccagcctgag

61 tgacagagca agaccctatc tcaaaaaaaa aaaaaaaaaa gaaaagctcc tgaggtgtag

121 acgccaactc tctctagctc gctagtgggt tgcaggaggt gcttacgcat gtttgtttct

181 ttgctgccgt cttccagttg ctttatctgt tcacttgtgc cctgactttc aactctgtct

241 ccttcctctt cctacagtac tcccctgccc tcaacaagat gttttgccaa ctggccaaga

301 cctgccctgt gcagctgtgg gttgattcca cacccccgcc cggcacccgc gtccgcgcca

361 tggccatcta caagcagtca cagcacatga cggaggttgt gaggcgctgc ccccaccatg

421 agcgctgctc agatagcgat ggtctggccc ctcctcagca tcttatccga gtggaaggaa

481 atttgcgtgt ggagtatttg gatgacagaa acacttttcg acatagtgtg gtggtgccct

541 atgagccgcc tgaggttggc tctgactgta ccaccatcca ctacaactac atgtgtaaca

601 gttcctgcat gggcggcatg aaccggaggc ccatcctcac catcatcaca ctggaagact

661 ccagtggtaa tctactggga cggaacagct ttgaggtgcg tgtttgtgcc tgtcctggga

721 gagaccggcg cacagaggaa gagaatctcc gcaagaaagg ggagcctcac cacgagctgc

781 ccccagggag cactaagcga gcactgccca acaacaccag ctcctctccc cagccaaaga

841 agaaaccact ggatggagaa tatttcaccc ttcagatgct acttgactta cgatggtgtt

901 acttcctgat aaactcgtcg taagttgaaa atattatccg tgggcgtgag cgcttcgaga

961 tgttccgaga gctgaatgag gccttggaac tcaaggatgc ccaggctggg aaggagccag

1021 gggggagcag ggctcactcc agccacctga agtccaaaaa gggtcagtct acctcccgcc

1081 ataaaaaact catgttcaag acagaagggc ctgactcaga ctgacattct ccacttcttg

1141 ttccccactg acagcctccc acccccatct ctccctcccc tgccattttg ggttttgggt

1201 ctttgaaccc ttgcttgcaa taggtgtgcg tcagaagcac ccaggacttc catttgcttt

1261 gtcccggggc tccactgaac aagttggcct gcactggtgt tttgttgtgg ggaggaggat

1321 ggggagtagg acataccagc ttagatttta aggtttttac tgtgagggat gtttgggaga

1381 tgtaagaaat gttcttgcag ttaagggtta gtttacaatc agccacattc taggtagggg

1441 cccacttcac cgtactaacc agggaagctg tccctcactg ttgaattttc tctaacttca

1501 aggcccatat ctgtgaaatg ctggcatttg cacctacctc acagagtgca ttgtgagggt

1561 taatgaaata atgtacatct ggccttgaaa ccacctttta ttacatgggg tctagaactt

1621 gacccccttg agggtgcttg ttccctctcc ctgttggtcg gtgggttggt agtttctaca

1681 gttgggcagc tggttaggta gagggagttg tcaagtctct gctggcccag ccaaaccctg

1741 tctgacaacc tcttggtgaa ccttagtacc taaaaggaaa tctcacccca tcccacaccc

1801 tggaggattt catctcttgt atatgatgat ctggatccac caagacttgt tttatgctca

1861 gggtcaattt cttttttctt tttttttttt ttttttcttt ttctttgaga ctgggtctcg

1921 ctttgttgcc caggctggag tggagtggcg tgatcttggc ttactgcagc ctttgcctcc

1981 ccggctcgag cagtcctgcc tcagcctccg gagtagctgg gaccacaggt tcatgccacc

2041 atggccagcc aacttttgca tgttttgtag agatggggtc tcacagtgtt gcccaggctg

2101 gtctcaaact cctgggctca ggcgatccac ctgtctcagc ctcccagagt gctgggatta

2161 caattgtgag ccaccacgtc cagctggaag ggtcaacatc ttttacattc tgcaagcaca

2221 tctgcatttt caccccaccc ttcccctcct tctccctttt tatatcccat ttttatatcg

2281 atctcttatt ttacaataaa actttgctgc cacctgtgtg tctgaggggt g

Then i'm supposed to mark sites of initiation and termination on the nucleotide sequence. I know the initiation codon is atg, so does that mean whenever I find atg in the sequence it is an initiation site?

For example in this line, is the place ive put the (()) around an initiation site?

1 tgaggccagg ag((atg))gaggc tgcagtgagc tgtgatcaca ccactgtgct ccagcctgag

How many initiation and termination sites should there be?

I'm so confused! Hopefully someone can set me straight, I tried talking to my lecturer but he isnt very good with english and has a hard time explaining. I've been trying to learn this course from a text book and can only get so far.

Thanks!

March 30, 200916 yr

I will move that to the homework section. To your questions:

- yes the sequence after Origin is the DNA sequence

- the amino acid sequence starts after translation

Then i'm supposed to mark sites of initiation and termination on the nucleotide sequence. I know the initiation codon is atg, so does that mean whenever I find atg in the sequence it is an initiation site?

OK I assume you mean translation initiation site (there is also the transcription initiation site, which you won't find in this sequence). Anyway, a gene has a start and an ending, right? Now the start codon is AUG and not ATG. Think about why this is significant. Remember that we are talking about translation initiation, which does not happen on the DNA but on what...?

March 31, 200916 yr

Author

Translation decodes the mRNA, so the sequence is a template DNA that needs to be converted to mRNA? Thats were the A is changed into U. So I should be finding complements for the sequence I have and then find the AUG initiation sites from that?

How am I going? On the right track yet?

haha Thankyou!

March 31, 200916 yr

My apologies, I did not look over the entry careful enough. Just disregard the last comment in my post.

First find out what type of molecule we got. Is it DNA or RNA (actually it is kind of a trick question, look in the entries to find it)? The entries in the Genbank file give the position of the feature in question. For example "STS 243..558" says that there is a sequence tagged site at position 243-558 (if you are interested you can look up what STS are). Now I assume that you are interested in finding the coding region. This is termed coding sequence, or CDS. This is all you need to find the nucleotide sequence that corresponds to the AA sequence shown in this entry.

April 4, 200916 yr

Author

Thank you so much for showing me how to find initiation and termination points using the CDS information!! I had no idea what all of that part meant and was trying to do it by finding the points using the amino acid sequence, which was taking ages as I kept getting confused and had to start again. Your way is so much quicker!

Another question, We are supposed to mark as many features as possible on the stand - so far I have the termination and initiation sites, what other kinds of features are there that can be marked? Just the different amino acids?

Thanks again, i'm slowly getting through this!

April 4, 200916 yr

Author

I've attched my sequence so far. It's in word so the colours come up. How does it look? I have the initiation and termination sites marked and the exons. Any suggestions of other important features that I should think about adding?

I also had to choose and allelic variant and mark where it differs from the normal gene. Can you tell if i've done it correctly?

This is the information I have on the variant from NCBI:

.0001 LI-FRAUMENI SYNDROME 1 [TP53, ARG248TRP]

Malkin et al. (1990) demonstrated that alterations of the TP53 gene occur not only as somatic mutations in human cancers, but also as germline mutations in some cancer-prone families. In 2 families with Li-Fraumeni syndrome-1 (151623), they identified a C-to-T mutation at the first nucleotide of codon 248, changing arginine to tryptophan (R248W).

I found codon 248 in the amino acid seqence and highlighted that. Then I tried to find it in the nucleotide sequence by finding the amino acids surrounding it. eg, cgg®cgg®ccc(P). Some of them seem to fit and others dont. eg before the R there is supposed to be N, however the codon is cac which codes for H not N. Does that make sense? Am I doing something wrong?

Thanks again!

Sequence.doc

Sign In

Nucleotide sequencing and using NCBI?

Featured Replies

Archived

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)