GeneBank序列注释说明

GeneBank格式的序列中，有一些注释很难弄明白它的意思，通过搜索，终于找到了其原始的解释说明。
现以NM_005715为一个简单的例子（##后面为说明）。
LOCUS NM_005715 4396 bp mRNA linear PRI 27-JUN-2012 ##说明accession num、序列长度和最后更新日期 DEFINITION Homo sapiens uronyl-2-sulfotransferase (UST), mRNA. ##序列所在基因功能描述 ACCESSION NM_005715 ##accession num VERSION NM_005715.2 GI:194578911 ##accession版本号和所在基因GI号 KEYWORDS . SOURCE Homo sapiens (human) ##序列来源，即序列所在物种 ORGANISM Homo sapiens ##序列物种生物学分类 Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 4396) ##参考文献序号 AUTHORS Uher,R., Perroud,N., Ng,M.Y., Hauser,J., Henigsberg,N., Maier,W., ##参考文献作者 Mors,O., Placentino,A., Rietschel,M., Souery,D., Zagar,T., Czerski,P.M., Jerman,B., Larsen,E.R., Schulze,T.G., Zobel,A., Cohen-Woods,S., Pirlo,K., Butler,A.W., Muglia,P., Barnes,M.R., Lathrop,M., Farmer,A., Breen,G., Aitchison,K.J., Craig,I., Lewis,C.M. and McGuffin,P. TITLE Genome-wide pharmacogenetics of antidepressant response in the ##参考文献标题 GENDEP project JOURNAL Am J Psychiatry 167 (5), 555-564 (2010) ##参考文献发表的杂志 PUBMED 20360315 ##在pubmed数据库中序号 REMARK GeneRIF: Clinical trial and genome-wide association study of gene-disease association, gene-environment interaction, and pharmacogenomic / toxicogenomic. (HuGE Navigator) REFERENCE 2 (bases 1 to 4396) AUTHORS Trynka,G., Zhernakova,A., Romanos,J., Franke,L., Hunt,K.A., Turner,G., Bruinenberg,M., Heap,G.A., Platteel,M., Ryan,A.W., de Kovel,C., Holmes,G.K., Howdle,P.D., Walters,J.R., Sanders,D.S., Mulder,C.J., Mearin,M.L., Verbeek,W.H., Trimble,V., Stevens,F.M., Kelleher,D., Barisani,D., Bardella,M.T., McManus,R., van Heel,D.A. and Wijmenga,C. TITLE Coeliac disease-associated risk variants in TNFAIP3 and REL implicate altered NF-kappaB signalling JOURNAL Gut 58 (8), 1078-1083 (2009) PUBMED 19240061 REMARK GeneRIF: Observational study of gene-disease association. (HuGE Navigator) REFERENCE 3 (bases 1 to 4396) AUTHORS Xu,D., Song,D., Pedersen,L.C. and Liu,J. TITLE Mutational study of heparan sulfate 2-O-sulfotransferase and chondroitin sulfate 2-O-sulfotransferase JOURNAL J. Biol. Chem. 282 (11), 8356-8367 (2007) PUBMED 17227754 REMARK GeneRIF: analysis of differences and similarities various residues play in the biological roles of the HS-2OST and CS-2OST enzymes REFERENCE 4 (bases 1 to 4396) AUTHORS Ohtake,S., Kimata,K. and Habuchi,O. TITLE Recognition of sulfation pattern of chondroitin sulfate by uronosyl 2-O-sulfotransferase JOURNAL J. Biol. Chem. 280 (47), 39115-39123 (2005) PUBMED 16192264 REMARK GeneRIF: 2OST transfers sulfate preferentially to the GlcA residue located in a unique sequence, -GalNAc(4SO(4))-GlcA-GalNAc(6SO(4))-. REFERENCE 5 (bases 1 to 4396) AUTHORS Mungall,A.J., Palmer,S.A., Sims,S.K., Edwards,C.A., Ashurst,J.L., Wilming,L., Jones,M.C., Horton,R., Hunt,S.E., Scott,C.E., Gilbert,J.G., Clamp,M.E., Bethel,G., Milne,S., Ainscough,R., Almeida,J.P., Ambrose,K.D., Andrews,T.D., Ashwell,R.I., Babbage,A.K., Bagguley,C.L., Bailey,J., Banerjee,R., Barker,D.J., Barlow,K.F., Bates,K., Beare,D.M., Beasley,H., Beasley,O., Bird,C.P., Blakey,S., Bray-Allen,S., Brook,J., Brown,A.J., Brown,J.Y., Burford,D.C., Burrill,W., Burton,J., Carder,C., Carter,N.P., Chapman,J.C., Clark,S.Y., Clark,G., Clee,C.M., Clegg,S., Cobley,V., Collier,R.E., Collins,J.E., Colman,L.K., Corby,N.R., Coville,G.J., Culley,K.M., Dhami,P., Davies,J., Dunn,M., Earthrowl,M.E., Ellington,A.E., Evans,K.A., Faulkner,L., Francis,M.D., Frankish,A., Frankland,J., French,L., Garner,P., Garnett,J., Ghori,M.J., Gilby,L.M., Gillson,C.J., Glithero,R.J., Grafham,D.V., Grant,M., Gribble,S., Griffiths,C., Griffiths,M., Hall,R., Halls,K.S., Hammond,S., Harley,J.L., Hart,E.A., Heath,P.D., Heathcott,R., Holmes,S.J., Howden,P.J., Howe,K.L., Howell,G.R., Huckle,E., Humphray,S.J., Humphries,M.D., Hunt,A.R., Johnson,C.M., Joy,A.A., Kay,M., Keenan,S.J., Kimberley,A.M., King,A., Laird,G.K., Langford,C., Lawlor,S., Leongamornlert,D.A., Leversha,M., Lloyd,C.R., Lloyd,D.M., Loveland,J.E., Lovell,J., Martin,S., Mashreghi-Mohammadi,M., Maslen,G.L., Matthews,L., McCann,O.T., McLaren,S.J., McLay,K., McMurray,A., Moore,M.J., Mullikin,J.C., Niblett,D., Nickerson,T., Novik,K.L., Oliver,K., Overton-Larty,E.K., Parker,A., Patel,R., Pearce,A.V., Peck,A.I., Phillimore,B., Phillips,S., Plumb,R.W., Porter,K.M., Ramsey,Y., Ranby,S.A., Rice,C.M., Ross,M.T., Searle,S.M., Sehra,H.K., Sheridan,E., Skuce,C.D., Smith,S., Smith,M., Spraggon,L., Squares,S.L., Steward,C.A., Sycamore,N., Tamlyn-Hall,G., Tester,J., Theaker,A.J., Thomas,D.W., Thorpe,A., Tracey,A., Tromans,A., Tubby,B., Wall,M., Wallis,J.M., West,A.P., White,S.S., Whitehead,S.L., Whittaker,H., Wild,A., Willey,D.J., Wilmer,T.E., Wood,J.M., Wray,P.W., Wyatt,J.C., Young,L., Younger,R.M., Bentley,D.R., Coulson,A., Durbin,R., Hubbard,T., Sulston,J.E., Dunham,I., Rogers,J. and Beck,S. TITLE The DNA sequence and analysis of human chromosome 6 JOURNAL Nature 425 (6960), 805-811 (2003) PUBMED 14574404 REFERENCE 6 (bases 1 to 4396) AUTHORS Kobayashi,M., Sugumaran,G., Liu,J., Shworak,N.W., Silbert,J.E. and Rosenberg,R.D. TITLE Molecular cloning and characterization of a human uronyl 2-sulfotransferase that sulfates iduronyl and glucuronyl residues in dermatan/chondroitin sulfate JOURNAL J. Biol. Chem. 274 (15), 10474-10480 (1999) PUBMED 10187838 COMMENT VALIDATED REFSEQ: This record has undergone validation or preliminary review. The reference sequence was derived from AI570697.1, DB496757.1, BC093668.1, AB020316.1 and CA842761.1. On Jul 29, 2008 this sequence version replaced gi:5032218.


            Summary: Uronyl 2-sulfotransferase transfers sulfate to the

            2-position of uronyl residues, such as iduronyl residues in

            dermatan sulfate and glucuronyl residues in chondroitin sulfate

            (Kobayashi et al., 1999 [PubMed 10187838]).[supplied by OMIM, Mar

            2008].

##RefSeq-Attributes-START## Transcript_exon_combination_evidence :: AB020316.1, AK292922.1 [ECO:0000332] ##RefSeq-Attributes-END## PRIMARY REFSEQ_SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP 1-40 AI570697.1 1-40 41-248 DB496757.1 18-225 249-1537 BC093668.1 1-1289 1538-4015 AB020316.1 1345-3822 4016-4396 CA842761.1 1-381 c FEATURES Location/Qualifiers source 1..4396 ##序列范围 /organism="Homo sapiens" ##物种 /mol_type="mRNA" ##序列类型 /db_xref="taxon:9606" ##物种编号 /chromosome="6" ##所在染色体 /map="6q25.1" ##所在染色体区域 gene 1..4396 ##包括在基因中的序列范围 /gene="UST" ##基因名称 /gene_synonym="2OST" ##基因名称别名 /note="uronyl-2-sulfotransferase" ##基因名称说明 /db_xref="GeneID:10090" ##在GeneBank中的ID号 /db_xref="HGNC:17223" ##在HGNC数据库中的ID号 /db_xref="HPRD:10298" ##在HPRD数据库中的ID号 /db_xref="MIM:610752" ##在MIM数据库中的ID号 ##有关种物种的数据库代号说明我会另外说明 exon 1..543 ##一个外显子在序列中的区域 /gene="UST" /gene_synonym="2OST" /inference="alignment:Splign" ##外显子得到的比对所用工具 /number=1 STS 249..1540 ##STS（sequence target site)，在基因组中唯一存在的序列，用来作序列标记 /gene="UST" /gene_synonym="2OST" /db_xref="UniSTS:485655" ##在UniSTS数据库中ID号 misc_feature 249..251 ##生物学上有特殊意义，但区别于其它标记的区域 /gene="UST" /gene_synonym="2OST" /note="upstream in-frame stop codon" ##序列说明 CDS 297..1517 ##编码区 /gene="UST" /gene_synonym="2OST" /note="dermatan/chondroitin sulfate 2-sulfotransferase" ##编码产物说明 /codon_start=1 /product="uronyl 2-sulfotransferase" ##编码产物 /protein_id="NP_005706.1" ##蛋白ID /db_xref="GI:5032219" /db_xref="CCDS:CCDS5213.1" ##CCDS数据库ID /db_xref="GeneID:10090" /db_xref="HGNC:17223" /db_xref="HPRD:10298" /db_xref="MIM:610752" /translation="MKKKQQHPGGGADPWPHGAPMGGAPPGLGSWKRRVPLLPFLRFS LRDYGFCMATLLVFCLGSLLYQLSGGPPRFLLDLRQYLGNSTYLDDHGPPPSKVLPFP SQVVYNRVGKCGSRTVVLLLRILSEKHGFNLVTSDIHNKTRLTKNEQMELIKNISTAE QPYLFTRHVHFLNFSRFGGDQPVYINIIRDPVNRFLSNYFFRRFGDWRGEQNHMIRTP SMRQEERYLDINECILENYPECSNPRLFYIIPYFCGQHPRCREPGEWALERAKLNVNE NFLLVGILEELEDVLLLLERFLPHYFKGVLSIYKDPEHRKLGNMTVTVKKTVPSPEAV QILYQRMRYEYEFYHYVKEQFHLLKRKFGLKSHVSKPPLRPHFFIPTPLETEEPIDDE EQDDEKWLEDIYKR" ##翻译的氨基酸序列 misc_feature 444..506 /gene="UST" /gene_synonym="2OST" /inference="non-experimental evidence, no additional details recorded" /note="propagated from UniProtKB/Swiss-Prot (Q9Y2C2.1); transmembrane region" exon 544..587 /gene="UST" /gene_synonym="2OST" /inference="alignment:Splign" /number=2 exon 588..743 /gene="UST" /gene_synonym="2OST" /inference="alignment:Splign" /number=3 exon 744..823 /gene="UST" /gene_synonym="2OST" /inference="alignment:Splign" /number=4 exon 824..977 /gene="UST" /gene_synonym="2OST" /inference="alignment:Splign" /number=5 exon 978..1075 /gene="UST" /gene_synonym="2OST" /inference="alignment:Splign" /number=6 exon 1076..1233 /gene="UST" /gene_synonym="2OST" /inference="alignment:Splign" /number=7 exon 1234..4391 /gene="UST" /gene_synonym="2OST" /inference="alignment:Splign" /number=8 STS 4236..4336 /gene="UST" /gene_synonym="2OST" /standard_name="D6S1148E" /db_xref="UniSTS:83075" ORIGIN 1 ggcgcggcgg ggcgcggggc gtggggacgc tagcgggcgc cggacgggcg cggcgccccg 61 tcacgggcag cgccccgaac cggggccgga cacctcggcc gctcgggccg cggcggcggg 121 gaccatgccg aagaaagtct cctgagcccg gcaacttcgg cccctccccg cccccacccg 181 gctgccctcc gcgcggccct ccccatgtgc agccggccag ccgggctctc ctcctcgcgg 241 cggatgggtg accttttcct ggcacgggca ggctgtggga ggcagcggag caggcgatga 301 agaagaagca gcagcatccc ggcggcggcg cggatccctg gccccatggg gcccctatgg 361 ggggcgcccc tccgggcctg ggcagctgga agcgtcgggt gcccctgctg cctttcctgc 421 gcttctccct ccgggactac ggcttctgca tggccaccct gctggtcttc tgcctgggct 481 ccctcctcta tcagctcagc gggggacccc ctcgcttcct gctcgacctg cggcagtact 541 tgggaaattc cacttacttg gatgaccatg gaccacctcc tagtaaggta ctacctttcc 601 caagccaggt ggtgtacaac agggtaggca agtgtgggag ccgtactgtg gtcttgcttc 661 tgagaatctt gtcggagaag cacggattta atttggtcac atcagacatt cacaacaaaa 721 ccaggcttac taaaaatgaa caaatggaac tgattaaaaa tataagtact gccgaacaac 781 cctatttatt cactcgacat gttcatttcc tcaacttctc aaggtttgga ggagaccagc 841 ctgtctacat caacatcatt agagaccccg tcaaccggtt cttatccaac tattttttcc 901 gtcgctttgg agactggaga ggggaacaaa atcacatgat ccgcaccccc agcatgaggc 961 aggaggagcg ctacctggat atcaatgagt gtattcttga aaactatccc gagtgctcca 1021 accccaggtt attttacatc attccgtact tttgtggaca gcatcccaga tgcagggagc 1081 ctggtgaatg ggcccttgag agagcaaagc tgaacgtgaa tgaaaacttc ctgctcgtgg 1141 ggattcttga agagttggaa gatgtgctgc tgttactgga aagattttta cctcattact 1201 tcaagggcgt gctcagtatc tacaaagacc cagagcacag gaagcttgga aacatgactg 1261 tgacggtgaa gaagactgtc ccctctcctg aggctgtgca gatcctctac cagcggatga 1321 gatacgagta cgagttttac cactacgtca aagagcagtt ccacctgctg aagcgcaagt 1381 ttggacttaa gtctcacgtc agcaagcccc ccctgaggcc acacttcttt atcccaactc 1441 cactggaaac cgaggagcca atcgacgatg aagaacagga tgatgaaaag tggctggaag 1501 atatttataa gaggtgatgt gactgtgttg cctctatggc tttatctccc ttttccagaa 1561 agttctttgt ttggggaagt aaaatcctta agggactaaa ttaatgcttg ggtgcattaa 1621 aaagaacaaa acattcccac atgttggggt cattgggaga tgcccggttt tgcgggtttt 1681 atttgtttaa ttttattctg tgttttctct tggctctttg ggtctttccc gggtacacta 1741 gatggctcca tcccaaggca tcttgtcata aaacagcttt cccccacccc atatcatggg 1801 aaaaggggga gaaatatagc ccctagccta ataacttatc atttgtaaaa tgacttataa 1861 aaatattacc tcaatggtag gagacatcca gacttgtata tttcagtgga aatacaaaac 1921 cacttcagag accagggtat ctcctctgga aggatctaag agaaggtaag acagattagg 1981 acatcgaaaa ggaggatgga gccaggtgcc atggcttgag cctataatcc gaggctgagg 2041 tgggaggatc acttgagccc aggagtttga ggttgcagtg agctgtgatc acaccactgc 2101 actccagcct gggtgacaga gtgagactct gtctcaatta attttttttt tttaaaggag 2161 gaggatctcc atgggtaagt ggtttctacc cgcatgggta gagttctgcc tctggtcctt 2221 ctcagggggc actttcacca agagcagtgt aattatctct gaaagagcaa gtcagcttgt 2281 gccgcatccc caaccaatcc acagcctgga gtacctttca aggtcaaagt gcatggccag 2341 ctccattgag acattccatt tcaaagcacc gtgctgacag atatcaaagt actctagcag 2401 ggaaaataat ttgtttgctg tgtaaggaag aatgtagaca agacagataa atctgaaggt 2461 catgtggcat cagggaaagg gcatggctgt gtcttttgca cccaatatga aacatcttct 2521 cccaacactg ctttaatgga agttctagga accaatttag ctcaggcatt tgactcctac 2581 agcagaagtt ctgagcctga ccacagatgg tgtgtaatct atcaaacaca cccctggcca 2641 agttgggtcc tataggacct ggtactatgt actattgtaa cttctagttc cctaagaggt 2701 acctgttttc agtaaaaagg ggtcctgagt tctgtgcagg tggaagagct acccgagaac 2761 tacctgagtt ctgtgcaggt agagtcccat ttcttatggg acctgtgtgc tcctgagaac 2821 tcttacttga gacatcaaaa agaagcagca agagcttctg ggacagagac tgcttggcca 2881 gctttgtaag taagtggctg cctccaatgt gatgtgagta catgttgggc agtctcactg 2941 tcctaaggta tgtcttcttt ccacctccca ctgcccctcc cctgccacct atcaatgatg 3001 ccttggttca gtcattagaa atctgttgct ttgagttctg aaatattttc accttaaaaa 3061 aaatgctgaa aatacacatt ctcctgggaa gacgataaac agctagctaa gaagccgagg 3121 ttcagtggtg gcagcaggaa ggacactgcc acaaattttg tctatttcat atttgtcccc 3181 tagagccagc cctagcaaat gtgtgagttg ggagtagtta atagtaaata agactctgac 3241 tttacacaag ctacacattt tatacttttc ataaaccaca aagtctctct agaatttttt 3301 ctgccttcac taaaattgga ctgtagccaa gatataaagc aagtcatttg gaacctgccg 3361 agtgagcact gaagctactt tatcatgaga tgtgtgttaa gaaggctgca gcccacagga 3421 gtccagggaa ggcggggacc acagaggcac agagtccagc acttggccgc tcatgggcct 3481 tctttctgcc tcagaggacg ggggcagaga agtgatgaag ggaaatgttc ttagaggagg 3541 aaatatcctt tgtcctgttc agagagacca gggccctacc attaggcata ctttcagaag 3601 caacctggag aacagctatc aatcatattc aaaaccagta caagaactgc tgcctggtac 3661 cctgtgagtc atttctatga aattccatat aaagaatgat gataagttta cacactgtgc 3721 aatctcacaa tctgaaaata aagttgagtt ggctgtgttt tctctgctct tgtcagaaca 3781 ttgggacaat tggtcgttca aaaacattca tcctcttact gcaagtttat ctgggtactt 3841 ttacctgtgt gttcaaaggc atttcttttc agcagtgatc attataactt cacaaaaaaa 3901 gatgctgacg gatttactta cagggcctta atgttatttt gtcccagcca acaccctcta 3961 ggtcctaaaa gtcaaggtac ttcagtttat ttggcaaaca tgacaacatt ttttttggcc 4021 ctgggcccaa cagtttgtac ttcatgaaac atattgtaca ttttacatag tttaatttaa 4081 aaaatacctt ttaagctagt tgatctttga ctgtcttatt tattataacc tttcagcaca 4141 ttccaaggtt ttagttactc aggaaggagt taattaaaat gattttattt tggtctgatg 4201 gatgtttttt aaaaggaaaa ttattattat gaaccttcag cctactttct tgagtgccgt 4261 aaaagtgctt gtaaatcttt tttttttttt aagaagaaag aaaaaaatgg tgtttgacgt 4321 tgatggaaat tcaaaaatat atatggaact gaaacattaa cttagctaaa ataaaagcaa 4381 tctgtgtttg aaaaaa //
还有很多在此例子中没有出现，现按字母顺序做一个汇总，方便大家查阅。

attenuator	1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; 2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription.
C_region	Constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains. Includes one or more exons, depending on the particular chain.
CAAT_signal	CAAT box; part of a conserved sequence located about 75 bp upstream of the start point of eukaryotic transcription units that may be involved in RNA polymerase binding; consensus=GG(C or T)CAATCT.
CDS	coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon). Feature includes amino acid conceptual translation.
centromere	Region of chromosome to which spindle traction fibers attach during mitosis and meiosis. Must be experimentally characterized.
D-loop	Displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein.
D_segment	Diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain.
enhancer	A cis-acting sequence that increases the utilization of (some) eukaryotic promoters and can function in either orientation and in any location (upstream or downstream) relative to the promoter.
exon	Region of genome that codes for portion of spliced mRNA; may contain 5′ UTR, all CDSs, and 3′ UTR.
GC_signal	GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units that may occur in multiple copies or in either orientation; consensus=GGGCGG.
gene	Region of biological interest identified as a gene and for which a name has been assigned.
iDNA	Intervening DNA; DNA which is eliminated through any of several kinds of recombination.
intron	A segment of DNA that is transcribed, but removed from within the transcript, by splicing together the sequences (exons) on either side of it.
J_segment	Joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains.
LTR	Long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses.
mat_peptide	Mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification. The location does not include the stop codon (unlike the corresponding CDS).
misc_binding	Site in nucleic acid that covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer_bind or protein_bind).
misc_difference	Feature sequence is different from that presented in the entry and cannot be described by any other Difference key (unsure, mutation, variation, or modified_base).
misc_feature	Region of biological interest which cannot be described by any other feature key.
misc_recomb	Site of any generalized, site-specific, or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/proviral).
misc_RNA	Any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5’UTR, 3’UTR, exon, transit_peptide, polyA_site, rRNA, tRNA, and ncRNA).
misc_signal	Any region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin).
misc_structure	Any secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop).
mobile_element	Region of genome containing an element capable of or derived from movement from one location to another in the genome. The mobile_element_type qualifier is mandatory and a pull-down menu lists approved types. The name of the specific element can be given in the text box.
modified_base	The indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value).
mRNA	messenger RNA; includes 5′ untranslated region (5′ UTR), coding sequences (CDS, exon) and 3′ untranslated region (3′ UTR).
ncRNA	non-coding RNA; a non-protein-coding transcript other than ribosomal RNA and transfer RNA, including antisense RNA, guide RNA, scRNA, siRNA, miRNA, piRNA, snoRNA, and snRNA. The specific type of ncRNA must be specified in the /ncRNA_class qualifier.
N_region	Extra nucleotides inserted between rearranged immunoglobulin segments.
operon	Region containing polycistronic transcript under the control of the same regulatory sequences.
oriT	Origin of transfer; region of DNA where transfer is initiated during the process of conjugation or mobilization.
polyA_signal	Recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA.
polyA_site	Site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation.
precursor_RNA	Any RNA species that is not yet the mature RNA product; may include 5′ clipped region (5′ clip), 5′ untranslated region (5′ UTR), coding sequences (CDS, exon), intervening sequences (intron), 3′ untranslated region (3′ UTR), and 3′ clipped region (3′ clip).
prim_transcript	Primary (initial, unprocessed) transcript; includes 5′ clipped region (5′ clip), 5′ untranslated region (5′ UTR), coding sequences (CDS, exon), intervening sequences (intron), 3′ untranslated region (3′ UTR), and 3′ clipped region (3′ clip).
primer_bind	Non-covalent primer binding site for initiation of replication, transcription, or reverse transcription. Includes site(s) for synthetic e.g., PCR primer elements.
promoter	Region on a DNA molecule involved in RNA polymerase binding to initiate transcription.
protein_bind	Non-covalent protein binding site on nucleic acid.
RBS	Ribosome binding site.
repeat_region	Region of genome containing repeating units. Some qualifiers such as rpt_type and satellite have controlled vocabularies. These qualifiers have check boxes or pull-down menus to ensure that the correct format is used.
rep_origin	Origin of replication; starting site for duplication of nucleic acid to give two identical copies.
rRNA	Mature ribosomal RNA ; the RNA component of the ribonucleoprotein particle (ribosome) that assembles amino acids into proteins.
S_region	Switch region of immunoglobulin heavy chains. Involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell.
sig_peptide	Signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence.
source	Identifies the biological source of the specified span of the sequence. This key is mandatory. Every entry will have, as a minimum, a single source key spanning the entire sequence. More than one source key per sequence is permittable.
stem_loop	Hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA.
STS	Sequence Tagged Site. Short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR. A region of the genome can be mapped by determining the order of a series of STSs.
TATA_signal	TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit that may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T).
telomere	Experimentally characterized specialized DNA segment found at the ends of eukaryotic chromosomes.
terminator	Sequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein.
tmRNA	Transfer messenger RNA; acts as a tRNA first, then an mRNA that encodes a peptide tag.
transit_peptide	Transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post- translational import of the protein into the organelle.
tRNA	Mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence.
unsure	Author is unsure of exact sequence in this region.
V_region	Variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains. Codes for the variable amino terminal portion. Can be made up from V_segments, D_segments, N_regions, and J_segments.
V_segment	Variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains. Codes for most of the variable region (V_region) and the last few amino acids of the leader peptide.
variation	A related strain contains stable mutations from the same gene (e.g., RFLPs, polymorphisms, etc.) that differ from the presented sequence at this location (and possibly others).
3’UTR	Region near or at the 3′ end of a mature transcript (usually following the stop codon) that is not translated into a protein; trailer.
5’UTR	Region near or at the 5′ end of a mature transcript (usually preceding the initiation codon) that is not translated into a protein; leader.
-10_signal	Pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units that may be involved in binding RNA polymerase; consensus=TAtAaT.
-35_signal	A conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus = TTGACa or TGTTGACA.

资料来源：http://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp

生物信息博客

渡人，渡心，渡已

《GeneBank序列注释说明》上有 3 条评论

发表回复取消回复

《GeneBank序列注释说明》上有 3 条评论

发表回复 取消回复

发表回复取消回复