GeneBank序列注释说明

GeneBank格式的序列中,有一些注释很难弄明白它的意思,通过搜索,终于找到了其原始的解释说明。
现以NM_005715为一个简单的例子(##后面为说明)。

LOCUS NM_005715 4396 bp mRNA linear PRI 27-JUN-2012 ##说明accession num、序列长度和最后更新日期
DEFINITION Homo sapiens uronyl-2-sulfotransferase (UST), mRNA. ##序列所在基因功能描述
ACCESSION NM_005715 ##accession num
VERSION NM_005715.2 GI:194578911 ##accession版本号和所在基因GI号
KEYWORDS .
SOURCE Homo sapiens (human) ##序列来源,即序列所在物种
ORGANISM Homo sapiens ##序列物种生物学分类
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 4396) ##参考文献序号
AUTHORS Uher,R., Perroud,N., Ng,M.Y., Hauser,J., Henigsberg,N., Maier,W., ##参考文献作者
Mors,O., Placentino,A., Rietschel,M., Souery,D., Zagar,T.,
Czerski,P.M., Jerman,B., Larsen,E.R., Schulze,T.G., Zobel,A.,
Cohen-Woods,S., Pirlo,K., Butler,A.W., Muglia,P., Barnes,M.R.,
Lathrop,M., Farmer,A., Breen,G., Aitchison,K.J., Craig,I.,
Lewis,C.M. and McGuffin,P.
TITLE Genome-wide pharmacogenetics of antidepressant response in the ##参考文献标题
GENDEP project
JOURNAL Am J Psychiatry 167 (5), 555-564 (2010) ##参考文献发表的杂志
PUBMED 20360315 ##在pubmed数据库中序号
REMARK GeneRIF: Clinical trial and genome-wide association study of
gene-disease association, gene-environment interaction, and
pharmacogenomic / toxicogenomic. (HuGE Navigator)
REFERENCE 2 (bases 1 to 4396)
AUTHORS Trynka,G., Zhernakova,A., Romanos,J., Franke,L., Hunt,K.A.,
Turner,G., Bruinenberg,M., Heap,G.A., Platteel,M., Ryan,A.W., de
Kovel,C., Holmes,G.K., Howdle,P.D., Walters,J.R., Sanders,D.S.,
Mulder,C.J., Mearin,M.L., Verbeek,W.H., Trimble,V., Stevens,F.M.,
Kelleher,D., Barisani,D., Bardella,M.T., McManus,R., van Heel,D.A.
and Wijmenga,C.
TITLE Coeliac disease-associated risk variants in TNFAIP3 and REL
implicate altered NF-kappaB signalling
JOURNAL Gut 58 (8), 1078-1083 (2009)
PUBMED 19240061
REMARK GeneRIF: Observational study of gene-disease association. (HuGE
Navigator)
REFERENCE 3 (bases 1 to 4396)
AUTHORS Xu,D., Song,D., Pedersen,L.C. and Liu,J.
TITLE Mutational study of heparan sulfate 2-O-sulfotransferase and
chondroitin sulfate 2-O-sulfotransferase
JOURNAL J. Biol. Chem. 282 (11), 8356-8367 (2007)
PUBMED 17227754
REMARK GeneRIF: analysis of differences and similarities various residues
play in the biological roles of the HS-2OST and CS-2OST enzymes
REFERENCE 4 (bases 1 to 4396)
AUTHORS Ohtake,S., Kimata,K. and Habuchi,O.
TITLE Recognition of sulfation pattern of chondroitin sulfate by uronosyl
2-O-sulfotransferase
JOURNAL J. Biol. Chem. 280 (47), 39115-39123 (2005)
PUBMED 16192264
REMARK GeneRIF: 2OST transfers sulfate preferentially to the GlcA residue
located in a unique sequence, -GalNAc(4SO(4))-GlcA-GalNAc(6SO(4))-.
REFERENCE 5 (bases 1 to 4396)
AUTHORS Mungall,A.J., Palmer,S.A., Sims,S.K., Edwards,C.A., Ashurst,J.L.,
Wilming,L., Jones,M.C., Horton,R., Hunt,S.E., Scott,C.E.,
Gilbert,J.G., Clamp,M.E., Bethel,G., Milne,S., Ainscough,R.,
Almeida,J.P., Ambrose,K.D., Andrews,T.D., Ashwell,R.I.,
Babbage,A.K., Bagguley,C.L., Bailey,J., Banerjee,R., Barker,D.J.,
Barlow,K.F., Bates,K., Beare,D.M., Beasley,H., Beasley,O.,
Bird,C.P., Blakey,S., Bray-Allen,S., Brook,J., Brown,A.J.,
Brown,J.Y., Burford,D.C., Burrill,W., Burton,J., Carder,C.,
Carter,N.P., Chapman,J.C., Clark,S.Y., Clark,G., Clee,C.M.,
Clegg,S., Cobley,V., Collier,R.E., Collins,J.E., Colman,L.K.,
Corby,N.R., Coville,G.J., Culley,K.M., Dhami,P., Davies,J.,
Dunn,M., Earthrowl,M.E., Ellington,A.E., Evans,K.A., Faulkner,L.,
Francis,M.D., Frankish,A., Frankland,J., French,L., Garner,P.,
Garnett,J., Ghori,M.J., Gilby,L.M., Gillson,C.J., Glithero,R.J.,
Grafham,D.V., Grant,M., Gribble,S., Griffiths,C., Griffiths,M.,
Hall,R., Halls,K.S., Hammond,S., Harley,J.L., Hart,E.A.,
Heath,P.D., Heathcott,R., Holmes,S.J., Howden,P.J., Howe,K.L.,
Howell,G.R., Huckle,E., Humphray,S.J., Humphries,M.D., Hunt,A.R.,
Johnson,C.M., Joy,A.A., Kay,M., Keenan,S.J., Kimberley,A.M.,
King,A., Laird,G.K., Langford,C., Lawlor,S., Leongamornlert,D.A.,
Leversha,M., Lloyd,C.R., Lloyd,D.M., Loveland,J.E., Lovell,J.,
Martin,S., Mashreghi-Mohammadi,M., Maslen,G.L., Matthews,L.,
McCann,O.T., McLaren,S.J., McLay,K., McMurray,A., Moore,M.J.,
Mullikin,J.C., Niblett,D., Nickerson,T., Novik,K.L., Oliver,K.,
Overton-Larty,E.K., Parker,A., Patel,R., Pearce,A.V., Peck,A.I.,
Phillimore,B., Phillips,S., Plumb,R.W., Porter,K.M., Ramsey,Y.,
Ranby,S.A., Rice,C.M., Ross,M.T., Searle,S.M., Sehra,H.K.,
Sheridan,E., Skuce,C.D., Smith,S., Smith,M., Spraggon,L.,
Squares,S.L., Steward,C.A., Sycamore,N., Tamlyn-Hall,G., Tester,J.,
Theaker,A.J., Thomas,D.W., Thorpe,A., Tracey,A., Tromans,A.,
Tubby,B., Wall,M., Wallis,J.M., West,A.P., White,S.S.,
Whitehead,S.L., Whittaker,H., Wild,A., Willey,D.J., Wilmer,T.E.,
Wood,J.M., Wray,P.W., Wyatt,J.C., Young,L., Younger,R.M.,
Bentley,D.R., Coulson,A., Durbin,R., Hubbard,T., Sulston,J.E.,
Dunham,I., Rogers,J. and Beck,S.
TITLE The DNA sequence and analysis of human chromosome 6
JOURNAL Nature 425 (6960), 805-811 (2003)
PUBMED 14574404
REFERENCE 6 (bases 1 to 4396)
AUTHORS Kobayashi,M., Sugumaran,G., Liu,J., Shworak,N.W., Silbert,J.E. and
Rosenberg,R.D.
TITLE Molecular cloning and characterization of a human uronyl
2-sulfotransferase that sulfates iduronyl and glucuronyl residues
in dermatan/chondroitin sulfate
JOURNAL J. Biol. Chem. 274 (15), 10474-10480 (1999)
PUBMED 10187838
COMMENT VALIDATED REFSEQ: This record has undergone validation or
preliminary review. The reference sequence was derived from
AI570697.1, DB496757.1, BC093668.1, AB020316.1 and CA842761.1.
On Jul 29, 2008 this sequence version replaced gi:5032218.

Summary: Uronyl 2-sulfotransferase transfers sulfate to the
2-position of uronyl residues, such as iduronyl residues in
dermatan sulfate and glucuronyl residues in chondroitin sulfate
(Kobayashi et al., 1999 [PubMed 10187838]).[supplied by OMIM, Mar
2008].

##RefSeq-Attributes-START##
Transcript_exon_combination_evidence :: AB020316.1, AK292922.1
[ECO:0000332]
##RefSeq-Attributes-END##
PRIMARY REFSEQ_SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP
1-40 AI570697.1 1-40
41-248 DB496757.1 18-225
249-1537 BC093668.1 1-1289
1538-4015 AB020316.1 1345-3822
4016-4396 CA842761.1 1-381 c
FEATURES Location/Qualifiers
source 1..4396 ##序列范围
/organism="Homo sapiens" ##物种
/mol_type="mRNA" ##序列类型
/db_xref="taxon:9606" ##物种编号
/chromosome="6" ##所在染色体
/map="6q25.1" ##所在染色体区域
gene 1..4396 ##包括在基因中的序列范围
/gene="UST" ##基因名称
/gene_synonym="2OST" ##基因名称别名
/note="uronyl-2-sulfotransferase" ##基因名称说明
/db_xref="GeneID:10090" ##在GeneBank中的ID号
/db_xref="HGNC:17223" ##在HGNC数据库中的ID号
/db_xref="HPRD:10298" ##在HPRD数据库中的ID号
/db_xref="MIM:610752" ##在MIM数据库中的ID号
##有关种物种的数据库代号说明我会另外说明
exon 1..543 ##一个外显子在序列中的区域
/gene="UST"
/gene_synonym="2OST"
/inference="alignment:Splign" ##外显子得到的比对所用工具
/number=1
STS 249..1540 ##STS(sequence target site),在基因组中唯一存在的序列,用来作序列标记
/gene="UST"
/gene_synonym="2OST"
/db_xref="UniSTS:485655" ##在UniSTS数据库中ID号
misc_feature 249..251 ##生物学上有特殊意义,但区别于其它标记的区域
/gene="UST"
/gene_synonym="2OST"
/note="upstream in-frame stop codon" ##序列说明
CDS 297..1517 ##编码区
/gene="UST"
/gene_synonym="2OST"
/note="dermatan/chondroitin sulfate 2-sulfotransferase" ##编码产物说明
/codon_start=1
/product="uronyl 2-sulfotransferase" ##编码产物
/protein_id="NP_005706.1" ##蛋白ID
/db_xref="GI:5032219"
/db_xref="CCDS:CCDS5213.1" ##CCDS数据库ID
/db_xref="GeneID:10090"
/db_xref="HGNC:17223"
/db_xref="HPRD:10298"
/db_xref="MIM:610752"
/translation="MKKKQQHPGGGADPWPHGAPMGGAPPGLGSWKRRVPLLPFLRFS
LRDYGFCMATLLVFCLGSLLYQLSGGPPRFLLDLRQYLGNSTYLDDHGPPPSKVLPFP
SQVVYNRVGKCGSRTVVLLLRILSEKHGFNLVTSDIHNKTRLTKNEQMELIKNISTAE
QPYLFTRHVHFLNFSRFGGDQPVYINIIRDPVNRFLSNYFFRRFGDWRGEQNHMIRTP
SMRQEERYLDINECILENYPECSNPRLFYIIPYFCGQHPRCREPGEWALERAKLNVNE
NFLLVGILEELEDVLLLLERFLPHYFKGVLSIYKDPEHRKLGNMTVTVKKTVPSPEAV
QILYQRMRYEYEFYHYVKEQFHLLKRKFGLKSHVSKPPLRPHFFIPTPLETEEPIDDE
EQDDEKWLEDIYKR" ##翻译的氨基酸序列
misc_feature 444..506
/gene="UST"
/gene_synonym="2OST"
/inference="non-experimental evidence, no additional
details recorded"
/note="propagated from UniProtKB/Swiss-Prot (Q9Y2C2.1);
transmembrane region"
exon 544..587
/gene="UST"
/gene_synonym="2OST"
/inference="alignment:Splign"
/number=2
exon 588..743
/gene="UST"
/gene_synonym="2OST"
/inference="alignment:Splign"
/number=3
exon 744..823
/gene="UST"
/gene_synonym="2OST"
/inference="alignment:Splign"
/number=4
exon 824..977
/gene="UST"
/gene_synonym="2OST"
/inference="alignment:Splign"
/number=5
exon 978..1075
/gene="UST"
/gene_synonym="2OST"
/inference="alignment:Splign"
/number=6
exon 1076..1233
/gene="UST"
/gene_synonym="2OST"
/inference="alignment:Splign"
/number=7
exon 1234..4391
/gene="UST"
/gene_synonym="2OST"
/inference="alignment:Splign"
/number=8
STS 4236..4336
/gene="UST"
/gene_synonym="2OST"
/standard_name="D6S1148E"
/db_xref="UniSTS:83075"
ORIGIN
1 ggcgcggcgg ggcgcggggc gtggggacgc tagcgggcgc cggacgggcg cggcgccccg
61 tcacgggcag cgccccgaac cggggccgga cacctcggcc gctcgggccg cggcggcggg
121 gaccatgccg aagaaagtct cctgagcccg gcaacttcgg cccctccccg cccccacccg
181 gctgccctcc gcgcggccct ccccatgtgc agccggccag ccgggctctc ctcctcgcgg
241 cggatgggtg accttttcct ggcacgggca ggctgtggga ggcagcggag caggcgatga
301 agaagaagca gcagcatccc ggcggcggcg cggatccctg gccccatggg gcccctatgg
361 ggggcgcccc tccgggcctg ggcagctgga agcgtcgggt gcccctgctg cctttcctgc
421 gcttctccct ccgggactac ggcttctgca tggccaccct gctggtcttc tgcctgggct
481 ccctcctcta tcagctcagc gggggacccc ctcgcttcct gctcgacctg cggcagtact
541 tgggaaattc cacttacttg gatgaccatg gaccacctcc tagtaaggta ctacctttcc
601 caagccaggt ggtgtacaac agggtaggca agtgtgggag ccgtactgtg gtcttgcttc
661 tgagaatctt gtcggagaag cacggattta atttggtcac atcagacatt cacaacaaaa
721 ccaggcttac taaaaatgaa caaatggaac tgattaaaaa tataagtact gccgaacaac
781 cctatttatt cactcgacat gttcatttcc tcaacttctc aaggtttgga ggagaccagc
841 ctgtctacat caacatcatt agagaccccg tcaaccggtt cttatccaac tattttttcc
901 gtcgctttgg agactggaga ggggaacaaa atcacatgat ccgcaccccc agcatgaggc
961 aggaggagcg ctacctggat atcaatgagt gtattcttga aaactatccc gagtgctcca
1021 accccaggtt attttacatc attccgtact tttgtggaca gcatcccaga tgcagggagc
1081 ctggtgaatg ggcccttgag agagcaaagc tgaacgtgaa tgaaaacttc ctgctcgtgg
1141 ggattcttga agagttggaa gatgtgctgc tgttactgga aagattttta cctcattact
1201 tcaagggcgt gctcagtatc tacaaagacc cagagcacag gaagcttgga aacatgactg
1261 tgacggtgaa gaagactgtc ccctctcctg aggctgtgca gatcctctac cagcggatga
1321 gatacgagta cgagttttac cactacgtca aagagcagtt ccacctgctg aagcgcaagt
1381 ttggacttaa gtctcacgtc agcaagcccc ccctgaggcc acacttcttt atcccaactc
1441 cactggaaac cgaggagcca atcgacgatg aagaacagga tgatgaaaag tggctggaag
1501 atatttataa gaggtgatgt gactgtgttg cctctatggc tttatctccc ttttccagaa
1561 agttctttgt ttggggaagt aaaatcctta agggactaaa ttaatgcttg ggtgcattaa
1621 aaagaacaaa acattcccac atgttggggt cattgggaga tgcccggttt tgcgggtttt
1681 atttgtttaa ttttattctg tgttttctct tggctctttg ggtctttccc gggtacacta
1741 gatggctcca tcccaaggca tcttgtcata aaacagcttt cccccacccc atatcatggg
1801 aaaaggggga gaaatatagc ccctagccta ataacttatc atttgtaaaa tgacttataa
1861 aaatattacc tcaatggtag gagacatcca gacttgtata tttcagtgga aatacaaaac
1921 cacttcagag accagggtat ctcctctgga aggatctaag agaaggtaag acagattagg
1981 acatcgaaaa ggaggatgga gccaggtgcc atggcttgag cctataatcc gaggctgagg
2041 tgggaggatc acttgagccc aggagtttga ggttgcagtg agctgtgatc acaccactgc
2101 actccagcct gggtgacaga gtgagactct gtctcaatta attttttttt tttaaaggag
2161 gaggatctcc atgggtaagt ggtttctacc cgcatgggta gagttctgcc tctggtcctt
2221 ctcagggggc actttcacca agagcagtgt aattatctct gaaagagcaa gtcagcttgt
2281 gccgcatccc caaccaatcc acagcctgga gtacctttca aggtcaaagt gcatggccag
2341 ctccattgag acattccatt tcaaagcacc gtgctgacag atatcaaagt actctagcag
2401 ggaaaataat ttgtttgctg tgtaaggaag aatgtagaca agacagataa atctgaaggt
2461 catgtggcat cagggaaagg gcatggctgt gtcttttgca cccaatatga aacatcttct
2521 cccaacactg ctttaatgga agttctagga accaatttag ctcaggcatt tgactcctac
2581 agcagaagtt ctgagcctga ccacagatgg tgtgtaatct atcaaacaca cccctggcca
2641 agttgggtcc tataggacct ggtactatgt actattgtaa cttctagttc cctaagaggt
2701 acctgttttc agtaaaaagg ggtcctgagt tctgtgcagg tggaagagct acccgagaac
2761 tacctgagtt ctgtgcaggt agagtcccat ttcttatggg acctgtgtgc tcctgagaac
2821 tcttacttga gacatcaaaa agaagcagca agagcttctg ggacagagac tgcttggcca
2881 gctttgtaag taagtggctg cctccaatgt gatgtgagta catgttgggc agtctcactg
2941 tcctaaggta tgtcttcttt ccacctccca ctgcccctcc cctgccacct atcaatgatg
3001 ccttggttca gtcattagaa atctgttgct ttgagttctg aaatattttc accttaaaaa
3061 aaatgctgaa aatacacatt ctcctgggaa gacgataaac agctagctaa gaagccgagg
3121 ttcagtggtg gcagcaggaa ggacactgcc acaaattttg tctatttcat atttgtcccc
3181 tagagccagc cctagcaaat gtgtgagttg ggagtagtta atagtaaata agactctgac
3241 tttacacaag ctacacattt tatacttttc ataaaccaca aagtctctct agaatttttt
3301 ctgccttcac taaaattgga ctgtagccaa gatataaagc aagtcatttg gaacctgccg
3361 agtgagcact gaagctactt tatcatgaga tgtgtgttaa gaaggctgca gcccacagga
3421 gtccagggaa ggcggggacc acagaggcac agagtccagc acttggccgc tcatgggcct
3481 tctttctgcc tcagaggacg ggggcagaga agtgatgaag ggaaatgttc ttagaggagg
3541 aaatatcctt tgtcctgttc agagagacca gggccctacc attaggcata ctttcagaag
3601 caacctggag aacagctatc aatcatattc aaaaccagta caagaactgc tgcctggtac
3661 cctgtgagtc atttctatga aattccatat aaagaatgat gataagttta cacactgtgc
3721 aatctcacaa tctgaaaata aagttgagtt ggctgtgttt tctctgctct tgtcagaaca
3781 ttgggacaat tggtcgttca aaaacattca tcctcttact gcaagtttat ctgggtactt
3841 ttacctgtgt gttcaaaggc atttcttttc agcagtgatc attataactt cacaaaaaaa
3901 gatgctgacg gatttactta cagggcctta atgttatttt gtcccagcca acaccctcta
3961 ggtcctaaaa gtcaaggtac ttcagtttat ttggcaaaca tgacaacatt ttttttggcc
4021 ctgggcccaa cagtttgtac ttcatgaaac atattgtaca ttttacatag tttaatttaa
4081 aaaatacctt ttaagctagt tgatctttga ctgtcttatt tattataacc tttcagcaca
4141 ttccaaggtt ttagttactc aggaaggagt taattaaaat gattttattt tggtctgatg
4201 gatgtttttt aaaaggaaaa ttattattat gaaccttcag cctactttct tgagtgccgt
4261 aaaagtgctt gtaaatcttt tttttttttt aagaagaaag aaaaaaatgg tgtttgacgt
4321 tgatggaaat tcaaaaatat atatggaact gaaacattaa cttagctaaa ataaaagcaa
4381 tctgtgtttg aaaaaa
//

还有很多在此例子中没有出现,现按字母顺序做一个汇总,方便大家查阅。

attenuator 1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; 2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription.
C_region Constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains. Includes one or more exons, depending on the particular chain.
CAAT_signal CAAT box; part of a conserved sequence located about 75 bp upstream of the start point of eukaryotic transcription units that may be involved in RNA polymerase binding; consensus=GG(C or T)CAATCT.
CDS coding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon). Feature includes amino acid conceptual translation.
centromere Region of chromosome to which spindle traction fibers attach during mitosis and meiosis. Must be experimentally characterized.
D-loop Displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein.
D_segment Diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain.
enhancer A cis-acting sequence that increases the utilization of (some) eukaryotic promoters and can function in either orientation and in any location (upstream or downstream) relative to the promoter.
exon Region of genome that codes for portion of spliced mRNA; may contain 5′ UTR, all CDSs, and 3′ UTR.
GC_signal GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units that may occur in multiple copies or in either orientation; consensus=GGGCGG.
gene Region of biological interest identified as a gene and for which a name has been assigned.
iDNA Intervening DNA; DNA which is eliminated through any of several kinds of recombination.
intron A segment of DNA that is transcribed, but removed from within the transcript, by splicing together the sequences (exons) on either side of it.
J_segment Joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains.
LTR Long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses.
mat_peptide Mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification. The location does not include the stop codon (unlike the corresponding CDS).
misc_binding Site in nucleic acid that covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer_bind or protein_bind).
misc_difference Feature sequence is different from that presented in the entry and cannot be described by any other Difference key (unsure, mutation, variation, or modified_base).
misc_feature Region of biological interest which cannot be described by any other feature key.
misc_recomb Site of any generalized, site-specific, or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/proviral).
misc_RNA Any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5’UTR, 3’UTR, exon, transit_peptide, polyA_site, rRNA, tRNA, and ncRNA).
misc_signal Any region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin).
misc_structure Any secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop).
mobile_element Region of genome containing an element capable of or derived from movement from one location to another in the genome. The mobile_element_type qualifier is mandatory and a pull-down menu lists approved types. The name of the specific element can be given in the text box.
modified_base The indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value).
mRNA messenger RNA; includes 5′ untranslated region (5′ UTR), coding sequences (CDS, exon) and 3′ untranslated region (3′ UTR).
ncRNA non-coding RNA; a non-protein-coding transcript other than ribosomal RNA and transfer RNA, including antisense RNA, guide RNA, scRNA, siRNA, miRNA, piRNA, snoRNA, and snRNA. The specific type of ncRNA must be specified in the /ncRNA_class qualifier.
N_region Extra nucleotides inserted between rearranged immunoglobulin segments.
operon Region containing polycistronic transcript under the control of the same regulatory sequences.
oriT Origin of transfer; region of DNA where transfer is initiated during the process of conjugation or mobilization.
polyA_signal Recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA.
polyA_site Site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation.
precursor_RNA Any RNA species that is not yet the mature RNA product; may include 5′ clipped region (5′ clip), 5′ untranslated region (5′ UTR), coding sequences (CDS, exon), intervening sequences (intron), 3′ untranslated region (3′ UTR), and 3′ clipped region (3′ clip).
prim_transcript Primary (initial, unprocessed) transcript; includes 5′ clipped region (5′ clip), 5′ untranslated region (5′ UTR), coding sequences (CDS, exon), intervening sequences (intron), 3′ untranslated region (3′ UTR), and 3′ clipped region (3′ clip).
primer_bind Non-covalent primer binding site for initiation of replication, transcription, or reverse transcription. Includes site(s) for synthetic e.g., PCR primer elements.
promoter Region on a DNA molecule involved in RNA polymerase binding to initiate transcription.
protein_bind Non-covalent protein binding site on nucleic acid.
RBS Ribosome binding site.
repeat_region Region of genome containing repeating units. Some qualifiers such as rpt_type and satellite have controlled vocabularies. These qualifiers have check boxes or pull-down menus to ensure that the correct format is used.
rep_origin Origin of replication; starting site for duplication of nucleic acid to give two identical copies.
rRNA Mature ribosomal RNA ; the RNA component of the ribonucleoprotein particle (ribosome) that assembles amino acids into proteins.
S_region Switch region of immunoglobulin heavy chains. Involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell.
sig_peptide Signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence.
source Identifies the biological source of the specified span of the sequence. This key is mandatory. Every entry will have, as a minimum, a single source key spanning the entire sequence. More than one source key per sequence is permittable.
stem_loop Hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA.
STS Sequence Tagged Site. Short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR. A region of the genome can be mapped by determining the order of a series of STSs.
TATA_signal TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit that may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T).
telomere Experimentally characterized specialized DNA segment found at the ends of eukaryotic chromosomes.
terminator Sequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein.
tmRNA Transfer messenger RNA; acts as a tRNA first, then an mRNA that encodes a peptide tag.
transit_peptide Transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post- translational import of the protein into the organelle.
tRNA Mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence.
unsure Author is unsure of exact sequence in this region.
V_region Variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains. Codes for the variable amino terminal portion. Can be made up from V_segments, D_segments, N_regions, and J_segments.
V_segment Variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains. Codes for most of the variable region (V_region) and the last few amino acids of the leader peptide.
variation A related strain contains stable mutations from the same gene (e.g., RFLPs, polymorphisms, etc.) that differ from the presented sequence at this location (and possibly others).
3’UTR Region near or at the 3′ end of a mature transcript (usually following the stop codon) that is not translated into a protein; trailer.
5’UTR Region near or at the 5′ end of a mature transcript (usually preceding the initiation codon) that is not translated into a protein; leader.
-10_signal Pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units that may be involved in binding RNA polymerase; consensus=TAtAaT.
-35_signal A conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus = TTGACa or TGTTGACA.

资料来源:http://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp

GeneBank序列注释说明》上有 3 条评论

回复 sunqixin 取消回复

您的电子邮箱地址不会被公开。 必填项已用*标注