Appendix E - Appendix E to Subpart G of Part 1—List of Feature Keys Related to Nucleotide Sequences

Source: World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (2009).

Key Description allelea related individual or strain contains stable, alternative forms of the same gene, which differs from the presented sequence at this location (and perhaps others). attenuator(1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; (2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription. C__regionconstant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; includes one or more exons depending on the particular chain. CAAT__signalCAAT box; part of a conserved sequence located about 75 bp upstream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG (C or T) CAATCT. CDScoding sequence; sequence of nucleotides that corresponds with the sequence of amino acids in a protein (location includes stop codon); feature includes amino acid conceptual translation. conflictindependent determinations of the “same” sequence differ at this site or region. D-loopdisplacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein. D-segmentdiversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain. enhancera cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter. exonregion of genome that codes for portion of spliced mRNA; may contain 5′Uspan, all CDSs, and 3′Uspan. GC__signalGC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG. generegion of biological interest identified as a gene and for which a name has been assigned. iDNAintervening DNA; DNA which is eliminated through any of several kinds of recombination. introna segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it. J__segmentjoining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains. Lspanlong terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses. mat__peptidemature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification; the location does not include the stop codon (unlike the corresponding CDS). misc__bindingsite in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer__bind or protein__bind). misc__differencefeature sequence is different from that presented in the entry and cannot be described by any other Difference key (conflict, unsure, old__sequence, mutation, variation, allele, or modified__base). misc__featureregion of biological interest which cannot be described by any other feature key; a new or rare feature. misc__recombsite of any generalized, site-specific or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/insertion__seq, /transposon, /proviral). misc__RNAany transcript or RNA product that cannot be defined by other RNA keys (prim__transcript, precursor__RNA, mRNA, 5′clip, 3′clip, 5′Uspan, 3′Uspan, exon, CDS, sig__peptide, transit__peptide, mat__peptide, intron, polyA__site, rRNA, tRNA, scRNA, and snRNA). misc__signalany region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT__signal, TATA__signal, -35__signal, -10__signal, GC__signal, RBS, polyA__signal, enhancer, attenuator, terminator, and rep__origin). misc__structureany secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem__loop and D-loop). modified__basethe indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod__base qualifier value). mRNAmessenger RNA; includes 5′ untranslated region (5′Uspan), coding sequences (CDS, exon) and 3′ untranslated region (3′Uspan). mutationa related strain has an abrupt, inheritable change in the sequence at this location. N__regionextra nucleotides inserted between rearranged immunoglobulin segments. old__sequencethe presented sequence revises a previous version of the sequence at this location. polyA__signalrecognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA. polyA__sitesite on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation. precursor__RNAany RNA species that is not yet the mature RNA product; may include 5′ clipped region (5′clip), 5′ untranslated region (5′Uspan), coding sequences (CDS, exon), intervening sequences (intron), 3′ untranslated region (3′Uspan), and 3′ clipped region (3′clip). prim__transcriptprimary (initial, unprocessed) transcript; includes 5′ clipped region (5′clip), 5′ untranslated region (5′Uspan), coding sequences (CDS, exon), intervening sequences (intron), 3′ untranslated region (3′Uspan), and 3′ clipped region (3′clip). primer__bindnon-covalent primer binding site for initiation of replication, transcription, or reverse transcription; includes site(s) for synthetic, for example, PCR primer elements. promoterregion on a DNA molecule involved in RNA polymerase binding to initiate transcription. protein__bindnon-covalent protein binding site on nucleic acid. RBSribosome binding site. repeat__regionregion of genome containing repeating units. repeat__unitsingle repeat element. rep__originorigin of replication; starting site for duplication of nucleic acid to give two identical copies. rRNAmature ribosomal RNA; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins. S__regionswitch region of immunoglobulin heavy chains; involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell. satellitemany tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA. scRNAsmall cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote. sig__peptidesignal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence. snRNAsmall nuclear RNA; any one of many small RNA species confined to the nucleus; several of the snRNAs are involved in splicing or other RNA processing reactions. sourceidentifies the biological source of the specified span of the sequence; this key is mandatory; every entry will have, as a minimum, a single source key spanning the entire sequence; more than one source key per sequence is permissible. stem__loophairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA. STSSequence Tagged Site; short, single-copy DNA sequence that characterizes a mapping landmark on the genome and can be detected by PCR; a region of the genome can be mapped by determining the order of a series of STSs. TATA__signalTATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T). terminatorsequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein. transit__peptidetransit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post-translational import of the protein into the organelle. tRNAmature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence. unsureauthor is unsure of exact sequence in this region. V__regionvariable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for the variable amino terminal portion; can be made up from V__segments, D__segments, N__regions, and J__segments. V__segmentvariable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains; codes for most of the variable region (V__region) and the last few amino acids of the leader peptide. variationa related strain contains stable mutations from the same gene (for example, RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others). 3′clip3′-most region of a precursor transcript that is clipped off during processing. 3′Uspanregion at the 3′ end of a mature transcript (following the stop codon) that is not translated into a protein. 5′clip5′-most region of a precursor transcript that is clipped off during processing. 5′Uspanregion at the 5′ end of a mature transcript (preceding the initiation codon) that is not translated into a protein. −10__signalpribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT. −35__signala conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus=TTGACa [ ] or TGTTGACA [ ].
[86 FR 57052, Oct. 14, 2021]