[Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split
Mark Dalphin
mdalphin@amgen.com
Thu, 25 Jan 2001 13:51:03 -0800
Hilmar Lapp wrote:
> > - in plain simple genbank/embl terms
> > <5..12> and <5.12>
> > are valid, but
> > >5..12, 5<..12, 5..12<, 5..>12
> > are invalid.
>
> The GenBank documentation is somewhat inconsistent here. Let me quote:
>
> >From http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#FeaturesB
>
> <quote>
> If the "<" symbol precedes a base span, the sequence is partial on the
> 5' end (e.g., CDS <1..206). If the ">" symbol follows a base span,
> the
> sequence is partial on the 3' end (e.g., CDS 435..915>).
> </quote>
>
> >From http://www.ncbi.nlm.nih.gov/collab/FT/index.html
>
> <quote>
> CDS <1..>336
> /codon_start=1
> /gene="IGHV1"
> /product="immunoglobulin heavy chain variable region"
> V_region <1..>336
> /gene="IGHV1"
> /product="immunoglobulin heavy chain variable region"
> </quote>
>
> >From the BNF grammar definition of the feature table, to be found at
> http://www.ncbi.nlm.nih.gov/collab/FT/index.html#backus-naur
>
> <quote>
> local_location ::= <base_position> | <between_position> | <base_range>
> base_position ::= <integer> | <low_base_bound> | <high_base_bound> |
> <two_base_bound>
>
> low_base_bound ::= > <integer>
>
> high_base_bound ::= < <integer>
>
> two_base_bound ::= <base_position>.<base_position>
>
> between_position ::= <base_position>^<base_position>
>
> base_range ::= <base_position>..<base_position>
> </quote>
>
I just looked for an example at NCBI and found this:
http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=Nucleotide&list_uids=234355&dopt=GenBank
As you can see, the symbol '>' does end up BEFORE the position it is modifing which is
consistant with the BNF. Hope this helps...
LOCUS S52564 10 bp DNA PRI 05-APR-1999
DEFINITION Homo sapiens phenylalanine hydroxylase (PAH) gene, partial cds.
ACCESSION S52564
VERSION S52564.1 GI:234355
SOURCE human.
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
FEATURES Location/Qualifiers
source 1..10
/organism="Homo sapiens"
/db_xref="taxon:9606"
gene <1..>10
/gene="PAH"
CDS <1..>10
/gene="PAH"
/note="missense mutation"
/codon_start=2
/product="phenylalanine hydroxylase"
/protein_id="AAD14912.2"
/db_xref="GI:4559419"
/translation="HGV"
variation 5..7
/gene="PAH"
/note="Gly for Glu221"
BASE COUNT 3 a 2 c 3 g 2 t
ORIGIN
1 ccatggagta
//
Mark Dalphin email: mdalphin@amgen.com
Mail Stop: 29-2-A phone: +1-805-447-4951 (work)
One Amgen Center Drive +1-805-375-0680 (home)
Thousand Oaks, CA 91320 fax: +1-805-499-9955 (work)