[Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split

Mark Dalphin mdalphin@amgen.com
Thu, 25 Jan 2001 13:51:03 -0800


Hilmar Lapp wrote:

> > - in plain simple genbank/embl terms
> >   <5..12> and <5.12>
> >    are valid, but
> >   >5..12, 5<..12, 5..12<, 5..>12
> >   are invalid.
>
> The GenBank documentation is somewhat inconsistent here. Let me quote:
>
> >From http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#FeaturesB
>
> <quote>
> If the "<" symbol precedes a base span, the sequence is partial on the
> 5' end (e.g., CDS  <1..206).  If the ">" symbol follows a base span,
> the
> sequence is partial on the 3' end (e.g., CDS   435..915>).
> </quote>
>
> >From http://www.ncbi.nlm.nih.gov/collab/FT/index.html
>
> <quote>
> CDS             <1..>336
>                 /codon_start=1
>                 /gene="IGHV1"
>                 /product="immunoglobulin heavy chain variable region"
> V_region        <1..>336
>                 /gene="IGHV1"
>                 /product="immunoglobulin heavy chain variable region"
> </quote>
>
> >From the BNF grammar definition of the feature table, to be found at
> http://www.ncbi.nlm.nih.gov/collab/FT/index.html#backus-naur
>
> <quote>
> local_location ::= <base_position> | <between_position> | <base_range>
> base_position ::= <integer> | <low_base_bound> | <high_base_bound> |
> <two_base_bound>
>
> low_base_bound ::= > <integer>
>
> high_base_bound ::= < <integer>
>
> two_base_bound ::= <base_position>.<base_position>
>
> between_position ::= <base_position>^<base_position>
>
> base_range ::= <base_position>..<base_position>
> </quote>
>

I just looked for an example at NCBI and found this:

http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=Nucleotide&list_uids=234355&dopt=GenBank

As you can see, the symbol '>' does end up BEFORE the position it is modifing which is
consistant with the BNF. Hope this helps...

LOCUS       S52564         10 bp    DNA             PRI       05-APR-1999
DEFINITION  Homo sapiens phenylalanine hydroxylase (PAH) gene, partial cds.
ACCESSION   S52564
VERSION     S52564.1  GI:234355
SOURCE      human.
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
FEATURES             Location/Qualifiers
     source          1..10
                     /organism="Homo sapiens"
                     /db_xref="taxon:9606"
     gene            <1..>10
                     /gene="PAH"
     CDS             <1..>10
                     /gene="PAH"
                     /note="missense mutation"
                     /codon_start=2
                     /product="phenylalanine hydroxylase"
                     /protein_id="AAD14912.2"
                     /db_xref="GI:4559419"
                     /translation="HGV"
     variation       5..7
                     /gene="PAH"
                     /note="Gly for Glu221"
BASE COUNT        3 a      2 c      3 g      2 t
ORIGIN
        1 ccatggagta
//

Mark Dalphin                          email: mdalphin@amgen.com
Mail Stop: 29-2-A                     phone: +1-805-447-4951 (work)
One Amgen Center Drive                       +1-805-375-0680 (home)
Thousand Oaks, CA 91320                 fax: +1-805-499-9955 (work)