Bioperl: Re: feature parsing for GenBank/EMBL

James Gilbert jgrg@sanger.ac.uk
Mon, 8 May 2000 12:23:06 +0100 (BST)



Hilmar,

We are aware that we ignore these kind of
features.  I think we'll contine to do so until
someone complains that they NEED to parse them.  
As Ewan suggested in a previous email, the
GenBank/EMBL/swissprot parsers probably need
reworking, so that such features could be handled
by users' custom object handlers.

	Cheers,  James

On Mon, 8 May 2000, Hilmar Lapp wrote:

> There's a documentation of the feature table format at the NCBI website (URL
> http://www.ncbi.nlm.nih.gov/collab/FT/index.html). Locations in particular are
> documented at http://www.ncbi.nlm.nih.gov/collab/FT/index.html#location
> 
> A couple of these are not covered (i.e., fail, but ignored after a warning)
> presently by the feature table parsing methods, and some are even not really
> covered by the SeqFeatureI interface, like (quoted from the URL)
> 
> (23.45)..600       Specifies that the starting point is one of the bases be-
>                    tween bases 23 and 45, inclusive, and the end point is 
>                    base 600 
> 
> (122.133)..(204.221) The feature starts at a base between 122 and 133, in-
>                      clusive, and ends at a base between 204 and 221, in-
>                      clusive
> 
> 145^177            Points to a site between two adjacent bases anywhere 
>                    between bases 145 and 177 
> 
> 
> order(location,location, ... location) 
>      The elements can be found in the specified order (5' to 3' direction),
> but nothing is implied about the reasonableness about joining them 
> 
> J00194:(100..202)  Points to bases 100 to 202, inclusive, in the entry (in 
>                    this database) with primary accession number 
>                    'J00194'
> 
> Do you see a point in having 'wobble' information for start and end in the
> SeqFeatureI interface, or in an implementation module?
> 
> I think just saying we don't let us govern by GenBank parsing issues
> (actually, it's a joint definition for GenBank/EMBL/DDBJ) may not be the best
> answer, because the feature annotation rules obviously reflect the biological
> knowledge we have at present, and I think that's what we are trying to model,
> at least to some extent.
> 
> Just a few thoughts off the top of my head.
> 
> Cheers,
> 
> 	Hilmar
> -- 
> -----------------------------------------------------------------------
> Hilmar Lapp                                      email: hlapp@gmx.net
> NFI Vienna, IFD/Bioinformatics                   phone: +43 1 86634 631
> A-1235 Vienna                                      fax: +43 1 86634 727
> ROI: Bioinformatics (arrays, expression, seqs), Programming, Databases,
>      Mountain Biking (hard tail, hard fork: feel the trail)
> -----------------------------------------------------------------------
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
> 

James G.R. Gilbert
The Sanger Centre
Wellcome Trust Genome Campus
Hinxton
Cambridge                        Tel: 01223 494906
CB10 1SA                         Fax: 01223 494919




=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================