Bioperl: Re: feature parsing for GenBank/EMBL
James Gilbert
jgrg@sanger.ac.uk
Mon, 8 May 2000 12:23:06 +0100 (BST)
Hilmar,
We are aware that we ignore these kind of
features. I think we'll contine to do so until
someone complains that they NEED to parse them.
As Ewan suggested in a previous email, the
GenBank/EMBL/swissprot parsers probably need
reworking, so that such features could be handled
by users' custom object handlers.
Cheers, James
On Mon, 8 May 2000, Hilmar Lapp wrote:
> There's a documentation of the feature table format at the NCBI website (URL
> http://www.ncbi.nlm.nih.gov/collab/FT/index.html). Locations in particular are
> documented at http://www.ncbi.nlm.nih.gov/collab/FT/index.html#location
>
> A couple of these are not covered (i.e., fail, but ignored after a warning)
> presently by the feature table parsing methods, and some are even not really
> covered by the SeqFeatureI interface, like (quoted from the URL)
>
> (23.45)..600 Specifies that the starting point is one of the bases be-
> tween bases 23 and 45, inclusive, and the end point is
> base 600
>
> (122.133)..(204.221) The feature starts at a base between 122 and 133, in-
> clusive, and ends at a base between 204 and 221, in-
> clusive
>
> 145^177 Points to a site between two adjacent bases anywhere
> between bases 145 and 177
>
>
> order(location,location, ... location)
> The elements can be found in the specified order (5' to 3' direction),
> but nothing is implied about the reasonableness about joining them
>
> J00194:(100..202) Points to bases 100 to 202, inclusive, in the entry (in
> this database) with primary accession number
> 'J00194'
>
> Do you see a point in having 'wobble' information for start and end in the
> SeqFeatureI interface, or in an implementation module?
>
> I think just saying we don't let us govern by GenBank parsing issues
> (actually, it's a joint definition for GenBank/EMBL/DDBJ) may not be the best
> answer, because the feature annotation rules obviously reflect the biological
> knowledge we have at present, and I think that's what we are trying to model,
> at least to some extent.
>
> Just a few thoughts off the top of my head.
>
> Cheers,
>
> Hilmar
> --
> -----------------------------------------------------------------------
> Hilmar Lapp email: hlapp@gmx.net
> NFI Vienna, IFD/Bioinformatics phone: +43 1 86634 631
> A-1235 Vienna fax: +43 1 86634 727
> ROI: Bioinformatics (arrays, expression, seqs), Programming, Databases,
> Mountain Biking (hard tail, hard fork: feel the trail)
> -----------------------------------------------------------------------
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
>
James G.R. Gilbert
The Sanger Centre
Wellcome Trust Genome Campus
Hinxton
Cambridge Tel: 01223 494906
CB10 1SA Fax: 01223 494919
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================