[Bioperl-l] How sequence fetching should fail?
Heikki Lehvaslaiho
heikki at nildram.co.uk
Sat Apr 10 09:07:22 EDT 2004
When I started to make the changes, the problem turned out to be a slightly
more complicated than I anticipated.
Firstly, refreshing my memory of the BioFetch spec
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/obda-specs/biofetch/biofetch.txt?rev=1.2&cvsroot=obf-common&content-type=text/vnd.viewcvs-markup
It should be returning an EXCEPTION. The implementaion catches that and and
prints it out like warning that I could not turn off. I am not worried by
this descrepancy because it goes well with the behaviour of other modules.
The WARNING from GenBank turned out to be due to the parser
(Bio::SeqIO::genbank, line 239). next_seq() returns undef if the first line
does not start with 'LOCUS. In EMBL and SWISS-PROT parsers, this modification
was missing and the parser threw an error on misformed ID line.
My understanding is that the modification helps quick streaming of entries
from the NCBI server. Since we get EMBL and SWISS-PROT entries from the EBI
BioFetch server, the first response line starts with "ERROR" and this get
passed to the parser which throws an error. I've now modified the parser and
the situation now looks like this:
Bio::DB::BioFetch WARNING
Bio::DB::GenBank WARNING
Bio::DB::GenPept WARNING
Bio::DB::SwissProt WARNING
Bio::DB::RefSeq WARNING
Bio::DB::EMBL WARNING
-Heikki
On Monday 05 Apr 2004 11:55, Heikki Lehvaslaiho wrote:
> Last week Web Barris asked more questions about sequence retrieval.
> I had a look how different modules work when the retrieval fails due to
> nonexisting id. The response can be summarised as follows:
>
> Bio::DB::BioFetch WARNING
> Bio::DB::GenBank WARNING
> Bio::DB::GenPept WARNING
> Bio::DB::SwissProt EXCEPTION
> Bio::DB::RefSeq WARNING
> Bio::DB::EMBL EXCEPTION
>
> I suggest that we treat this situation as an error that needs to be fixed
> in both development cvs head and in the 1.4 branch. All modules should
> print a warning (rather than die on an error) and return undef when
> retieval fails. It is then up to the use to test the if the sequence
> variable got assingned. This is the functionality defined in the OBDA (Open
> Data Base Access) specs and implemeted in Bio::DB::BioFetch.
>
>
> The use code will always look something like this:
>
> $db = new Bio::DB::SeqRetrievalClass;
> for (@ids) {
> $seq = $gb->get_Seq_by_id($_);
> if ($seq) {
> # do what you wanted
> } else {
> # skip and keep log
> }
> }
>
> Unless I hear any strong differing opinions within a day or two, I'll
> commit the necessary changes. The critical question here is: will this
> break any existing code?
>
>
> -Heikki
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list