[Bioperl-l] retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails

Daniel Lang daniel.lang at biologie.uni-freiburg.de
Wed Sep 6 09:11:42 UTC 2006

Hi Brian,

I'm iterating now over all uniprot_trembl sequences and record for which
 retrieval fails - Lets see if STANDARDs also fail...

How is the second field of the swissprot ID line handled anyway? Because
PRELIMINARYs end up as STANDARD when being parsed by Bio::SeqIO::swiss.

On the other side I'm still confused why there's no error or warning
when the retrieval fails. Can you give me a hint which modules (besides
swiss.pm) to look at?


Brian Osborne wrote:
> Daniel,
> Well, if you can isolate the bug please add it to bugzilla.
> Brian O.
> On 9/5/06 5:57 AM, "Daniel Lang" <daniel.lang at biologie.uni-freiburg.de>
> wrote:
>> Hi Brian,
>> sorry for the belated response!
>> I've compiled you a set of 100 PRELIMINARY entries from the latest
>> uniprot_trembl release. I've tried to reproduce the bug using only these
>> as input to build an index, but (sadly) all of them can be retrieved
>> using the latest checkout:-(
>> Maybe its not connected to these entries after all, but the size or some
>> other feature of the uniprot distribution?
>> I now could make it work using the 1.5.1 release.
>> Originally, I've built the index using flat protocol, when I try bdb and
>> bioperl-live even more problems occur:
>> bp_bioflat_index.pl --dbname sw -i bdb -f swiss -l . -c uniprot_sprot.dat
>> ------------- EXCEPTION  -------------
>> MSG: The lineage 'Eukaryota, Metazoa, Chordata, Craniata, Vertebrata,
>> Euteleostomi, Amphibia, Batrachia, Anura, Mesobatrachia, Pipoidea,
>> Pipidae, Xenopodinae, Xenopus, Silurana, Xenopus, tropicalis' had two
>> non-consecutive nodes with the same name. Can't cope!
>> STACK Bio::DB::Taxonomy::list::add_lineage
>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:163
>> STACK Bio::DB::Taxonomy::list::new
>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy/list.pm:100
>> STACK Bio::DB::Taxonomy::new
>> /home/lang/bioperl/bioperl-live/Bio/DB/Taxonomy.pm:106
>> STACK Bio::Species::classification
>> /home/lang/bioperl/bioperl-live/Bio/Species.pm:171
>> STACK Bio::SeqIO::swiss::_read_swissprot_Species
>> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:1049
>> STACK Bio::SeqIO::swiss::next_seq
>> /home/lang/bioperl/bioperl-live/Bio/SeqIO/swiss.pm:240
>> STACK Bio::DB::Flat::parse_one_record
>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat.pm:333
>> STACK Bio::DB::Flat::BDB::_index_file
>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:235
>> STACK Bio::DB::Flat::BDB::build_index
>> /home/lang/bioperl/bioperl-live/Bio/DB/Flat/BDB.pm:218
>> STACK toplevel
>> /share/apps/bioperl/bioperl-live/scripts_temp/bp_bioflat_index.pl:113
>> But I think this is connected to the new changes to taxonomy handling in
>> Bio::Taxon...
>> I'm unsure wether to submit this separately, but I could also provide an
>> example of such a swissprot entry that causes this error.
>> Thanks, again.
>> Daniel
>> Brian Osborne wrote:
>>> Daniel,
>>> Bug, presumably in SeqIO/swiss.pm. Can you send me a small file with such a
>>> PRELIMINARY entry?
>>> Brian O.
>>> On 9/1/06 6:11 AM, "Daniel Lang" <daniel.lang at biologie.uni-freiburg.de>
>>> wrote:
>>>> Hi,
>>>> when using Bio::Registry (bioperl-live) to fetch uniprot entries from
>>>> local indexed uniprot *.dats, I had to realize that several entries
>>>> could not be retrieved despite the fact that they are present in the
>>>> files! A closer look reveals that they are of status PRELIMINARY:
>>>> uniprot_trembl.dat:ID   Q16EZ1_AEDAE   PRELIMINARY;   PRT;   222 AA.
>>>> I don't "grep" PRELIMINARY anywhere in my cvs checkout..
>>>> I also can't retrieve the sequences from the online database defined as
>>>> follows:
>>>> [swissprot_ebi]
>>>> protocol=biofetch
>>>> location=http://www.ebi.ac.uk/cgi-bin/dbfetch
>>>> dbname=swall
>>>> Is this a bug or a feature? If its a feature, how can I bypass it?
>>>> Thanks in advance,
>>>> Daniel

More information about the Bioperl-l mailing list