[Bioperl-l] Fwd: can't get seq with bioperl

Mon Oct 7 04:08:18 UTC 2013

Warren Gallin submitted this temporary hack to fix problems with WP seqs
but accidentally sent this to me only. Resending to the list.

---------- Forwarded message ----------
From: Warren Gallin <wgallin at ualberta.ca>
Date: 2013/10/5
Subject: Re: [Bioperl-l] can't get seq with bioperl
To: Alexey Morozov <alexeymorozov1991 at gmail.com>

This is another case of the new RefSeq WP series of protein entries that
does not have a link to the underlying nucleotide sequence.

NCBI has changed the way that highly redundant protein sequences from
bacterial genomes are stored.  Although a sequence appears when you access
the NCBI web site, that protein sequence is not retrieved by the
up-to-now-functional BioPerl approaches.

The give-away is the line:

CONTIG      join(WP_015639704.1:1..205)

The WP designation is for these problematic sequences.

The work-around that I used was to do the sequence retrieval within an eval
block and if there was no sequence forthcoming, then use the gi number to
retrieve the sequence in fast format and grab it that way.

Not pretty, but it will make your pipeline work.

Warren Gallin

-- 
Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.