[Bioperl-l] Missing Sequences

Mick Watson michaelwatson@paradigm-therapeutics.co.uk
Thu, 30 May 2002 16:29:53 +0100


This is an old-ish problem when using Bioperl to fetch multiple
sequences from GenBank/EMBL

I am using EMBL.pm (Bioperl 1.0) to fetch multiple sequences that have
been identified from a blast search against Unigene.  Parsing the
Accession from unigene entries is simple as I just look for the

    /gb=.....

token and I have the accessions.  Simple.

The problem is, I guess, that these are GenBank accessions so I get the
following list:

AL117415 AJ291674 AJ291673 AJ291675 NM_022139 AF253318 NM_025220
AB055891 BI826766 BG547620

When I use EMBL.pm to fetch these, it croaks with the error that
NM_022139 and NM_025220 do not exist, and when I try to fetch them from
the ebi, it's right, they don't.  However, when I go to the NCBI, they
DO exist in GenBank (or at least the NCBI's nucleotide fetch tool says
that they do)

So my question is why is it that there are sequences in GenBank that
aren't in EMBL?  I'm guessing the NM_ prefix has some sort of
relevance....

Also, this looks as if this will force me to use GenBank.pm to fetch the
sequences and not EMBL.pm, and I don't want to do this for various
reasons....

Thanks
Mick