[Bioperl-l] Missing Sequences

Mick Watson michaelwatson@paradigm-therapeutics.co.uk
Thu, 30 May 2002 16:54:44 +0100


Thanks for your help! :-)

I guess this is a bad assumption that when I look at a unigene record and see:

    /gb=NM_etc

I assume that the gb stands for GenBank and the NM_etc is an accession number
for GenBank - when in effect it could be a RefSeq accession number.

But aren't RefSeq entries in some way derived from GenBank/EMBL entries?  So
why not have the GenBank accession in the /gb= tag and have a new tag, /rs=
for the refseq accession....?

Or maybe I am just confused....

It is also rather unfortunate that the fetch software at both the EBI and NCBI
will croak when just one of a whole list of accessions is not present in the
database

Thanks again
Mick

Brian Osborne wrote:

> Mick,
>
> Those NM_* ids correspond to RefSeq entries. From the FAQ:
>
>   Q2.3: How can I get NT_ or NM_ accessions from NCBI (Reference
>         Sequences)?
>
>         Use Bio::DB::RefSeq not Bio::DB::GenBank when you are retrieving
>         the NM_ accessions. This is still an area of active development
>         because the data providers have not provided the best interface for
>         us to query.  EBI has provided a mirror with their dbfetch system
>         which is accessible through the Bio::DB::RefSeq object however,
>         there are cases where NT_ accessions will not be retrievable.
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
> Behalf Of Mick Watson
> Sent: Thursday, May 30, 2002 11:30 AM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] Missing Sequences
>
> This is an old-ish problem when using Bioperl to fetch multiple
> sequences from GenBank/EMBL
>
> I am using EMBL.pm (Bioperl 1.0) to fetch multiple sequences that have
> been identified from a blast search against Unigene.  Parsing the
> Accession from unigene entries is simple as I just look for the
>
>     /gb=.....
>
> token and I have the accessions.  Simple.
>
> The problem is, I guess, that these are GenBank accessions so I get the
> following list:
>
> AL117415 AJ291674 AJ291673 AJ291675 NM_022139 AF253318 NM_025220
> AB055891 BI826766 BG547620
>
> When I use EMBL.pm to fetch these, it croaks with the error that
> NM_022139 and NM_025220 do not exist, and when I try to fetch them from
> the ebi, it's right, they don't.  However, when I go to the NCBI, they
> DO exist in GenBank (or at least the NCBI's nucleotide fetch tool says
> that they do)
>
> So my question is why is it that there are sequences in GenBank that
> aren't in EMBL?  I'm guessing the NM_ prefix has some sort of
> relevance....
>
> Also, this looks as if this will force me to use GenBank.pm to fetch the
> sequences and not EMBL.pm, and I don't want to do this for various
> reasons....
>
> Thanks
> Mick
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l