[Bioperl-l] Missing Sequences
Mick Watson
michaelwatson@paradigm-therapeutics.co.uk
Thu, 30 May 2002 16:54:44 +0100
Thanks for your help! :-)
I guess this is a bad assumption that when I look at a unigene record and see:
/gb=NM_etc
I assume that the gb stands for GenBank and the NM_etc is an accession number
for GenBank - when in effect it could be a RefSeq accession number.
But aren't RefSeq entries in some way derived from GenBank/EMBL entries? So
why not have the GenBank accession in the /gb= tag and have a new tag, /rs=
for the refseq accession....?
Or maybe I am just confused....
It is also rather unfortunate that the fetch software at both the EBI and NCBI
will croak when just one of a whole list of accessions is not present in the
database
Thanks again
Mick
Brian Osborne wrote:
> Mick,
>
> Those NM_* ids correspond to RefSeq entries. From the FAQ:
>
> Q2.3: How can I get NT_ or NM_ accessions from NCBI (Reference
> Sequences)?
>
> Use Bio::DB::RefSeq not Bio::DB::GenBank when you are retrieving
> the NM_ accessions. This is still an area of active development
> because the data providers have not provided the best interface for
> us to query. EBI has provided a mirror with their dbfetch system
> which is accessible through the Bio::DB::RefSeq object however,
> there are cases where NT_ accessions will not be retrievable.
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
> Behalf Of Mick Watson
> Sent: Thursday, May 30, 2002 11:30 AM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] Missing Sequences
>
> This is an old-ish problem when using Bioperl to fetch multiple
> sequences from GenBank/EMBL
>
> I am using EMBL.pm (Bioperl 1.0) to fetch multiple sequences that have
> been identified from a blast search against Unigene. Parsing the
> Accession from unigene entries is simple as I just look for the
>
> /gb=.....
>
> token and I have the accessions. Simple.
>
> The problem is, I guess, that these are GenBank accessions so I get the
> following list:
>
> AL117415 AJ291674 AJ291673 AJ291675 NM_022139 AF253318 NM_025220
> AB055891 BI826766 BG547620
>
> When I use EMBL.pm to fetch these, it croaks with the error that
> NM_022139 and NM_025220 do not exist, and when I try to fetch them from
> the ebi, it's right, they don't. However, when I go to the NCBI, they
> DO exist in GenBank (or at least the NCBI's nucleotide fetch tool says
> that they do)
>
> So my question is why is it that there are sequences in GenBank that
> aren't in EMBL? I'm guessing the NM_ prefix has some sort of
> relevance....
>
> Also, this looks as if this will force me to use GenBank.pm to fetch the
> sequences and not EMBL.pm, and I don't want to do this for various
> reasons....
>
> Thanks
> Mick
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l