[Bioperl-l] Genbank Bioperl problem
Ali Al-Shahib
alshahib@dcs.gla.ac.uk
Thu, 13 Jun 2002 16:00:28 +0100 (BST)
Hi Brian
Genpept seemed to work with me, but I had to use 'id' instead of 'acc' so:
my $seq = $gb->get_Seq_by_id('NP_457465.1');
I also tried RefSeq, but it doesn't work.
However, now I've faced another problem. I wanted to use the batch (my
$seq = $gb->get_Seq_by_batch($filename)) but Genpept doesn't support this.
Have you any ideas how I can solve this problem, because I have alot of
NP's I need fetching from NCBI, and its impossible for me to do them
without a batch.
Thank you in advance.
Ali
On Thu, 13 Jun 2002, Brian Osborne wrote:
> Ali and Stefan,
>
> Accession numbers starting with NP_ are Genbank RefSeq entries (see
> http://www.ncbi.nlm.nih.gov/LocusLink/RSfaq.html). From the Bioperl FAQ:
>
> Q2.3: How can I get NT_ or NM_ accessions from NCBI (Reference
> Sequences)?
>
> Use Bio::DB::RefSeq not Bio::DB::GenBank when you are retrieving
> the NM_ accessions. This is still an area of active development
> because the data providers have not provided the best interface for
> us to query. EBI has provided a mirror with their dbfetch system
> which is accessible through the Bio::DB::RefSeq object however,
> there are cases where NT_ accessions will not be retrievable.
>
> Bio::DB::GenPept won't work, and a one-liner using Bio::DB::RefSeq seemed to
> work. I'll change the FAQ so that it refers to NP_'s as well.
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
> Behalf Of Stefan A Kirov
> Sent: Wednesday, June 12, 2002 2:59 PM
> To: Ali Al-Shahib
> Cc: Bioperl
> Subject: Re: [Bioperl-l] Genbank Bioperl help
>
> Use Bio::DB::GenPept for proteins!
> Good luck!
> Stefan
>
> On Wed, 12 Jun 2002, Ali Al-Shahib wrote:
>
> >
> >Hi
> >
> >I've got a set of accession numbers but they start with 'NP_' as they are
> >proteins. I've used the genbank module (Bio::DB::GenBank) and produced
> >the following script:
> >
> >#!/usr/local/bin/perl -w
> >
> >use Bio::DB::GenBank;
> >use Bio::Species;
> >my $gb = new Bio::DB::GenBank;
> >
> >#get a particular accession number
> >my $seq = $gb->get_Seq_by_acc('NP_347647');
> >
> >#get the sepecies from the 'sequence' object
> >my $sp = $seq->species();
> >
> >#get the classification
> >my @class = $sp->classification();
> >
> >#print out the result, line by line
> >print join ("\n", @class), "\n";
> >
> >However it works for accssion numbers for nucleotide sequences but not of
> >protien sequences. How can I change the script to make it fetch the
> >organsim name from genbank using the protein accession number which starts
> >with 'NP_' (example: NP_347647.1). It fetches accession numbers like
> >AC021953, but not 'NP_.....'.
> >
> >I would greatly appreciate it if you can answer my query.
> >
> >Thank you in advance
> >
> >Ali
> >--
> >Mr Ali Al-Shahib
> >Research Student
> >Bioinformatics Research Centre
> >Department of Computing Science
> >17 Lilybank Gardens
> >University of Glasgow
> >Glasgow G12 8QQ
> >Scotland, UK
> >Tel: 0141 330 2421 (direct)
> >E-mail: alshahib@dcs.gla.ac.uk
> >Web page: http://www.dcs.gla.ac.uk/~alshahib
> >
> >
> >
> >
> >
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l@bioperl.org
> >http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
>
>
--
Mr Ali Al-Shahib
Research Student
Bioinformatics Research Centre
Department of Computing Science
17 Lilybank Gardens
University of Glasgow
Glasgow G12 8QQ
Scotland, UK
Tel: 0141 330 2421 (direct)
E-mail: alshahib@dcs.gla.ac.uk
Web page: http://www.dcs.gla.ac.uk/~alshahib