[Bioperl-l] Genbank Bioperl problem

Stefan Kirov skirov@utk.edu
Thu, 13 Jun 2002 11:43:12 -0500


Ali,
You should use an arrray, not a filename when you use get_Stream_by_acc(it is not
get_seq_by_batch!)- it is inside the documentation. Does not matter if you are
using Refseq or Genpept. Read your file and use push(@array,$accession) to
transfer the file into an array and then you can query NCBI, using batch
retrieval. Go here http://doc.bioperl.org/releases/bioperl-1.0.1/ and read Genpept
and Refseq descriptions. Also you are using versions to retrieve a sequence. You
may want to chop this off. Maybe this is the reason Refseq does not work for you,
here is what documentation says:
>  # especially when using versions, you better be prepared  # in not getting what
what want
> eval {      $seq = $db->get_Seq_by_version('NM_006732.1'); # RefSeq VERSION  };
You should read the documentation! It is all in there!
Stefan

Ali Al-Shahib wrote:

> Hi Brian
>
> Genpept seemed to work with me, but I had to use 'id' instead of 'acc' so:
> my $seq = $gb->get_Seq_by_id('NP_457465.1');
> I also tried RefSeq, but it doesn't work.
>
> However, now I've faced another problem.  I wanted to use the batch (my
> $seq = $gb->get_Seq_by_batch($filename)) but Genpept doesn't support this.
> Have you any ideas how I can solve this problem, because I have alot of
> NP's I need fetching from NCBI, and its impossible for me to do them
> without a batch.
>
> Thank you in advance.
>
> Ali
>
> On Thu, 13 Jun 2002, Brian Osborne wrote:
>
> > Ali and Stefan,
> >
> > Accession numbers starting with NP_ are Genbank RefSeq entries (see
> > http://www.ncbi.nlm.nih.gov/LocusLink/RSfaq.html). From the Bioperl FAQ:
> >
> >   Q2.3: How can I get NT_ or NM_ accessions from NCBI (Reference
> >       Sequences)?
> >
> >       Use Bio::DB::RefSeq not Bio::DB::GenBank when you are retrieving
> >       the NM_ accessions. This is still an area of active development
> >       because the data providers have not provided the best interface for
> >       us to query.  EBI has provided a mirror with their dbfetch system
> >       which is accessible through the Bio::DB::RefSeq object however,
> >       there are cases where NT_ accessions will not be retrievable.
> >
> > Bio::DB::GenPept won't work, and a one-liner using Bio::DB::RefSeq seemed to
> > work. I'll change the FAQ so that it refers to NP_'s as well.
> >
> > Brian O.
> >
> > -----Original Message-----
> > From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On
> > Behalf Of Stefan A Kirov
> > Sent: Wednesday, June 12, 2002 2:59 PM
> > To: Ali Al-Shahib
> > Cc: Bioperl
> > Subject: Re: [Bioperl-l] Genbank Bioperl help
> >
> > Use Bio::DB::GenPept for proteins!
> > Good luck!
> > Stefan
> >
> > On Wed, 12 Jun 2002, Ali Al-Shahib wrote:
> >
> > >
> > >Hi
> > >
> > >I've got a set of accession numbers but they start with 'NP_' as they are
> > >proteins.  I've used the genbank module (Bio::DB::GenBank) and produced
> > >the following script:
> > >
> > >#!/usr/local/bin/perl -w
> > >
> > >use Bio::DB::GenBank;
> > >use Bio::Species;
> > >my $gb = new Bio::DB::GenBank;
> > >
> > >#get a particular accession number
> > >my $seq = $gb->get_Seq_by_acc('NP_347647');
> > >
> > >#get the sepecies from the 'sequence' object
> > >my $sp = $seq->species();
> > >
> > >#get the classification
> > >my @class = $sp->classification();
> > >
> > >#print out the result, line by line
> > >print join ("\n", @class), "\n";
> > >
> > >However it works for accssion numbers for nucleotide sequences but not of
> > >protien sequences.  How can I change the script to make it fetch the
> > >organsim name from genbank using the protein accession number which starts
> > >with 'NP_' (example: NP_347647.1).  It fetches accession numbers like
> > >AC021953, but not 'NP_.....'.
> > >
> > >I would greatly appreciate it if you can answer my query.
> > >
> > >Thank you in advance
> > >
> > >Ali
> > >--
> > >Mr Ali Al-Shahib
> > >Research Student
> > >Bioinformatics Research Centre
> > >Department of Computing Science
> > >17 Lilybank Gardens
> > >University of Glasgow
> > >Glasgow G12 8QQ
> > >Scotland, UK
> > >Tel: 0141 330 2421 (direct)
> > >E-mail: alshahib@dcs.gla.ac.uk
> > >Web page: http://www.dcs.gla.ac.uk/~alshahib
> > >
> > >
> > >
> > >
> > >
> > >
> > >_______________________________________________
> > >Bioperl-l mailing list
> > >Bioperl-l@bioperl.org
> > >http://bioperl.org/mailman/listinfo/bioperl-l
> > >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> >
> >
>
> --
> Mr Ali Al-Shahib
> Research Student
> Bioinformatics Research Centre
> Department of Computing Science
> 17 Lilybank Gardens
> University of Glasgow
> Glasgow G12 8QQ
> Scotland, UK
> Tel: 0141 330 2421 (direct)
> E-mail: alshahib@dcs.gla.ac.uk
> Web page: http://www.dcs.gla.ac.uk/~alshahib
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

--
Please Note: New area code 865, effective NOW!
Stefan A. Kirov, Ph.D.
Dept Biochemistry and Cellular and Molecular Biology
F233 Walters Life Sciences Building
1414 Cumberland Avenue
University of Tennessee
Knoxville, TN  37996-0840
Tel: 865-974-6710