[Bioperl-l] Recommended way to download qual files from Genbank?

Cook, Malcolm MEC at stowers-institute.org
Fri Jan 11 19:40:14 UTC 2008


Phillip:

Of course - mea culpa - here's the full monty....

Indeed NCBI's eutils can do this:

> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

which uses my script (attached) to wrap NCBI's eutils.

It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
by NCBI in their "Jul 24-27, 2007" course found at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html

I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
very beginning so that trace messages are not printed on STDOUT, such as
this echoed header:
	 Retrieving 1 records from nucleotide...
... and footer:
	Received records 1 - 1.
	Wrote data to -.

(otherwise they are interspersed with downloaded qual files)

It also depends on recent version of GetOpt::Long.

Hope it helps.

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: Phillip San Miguel [mailto:pmiguel at purdue.edu] 
> Sent: Friday, January 11, 2008 1:33 PM
> To: Cook, Malcolm
> Cc: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from Genbank?
> 
> Hi Malcolm,
>     Looks like your email was (inadvertantly?) redacted in 
> some way. (No attachment and last sentence truncated.) Would 
> it be possible to get a complete version so I can be sure I'm 
> following you?
> Thanks,
> Phillip
> 
> Cook, Malcolm wrote:
> > Indeed eutil is capable of this
> >
> > The following use of my ncbi_eutil (attached) script yeilds what you
> > want:
> >
> > ncbi_eutil -search db=nucleotide term=AC207960 -fetch 
> rettype=qual > 
> > AC207960.qual
> >
> > It depends on the version of NCBI_PowerScripting.pm , such as is 
> > included in
> >
> > Malcolm Cook
> > Database Applications Manager - Bioinformatics Stowers 
> Institute for 
> > Medical Research - Kansas City, Missouri
> >   
> >
> >   
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris 
> >> Fields
> >> Sent: Friday, January 11, 2008 11:10 AM
> >> To: Phillip San Miguel
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Recommended way to download qual 
> files from 
> >> Genbank?
> >>
> >> I don't think this is possible with the current setup for 
> >> Bio::DB::GenBank (which the script uses).  We'll have to 
> investigate 
> >> whether it is possible to retrieve this data via NCBI's 
> eutils; if so 
> >> we can try adding it in.  If you want you can submit this as an 
> >> enhancement request via bugzilla for tracking:
> >>
> >> http://bugzilla.open-bio.org/
> >>
> >> chris
> >>
> >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> >>
> >>     
> >>> No problem getting sequence from genbank via a myriad of 
> methods.  
> >>> But as the volume of non-finished sequence in genbank 
> increases the 
> >>> importance of also obtaining quality values for a given sequence 
> >>> increases. Some records include quality values.
> >>>
> >>> I typically use bp_fetch.pl to grab a sequence from genbank:
> >>>
> >>> bp_fetch.pl -fmt fasta net::genbank:AC207960
> >>>
> >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't 
> >>> designed to pull down quals evidently:
> >>>
> >>> bp_fetch.pl -fmt qual net::genbank:AC207960
> >>>
> >>> gives:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual 
> >>> object to write_seq() as a parameter named "source"
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> >>> 5.8.8/Bio/Root/Root.pm:359
> >>> STACK: Bio::SeqIO::qual::write_seq
> >>>       
> >> /usr/local/perl_5.8/lib/site_perl/
> >>     
> >>> 5.8.8/Bio/SeqIO/qual.pm:205
> >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> >>> -----------------------------------------------------------
> >>>
> >>> (running under bioperl 1.5.2)
> >>>
> >>> The quality values for this accession are in genbank as these URLs
> >>> demonstrate:
> >>>
> >>>
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
> >> 0
> >>     
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>     
> >>> 4937460&dopt=fasta
> >>>
> >>>
> >>>       
> >> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>     
> >>> 4937460&dopt=qual
> >>>
> >>> What is the best way to pull down these qual values? They aren't 
> >>> present in "GenBank(Full)" format. They are present in an ASN.1 
> >>> format.
> >>>
> >>> Advice would be appreciated.
> >>>
> >>> --
> >>> Phillip
> >>> Purdue Genomics Core Facility
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>       
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >   
> 
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ncbi_eutil
Type: application/octet-stream
Size: 1854 bytes
Desc: ncbi_eutil
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080111/aedce802/attachment-0003.obj>


More information about the Bioperl-l mailing list