[Bioperl-l] SeqIO issue? EUtilities Cookbook
Phillip San Miguel
pmiguel at purdue.edu
Fri Mar 26 17:28:09 UTC 2010
Ah, yes. That does the trick. Actually I have already downloaded a few
thousand records in whatever that format that is returned when 'genbank'
is specified instead of 'gb'. (See below, it begins with 'Seq-entry ::=
seq {') Any idea what format that is and how to convert it to something
SeqIO can use?
If not, I can just pull them all down again by sending about 200 gi's
per request. That should not offend the genbank gods...
Thanks for your help,
Phillip
Chris Fields wrote:
> Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files). 'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end.
>
> Also, note that the email is now required (you'll get a warning about this with code from SVN). I'll update the wiki to reflect both.
>
> chris
>
> On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote:
>
>
>> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work.
>>
>> I am trying to use the code provided at:
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO
>>
>> and modified to request gi228534658
>>
>> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything.
>>
>> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1:
>>
>> #!/usr/bin/perl
>> use strict;
>> use warnings;
>>
>> use Bio::SeqIO;
>> use Bio::DB::EUtilities;
>>
>> my @ids;
>> push @ids, '228534658';
>> my $factory = Bio::DB::EUtilities->new(
>> -eutil => 'efetch',
>> -db => 'nucleotide',
>> -rettype => 'genbank',
>> -id => \@ids);
>>
>> my $file = 'myseqs.gb';
>>
>> # dump HTTP::Response content to a file (not retained in memory)
>> $factory->get_Response(-file => $file);
>>
>> my $seqin = Bio::SeqIO->new(-file => $file,
>> -format => 'genbank');
>>
>> while (my $seq = $seqin->next_seq) {
>> print "I see a sequence\n";
>> print $seq->species();
>> }
>>
>>
>> "myseqs.gb" does have content:
>>
>> Seq-entry ::= seq {
>> id {
>> general {
>> db "gpid:36555" ,
>> tag
>> str "contig49313" } ,
>> genbank {
>> accession "EZ113652" ,
>> version 1 } ,
>> gi 228534658 } ,
>> descr {
>> title "TSA: Zea mays contig49313, mRNA sequence." ,
>> source {
>> genome genomic ,
>> org {
>> taxname "Zea mays" ,
>> db {
>> {
>> db "taxon" ,
>> tag
>> id 4577 } } ,
>> orgname {
>> name
>> binomial {
>> genus "Zea" ,
>> species "mays" } ,
>> lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta;
>> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae;
>> PACCAD clade; Panicoideae; Andropogoneae; Zea" ,
>> gcode 1 ,
>> mgcode 1 ,
>> div "PLN" } } } ,
>> molinfo {
>> biomol mRNA ,
>> tech tsa } ,
>> pub {
>> pub {
>> article {
>> title {
>> name "Deep sampling of the Palomero maize transcriptome by a high
>> throughput strategy of pyrosequencing." } ,
>> authors {
>> names
>> std {
>> {
>> name
>> name {
>> last "Vega-Arreguin" ,
>> initials "J.C." } } ,
>> {
>> name
>> name {
>> last "Ibarra-Laclette" ,
>> initials "E." } } ,
>> {
>> name
>> name {
>> last "Jimenez-Moraila" ,
>> initials "B." } } ,
>> {
>> name
>> name {
>> last "Martinez" ,
>> initials "O." } } ,
>> {
>> name
>> name {
>> last "Vielle-Calzada" ,
>> initials "J.P." } } ,
>> {
>> name
>> name {
>> last "Herrera-Estrella" ,
>> initials "L." } } ,
>> {
>> name
>> name {
>> last "Herrera-Estrella" ,
>> initials "A." } } } } ,
>> from
>> journal {
>> title {
>> iso-jta "BMC Genomics" ,
>> ml-jta "BMC Genomics" ,
>> issn "1471-2164" ,
>> name "BMC genomics" } ,
>> imp {
>> date
>> std {
>> year 2009 ,
>> month 7 ,
>> day 6 } ,
>> volume "10" ,
>> issue "1" ,
>> pages "299" ,
>> language "ENG" ,
>> pubstatus aheadofprint ,
>> history {
>> {
>> pubstatus received ,
>> date
>> std {
>> year 2008 ,
>> month 12 ,
>> day 2 } } ,
>> {
>> pubstatus accepted ,
>> date
>> std {
>> year 2009 ,
>> month 7 ,
>> day 6 } } ,
>> {
>> pubstatus aheadofprint ,
>> date
>> std {
>> year 2009 ,
>> month 7 ,
>> day 6 } } ,
>> {
>> pubstatus other ,
>> date
>> std {
>> year 2009 ,
>> month 7 ,
>> day 8 ,
>> hour 9 ,
>> minute 0 } } ,
>> {
>> pubstatus pubmed ,
>> date
>> std {
>> year 2009 ,
>> month 7 ,
>> day 8 ,
>> hour 9 ,
>> minute 0 } } ,
>> {
>> pubstatus medline ,
>> date
>> std {
>> year 2009 ,
>> month 7 ,
>> day 8 ,
>> hour 9 ,
>> minute 0 } } } } } ,
>> ids {
>> pii "1471-2164-10-299" ,
>> doi "10.1186/1471-2164-10-299" ,
>> pubmed 19580677 } } ,
>> pmid 19580677 } } ,
>> pub {
>> pub {
>> sub {
>> authors {
>> names
>> std {
>> {
>> name
>> name {
>> last "Vega-Arreguin" ,
>> first "Julio" ,
>> initials "J.C." } } ,
>> {
>> name
>> name {
>> last "Ibarra-Laclette" ,
>> first "Enrique" ,
>> initials "E." } } ,
>> {
>> name
>> name {
>> last "Jimenez-Moraila" ,
>> first "Beatriz" ,
>> initials "B." } } ,
>> {
>> name
>> name {
>> last "Martinez" ,
>> first "Octavio" ,
>> initials "O." } } ,
>> {
>> name
>> name {
>> last "Vielle-Calzada" ,
>> first "Jean" ,
>> initials "J.Philippe." } } ,
>> {
>> name
>> name {
>> last "Herrera-Estrella" ,
>> first "Luis" ,
>> initials "L." } } ,
>> {
>> name
>> name {
>> last "Herrera-Estrella" ,
>> first "Alfredo" ,
>> initials "A." } } } ,
>> affil
>> std {
>> affil "Laboratorio Nacional de Genomica para la Biodiversidad" ,
>> div "Cinvestav Campus Guanajuato" ,
>> city "Irapuato" ,
>> sub "Guanajuato" ,
>> country "Mexico" ,
>> street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" ,
>> postal-code "36821" } } ,
>> medium other ,
>> date
>> std {
>> year 2009 ,
>> month 3 ,
>> day 23 } } } } ,
>> user {
>> type
>> str "GenomeProjectsDB" ,
>> data {
>> {
>> label
>> str "ProjectID" ,
>> data
>> int 36555 } ,
>> {
>> label
>> str "ParentID" ,
>> data
>> int 0 } } } ,
>> create-date
>> std {
>> year 2009 ,
>> month 5 ,
>> day 5 } ,
>> update-date
>> std {
>> year 2009 ,
>> month 7 ,
>> day 14 } } ,
>> inst {
>> repr raw ,
>> mol rna ,
>> length 450 ,
>> seq-data
>> ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02
>> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A
>> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63
>> 760CFF0'H } }
>>
>>
>> Maybe I am using the wrong format? This looks more like ASN than genbank format to me.
>>
>> Phillip
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list