[Bioperl-l] SeqIO issue? EUtilities Cookbook

Fri Mar 26 16:16:22 UTC 2010

Change the rettype from 'genbank' to 'gb' or 'gbwithparts' (the latter is if you always want a full nucleotide sequence instead of possibly getting contig files).  'genbank' used to be an alias for 'gb', but apparently no longer, and appears to be something that was changed on NCBI's end.

Also, note that the email is now required (you'll get a warning about this with code from SVN).  I'll update the wiki to reflect both.

chris

On Mar 26, 2010, at 10:52 AM, Phillip San Miguel wrote:

> Could someone tell me what I am doing wrong? This seems simple, but I have not been able to get it to work.
> 
> I am trying to use the code provided at:
> 
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO
> 
> and modified to request gi228534658
> 
> The EUtilities downloads a record from genbank and SeqIO seems as if it is parsing it, but also seems not to return anything.
> 
> Nothing is printed with I run the following script on a Solaris box running perl 5.10.0 and bioperl 1.6.1:
> 
> #!/usr/bin/perl
> use strict;
> use warnings;
> 
> use Bio::SeqIO;
> use Bio::DB::EUtilities;
> 
> my @ids;
> push @ids, '228534658';
> my $factory = Bio::DB::EUtilities->new(
>                       -eutil => 'efetch',
>                       -db => 'nucleotide',
>                       -rettype => 'genbank',
>                       -id => \@ids);
> 
> my $file = 'myseqs.gb';
> 
> # dump HTTP::Response content to a file (not retained in memory)
> $factory->get_Response(-file => $file);
> 
> my $seqin = Bio::SeqIO->new(-file => $file,
>                          -format => 'genbank');
> 
> while (my $seq = $seqin->next_seq) {
>  print "I see a sequence\n";
>  print $seq->species();
> }
> 
> 
> "myseqs.gb" does have content:
> 
> Seq-entry ::= seq {
> id {
>  general {
>    db "gpid:36555" ,
>    tag
>      str "contig49313" } ,
>  genbank {
>    accession "EZ113652" ,
>    version 1 } ,
>  gi 228534658 } ,
> descr {
>  title "TSA: Zea mays contig49313, mRNA sequence." ,
>  source {
>    genome genomic ,
>    org {
>      taxname "Zea mays" ,
>      db {
>        {
>          db "taxon" ,
>          tag
>            id 4577 } } ,
>      orgname {
>        name
>          binomial {
>            genus "Zea" ,
>            species "mays" } ,
>        lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta;
> Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae;
> PACCAD clade; Panicoideae; Andropogoneae; Zea" ,
>        gcode 1 ,
>        mgcode 1 ,
>        div "PLN" } } } ,
>  molinfo {
>    biomol mRNA ,
>    tech tsa } ,
>  pub {
>    pub {
>      article {
>        title {
>          name "Deep sampling of the Palomero maize transcriptome by a high
> throughput strategy of pyrosequencing." } ,
>        authors {
>          names
>            std {
>              {
>                name
>                  name {
>                    last "Vega-Arreguin" ,
>                    initials "J.C." } } ,
>              {
>                name
>                  name {
>                    last "Ibarra-Laclette" ,
>                    initials "E." } } ,
>              {
>                name
>                  name {
>                    last "Jimenez-Moraila" ,
>                    initials "B." } } ,
>              {
>                name
>                  name {
>                    last "Martinez" ,
>                    initials "O." } } ,
>              {
>                name
>                  name {
>                    last "Vielle-Calzada" ,
>                    initials "J.P." } } ,
>              {
>                name
>                  name {
>                    last "Herrera-Estrella" ,
>                    initials "L." } } ,
>              {
>                name
>                  name {
>                    last "Herrera-Estrella" ,
>                    initials "A." } } } } ,
>        from
>          journal {
>            title {
>              iso-jta "BMC Genomics" ,
>              ml-jta "BMC Genomics" ,
>              issn "1471-2164" ,
>              name "BMC genomics" } ,
>            imp {
>              date
>                std {
>                  year 2009 ,
>                  month 7 ,
>                  day 6 } ,
>              volume "10" ,
>              issue "1" ,
>              pages "299" ,
>              language "ENG" ,
>              pubstatus aheadofprint ,
>              history {
>                {
>                  pubstatus received ,
>                  date
>                    std {
>                      year 2008 ,
>                      month 12 ,
>                      day 2 } } ,
>                {
>                  pubstatus accepted ,
>                  date
>                    std {
>                      year 2009 ,
>                      month 7 ,
>                      day 6 } } ,
>                {
>                  pubstatus aheadofprint ,
>                  date
>                    std {
>                      year 2009 ,
>                      month 7 ,
>                      day 6 } } ,
>                {
>                  pubstatus other ,
>                  date
>                    std {
>                      year 2009 ,
>                      month 7 ,
>                      day 8 ,
>                      hour 9 ,
>                      minute 0 } } ,
>                {
>                  pubstatus pubmed ,
>                  date
>                    std {
>                      year 2009 ,
>                      month 7 ,
>                      day 8 ,
>                      hour 9 ,
>                      minute 0 } } ,
>                {
>                  pubstatus medline ,
>                  date
>                    std {
>                      year 2009 ,
>                      month 7 ,
>                      day 8 ,
>                      hour 9 ,
>                      minute 0 } } } } } ,
>        ids {
>          pii "1471-2164-10-299" ,
>          doi "10.1186/1471-2164-10-299" ,
>          pubmed 19580677 } } ,
>      pmid 19580677 } } ,
>  pub {
>    pub {
>      sub {
>        authors {
>          names
>            std {
>              {
>                name
>                  name {
>                    last "Vega-Arreguin" ,
>                    first "Julio" ,
>                    initials "J.C." } } ,
>              {
>                name
>                  name {
>                    last "Ibarra-Laclette" ,
>                    first "Enrique" ,
>                    initials "E." } } ,
>              {
>                name
>                  name {
>                    last "Jimenez-Moraila" ,
>                    first "Beatriz" ,
>                    initials "B." } } ,
>              {
>                name
>                  name {
>                    last "Martinez" ,
>                    first "Octavio" ,
>                    initials "O." } } ,
>              {
>                name
>                  name {
>                    last "Vielle-Calzada" ,
>                    first "Jean" ,
>                    initials "J.Philippe." } } ,
>              {
>                name
>                  name {
>                    last "Herrera-Estrella" ,
>                    first "Luis" ,
>                    initials "L." } } ,
>              {
>                name
>                  name {
>                    last "Herrera-Estrella" ,
>                    first "Alfredo" ,
>                    initials "A." } } } ,
>          affil
>            std {
>              affil "Laboratorio Nacional de Genomica para la Biodiversidad" ,
>              div "Cinvestav Campus Guanajuato" ,
>              city "Irapuato" ,
>              sub "Guanajuato" ,
>              country "Mexico" ,
>              street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" ,
>              postal-code "36821" } } ,
>        medium other ,
>        date
>          std {
>            year 2009 ,
>            month 3 ,
>            day 23 } } } } ,
>  user {
>    type
>      str "GenomeProjectsDB" ,
>    data {
>      {
>        label
>          str "ProjectID" ,
>        data
>          int 36555 } ,
>      {
>        label
>          str "ParentID" ,
>        data
>          int 0 } } } ,
>  create-date
>    std {
>      year 2009 ,
>      month 5 ,
>      day 5 } ,
>  update-date
>    std {
>      year 2009 ,
>      month 7 ,
>      day 14 } } ,
> inst {
>  repr raw ,
>  mol rna ,
>  length 450 ,
>  seq-data
>    ncbi2na '77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02
> 0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A
> A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63
> 760CFF0'H } }
> 
> 
> Maybe I am using the wrong format? This looks more like ASN than genbank format to me.
> 
> Phillip
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l