[Bioperl-l] SeqIO issue? EUtilities Cookbook

Phillip San Miguel pmiguel at purdue.edu
Fri Mar 26 15:52:17 UTC 2010


Could someone tell me what I am doing wrong? This seems simple, but I 
have not been able to get it to work.

I am trying to use the code provided at:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Retrieve_raw_data_records_from_GenBank.2C_save_raw_data_to_file.2C_then_parse_via_Bio::SeqIO

and modified to request gi228534658

The EUtilities downloads a record from genbank and SeqIO seems as if it 
is parsing it, but also seems not to return anything.

Nothing is printed with I run the following script on a Solaris box 
running perl 5.10.0 and bioperl 1.6.1:

#!/usr/bin/perl
use strict;
use warnings;

use Bio::SeqIO;
use Bio::DB::EUtilities;

my @ids;
push @ids, '228534658';
my $factory = Bio::DB::EUtilities->new(
                         -eutil => 'efetch',
                         -db => 'nucleotide',
                         -rettype => 'genbank',
                         -id => \@ids);

my $file = 'myseqs.gb';

# dump HTTP::Response content to a file (not retained in memory)
$factory->get_Response(-file => $file);

my $seqin = Bio::SeqIO->new(-file => $file,
                            -format => 'genbank');

while (my $seq = $seqin->next_seq) {
    print "I see a sequence\n";
    print $seq->species();
}


"myseqs.gb" does have content:

Seq-entry ::= seq {
  id {
    general {
      db "gpid:36555" ,
      tag
        str "contig49313" } ,
    genbank {
      accession "EZ113652" ,
      version 1 } ,
    gi 228534658 } ,
  descr {
    title "TSA: Zea mays contig49313, mRNA sequence." ,
    source {
      genome genomic ,
      org {
        taxname "Zea mays" ,
        db {
          {
            db "taxon" ,
            tag
              id 4577 } } ,
        orgname {
          name
            binomial {
              genus "Zea" ,
              species "mays" } ,
          lineage "Eukaryota; Viridiplantae; Streptophyta; Embryophyta;
 Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae;
 PACCAD clade; Panicoideae; Andropogoneae; Zea" ,
          gcode 1 ,
          mgcode 1 ,
          div "PLN" } } } ,
    molinfo {
      biomol mRNA ,
      tech tsa } ,
    pub {
      pub {
        article {
          title {
            name "Deep sampling of the Palomero maize transcriptome by a 
high
 throughput strategy of pyrosequencing." } ,
          authors {
            names
              std {
                {
                  name
                    name {
                      last "Vega-Arreguin" ,
                      initials "J.C." } } ,
                {
                  name
                    name {
                      last "Ibarra-Laclette" ,
                      initials "E." } } ,
                {
                  name
                    name {
                      last "Jimenez-Moraila" ,
                      initials "B." } } ,
                {
                  name
                    name {
                      last "Martinez" ,
                      initials "O." } } ,
                {
                  name
                    name {
                      last "Vielle-Calzada" ,
                      initials "J.P." } } ,
                {
                  name
                    name {
                      last "Herrera-Estrella" ,
                      initials "L." } } ,
                {
                  name
                    name {
                      last "Herrera-Estrella" ,
                      initials "A." } } } } ,
          from
            journal {
              title {
                iso-jta "BMC Genomics" ,
                ml-jta "BMC Genomics" ,
                issn "1471-2164" ,
                name "BMC genomics" } ,
              imp {
                date
                  std {
                    year 2009 ,
                    month 7 ,
                    day 6 } ,
                volume "10" ,
                issue "1" ,
                pages "299" ,
                language "ENG" ,
                pubstatus aheadofprint ,
                history {
                  {
                    pubstatus received ,
                    date
                      std {
                        year 2008 ,
                        month 12 ,
                        day 2 } } ,
                  {
                    pubstatus accepted ,
                    date
                      std {
                        year 2009 ,
                        month 7 ,
                        day 6 } } ,
                  {
                    pubstatus aheadofprint ,
                    date
                      std {
                        year 2009 ,
                        month 7 ,
                        day 6 } } ,
                  {
                    pubstatus other ,
                    date
                      std {
                        year 2009 ,
                        month 7 ,
                        day 8 ,
                        hour 9 ,
                        minute 0 } } ,
                  {
                    pubstatus pubmed ,
                    date
                      std {
                        year 2009 ,
                        month 7 ,
                        day 8 ,
                        hour 9 ,
                        minute 0 } } ,
                  {
                    pubstatus medline ,
                    date
                      std {
                        year 2009 ,
                        month 7 ,
                        day 8 ,
                        hour 9 ,
                        minute 0 } } } } } ,
          ids {
            pii "1471-2164-10-299" ,
            doi "10.1186/1471-2164-10-299" ,
            pubmed 19580677 } } ,
        pmid 19580677 } } ,
    pub {
      pub {
        sub {
          authors {
            names
              std {
                {
                  name
                    name {
                      last "Vega-Arreguin" ,
                      first "Julio" ,
                      initials "J.C." } } ,
                {
                  name
                    name {
                      last "Ibarra-Laclette" ,
                      first "Enrique" ,
                      initials "E." } } ,
                {
                  name
                    name {
                      last "Jimenez-Moraila" ,
                      first "Beatriz" ,
                      initials "B." } } ,
                {
                  name
                    name {
                      last "Martinez" ,
                      first "Octavio" ,
                      initials "O." } } ,
                {
                  name
                    name {
                      last "Vielle-Calzada" ,
                      first "Jean" ,
                      initials "J.Philippe." } } ,
                {
                  name
                    name {
                      last "Herrera-Estrella" ,
                      first "Luis" ,
                      initials "L." } } ,
                {
                  name
                    name {
                      last "Herrera-Estrella" ,
                      first "Alfredo" ,
                      initials "A." } } } ,
            affil
              std {
                affil "Laboratorio Nacional de Genomica para la 
Biodiversidad" ,
                div "Cinvestav Campus Guanajuato" ,
                city "Irapuato" ,
                sub "Guanajuato" ,
                country "Mexico" ,
                street "Km 9.6 Libramiento Norte, Carretera Irapuato-Leon" ,
                postal-code "36821" } } ,
          medium other ,
          date
            std {
              year 2009 ,
              month 3 ,
              day 23 } } } } ,
    user {
      type
        str "GenomeProjectsDB" ,
      data {
        {
          label
            str "ProjectID" ,
          data
            int 36555 } ,
        {
          label
            str "ParentID" ,
          data
            int 0 } } } ,
    create-date
      std {
        year 2009 ,
        month 5 ,
        day 5 } ,
    update-date
      std {
        year 2009 ,
        month 7 ,
        day 14 } } ,
  inst {
    repr raw ,
    mol rna ,
    length 450 ,
    seq-data
      ncbi2na 
'77499DA7905DD417DCB7F1D538536238E08229108D89A87E2CDA6282DA3AD02
0524AE9C0D4154576794E0420BFA8E351A9ED347A504D3B6FE927E94E475EB17A52427227B820A
A21086117F7597EFB837ED2FB463AEF9F9E774052FD00FA0C1C803A521131212AFFB00D11CDD63
760CFF0'H } }


Maybe I am using the wrong format? This looks more like ASN than genbank 
format to me.

Phillip



More information about the Bioperl-l mailing list