[Bioperl-l] Bio::DB::EntrezGene or Bio::DB::Query::GenBank to obtain sequence metadata without sequence
Chris Fields
cjfields at illinois.edu
Sun Oct 11 22:53:18 UTC 2009
On Oct 11, 2009, at 5:03 PM, Dan Kortschak wrote:
> Hi Russell,
>
> I ended up using a hodgepodge of Bio::DB::GenBank and
> Bio::DB::EUtililties
>
> #!/usr/bin/perl
>
> use Bio::DB::EUtilities;
> use Bio::DB::GenBank;
>
> my @uids = qw(89161185 89161199 89161205 89161207 51511721 89161210
> 89161213 51511724 89161216 89161187 51511727 89161190 51511729
> 51511730 51511731 51511732 51511734 51511735 42406306 51511747
> 51511750 89161203 17981852 89161218 89161220);
>
> my $gb = Bio::DB::GenBank->new(-complexity => 1,-seq_stop=>1);
> my $seqio = $gb->get_Stream_by_gi(\@uids);
> my $summary = Bio::DB::EUtilities->new(-eutil => 'esummary',
> -db => 'nucleotide',
> -id => \@uids);
>
> print "chromosome,refSeq,name,length\n";
>
> my $index=0;
>
> while (my $seq = $seqio->next_seq() and my $ds = $summary-
> >next_DocSum) {
> warn "Database queries don't reconcile",$seq->primary_id,"-",$ds-
> >get_id,"\n" if $seq->primary_id != $ds->get_id;
> print $index++,",",$seq->id(),".",$seq->version,",";
> ($feat)=$seq->get_SeqFeatures;
> if( defined $seq->species && $feat->annotation->get_Annotations
> ('chromosome')) {
> print $seq->species->binomial;
> print " chromosome ",$feat->annotation->get_Annotations
> ('chromosome'),",";
> } elsif (defined $seq->species && $feat->annotation->get_Annotations
> ('organelle')) {
> print $seq->species->binomial;
> print " ",$feat->annotation->get_Annotations('organelle'),",";
> } else {
> $_=$seq->desc;
> /^([[:alnum:] ]+)[[:graph:]]/;
> print "$1,";
> }
> print $ds->get_contents_by_name('Length') if $ds-
> >get_contents_by_name('Length');
> print "\n";
> }
>
>
> If Bio::DB:EUtilities gives rich seq or equivalent I might change over
> from Bio::DB::GenBank, but it pretty much works at the moment (the
> fail-overs give me grief and I don't like the kludge of asking for a
> single base, but they work to get some of the details that the DocSum
> doesn't - sensible title for example).
>
> cheers
> Dan
Bio::DB::EUtilities is a generic front-end to NCBI's eutils, so you
can retrieve the gene info in whatever format via efetch and pass it
on to the appropriate parser (Bio::SeqIO::entrezgene, which requires
ASN1 at this time). It's probably best to save the ASN1 data to a
temp file prior to that (getting the raw content as a file handle
isn't set up yet).
chris
More information about the Bioperl-l
mailing list