[Bioperl-l] GI identifier missing when using Bio::Index::GenBank?

Brian Osborne osborne1 at optonline.net
Thu Apr 27 18:04:18 UTC 2006


Todd,

No, I don't think so, I think this is a bug. Can you put this into Bugzilla
along with that Genbank file, CJ521890, that shows it? Then I'll take a
closer look...

Brian O.


On 4/27/06 1:56 PM, "Todd Richmond" <richmond.todd at gmail.com> wrote:

> I could, but I don't want to store all that information. For instance,
> in the past two weeks, 387000 plant sequences have been added to
> GenBank. I'm interested in storing complete information for the ~600
> sequences from that set that are related to the gene families I'm
> interested in.
> 
> I can certainly come up with a workaround myself by implementing a
> hash of accession/gi numbers or modifiying the load script supplied by
> bioperl to accept a list of accession numbers as a filter. I was just
> wondering if I'm missing something obvious...
> 
> Todd
> 
> 
> On 4/27/06, Brian Osborne <osborne1 at optonline.net> wrote:
>> Todd,
>> 
>> Can't you go directly from the daily update to the database?
>> 
>> Brian O.
>> 
>> 
>> On 4/26/06 9:47 PM, "Todd Richmond" <richmond.todd at gmail.com> wrote:
>> 
>>> I've got an application where I grab the daily updates from NCBI, pull
>>> out just the plant sequences and store them in a separate flat file.
>>> Then I use Bio::Index::GenBank to index the plant flat file so I can
>>> pull out my sequences of interest. I'm in the midst of converting my
>>> scripts to using bioperl-db/biosql so I can push those sequences into
>>> the database. The problem is that the NCBI GI identifier isn't
>>> returned when using the index file.
>>> 
>>> When I run the following test script:
>>> ***
>>> use Bio::Index::GenBank;
>>> use Bio::SeqIO;
>>> use strict;
>>> my $Index_File_Name = 'nc0425.idx';
>>> my $inx = Bio::Index::GenBank->new('-filename' => $Index_File_Name);
>>> 
>>> my $seqio = new Bio::SeqIO( '-format' => 'genbank' );
>>> my $seq = $inx->get_Seq_by_acc('CJ521890');
>>> $seqio->write_seq($seq);
>>> ***
>>> 
>>> Diffing to the original GenBank record, the only difference is the GI
>>> identifier:
>>> 
>>> diff CJ521890_orig.out CJ521890_seqio.out
>>> 5c5
>>> < VERSION     CJ521890.1  GI:93266243
>>> ---
>>>> VERSION     CJ521890.1
>>> 
>>> Is this expected behaviour? If so, is there a workaround that will
>>> allow me to retrieve the GI from the index file so I can store it in
>>> the bioentry table?
>>> 
>>> Thanks, Todd
>>> 
>>> 
>>> --
>>> Todd Richmond
>>> richmond.todd at gmail.com
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> 
> 
> 
> --
> Todd Richmond
> richmond.todd at gmail.com





More information about the Bioperl-l mailing list