[Bioperl-l] acquiring a local refseq + index
Chris Fields
cjfields at uiuc.edu
Sun Dec 31 19:36:47 UTC 2006
As a followup, I have committed the fix Erik had in Bugzilla. I
don't know if this helps with the below issue Erik describes (they
sound unrelated).
chris
On Dec 30, 2006, at 8:33 PM, Chris Fields wrote:
> Agree with Hilmar, in that we need examples. If you are referring to
> your submitted bug:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2167
>
> we could add this in as long as it passes (I'll try giving it a
> workout with my local bacterial seqs tonight or tomorrow). However,
> in the not-too-distant future your patch would likely be rendered
> obsolete, as any parsing in Bio::SeqIO modules pertaining to
> Bio::Species-related matters will be deprecated in favor of simple
> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has
> optional db lookups using NCBI Taxonomy). Bio::Species and anything
> related to it are considered marked for deprecation. Fair warning...
>
> chris
>
> On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote:
>
>> Can you send examples and the resulting error messages? Also, I'm
>> assuming you running the 1.5.2 release of Bioperl; if not that's what
>> I would try first.
>>
>> -hilmar
>>
>> On Dec 30, 2006, at 7:05 PM, Erik wrote:
>>
>>> Hi all,
>>>
>>> I downloaded the refseq files (.gbff) and want to index the lot with
>>> Bio::DB::Flat.
>>>
>>> It turns out that there are many cases where the SOURCE and
>>> ORGANISM lines
>>> are messed up, sometimes to a degree where the indexing fails on a
>>> Bio::SeqIO::genbank error.
>>>
>>> I'd like to change Bio::SeqIO::genbank to let this parsing go at
>>> least so
>>> far as to make the indexing of the refseq files possible, and
>>> hopefully
>>> improving the taxonomic output ($seq->species->binomial is often
>>> mutilated
>>> at the moment).
>>>
>>> Is it still worthwhile to change parsing modules like
>>> Bio::SeqIO::genbank?
>>> Is anyone already working on a rewrite? Because if this is the
>>> case I may
>>> be better off writing my own indexing scheme?
>>>
>>> Below is (outline of) my indexing program, which uses
>>> Bio::DB::Flat::DBD.
>>> If anyone knows of a better way to get a locally searchable refseq
>>> flat
>>> file index, I would be very interested.
>>>
>>> Thanks for your help,
>>>
>>> Erikjan
>>>
>>>
>>> -------------
>>> use Bio::DB::Flat;
>>>
>>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
>>> my $db=Bio::DB::Flat->new(
>>> -directory => $refseq_dir,
>>> -dbname => 'refseq',
>>> -format => 'genbank',
>>> -index => 'bdb',
>>> -write_flag => 1,
>>> );
>>> my @files = getfiles($refseq_dir);
>>> for my $f (@files) {
>>> db->build_index($f);
>>> }
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list