[Bioperl-l] acquiring a local refseq + index
Chris Fields
cjfields at uiuc.edu
Sun Dec 31 02:33:23 UTC 2006
Agree with Hilmar, in that we need examples. If you are referring to
your submitted bug:
http://bugzilla.open-bio.org/show_bug.cgi?id=2167
we could add this in as long as it passes (I'll try giving it a
workout with my local bacterial seqs tonight or tomorrow). However,
in the not-too-distant future your patch would likely be rendered
obsolete, as any parsing in Bio::SeqIO modules pertaining to
Bio::Species-related matters will be deprecated in favor of simple
parsing (more foolproof, less uncertainty) and Bio::Taxon (which has
optional db lookups using NCBI Taxonomy). Bio::Species and anything
related to it are considered marked for deprecation. Fair warning...
chris
On Dec 30, 2006, at 7:48 PM, Hilmar Lapp wrote:
> Can you send examples and the resulting error messages? Also, I'm
> assuming you running the 1.5.2 release of Bioperl; if not that's what
> I would try first.
>
> -hilmar
>
> On Dec 30, 2006, at 7:05 PM, Erik wrote:
>
>> Hi all,
>>
>> I downloaded the refseq files (.gbff) and want to index the lot with
>> Bio::DB::Flat.
>>
>> It turns out that there are many cases where the SOURCE and
>> ORGANISM lines
>> are messed up, sometimes to a degree where the indexing fails on a
>> Bio::SeqIO::genbank error.
>>
>> I'd like to change Bio::SeqIO::genbank to let this parsing go at
>> least so
>> far as to make the indexing of the refseq files possible, and
>> hopefully
>> improving the taxonomic output ($seq->species->binomial is often
>> mutilated
>> at the moment).
>>
>> Is it still worthwhile to change parsing modules like
>> Bio::SeqIO::genbank?
>> Is anyone already working on a rewrite? Because if this is the
>> case I may
>> be better off writing my own indexing scheme?
>>
>> Below is (outline of) my indexing program, which uses
>> Bio::DB::Flat::DBD.
>> If anyone knows of a better way to get a locally searchable refseq
>> flat
>> file index, I would be very interested.
>>
>> Thanks for your help,
>>
>> Erikjan
>>
>>
>> -------------
>> use Bio::DB::Flat;
>>
>> my $refseq_dir = '/data/ftp.ncbi.nih.gov/refseq/release/complete';
>> my $db=Bio::DB::Flat->new(
>> -directory => $refseq_dir,
>> -dbname => 'refseq',
>> -format => 'genbank',
>> -index => 'bdb',
>> -write_flag => 1,
>> );
>> my @files = getfiles($refseq_dir);
>> for my $f (@files) {
>> db->build_index($f);
>> }
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list