[Bioperl-l] bp_classify_hits_kingdom.pl

Chris Fields cjfields at illinois.edu
Wed Jun 22 19:51:21 UTC 2011


We should actually do a general switchover to SQLite I think, or at least abstract that.

chris

On Jun 22, 2011, at 11:04 AM, Jason Stajich wrote:

> Hi Dan
> 
> Looks like the mformat 6 is the right one for the blastplus toolkit - it is m8 or m9 for the C toolkit blastall application.
> 
> I think DB_File was falling over with the now 40M+ gi to taxid pairs that I think were overwhelming DB_File and the berkkeleyDB implementation there.
> 
> To solve it I reimplemented it with SQLite -- which will require you to install DBD::SQLite.
> 
> I've checked in code to the main trunk in the github repo if you want to take a look -- you can either download the file https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS or check it out via git (recommended). 
> 
> -jason
> On Jun 21, 2011, at 7:25 AM, Jackson, Daniel wrote:
> 
>> Hi Jason,
>> 
>> My name is Dan and I'm hoping to use your bioperl script bp_classify_hits_kingdom.pl to categorise some ESTs I recently acquired. I've been stuck on this problem for days now - can you help?!? I suspect it's an easy and obvious solution.... I'm not a complete newbie to using scripts, but wouldn't say I'm experienced! I've just installed Bioperl and have generated a small BLASTx test file of my sequences searched against a local installation of GenBank's nr database. The BLAST search was run locally as follows:
>> 
>> gzgbio-48:~ djackson$ blastx -query /Users/djackson/Desktop/10_Vaceltia_contigs_fna.txt -db /Users/djackson/BLAST-2.2.25+/db/nr/nr -outfmt 6 -out /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6.txt -numthreads 2 -evalue .00001 -show_gis -num_descriptions 10 -num_alignments 10 -max_target_seqs 10
>> 
>> 
>> The results of this file are attached (BLASTx_10_Vaceltia_contigs_m6.txt). I realise the BLASTx output is supposed to be in -outfmt 8 or -outfmt 9, but providing these files to bp_classify_hits_kingdom.pl generates the following error:
>> 
>> gzgbio-48:~ djackson$ bp_classify_hits_kingdom.pl -t /Users/djackson/taxdump -g /Users/djackson/taxdump/gi_taxid_prot.dmp -i /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m9.txt  -e .0001
>> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m9.txt
>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 1.
>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 1.
>> Use of uninitialized value $hname in pattern match (m//) at /usr/local/bin/bp_classify_hits_kingdom.pl line 148, <$fh> line 1.
>> Use of uninitialized value $hname in concatenation (.) or string at /usr/local/bin/bp_classify_hits_kingdom.pl line 195, <$fh> line 1.
>> no GI in
>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 2.
>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 2.
>> Use of uninitialized value $hname in pattern match (m//) at /usr/local/bin/bp_classify_hits_kingdom.pl line 148, <$fh> line 2.
>> Use of uninitialized value $hname in concatenation (.) or string at /usr/local/bin/bp_classify_hits_kingdom.pl line 195, <$fh> line 2.
>> no GI in
>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 3.
>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 3.
>> .
>> .
>> etc...
>> .
>> .
>> no GI in
>> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m9 total=1182
>>                    1182 100.00%
>> gzgbio-48:~ djackson$
>> 
>> 
>> Providing the bp_classify_hits_kingdom.pl script with an -outfmt 6 format seems to get closer to a meaningful output, but still generates the following error:
>> 
>> 
>> gzgbio-48:~ djackson$ bp_classify_hits_kingdom.pl -t /Users/djackson/taxdump -g /Users/djackson/taxdump/gi_taxid_prot.dmp -i /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6.txt  -e .0001 -v
>> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6.txt
>> no taxid for 51127506
>> no taxid for 51127506
>> no taxid for 51127506
>> no taxid for 317419045
>> no taxid for 47219014
>> .
>> .
>> etc...
>> .
>> .
>> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6 total=10
>>                    10 100.00%
>> gzgbio-48:~ djackson$
>> 
>> 
>> Kind regards and thanks in advance,
>> Dan
>> 
>> <BLASTx_10_Vaceltia_contigs_m6.txt>
>> 
>> 
>> ---------------------------------------------------------------
>> Junior Professor Daniel J. Jackson
>> Courant Research Centre Geobiology
>> Georg-August University of Göttingen
>> Goldschmidtstr.3
>> 37077 Göttingen
>> Germany
>> 
>> Tel: +49 (0) 551 39 14177
>> Fax: +49 (0) 551 39 7918
>> 
>> djackso at uni-goettingen.de
>> http://www.uni-goettingen.de/en/102705.html
>> ---------------------------------------------------------------
>> 
>> 
>> 
>> 
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list