[Bioperl-l] bp_classify_hits_kingdom.pl
Jason Stajich
jason.stajich at gmail.com
Wed Jun 22 16:04:59 UTC 2011
Hi Dan
Looks like the mformat 6 is the right one for the blastplus toolkit - it is m8 or m9 for the C toolkit blastall application.
I think DB_File was falling over with the now 40M+ gi to taxid pairs that I think were overwhelming DB_File and the berkkeleyDB implementation there.
To solve it I reimplemented it with SQLite -- which will require you to install DBD::SQLite.
I've checked in code to the main trunk in the github repo if you want to take a look -- you can either download the file https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS or check it out via git (recommended).
-jason
On Jun 21, 2011, at 7:25 AM, Jackson, Daniel wrote:
> Hi Jason,
>
> My name is Dan and I'm hoping to use your bioperl script bp_classify_hits_kingdom.pl to categorise some ESTs I recently acquired. I've been stuck on this problem for days now - can you help?!? I suspect it's an easy and obvious solution.... I'm not a complete newbie to using scripts, but wouldn't say I'm experienced! I've just installed Bioperl and have generated a small BLASTx test file of my sequences searched against a local installation of GenBank's nr database. The BLAST search was run locally as follows:
>
> gzgbio-48:~ djackson$ blastx -query /Users/djackson/Desktop/10_Vaceltia_contigs_fna.txt -db /Users/djackson/BLAST-2.2.25+/db/nr/nr -outfmt 6 -out /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6.txt -numthreads 2 -evalue .00001 -show_gis -num_descriptions 10 -num_alignments 10 -max_target_seqs 10
>
>
> The results of this file are attached (BLASTx_10_Vaceltia_contigs_m6.txt). I realise the BLASTx output is supposed to be in -outfmt 8 or -outfmt 9, but providing these files to bp_classify_hits_kingdom.pl generates the following error:
>
> gzgbio-48:~ djackson$ bp_classify_hits_kingdom.pl -t /Users/djackson/taxdump -g /Users/djackson/taxdump/gi_taxid_prot.dmp -i /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m9.txt -e .0001
> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m9.txt
> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 1.
> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 1.
> Use of uninitialized value $hname in pattern match (m//) at /usr/local/bin/bp_classify_hits_kingdom.pl line 148, <$fh> line 1.
> Use of uninitialized value $hname in concatenation (.) or string at /usr/local/bin/bp_classify_hits_kingdom.pl line 195, <$fh> line 1.
> no GI in
> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 2.
> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 2.
> Use of uninitialized value $hname in pattern match (m//) at /usr/local/bin/bp_classify_hits_kingdom.pl line 148, <$fh> line 2.
> Use of uninitialized value $hname in concatenation (.) or string at /usr/local/bin/bp_classify_hits_kingdom.pl line 195, <$fh> line 2.
> no GI in
> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 3.
> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 3.
> .
> .
> etc...
> .
> .
> no GI in
> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m9 total=1182
> 1182 100.00%
> gzgbio-48:~ djackson$
>
>
> Providing the bp_classify_hits_kingdom.pl script with an -outfmt 6 format seems to get closer to a meaningful output, but still generates the following error:
>
>
> gzgbio-48:~ djackson$ bp_classify_hits_kingdom.pl -t /Users/djackson/taxdump -g /Users/djackson/taxdump/gi_taxid_prot.dmp -i /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6.txt -e .0001 -v
> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6.txt
> no taxid for 51127506
> no taxid for 51127506
> no taxid for 51127506
> no taxid for 317419045
> no taxid for 47219014
> .
> .
> etc...
> .
> .
> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6 total=10
> 10 100.00%
> gzgbio-48:~ djackson$
>
>
> Kind regards and thanks in advance,
> Dan
>
> <BLASTx_10_Vaceltia_contigs_m6.txt>
>
>
> ---------------------------------------------------------------
> Junior Professor Daniel J. Jackson
> Courant Research Centre Geobiology
> Georg-August University of Göttingen
> Goldschmidtstr.3
> 37077 Göttingen
> Germany
>
> Tel: +49 (0) 551 39 14177
> Fax: +49 (0) 551 39 7918
>
> djackso at uni-goettingen.de
> http://www.uni-goettingen.de/en/102705.html
> ---------------------------------------------------------------
>
>
>
>
>
More information about the Bioperl-l
mailing list