[Bioperl-l] bp_classify_hits_kingdom.pl

Bernd Web bernd.web at gmail.com
Wed Jun 22 18:07:39 UTC 2011


Hi Jason,

I did GI to TAX mapping in Perl alone. Nice to know this script
exists. Thanks for this.
Just one question, I noticed on
https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS:

line 96: my $dbh = tie(%gi2node, 'DB_File', 'gi2class');
and
line 100: my $dbh2 = my $dbh =
DBI->connect("dbi:SQLite:dbname=$giidxfile","","");

So  the second $dbh masks earlier declaration.


Cheers,
Bernd

On Wed, Jun 22, 2011 at 6:04 PM, Jason Stajich <jason.stajich at gmail.com> wrote:
> Hi Dan
>
> Looks like the mformat 6 is the right one for the blastplus toolkit - it is m8 or m9 for the C toolkit blastall application.
>
> I think DB_File was falling over with the now 40M+ gi to taxid pairs that I think were overwhelming DB_File and the berkkeleyDB implementation there.
>
> To solve it I reimplemented it with SQLite -- which will require you to install DBD::SQLite.
>
> I've checked in code to the main trunk in the github repo if you want to take a look -- you can either download the file https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/classify_hits_kingdom.PLS or check it out via git (recommended).
>
> -jason
> On Jun 21, 2011, at 7:25 AM, Jackson, Daniel wrote:
>
>> Hi Jason,
>>
>> My name is Dan and I'm hoping to use your bioperl script bp_classify_hits_kingdom.pl to categorise some ESTs I recently acquired. I've been stuck on this problem for days now - can you help?!? I suspect it's an easy and obvious solution.... I'm not a complete newbie to using scripts, but wouldn't say I'm experienced! I've just installed Bioperl and have generated a small BLASTx test file of my sequences searched against a local installation of GenBank's nr database. The BLAST search was run locally as follows:
>>
>> gzgbio-48:~ djackson$ blastx -query /Users/djackson/Desktop/10_Vaceltia_contigs_fna.txt -db /Users/djackson/BLAST-2.2.25+/db/nr/nr -outfmt 6 -out /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6.txt -numthreads 2 -evalue .00001 -show_gis -num_descriptions 10 -num_alignments 10 -max_target_seqs 10
>>
>>
>> The results of this file are attached (BLASTx_10_Vaceltia_contigs_m6.txt). I realise the BLASTx output is supposed to be in -outfmt 8 or -outfmt 9, but providing these files to bp_classify_hits_kingdom.pl generates the following error:
>>
>> gzgbio-48:~ djackson$ bp_classify_hits_kingdom.pl -t /Users/djackson/taxdump -g /Users/djackson/taxdump/gi_taxid_prot.dmp -i /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m9.txt  -e .0001
>> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m9.txt
>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 1.
>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 1.
>> Use of uninitialized value $hname in pattern match (m//) at /usr/local/bin/bp_classify_hits_kingdom.pl line 148, <$fh> line 1.
>> Use of uninitialized value $hname in concatenation (.) or string at /usr/local/bin/bp_classify_hits_kingdom.pl line 195, <$fh> line 1.
>> no GI in
>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 2.
>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 2.
>> Use of uninitialized value $hname in pattern match (m//) at /usr/local/bin/bp_classify_hits_kingdom.pl line 148, <$fh> line 2.
>> Use of uninitialized value $hname in concatenation (.) or string at /usr/local/bin/bp_classify_hits_kingdom.pl line 195, <$fh> line 2.
>> no GI in
>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 3.
>> Use of uninitialized value $evalue in numeric gt (>) at /usr/local/bin/bp_classify_hits_kingdom.pl line 143, <$fh> line 3.
>> .
>> .
>> etc...
>> .
>> .
>> no GI in
>> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m9 total=1182
>>                     1182 100.00%
>> gzgbio-48:~ djackson$
>>
>>
>> Providing the bp_classify_hits_kingdom.pl script with an -outfmt 6 format seems to get closer to a meaningful output, but still generates the following error:
>>
>>
>> gzgbio-48:~ djackson$ bp_classify_hits_kingdom.pl -t /Users/djackson/taxdump -g /Users/djackson/taxdump/gi_taxid_prot.dmp -i /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6.txt  -e .0001 -v
>> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6.txt
>> no taxid for 51127506
>> no taxid for 51127506
>> no taxid for 51127506
>> no taxid for 317419045
>> no taxid for 47219014
>> .
>> .
>> etc...
>> .
>> .
>> /Users/djackson/Desktop/BLASTx_10_Vaceltia_contigs_m6 total=10
>>                     10 100.00%
>> gzgbio-48:~ djackson$
>>
>>
>> Kind regards and thanks in advance,
>> Dan
>>
>> <BLASTx_10_Vaceltia_contigs_m6.txt>
>>
>>
>> ---------------------------------------------------------------
>> Junior Professor Daniel J. Jackson
>> Courant Research Centre Geobiology
>> Georg-August University of Göttingen
>> Goldschmidtstr.3
>> 37077 Göttingen
>> Germany
>>
>> Tel: +49 (0) 551 39 14177
>> Fax: +49 (0) 551 39 7918
>>
>> djackso at uni-goettingen.de
>> http://www.uni-goettingen.de/en/102705.html
>> ---------------------------------------------------------------
>>
>>
>>
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>




More information about the Bioperl-l mailing list