[Bioperl-l] parsing blast report with long description
shalabh sharma
shalabh.sharma7 at gmail.com
Thu May 13 15:07:26 UTC 2010
Hi All,
I need some help in parsing blast output.
I have a inhouse database that contain sequences with really long
description.
>SMPL_IDI_1105131728043
/GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_1000008216887/Open
Ocean/Galapagos Islands/134 miles NE of Galapagos/Ecuador/0.1 -
0.8/1d15'51N"/90d17'42W"/2 m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04
IHWWLFEVGQKGFLNFSWCFGQVFKRLEHVCIRPKYVPYSSNLYRDSVKTLETPMWRRNSMRVFLKGSLFAVSLIASGAV
So my blast report looks like this:
.....
.....
>SMPL_IDI_1105131728043
/GS026/SMPL_READ_1095454077952/SMPL_READ_1095454041540/TI_100000821
6887/Open Ocean/Galapagos Islands/134 miles NE of
Galapagos/Ecuador/0.1 - 0.8/1d15'51N"/90d17'42W"/2
m/2386 m/0.22 ug-kg/32.6 psu/27.8 C/2-1-04
Length = 213
Score = 124 bits (310), Expect = 5e-27, Method: Compositional matrix
adjust.
Identities = 62/155 (40%), Positives = 96/155 (61%), Gaps = 1/155 (0%)
.....
.....
(note that the tag "TI_1000008216887" is splitting in two lines).
I am using SeqIO to parse this report. What i am doing is parsing the
description field again to get all the tags. like
....
....
my $desc = $hit->description;
my @f = split('/',$desc);
for(my $i = 0;$i < scalar
@f;$i++){ print OUT "$f[$i]\t";}
.....
.....
*I am getting the perfect parsed report but the field with TI_1000008216887
has a space **TI_100000821 6887 *.
I would really appreciate if anyone can help me out.
Thanks
Shalabh Sharma
More information about the Bioperl-l
mailing list