[Biojava-dev] bioperl like blastparser
Andreas Prlic
ap3 at sanger.ac.uk
Thu Dec 20 16:15:31 UTC 2007
Hi Michael,
The blast parser (BlastLikeSaxParser) in BioJava has been around for
a while and is frequently being used to parse a variety
of different blast outputs. Still it is not complete and can not
parse PSI blast. We have had a number of request about it lately
so I suppose it needs a little maintenance now.
To write a new blast parser from scratch will involve a significant
amount of time. It will take time to fix all the bugs, add support
for the different blast versions and write documentation. Much of
this is already available in BioJava, so I would prefer if you could
submit patches for
the current blast parser. Would you also be interested to
collaborate in this direction?
Another feature that would be nice to add support for is the
possibility to send off blast searches to webservices...
Cheers,
Andreas
On 20 Dec 2007, at 12:54, Michael Gang wrote:
> Hi All,
>
> I used the interface of the java blast parser.
> I had mainly two problems with it:
> 1) The blast parser does not parse all the information (for example
> query length)
> 2) The blast parser parses the whole blast report into a list which
> eats a lot of memory.
>
> I would be interested to write and contribute a blast parser which
> parses all the information of the blast and parses the blast
> iteratively.
> Something like the following code in bioperl (just in Java).
> use Bio::SearchIO;
> # format can be 'fasta', 'blast'
> my $searchio = new Bio::SearchIO( -format => 'blastxml',
> -file => 'blastout.xml' );
> while ( my $result = $searchio->next_result() ) {
> while( my $hit = $result->next_hit ) {
> # process the Bio::Search::Hit::HitI object
> while( my $hsp = $hit->next_hsp ) {
> # process the Bio::Search::HSP::HSPI object
> }
> }
>
> Would you be interested in such a contribution ?
>
> Best regards,
> Michael
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-----------------------------------------------------------------------
Andreas Prlic Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA, UK
+44 (0) 1223 49 6891
-----------------------------------------------------------------------
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the biojava-dev
mailing list