[Biopython-dev] Fwd: [blast-announce] Correction: BLAST 2.2.24 release announcement

Peter biopython at maubp.freeserve.co.uk
Tue Sep 7 11:17:37 UTC 2010


On Sat, Sep 4, 2010 at 4:23 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Being able to convert Blast ASN.1 output into any of the other formats
> will make a big difference to us. If we had a parser for ASN.1 Blast
> output, then strictly speaking there is no reason to have a parser for
> any of the other formats (in practice, we can be more flexible of course).

Dave Messina made a good point on the BioPerl list that (depending
on what data you are interested in) post-processing to generate the
alignment views is a waste of CPU time:
http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033972.html

Also, and we see this already with the XML output, the output file
size is quite inflated - especially if all you need can be presented
in one of the tabular forms which is smaller and quicker to parse.

So yes, in principle a parser for ASN.1 Blast would be all we need,
but in practice tabular/plaintext/XML BLAST parsers are still useful.

> I looked some more into the Blast parser issues we discussed
> earlier (starting here:
> http://lists.open-bio.org/pipermail/biopython-dev/2010-May/007762.html
> ). Unfortunately things are not as easy as I had hoped. Except
> for the new ASN.1 output format, none of the other output
> formats (plain text, XML, tabular) contain all of the output
> generated by the Blast run. Some results are only found in
> the XML, some only in the plain text output, and tabular
> output can contain all kinds of stuff depending on the exact
> options that were used. As a consequence, it's hard to design
> a generic Blast record class; having a specialized Record
> class for plain text, XML, and tabular seems more appropriate,
> and these record classes may not be fully consistent with each
> other (some elements may exist in one class but not in the
> other).

I thought it would be hard :(

> Also, we cannot read in the Blast output in one format and
> write out the Blast output in a different format (at least not
> reliably).

In some cases this isn't surprising of course (e.g. tabular
to XML isn't going to work).

> With the format converter in Blast 2.2.24, luckily there is
> no longer such a need for such converters in Biopython.
> If we had an ASN.1 parser, we could run Blast, save its
> output in ASN.1, load the Blast output into Python, filter
> the Blast output or otherwise modify it, write out the
> modified output in ASN.1 format, and then use the Blast
> 2.2.24 format converter to convert the modified output
> to plain text or some other format. That would be really
> useful.
>
> Unfortunately, making a parser for ASN.1 will not be so
> easy. As far as I know there isn't anything like expat or
> DOM for ASN.1 like we have for XML. Maybe this is
> something for a google summer of code?

Maybe. There are some python libraries out there for
ASN.1 (it is an ISO standard used beyond the NCBI).
http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One
http://bitbucket.org/haypo/hachoir/wiki/hachoir-parser

Peter



More information about the Biopython-dev mailing list