[Biopython-dev] Blast parsers and records

Michael Sandford sandford at ufl.edu
Sun May 30 02:35:18 UTC 2010


I've got a few comments as well:
> 4) The current Blast record stores its information in attributes. If you use Bio.Entrez to parse Blast XML output (Biopython 1.54 contains the necessary DTDs to do so), the information is stored in dictionaries. This has some advantages. For example, it allows you to use record.keys() to find out what the record contains. Ideally, I think that a Blast Record class should inherit from a dictionary.
>    

The disadvantage that I can immediately think of using this methodology 
is that you lose the ability to have a heavyweight IDE give you 
intellisense on what fields are available.  Many may say that 
intellisense is evil and/or a crutch and I won't really argue that.  But 
Eclipse is pretty good at giving you options if you type in 
"variablename." and then it'll bring up a whole list of attributes and 
functions, and I find that handy.  Moving to a dictionary based approach 
will stop that.

Calling dir(variablename) will enable you to see not only the attributes 
available, but the functions as well.  That may not be as elegant as 
iterating over keys in a dictionary but it is some measure of an 
alternative.

It seems to me that there is a fair amount of xml parsing that gets done 
in bioinformatics these days.  I know that one of the goals of the 
project is minimal dependence on external libraries, however, I think 
that lxml ( http://codespeak.net/lxml/) might provide some rather 
substantial gains in terms of parsing code complexity reduction.  I also 
think that the lxml/etree representation of parsed data is fairly 
reasonable.

Mike
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>    




More information about the Biopython-dev mailing list