[Biopython-dev] Proposal of a patch for FastaIO

Peter Cock p.j.a.cock at googlemail.com
Thu Jun 21 09:48:22 UTC 2012


On Thu, Jun 21, 2012 at 7:30 AM, Roberto Mosca
<roberto.mosca at irbbarcelona.org> wrote:
> Dear Biopython developers,
>
> I am new to this mailing list but I would like to propose a patch for
> the parser of the fasta-m10 alignment format and I do not really know
> how to do it...
>
> The fact is that from a python script I need to access the details
> (start and stop residues) of the alignment in every sequence and also
> the E-value (sw_expect) which are saved in private variables (_al_start,
> _al_stop and _annotations["sw_expect"]) but are not accessible from the
> public interface of the SeqRecord and Alignment classes.
>
> For this reason I had to modify the Bio/AlignIO/FastaIO.py file of my
> local copy of BioPython.
>
> Since I feel that other people could also benefit from these changes, I
> would like to propose to include them in the standard distribution, but
> I do not know what is the right procedure to follow. Could you help me?
>
> I attach the patch file to FastaIO.py.
>
> The code introduces two keys in the public "annotations" member of
> SeqRecord ("start" and "end") and one key ("sw_expect") in the public
> "annotations" member of Alignment.
>
> Can someone give me any feedback on this?
>
> I also have a github account (rmosca)...
>
> Thank you in advance and thanks for this wonderful library that is
> BioPython!
>
> Roberto

Hi Roberto,

Apologies if my quick reply to your pull request was too curt
(I was checking emails over breakfast again):
https://github.com/biopython/biopython/pull/51

These attributes were deliberately stored under private variables
(_al_start, _al_stop and _annotations["sw_expect"]) so that you
can use them in the short term - but was never intended as a
long term solution, see also:
http://lists.open-bio.org/pipermail/biopython/2009-October/005760.html

It has taken longer than I expected, but that work is now happening
in Bow's Google Summer of Code project:
http://biopython.org/wiki/SearchIO

This will probably result in deprecating the current Bio.AlignIO.FastaIO
module (but you'd still be able to use the "fasta-m10" format with the
main Bio.AlignIO.parse() function).

So, for the medium term please just use the private variables -
and help out with testing or other feedback on Bow's GSoC work.
As it happens, Bill Pearson's FASTA -m10 output is (I think) next
on the list...

More generally, I'd like to do something more organised and consistent
for start/end coordinates over other file formats - i.e. in the SeqRecord:
http://lists.open-bio.org/pipermail/biopython-dev/2012-May/009646.html

Regards,

Peter



More information about the Biopython-dev mailing list