[Biopython] fasta-m10 al_start and al_end?

Fri Oct 23 18:40:12 UTC 2009

On Fri, Oct 23, 2009 at 4:57 PM, Anne Pajon <ap12 at sanger.ac.uk> wrote:
> Dear,
>
> I am using Biopython to parse a fasta alignment file:
>
>    alignments =
> AlignIO.parse(open("fastaresults/78_Spneumoniae_ATCC700669/all_bases_435_1055_cds.fres"),
> "fasta-m10", seq_count=2)
>    for alignment in alignments:
>
>        record_query = alignment[0]
>        record_match = alignment[1]
>
>        print alignment._annotations["sw_score"],
> alignment._annotations["sw_ident"]
>        print record_query.annotations["original_length"]
>        # print record_query.annotations["al_start"],
> record_query.annotations["al_end"]
>
> I would like to print the start/end of each aligned sequences.
>
> I can see in Bio.AlignIO.FastaIO.next() that sq_len is stored in
> annotations:
>        record.annotations["original_length"] =
> int(query_annotation["sq_len"])
> but I cannot find a way of accessing at_start and al_end.
>
> Thanks in advance for your help.
> Kind regards,
> Anne.

Hi Anne,

That's a good question, but the answer may be a little
disappointing.

That information isn't currently recorded in the SeqRecord,
partly because at the time I didn't need it, but mainly I was
undecided about if the start location should be converted
into python counting or not (zero based versus one based).
What would you prefer? My inclination is python counting.

Peter

P.S. Most of the alignment level annotation is recorded,
but is currently hidden in a "private" property (leading
underscore). You can access this, but be warned that this
will change in future - Improving the alignment object is
something I am working on for a future release.