[BioPython] Clustalw.parse_file errors
Nick Matzke
matzke at berkeley.edu
Tue Aug 5 23:19:23 UTC 2008
Never mind, it turns out my alignment file was missing a blank line
after each section of the alignment. The .aln file doesn't have to have
a consensus line with "*", ":" characters in it necessarily, but it does
have to have at least a line of spaces of the length of the aligned
block (this is what protein.aln has).
I inserted a line of spaces after each chunk of the alignment and now it
parses.
(My alignment wasn't generated by Clustal anyway, so I also added this
header line to make the parser happy: "CLUSTAL W (1.83) formatted
alignment done with PROMALS3D")
I.e. for future readers (truncating my .aln file)...
...this got the _star_info error:
================================
CLUSTAL W (1.83) formatted alignment done with PROMALS3D
SctN_Salt ----------------MKNEL---------------------------------------
SctN_EHEC MISEHDSVLEKYPRIQKVLNST--------------------------------------
SctN_Chrm ---------MRLPDIRLIENTL--------------------------------------
SctN_Yers ---------MKLPDIARLTPRL--------------------------------------
SctN_Soda ----------MTCNSQRLASML--------------------------------------
SctN_Laws ----------------MALEYI--------------------------------------
SctN_Chl4 ----------------MEEITTE-------------------------------------
SctN_Salt --------------------------MQRLRLKYPPP---------DGYCR--------W
SctN_EHEC --------------------------VPALSLN-------------SSTRY--------E
SctN_Chrm --------------------------RERLTLAPA---PPGQR---SGVEL--------F
SctN_Yers --------------------------QQQLTRPSAPP---------EGLRY--------R
SctN_Soda --------------------------AQHLTPVDEPP---------DGYRL--------T
SctN_Laws --------------------------ASLLEEAVQNT---------SPVEV--------R
SctN_Chl4 --------------------------FNTLMTELPDV---------QLTAV--------V
===================================
...but this parsed successfully:
================================
CLUSTAL W (1.83) formatted alignment done with PROMALS3D
SctN_Salt ----------------MKNEL---------------------------------------
SctN_EHEC MISEHDSVLEKYPRIQKVLNST--------------------------------------
SctN_Chrm ---------MRLPDIRLIENTL--------------------------------------
SctN_Yers ---------MKLPDIARLTPRL--------------------------------------
SctN_Soda ----------MTCNSQRLASML--------------------------------------
SctN_Laws ----------------MALEYI--------------------------------------
SctN_Chl4 ----------------MEEITTE-------------------------------------
SctN_Salt --------------------------MQRLRLKYPPP---------DGYCR--------W
SctN_EHEC --------------------------VPALSLN-------------SSTRY--------E
SctN_Chrm --------------------------RERLTLAPA---PPGQR---SGVEL--------F
SctN_Yers --------------------------QQQLTRPSAPP---------EGLRY--------R
SctN_Soda --------------------------AQHLTPVDEPP---------DGYRL--------T
SctN_Laws --------------------------ASLLEEAVQNT---------SPVEV--------R
SctN_Chl4 --------------------------FNTLMTELPDV---------QLTAV--------V
===================================
...the difference is that the first blank line after the block must be
spaces (or consensus characters *:. etc.), not just a blank line.
Thanks for the hints!
Nick
Peter wrote:
> On Tue, Aug 5, 2008 at 10:39 PM, Nick Matzke <matzke at berkeley.edu> wrote:
>> Thanks for the help Peter, it really is a great tutorial!
>>
>> I've replaced just the ClustalIO.py file as you suggested, and it parses
>> both the example.aln and protein.aln files.
>
> Good :)
>
>> However I tried an ClustalW-formatted alignment file I made awhile ago with
>> my own data and still got the star_info error:
>>
>> AttributeError: Alignment instance has no attribute '_star_info'
>>
>> But my file could be weird. Does the _star_info error indicate alphabet
>> issues or something?
>
> The _star_info is a nasty private variable used to store the ClustalW
> consensus, used if writing the file back out again in clustal format.
> The error suggests something else has gone wrong with the consensus
> parsing... (and shouldn't be anything to do with the alphabet).
>
> Could you file a bug, and (after filing the bug) could you upload one
> of these example files to the bug as an attachment please?
>
> Peter
>
--
====================================================
Nicholas J. Matzke
Ph.D. student, Graduate Student Researcher
Huelsenbeck Lab
4151 VLSB (Valley Life Sciences Building)
Department of Integrative Biology
University of California, Berkeley
Lab website: http://ib.berkeley.edu/people/lab_detail.php?lab=54
Dept. personal page:
http://ib.berkeley.edu/people/students/person_detail.php?person=370
Lab personal page:
http://fisher.berkeley.edu/~edna/lab_test/members/matzke.html
Lab phone: 510-643-6299
Dept. fax: 510-643-6264
Cell phone: 510-301-0179
Email: matzke at berkeley.edu
Office hours for Bio1B, Spring 2008: Biology: Plants, Evolution, Ecology
VLSB 2013, Monday 1-1:30 (some TA there for all hours during work week)
Mailing address:
Department of Integrative Biology
3060 VLSB #3140
Berkeley, CA 94720-3140
====================================================
More information about the Biopython
mailing list