[BioPython] Comment/Suggestion about Bio.PDB.Polypeptide class. How
to keep gaps information ?
Julie Bernauer
julie.bernauer at ibbmc.u-psud.fr
Tue May 24 13:26:13 EDT 2005
Hello
Let's imagine we want a fasta file or a seq object containing gaps
describing the amino acids that are present in a structure :
Ex : 1t6b chain X
Using this code :
for pp in ppd.build_peptides(structure[0][X]):
print pp
We get :
<Polypeptide start=16 end=158>
<Polypeptide start=175 end=275>
<Polypeptide start=288 end=303>
<Polypeptide start=320 end=735>
If we want to bind those peptides together, let's try to define an empty
polypeptide :
pp1=Polypeptide.Polypeptide([])
and extend it with the peptides we get :
pp1=Polypeptide.Polypeptide([])
for pp in ppd.build_peptides(structurecomplex[0][chaineR]):
pp1.extend(pp)
print pp1
seq=pp1.get_sequence()
print seq.tostring()
We have :
<Polypeptide start=16 end=735>
SQGLLGYYFSDLNFQAPMVVTSSTTGDLSIPSSELENIPSENQYFQSAIWSGFIKVKKSDEYTFATSADNHVTMWVDDQEVINKASNSNKIRLEKGRLYQIKIQYQRENPTEKGLDFKLYWTDSQNKKEVISSDNLQLPELKQVPDRDNDGIPDSLEVEGYTVDVKNKRTFLSPWISNIHEKKGLTKYKSSPEKWSTASDPYSDFEKVTGRIDKNVSPEARHPLVAAYPIVHVDMENIILSKNETISKNTSTSRTHTSEVVSAGFSNSNSSTVAIDHSLSLAGERTWAETMGLNTADTARLNANIRYVNTGTAPIYNVLPTTSLVLGKNQTLATIKAKENQLSQILAPNNYYPSKNLAPIALNAQDDFSSTPITMNYNQFLELEKTKQLRLDTDQVYGNIATYNFENGRVRVDTGSNWSEVLPQIQETTARIIFNGKDLNLVERRIAAVNPSDPLETTKPDMTLKEALKIAFGFNEPNGNLQYQGKDITEFDFNFDQQTSQNIKNQLAELNATNIYTVLDKIKLNAKMNILIRDKRFHYDRNNIAVGADESVVKEAHREVINSSTEGLLLNIDKDIRKILSGYIVEIEDTEGLKEVINDRYDMLNISSLRQDGKTFIDFKKYNDKLPLYISNPNYKVNVYAVTKENTIINPSENGDTSTNGIKKILIFSKKGYEIG
i.e.: We totally lose the information of gaps. "pp1" still contains this
information but cannot give it to "seq" even if using the gapped
alphabet.
I know it would be possible to get it from an iteration on residue from
the structure. However, I think it would be better to fill gap with an
'X' or a '-' while doing pp1.get_sequence(). I mean changing the method
get_sequence to handle this case.
Instead of :
for res in self:
resname=res.get_resname()
if to_one_letter_code.has_key(resname):
resname=to_one_letter_code[resname]
else:
resname='X'
s=s+resname
I think would be nice to iterate over resseq
What do you think ?
--
Julie BERNAUER
Equipe de Génomique Structurale http://www.genomics.eu.org
IBBMC - UMR 8619 - U.P.S. Bât.430 Tel. : +33 1 69 15 31 57
91405 Orsay - FRANCE Fax. : +33 1 69 85 37 15
More information about the BioPython
mailing list