PDB to Fasta (was Re: [BioPython] PDB -> FASTA but the spam filter
hates me)
Iddo
idoerg at burnham.org
Tue Nov 2 18:34:20 EST 2004
Trouble is, that the conversion here might not be good for a some
purposes, as usually structure ->sequence conversion applications want
(1) unique mapping and (2) a 20 letter alphabet + 'X' for everything else.
Gavin Crooks wrote:
> There is also a longer three letter code conversion table in
> Bio/SCOP/Raf.py
> The PDB contains a whole bunch of weird 3 letter codes for different
> chemically modified amino acids.
>
> Another possibility is to get the fasta sequences directly from the
> ASTRAL database, since they have already grubbed around and done the
> conversion.
>
> Gavin
>
>
> # This table is taken from the RAF release notes, and includes the
> # undocumented mapping "UNK" -> "X"
> to_one_letter_code= {
> 'ALA':'A', 'VAL':'V', 'PHE':'F', 'PRO':'P', 'MET':'M',
> 'ILE':'I', 'LEU':'L', 'ASP':'D', 'GLU':'E', 'LYS':'K',
> 'ARG':'R', 'SER':'S', 'THR':'T', 'TYR':'Y', 'HIS':'H',
> 'CYS':'C', 'ASN':'N', 'GLN':'Q', 'TRP':'W', 'GLY':'G',
> '2AS':'D', '3AH':'H', '5HP':'E', 'ACL':'R', 'AIB':'A',
> 'ALM':'A', 'ALO':'T', 'ALY':'K', 'ARM':'R', 'ASA':'D',
> 'ASB':'D', 'ASK':'D', 'ASL':'D', 'ASQ':'D', 'AYA':'A',
> 'BCS':'C', 'BHD':'D', 'BMT':'T', 'BNN':'A', 'BUC':'C',
> 'BUG':'L', 'C5C':'C', 'C6C':'C', 'CCS':'C', 'CEA':'C',
> 'CHG':'A', 'CLE':'L', 'CME':'C', 'CSD':'A', 'CSO':'C',
> 'CSP':'C', 'CSS':'C', 'CSW':'C', 'CXM':'M', 'CY1':'C',
> 'CY3':'C', 'CYG':'C', 'CYM':'C', 'CYQ':'C', 'DAH':'F',
> 'DAL':'A', 'DAR':'R', 'DAS':'D', 'DCY':'C', 'DGL':'E',
> 'DGN':'Q', 'DHA':'A', 'DHI':'H', 'DIL':'I', 'DIV':'V',
> 'DLE':'L', 'DLY':'K', 'DNP':'A', 'DPN':'F', 'DPR':'P',
> 'DSN':'S', 'DSP':'D', 'DTH':'T', 'DTR':'W', 'DTY':'Y',
> 'DVA':'V', 'EFC':'C', 'FLA':'A', 'FME':'M', 'GGL':'E',
> 'GLZ':'G', 'GMA':'E', 'GSC':'G', 'HAC':'A', 'HAR':'R',
> 'HIC':'H', 'HIP':'H', 'HMR':'R', 'HPQ':'F', 'HTR':'W',
> 'HYP':'P', 'IIL':'I', 'IYR':'Y', 'KCX':'K', 'LLP':'K',
> 'LLY':'K', 'LTR':'W', 'LYM':'K', 'LYZ':'K', 'MAA':'A',
> 'MEN':'N', 'MHS':'H', 'MIS':'S', 'MLE':'L', 'MPQ':'G',
> 'MSA':'G', 'MSE':'M', 'MVA':'V', 'NEM':'H', 'NEP':'H',
> 'NLE':'L', 'NLN':'L', 'NLP':'L', 'NMC':'G', 'OAS':'S',
> 'OCS':'C', 'OMT':'M', 'PAQ':'Y', 'PCA':'E', 'PEC':'C',
> 'PHI':'F', 'PHL':'F', 'PR3':'C', 'PRR':'A', 'PTR':'Y',
> 'SAC':'S', 'SAR':'G', 'SCH':'C', 'SCS':'C', 'SCY':'C',
> 'SEL':'S', 'SEP':'S', 'SET':'S', 'SHC':'C', 'SHR':'K',
> 'SOC':'C', 'STY':'Y', 'SVA':'S', 'TIH':'A', 'TPL':'W',
> 'TPO':'T', 'TPQ':'A', 'TRG':'K', 'TRO':'W', 'TYB':'Y',
> 'TYQ':'Y', 'TYS':'Y', 'TYY':'Y', 'AGM':'R', 'GL3':'G',
> 'SMC':'C', 'ASX':'B', 'CGU':'E', 'CSX':'C', 'GLX':'Z',
> 'UNK':'X'
> }
>
> On Nov 2, 2004, at 13:51, Iddo wrote:
>
>> Welcome aboard, and I am glad we managed to save one Jedi from the
>> dark side of bioinformatics...;)
>>
>> As to your question: see the attached class. I should get that in
>> Biopython, but I keep not doing that....
>>
>> ./I
>>
>>
>> Douglas Kojetin wrote:
>>
>>> Hi All-
>>>
>>> I'm a beginner @ biopython (and I'm 'switching' from perl to python
>>> ...). First off, many thanks for the structrual biopython FAQ ...
>>> very helpful! My question: can anyone help me with some ideas on
>>> how to whip up a quick PDB->FASTA (sequence) script?
>>>
>>> From the structural biopython faq, I've been able to extract residue
>>> information in the form of:
>>>
>>> <Residue MET het= resseq=1 icode= >
>>>
>>> I take it I would just need to grab the MET (split the residue
>>> object and grab the r[1] index?) and convert into M, then append to
>>> a sequence string ....
>>>
>>> but I didn't know if biopython had something that did an
>>> autoconversion of MET->M, or vice versa (M->MET).
>>>
>>> Thanks for the input,
>>> Doug
>>>
>>> _______________________________________________
>>> BioPython mailing list - BioPython at biopython.org
>>> http://biopython.org/mailman/listinfo/biopython
>>>
>>>
>>
>
> Gavin E. Crooks
> Postdoctoral Fellow tel: (510) 642-9614
> 461 Koshland Hall aim:notastring
> University of California http://threeplusone.com/
> Berkeley, CA 94720-3102, USA gec at compbio.berkeley.edu
>
>
>
--
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 North Torrey Pines Road
La Jolla, CA 92037 USA
T: (858) 646 3100 x3516
F: (858) 713 9930
http://ffas.ljcrf.edu/~iddo
More information about the BioPython
mailing list