PDB to Fasta (was Re: [BioPython] PDB -> FASTA but the spam filter hates me)

Tue Nov 2 18:34:20 EST 2004

Trouble is, that the conversion here might not be good for a some 
purposes, as usually structure ->sequence conversion applications want 
(1) unique mapping and (2) a 20 letter alphabet + 'X' for everything else.

Gavin Crooks wrote:

> There is also a longer three letter code conversion table in 
> Bio/SCOP/Raf.py
> The PDB contains a whole bunch of weird 3 letter codes for different
> chemically modified amino acids.
>
> Another possibility is to get the fasta sequences directly from the
> ASTRAL database, since they have already grubbed around and done the
> conversion.
>
> Gavin
>
>
> # This table is taken from the RAF release notes, and includes the
> # undocumented mapping "UNK" -> "X"
> to_one_letter_code= {
>     'ALA':'A', 'VAL':'V', 'PHE':'F', 'PRO':'P', 'MET':'M',
>     'ILE':'I', 'LEU':'L', 'ASP':'D', 'GLU':'E', 'LYS':'K',
>     'ARG':'R', 'SER':'S', 'THR':'T', 'TYR':'Y', 'HIS':'H',
>     'CYS':'C', 'ASN':'N', 'GLN':'Q', 'TRP':'W', 'GLY':'G',
>     '2AS':'D', '3AH':'H', '5HP':'E', 'ACL':'R', 'AIB':'A',
>     'ALM':'A', 'ALO':'T', 'ALY':'K', 'ARM':'R', 'ASA':'D',
>     'ASB':'D', 'ASK':'D', 'ASL':'D', 'ASQ':'D', 'AYA':'A',
>     'BCS':'C', 'BHD':'D', 'BMT':'T', 'BNN':'A', 'BUC':'C',
>     'BUG':'L', 'C5C':'C', 'C6C':'C', 'CCS':'C', 'CEA':'C',
>     'CHG':'A', 'CLE':'L', 'CME':'C', 'CSD':'A', 'CSO':'C',
>     'CSP':'C', 'CSS':'C', 'CSW':'C', 'CXM':'M', 'CY1':'C',
>     'CY3':'C', 'CYG':'C', 'CYM':'C', 'CYQ':'C', 'DAH':'F',
>     'DAL':'A', 'DAR':'R', 'DAS':'D', 'DCY':'C', 'DGL':'E',
>     'DGN':'Q', 'DHA':'A', 'DHI':'H', 'DIL':'I', 'DIV':'V',
>     'DLE':'L', 'DLY':'K', 'DNP':'A', 'DPN':'F', 'DPR':'P',
>     'DSN':'S', 'DSP':'D', 'DTH':'T', 'DTR':'W', 'DTY':'Y',
>     'DVA':'V', 'EFC':'C', 'FLA':'A', 'FME':'M', 'GGL':'E',
>     'GLZ':'G', 'GMA':'E', 'GSC':'G', 'HAC':'A', 'HAR':'R',
>     'HIC':'H', 'HIP':'H', 'HMR':'R', 'HPQ':'F', 'HTR':'W',
>     'HYP':'P', 'IIL':'I', 'IYR':'Y', 'KCX':'K', 'LLP':'K',
>     'LLY':'K', 'LTR':'W', 'LYM':'K', 'LYZ':'K', 'MAA':'A',
>     'MEN':'N', 'MHS':'H', 'MIS':'S', 'MLE':'L', 'MPQ':'G',
>     'MSA':'G', 'MSE':'M', 'MVA':'V', 'NEM':'H', 'NEP':'H',
>     'NLE':'L', 'NLN':'L', 'NLP':'L', 'NMC':'G', 'OAS':'S',
>     'OCS':'C', 'OMT':'M', 'PAQ':'Y', 'PCA':'E', 'PEC':'C',
>     'PHI':'F', 'PHL':'F', 'PR3':'C', 'PRR':'A', 'PTR':'Y',
>     'SAC':'S', 'SAR':'G', 'SCH':'C', 'SCS':'C', 'SCY':'C',
>     'SEL':'S', 'SEP':'S', 'SET':'S', 'SHC':'C', 'SHR':'K',
>     'SOC':'C', 'STY':'Y', 'SVA':'S', 'TIH':'A', 'TPL':'W',
>     'TPO':'T', 'TPQ':'A', 'TRG':'K', 'TRO':'W', 'TYB':'Y',
>     'TYQ':'Y', 'TYS':'Y', 'TYY':'Y', 'AGM':'R', 'GL3':'G',
>     'SMC':'C', 'ASX':'B', 'CGU':'E', 'CSX':'C', 'GLX':'Z',
>     'UNK':'X'
>     }
>
> On Nov 2, 2004, at 13:51, Iddo wrote:
>
>> Welcome aboard, and I am glad we managed to save one  Jedi from the 
>> dark side of  bioinformatics...;)
>>
>> As to your question:  see the attached class. I should get that in 
>> Biopython, but I keep not doing that....
>>
>> ./I
>>
>>
>> Douglas Kojetin wrote:
>>
>>> Hi All-
>>>
>>> I'm a beginner @ biopython (and I'm 'switching' from perl to python 
>>> ...).  First off, many thanks for the structrual biopython FAQ ... 
>>> very helpful!  My question:  can anyone help me with some ideas on 
>>> how to whip up a quick PDB->FASTA (sequence) script?
>>>
>>> From the structural biopython faq, I've been able to extract residue 
>>> information in the form of:
>>>
>>>  <Residue MET het=  resseq=1 icode= >
>>>
>>> I take it I would just need to grab the MET (split the residue 
>>> object and grab the r[1] index?) and convert into M, then append to 
>>> a sequence string ....
>>>
>>> but I didn't know if biopython had something that did an 
>>> autoconversion of MET->M, or vice versa (M->MET).
>>>
>>> Thanks for the input,
>>> Doug
>>>
>>> _______________________________________________
>>> BioPython mailing list  -  BioPython at biopython.org
>>> http://biopython.org/mailman/listinfo/biopython
>>>
>>>
>>
>
> Gavin E. Crooks
> Postdoctoral Fellow                  tel:  (510) 642-9614
> 461 Koshland Hall                    aim:notastring
> University of California             http://threeplusone.com/
> Berkeley, CA 94720-3102, USA         gec at compbio.berkeley.edu
>
>
>

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 North Torrey Pines Road
La Jolla, CA 92037 USA
T: (858) 646 3100 x3516
F: (858) 713 9930
http://ffas.ljcrf.edu/~iddo