PDB to Fasta (was Re: [BioPython] PDB -> FASTA but the spam filter hates me)

Tue Nov 2 19:36:50 EST 2004

I believe it is for local Astral, not for over-the-web.

./I

Douglas Kojetin wrote:

> Thanks for all the quick and detailed responses!  Quite a motivation 
> to stick with (Bio)Python!
>
> I was unaware of the ASTRAL database -- that might be useful in the 
> future (i was looking to do this particular conversion on an 
> experimentally determined PDB ... not yet submitted).  Can BioPython 
> interact directly w/ the (remote) ASTRAL database?  Or is it something 
> I would need to have locally?
>
> Thanks again,
> Doug
>
>
> On Nov 2, 2004, at 6:34 PM, Iddo wrote:
>
>> Trouble is, that the conversion here might not be good for a some 
>> purposes, as usually structure ->sequence conversion applications 
>> want (1) unique mapping and (2) a 20 letter alphabet + 'X' for 
>> everything else.
>>
>>
>> Gavin Crooks wrote:
>>
>>> There is also a longer three letter code conversion table in 
>>> Bio/SCOP/Raf.py
>>> The PDB contains a whole bunch of weird 3 letter codes for different
>>> chemically modified amino acids.
>>>
>>> Another possibility is to get the fasta sequences directly from the
>>> ASTRAL database, since they have already grubbed around and done the
>>> conversion.
>>>
>>> Gavin
>>>
>>>
>>> # This table is taken from the RAF release notes, and includes the
>>> # undocumented mapping "UNK" -> "X"
>>> to_one_letter_code= {
>>>     'ALA':'A', 'VAL':'V', 'PHE':'F', 'PRO':'P', 'MET':'M',
>>>     'ILE':'I', 'LEU':'L', 'ASP':'D', 'GLU':'E', 'LYS':'K',
>>>     'ARG':'R', 'SER':'S', 'THR':'T', 'TYR':'Y', 'HIS':'H',
>>>     'CYS':'C', 'ASN':'N', 'GLN':'Q', 'TRP':'W', 'GLY':'G',
>>>     '2AS':'D', '3AH':'H', '5HP':'E', 'ACL':'R', 'AIB':'A',
>>>     'ALM':'A', 'ALO':'T', 'ALY':'K', 'ARM':'R', 'ASA':'D',
>>>     'ASB':'D', 'ASK':'D', 'ASL':'D', 'ASQ':'D', 'AYA':'A',
>>>     'BCS':'C', 'BHD':'D', 'BMT':'T', 'BNN':'A', 'BUC':'C',
>>>     'BUG':'L', 'C5C':'C', 'C6C':'C', 'CCS':'C', 'CEA':'C',
>>>     'CHG':'A', 'CLE':'L', 'CME':'C', 'CSD':'A', 'CSO':'C',
>>>     'CSP':'C', 'CSS':'C', 'CSW':'C', 'CXM':'M', 'CY1':'C',
>>>     'CY3':'C', 'CYG':'C', 'CYM':'C', 'CYQ':'C', 'DAH':'F',
>>>     'DAL':'A', 'DAR':'R', 'DAS':'D', 'DCY':'C', 'DGL':'E',
>>>     'DGN':'Q', 'DHA':'A', 'DHI':'H', 'DIL':'I', 'DIV':'V',
>>>     'DLE':'L', 'DLY':'K', 'DNP':'A', 'DPN':'F', 'DPR':'P',
>>>     'DSN':'S', 'DSP':'D', 'DTH':'T', 'DTR':'W', 'DTY':'Y',
>>>     'DVA':'V', 'EFC':'C', 'FLA':'A', 'FME':'M', 'GGL':'E',
>>>     'GLZ':'G', 'GMA':'E', 'GSC':'G', 'HAC':'A', 'HAR':'R',
>>>     'HIC':'H', 'HIP':'H', 'HMR':'R', 'HPQ':'F', 'HTR':'W',
>>>     'HYP':'P', 'IIL':'I', 'IYR':'Y', 'KCX':'K', 'LLP':'K',
>>>     'LLY':'K', 'LTR':'W', 'LYM':'K', 'LYZ':'K', 'MAA':'A',
>>>     'MEN':'N', 'MHS':'H', 'MIS':'S', 'MLE':'L', 'MPQ':'G',
>>>     'MSA':'G', 'MSE':'M', 'MVA':'V', 'NEM':'H', 'NEP':'H',
>>>     'NLE':'L', 'NLN':'L', 'NLP':'L', 'NMC':'G', 'OAS':'S',
>>>     'OCS':'C', 'OMT':'M', 'PAQ':'Y', 'PCA':'E', 'PEC':'C',
>>>     'PHI':'F', 'PHL':'F', 'PR3':'C', 'PRR':'A', 'PTR':'Y',
>>>     'SAC':'S', 'SAR':'G', 'SCH':'C', 'SCS':'C', 'SCY':'C',
>>>     'SEL':'S', 'SEP':'S', 'SET':'S', 'SHC':'C', 'SHR':'K',
>>>     'SOC':'C', 'STY':'Y', 'SVA':'S', 'TIH':'A', 'TPL':'W',
>>>     'TPO':'T', 'TPQ':'A', 'TRG':'K', 'TRO':'W', 'TYB':'Y',
>>>     'TYQ':'Y', 'TYS':'Y', 'TYY':'Y', 'AGM':'R', 'GL3':'G',
>>>     'SMC':'C', 'ASX':'B', 'CGU':'E', 'CSX':'C', 'GLX':'Z',
>>>     'UNK':'X'
>>>     }
>>>
>>> On Nov 2, 2004, at 13:51, Iddo wrote:
>>>
>>>> Welcome aboard, and I am glad we managed to save one  Jedi from the 
>>>> dark side of  bioinformatics...;)
>>>>
>>>> As to your question:  see the attached class. I should get that in 
>>>> Biopython, but I keep not doing that....
>>>>
>>>> ./I
>>>>
>>>>
>>>> Douglas Kojetin wrote:
>>>>
>>>>> Hi All-
>>>>>
>>>>> I'm a beginner @ biopython (and I'm 'switching' from perl to 
>>>>> python ...).  First off, many thanks for the structrual biopython 
>>>>> FAQ ... very helpful!  My question:  can anyone help me with some 
>>>>> ideas on how to whip up a quick PDB->FASTA (sequence) script?
>>>>>
>>>>> From the structural biopython faq, I've been able to extract 
>>>>> residue information in the form of:
>>>>>
>>>>>  <Residue MET het=  resseq=1 icode= >
>>>>>
>>>>> I take it I would just need to grab the MET (split the residue 
>>>>> object and grab the r[1] index?) and convert into M, then append 
>>>>> to a sequence string ....
>>>>>
>>>>> but I didn't know if biopython had something that did an 
>>>>> autoconversion of MET->M, or vice versa (M->MET).
>>>>>
>>>>> Thanks for the input,
>>>>> Doug
>>>>>
>>>>> _______________________________________________
>>>>> BioPython mailing list  -  BioPython at biopython.org
>>>>> http://biopython.org/mailman/listinfo/biopython
>>>>>
>>>>>
>>>>
>>>
>>> Gavin E. Crooks
>>> Postdoctoral Fellow                  tel:  (510) 642-9614
>>> 461 Koshland Hall                    aim:notastring
>>> University of California             http://threeplusone.com/
>>> Berkeley, CA 94720-3102, USA         gec at compbio.berkeley.edu
>>>
>>>
>>>
>>
>>
>> -- 
>> Iddo Friedberg, Ph.D.
>> The Burnham Institute
>> 10901 North Torrey Pines Road
>> La Jolla, CA 92037 USA
>> T: (858) 646 3100 x3516
>> F: (858) 713 9930
>> http://ffas.ljcrf.edu/~iddo
>>
>
> _______________________________________________
> BioPython mailing list  -  BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
>
>

-- 
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 North Torrey Pines Road
La Jolla, CA 92037 USA
T: (858) 646 3100 x3516
F: (858) 713 9930
http://ffas.ljcrf.edu/~iddo