[Biopython] Generating a fasta file from atomic coordinate file

Wed Mar 21 07:27:44 UTC 2018

Hi

The mmCIF coordinate file has this information readily available in the 
_entity_poly category either with non-standard residues in brackets 
(pdbx_seq_one_letter_code) or with non-standard residues having the one 
letter code of their parent (i.e. MSE -> MET) pdbx_seq_one_letter_code_can

I do not know if this is available in biopython's mmCIF parser.

Please note that PDB format files are not available for every entry in 
the PDB due to limitations of the format. However, mmCIF files are 
available for every entry.

Regards

John

On 20/03/2018 23:23, Peter Cock wrote:
> That is using the 3D structure to get the protein sequence
> (using the PDB parser and NumPy as a dependency), and
> the code to call it can be shortened to just:
>
> from Bio import SeqIO
> SeqIO.convert("input.pdb", "pdb-atom", "output.fasta", "fasta")
>
> Or, if you just want the sequence in the SEQRES header:
>
> from Bio import SeqIO
> SeqIO.convert("input.pdb", "pdb-seqres", "output.fasta", "fasta")
>
> See:
>
> http://biopython.org/wiki/SeqIO
>
> Peter
>
> On Tue, Mar 20, 2018 at 10:05 PM, João Rodrigues
> <j.p.g.l.m.rodrigues at gmail.com> wrote:
>> Hi Ahmad,
>>
>> You can use Bio.Seq directly on the PDB file:
>>
>> from Bio import SeqIO
>> records = SeqIO.parse('1xyz.pdb', 'pdb-atom'):
>> with open('1xyz.fasta', 'w') as handle:
>>      SeqIO.write(records, handle, "fasta")
>>
>> Not sure if there is a way to couple SeqIO directly to the Bio.PDB code (a
>> method that allows to read the sequence from the SMCRA object), that would
>> be cool to add.
>>
>> Cheers,
>>
>> João
>>
>> 2018-03-20 12:47 GMT-07:00 Jared Adolf-Bryfogle <jadolfbr at gmail.com>:
>>> Hey Ahmad,
>>>
>>> I have a script called get_seq.py in the bio-jade module, which uses
>>> BioPython.
>>>
>>> pip install bio-jade.
>>>
>>> The script will be installed to your path and you can use get_seq.py
>>> --help for more info. You may need to source your bashrc/profile afterward
>>> or open a new terminal to see it.
>>>
>>> If you have any issues, please let me know.  I may need to but out a new
>>> version.
>>>
>>>
>>> https://bio-jade.readthedocs.io/en/latest/apps_public_api/apps.public.general.html#get-seq-py
>>>
>>> https://github.com/SchiefLab/Jade
>>>
>>> -Jared
>>>
>>>
>>> Jared Adolf-Bryfogle, Ph.D.
>>> Research Associate
>>> Lab of Dr. William Schief
>>> The Scripps Research Institute
>>>
>>> On Tue, Mar 20, 2018 at 1:42 PM, Ahmad Abdelzaher <underoath006 at gmail.com>
>>> wrote:
>>>> Hello,
>>>>
>>>> Can I generate a properly formatted fasta sequence from the atomic
>>>> coordinates of a pdb file? I sort of know how to code it, but hopefully
>>>> there's some ready method in one of Biopython's modules that can do that.
>>>>
>>>> Any other suggestions?
>>>>
>>>> Regards.
>>>>
>>>> _______________________________________________
>>>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>>>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>>
>>>
>>> _______________________________________________
>>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>
>>
>> _______________________________________________
>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython

-- 
John Berrisford
PDBe
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Genome Campus
Hinxton
Cambridge CB10 1SD UK
Tel: +44 1223 492529

https://www.pdbe.org
https://www.facebook.com/proteindatabank
https://twitter.com/PDBeurope