[EMBOSS] extracting sequence from a pdb file

Jon Ison jison at ebi.ac.uk
Wed Nov 5 12:24:28 UTC 2008

Hi Perdeep

You can use the program pdbparse from the "structure" EMBASSY package:

It will read ATOM / SEQRES records and write the true (corrected) molecular
sequence (which is not an easy task!) to a file in a "clean" (easy to parse)
format, which other EMBASSY packages can then make use of.  There are various
options for controlling how it handles e.g. missing atoms, mismatches, non-amino
acid groups etc. in the original file.

It's been a long while since I tested the program on the whole of PDB though
so there are probably files it will fail for.

If you need to extract the sequence into any specified sequence format then
we could write a program for that (or something else) for the next release.
Let me know.



> Hi,
> Does anyone know if there is a program in EMBOSS that can extract protein sequence from a pdb format file?
> thanks,
> perdeep
> Perdeep K. Mehta, PhD
> Research Informatics, Information Sciences Division
> St. Jude Children's Research Hospital
> 262 Danny Thomas Place
> Memphis, TN 38105-2794
> Tel: 901-495 3774
> http://www.hatwellcenter.org/
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss

More information about the EMBOSS mailing list