[Biopython-dev] PIR parsing

Andrew Dalke dalke at acm.org
Sat Dec 9 02:29:11 EST 2000


Forgot to ask,

  What is the point of having both the "ref" and "dat" format
in PIR.

ref format example:

>P1;I52708
ELAV-like neuronal protein 1, truncated splice form - human
N;Alternate names: Drosophila ELAV(embryonic lethal, abnormal vision)-like
4; Hu a
ntigen D; paraneoplastic encephalomyelitis antigen
C;Species: Homo sapiens (man)

dat format example:

ENTRY           I52708  #type complete
TITLE           ELAV-like neuronal protein 1, truncated splice form - human
ALTERNATE_NAMES Drosophila ELAV(embryonic lethal, abnormal vision)-like 4;
                Hu antigen D; paraneoplastic encephalomyelitis antigen
ORGANISM        #formal_name Homo sapiens #common_name man


As far as I can tell, the ref format is easier to machine parse
than the dat one, and is more compact.  The dat format is easier
for a human to scan.  Also, the dat format contains the sequence
information while the ref one does not.

Can anyone here provide to me some background?

                    Andrew





More information about the Biopython-dev mailing list