[Bioperl-l] problems parsing EBI interposscan.xml

Mariano Latorre A malatorr at genoma.ciencias.uchile.cl
Fri Dec 3 13:17:06 EST 2004


I installed Bioperl 1.4 (also installed dependencies Heap and Graph). I
need to parser interproscan xml reports. 

When I run "make test" it passed the Interproscan_parser test ok. But
when I perform a Interproscan at EBI I get a XML that can not be parsed.
Bioperl says:

Can't call method "identifier" on an undefined value at
/usr/lib/perl5/site_perl/5.8.3/Bio/Ontology/SimpleOntologyEngine.pm line
410.


So I go to the test directory inside the bioperl installation and check
the differences and the xml generated by EBI and the one provided by
bioperl installation package and notice that they are totally
different!!!

I paste both file beginings (as you'll see they uses different tags...):

Thanks!
Mariano


1.- the one provided for bioperl testing:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!-- edited with XML Spy v4.4 U (http://www.xmlspy.com) by LYNN WHITE
(EMBL OUTSTATION THE EBI) -->
<!DOCTYPE interprodb SYSTEM "interpro.dtd">
<interprodb>
  <release>
    <dbinfo dbname="INTERPRO" version="5.1" entry_count="5630"
file_date="12-JUL-2002 00:00:00"/>
    <dbinfo dbname="SWISS" version="40.22" entry_count="110823"
file_date="24-JUN-2002 00:00:00"/>
    <dbinfo dbname="TREMBL" version="21.2" entry_count="671586"
file_date="05-JUL-2002 00:00:00"/>
    <dbinfo dbname="PRINTS" version="33.0" entry_count="1650"
file_date="24-JAN-2002 00:00:00"/>
    <dbinfo dbname="PREFILE" version="N/A" entry_count="252"
file_date="18-JUL-2001 00:00:00"/>
    <dbinfo dbname="PROSITE" version="17.5" entry_count="1565"
file_date="21-JUN-2002 00:00:00"/>
    <dbinfo dbname="PFAM" version="7.3" entry_count="3865"
file_date="17-MAY-2002 00:00:00"/>
    <dbinfo dbname="PRODOM" version="2001.3" entry_count="1346"
file_date="28-JAN-2002 00:00:00"/>
    <dbinfo dbname="SMART" version="3.1" entry_count="509"
file_date="16-NOV-2000 00:00:00"/>
    <dbinfo dbname="TIGRFAMs" version="1.2" entry_count="814"
file_date="03-AUG-2001 00:00:00"/>
  </release>
  <interpro id="IPR000001" type="Domain" short_name="Kringle"
protein_count="129">
    <name>Kringle</name>
    <abstract>
Kringles are autonomous structural domains, found throughout the blood
               clotting and fibrinolytic proteins.
Kringle domains are believed to play a role in binding mediators (e.g.,
membranes,
other proteins or phospholipids), and in the regulation of proteolytic
activity
<cite idref="PUB00002414"/>, <cite idref="PUB00001541"/>, <cite
idref="PUB00003257"/>.
Kringle domains <cite idref="PUB00003400"/>, <cite
idref="PUB00000803"/>, <cite idref="PUB00001620"/> are characterised by
a triple loop, 3-disulphide bridge structure, whose  conformation is
defined by a number of hydrogen bonds and small pieces of  anti-parallel
beta-sheet. They are found in a varying number  of  copies,  in some
serine proteases and
plasma proteins.</abstract>
    <example_list>
      <example><db_xref dbkey="P00748" db="SWISS"/>Blood coagulation
factor XII (Hageman factor) (1 copy)</example>
      <example><db_xref dbkey="P00749" db="SWISS"/>Urokinase-type
plasminogen activator (1 copy)</example>
      <example><db_xref dbkey="Q08048" db="SWISS"/>Hepatocyte growth
factor (HGF) (4 copies)</example>
      <example><db_xref dbkey="Q04756" db="SWISS"/>Hepatocyte growth
factor activator <cite idref="PUB00003400"/> (1 copy) <cite
idref="PUB00002776"/></example>
      <example>
                                <db_xref dbkey="P06867" db="SWISS"/>
Plasminogen (5 copies)
      </example>
      <example>
                                <db_xref dbkey="P26927" db="SWISS"/>
Hepatocyte growth factor like protein (4 copies) <cite
idref="PUB00000355"/>





2.- The ouptput from EBI INTERPRO:

<?xml version="1.0" encoding="ISO-8859-1"?>
<EBIInterProScanResults>
        <Header>
                <program name="InterProScan" version="4.0"
citation="PMID:11590104" />
                <parameters>
                        <sequences total="1" />
                        <databases total="11">
                                <database number="1" name="PRODOM"
type="sequences" />
                                <database number="2" name="PRINTS"
type="matrix" />
                                <database number="3" name="PIR"
type="model" />
                                <database number="4" name="PFAM"
type="model" />
                                <database number="5" name="SMART"
type="model" />
                                <database number="6" name="TIGRFAMs"
type="model" />
                                <database number="7" name="PROFILE"
type="strings" />
                                <database number="8" name="PROSITE"
type="strings" />
                                <database number="9" name="SUPERFAMILY"
type="model" />
                                <database number="10" name="SIGNALP"
type="model" />
                                <database number="11" name="TMHMM"
type="model" />
                        </databases>
                </parameters>
        </Header>
<interpro_matches>
 
   <protein id="SAM" length="393" crc64="847CBC4BD0EAA1BC" >
        <interpro id="IPR002133" name="S-adenosylmethionine synthetase"
type="Family">
          <classification id="GO:0004478" class_type="GO">
            <category>Molecular Function</category>
            <description>methionine adenosyltransferase
activity</description>
          </classification>
          <classification id="GO:0005524" class_type="GO">
            <category>Molecular Function</category>
            <description>ATP binding</description>
          </classification>
          <classification id="GO:0006730" class_type="GO">
            <category>Biological Process</category>
            <description>one-carbon compound metabolism</description>
          </classification>
          <match id="PIRSF000497" name="Methionine adenosyltransferase"
dbname="PIR">
            <location start="2" end="387" score="2.6e-224" status="T"
evidence="HMMPIR" />
          </match>
          <match id="PF00438.9" name="S-adenosylmethionine synthetase,
N-te" dbname="PFAM">
            <location start="2" end="102" score="2.7e-63" status="T"
evidence="HMMPfam" />
          </match>
          <match id="PF02772.5" name="S-adenosylmethionine synthetase,
cent" dbname="PFAM">
            <location start="116" end="238" score="5.1e-97" status="T"
evidence="HMMPfam" />
          </match>
          <match id="PF02773.5" name="S-adenosylmethionine synthetase,
C-te" dbname="PFAM">
            <location start="240" end="382" score="2e-83" status="T"
evidence="HMMPfam" />
          </match>
          <match id="TIGR01034" name="metK: S-adenosylmethionine
synthetase" dbname="TIGRFAMs">
            <location start="5" end="393" score="6.2e-232" status="T"
evidence="HMMTigr" />






More information about the Bioperl-l mailing list