[Bioperl-l] BioPerl parse interproscan xml not working

blpapery at gmail.com blpapery at gmail.com
Wed Nov 6 15:35:42 UTC 2013


Hi all,

I have been trying to use Bio::SeqIO to parse an XML interproscan result 
(XML version 1.0 is what interproscan outputs),
but I keep getting the following error:

no element found at line 24, column 0, byte 1421 at 
/System/Library/Perl/Extras/5.10.0/darwin-thread-multi-2level/XML/Parser.pm 
line 187

My code is below:

use Bio::SeqIO;

$io = Bio::SeqIO->new(-format => "interpro",-file   => "ipr.xml");

  while ($seq = $io->next_seq) {
    print $seq->accession; # trying to print out anything here
  }


XML file is shown below:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<protein-matches 
xmlns="http://www.ebi.ac.uk/interpro/resources/schemas/interproscan5">
    <protein>
        <sequence 
md5="d95d12290aaa87a91f47d25299cfb6ce">MKYKHLILSLSLIMLGPLAHAEEIGSVDTVFKMIGPDHKIVVEAFDDPDVKNVTCYVSRAKTGGIKGGLGLAEDTSDAAISCQQVGPIELSDRIKNGKAQGEVVFKKRTSLVFKSLQVVRFY
DAKRNALAYLAYSDKVVEGSPKNAISAVPVMPWRQ</sequence>
        <xref id="ecoli_3"/>
        <matches>
            <hmmer3-match evalue="1.0E-57" score="193.0">
                <signature ac="PF05981" desc="CreA protein" name="CreA">
                    <entry ac="IPR010292" desc="Uncharacterised protein 
family CreA" name="Uncharacterised_CreA" type="FAMILY"/>
                    <models>
                        <model ac="PF05981" desc="CreA protein" 
name="CreA"/>
                    </models>
                    <signature-library-release library="PFAM" 
version="27.0"/>
                </signature>
                <locations>
                    <hmmer3-location env-end="157" env-start="24" 
score="192.8" evalue="1.2E-57" hmm-start="1" hmm-end="128" hmm-length="0" 
start="24" end="156"/>
                </locations>
            </hmmer3-match>
        </matches>
    </protein>
</protein-matches>




Thanks in advance for your help.

Ben



More information about the Bioperl-l mailing list