[Bioperl-l] Homologene parser?

Siddhartha Basu basu at pharm.sunysb.edu
Tue Aug 14 15:02:06 UTC 2007

neeti somaiya wrote:
> Hi Andrew,
> I think the homologene data files have changed now on the ftp, from what you
> had used.
> It is now homologene.data and homologene.xml.
> I tried using your parser, but because it was written on the file
> hmlg.trip.ftp, it doesnt work anymore.
> I came across a parser
> http://bioinformatics.tgen.org/brunit/software/bioparser/docs/pod_bio_parser_homologene_fileparser_pm.shtml
> .
> I am looking at it to see if it works for me. NOt sure if it will.
> ~Neeti.

Hi Neeti,
I have recently written a parser for 'homologene' xml data specific for 
my purpose. I am not sure whether it will suit your purpose but it could 
be extended for general purpose parsing, so i am putting it forward. 
Here is how it works .......

* It only parses a single homologene entry <HG-Entry>.....</HG-Entry>.
* It does SAX based parsing (currently uses XML::SAX::ExpatXS)
* Returns a graph(uses Graph module of perl) object where each node is a 
homologue entry with its corresponding entrez gene id. Each node also 
contain the following attributes ...
	* Refseq protein id.
	* Protein id (pid)
	* ncbi taxon id.
* The edge attribute contain information about the ortholog(true/false) 
relationship between two nodes.
* The rest of tags currently are not being extracted. However, parsing 
the rest of the tags should not be very difficult.

Generally i get homologene xml stream from an 'efetch' through 
Bio::DB::EUtilities, feed it to the parser, gets back 'Graph' object and 
then works on it.

So, to make it more generic and work on local file

* We need another class that reads the chunk between 
<HG-Entry>.....</HG-Entry> and sends it to the parser.
* Add supports for most of the tags.
* Massage the data to a bioperl compatible object.

The first two i could work it out and for the last one i have to figure 
out the bioperl object that could be suitable (like  Bio::Cluster or 

Let me know if it sounds interesting and i will send you the code.


> On 8/14/07, Andrew Macgregor <amacgregor at ccg.murdoch.edu.au> wrote:
>> On 13/08/2007, at 6:29 PM, neeti somaiya wrote:
>>> Hi,
>>> Does anyone know of any Homologene parser, if available?
>>> Please let me know.
>>> Thanks and Regards,
>>> Neeti.
>> Hi Neeti,
>> Quite a long time ago now I wrote an Homologene parser and posted it
>> to the mailing list:
>> <http://www.bioperl.org/pipermail/bioperl-l/2002-February/007288.html>
>> I don't know if this still works but you could use it as a starting
>> point. There may also be something newer out there too, I don't know.
>> If you search the mailing list archives you'll get a few messages
>> around the topic.
>> Cheers, Andrew.
>> Andrew Macgregor
>> Centre for Comparative Genomics, Murdoch University
>> Email: amacgregor at ccg.murdoch.edu.au
>> Tel: (08) 9360 2961

More information about the Bioperl-l mailing list