[Bioperl-l] ID mapping (or: contributing to BioPerl)

Sun May 30 15:05:37 UTC 2010

On May 30, 2010, at 4:32 AM, Farkas, Illes wrote:

> Hi,
> 
> I've ran across a relatively simple, but specific task. I would like to put
> interaction (<protein_A>, <protein_B>, <PubMed_ID>) data from many sources
> (databases) into a single list containing the following in each record:
> <UniProt_primary_AC_of_A>, <UniProt_primary_AC_of_B>, <PubMed_ID>,
> <name_of_source_db>. (I am aware that there will be some loss during the ID
> conversion.)
> 
> I have found so far the following possibilities:
> 
> (1) BioMart perl API. Seems to be much smarter (and more complex) than what
> I would need. Also, I would need to parse input and output just as much as
> with newly written subroutines/modules.

Or, wondering whether you could create a set of BioPerl<->BioMart bridge modules.

> (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and
> KEGG IDs, but I could not find them on the "From" list.

I added an id_mapper to Bio::DB::SwissProt that calls to this.  It hasn't been broadly tested yet, but you are welcome to add more to it.

Might also be useful to have a DB wrapper around a locally-built ID mapping database, which would give you more flexibility than the web interface.

> (3) Synergizer. I cannot run it in remote batch mode. From what I would need
> I could not find BioGrid, ENSP and KEGG identifiers.
> 
> (4) Writing it all with ID mapping files downloaded from each database and
> contributing it to BioPerl. How can I contribute? How do I find the best
> place within BioPerl to add a particular module? Whom do I need to ask for
> approval?
> 
> Thanks in advance for any comments.
> Illes

A generalized ID mapping interface would be nice.  You could also incorporate some of NCBI's eutils stuff along these lines, or their gi2acc mappings. 

chris