[Bioperl-l] ID mapping (or: contributing to BioPerl)

Farkas, Illes fij at elte.hu
Sun May 30 09:32:58 UTC 2010


Hi,

I've ran across a relatively simple, but specific task. I would like to put
interaction (<protein_A>, <protein_B>, <PubMed_ID>) data from many sources
(databases) into a single list containing the following in each record:
<UniProt_primary_AC_of_A>, <UniProt_primary_AC_of_B>, <PubMed_ID>,
<name_of_source_db>. (I am aware that there will be some loss during the ID
conversion.)

I have found so far the following possibilities:

(1) BioMart perl API. Seems to be much smarter (and more complex) than what
I would need. Also, I would need to parse input and output just as much as
with newly written subroutines/modules.

(2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and
KEGG IDs, but I could not find them on the "From" list.

(3) Synergizer. I cannot run it in remote batch mode. From what I would need
I could not find BioGrid, ENSP and KEGG identifiers.

(4) Writing it all with ID mapping files downloaded from each database and
contributing it to BioPerl. How can I contribute? How do I find the best
place within BioPerl to add a particular module? Whom do I need to ask for
approval?

Thanks in advance for any comments.
Illes

-- 
http://hal.elte.hu/fij



More information about the Bioperl-l mailing list