[Bioperl-l] ID mapping (or: contributing to BioPerl)
Chris Fields
cjfields at illinois.edu
Sun May 30 15:05:37 UTC 2010
On May 30, 2010, at 4:32 AM, Farkas, Illes wrote:
> Hi,
>
> I've ran across a relatively simple, but specific task. I would like to put
> interaction (<protein_A>, <protein_B>, <PubMed_ID>) data from many sources
> (databases) into a single list containing the following in each record:
> <UniProt_primary_AC_of_A>, <UniProt_primary_AC_of_B>, <PubMed_ID>,
> <name_of_source_db>. (I am aware that there will be some loss during the ID
> conversion.)
>
> I have found so far the following possibilities:
>
> (1) BioMart perl API. Seems to be much smarter (and more complex) than what
> I would need. Also, I would need to parse input and output just as much as
> with newly written subroutines/modules.
Or, wondering whether you could create a set of BioPerl<->BioMart bridge modules.
> (2) UniProt.org -> ID mapping. I would need to convert BioGrid, HPRD and
> KEGG IDs, but I could not find them on the "From" list.
I added an id_mapper to Bio::DB::SwissProt that calls to this. It hasn't been broadly tested yet, but you are welcome to add more to it.
Might also be useful to have a DB wrapper around a locally-built ID mapping database, which would give you more flexibility than the web interface.
> (3) Synergizer. I cannot run it in remote batch mode. From what I would need
> I could not find BioGrid, ENSP and KEGG identifiers.
>
> (4) Writing it all with ID mapping files downloaded from each database and
> contributing it to BioPerl. How can I contribute? How do I find the best
> place within BioPerl to add a particular module? Whom do I need to ask for
> approval?
>
> Thanks in advance for any comments.
> Illes
A generalized ID mapping interface would be nice. You could also incorporate some of NCBI's eutils stuff along these lines, or their gi2acc mappings.
chris
More information about the Bioperl-l
mailing list