[Bioperl-l] Homologene parsing

Tue Oct 7 07:19:33 EDT 2003

Hello,
    I'm writing some code to access data from NCBI's homologene dataset
of reciprocal best blast matches
 (see http://www.ncbi.nlm.nih.gov/HomoloGene/details.shtml) and am
thinking of Bioperlifying it.
At the moment I just have code for trawling through the file,  parsing
individual entries and accessing their attributes
     e.g.,

            my @orthologueget_orthologous_accessions ()
             my @locus_ids     =get_orthologous_locusIds()
             my $cluster_size = orthologue_count()
           my $boolean =  has_orthologue('mouse') etc

    but to be useful for accessing clusters quickly I guess it needs
some sort of indexing - there are about 21 000 entries. I don't know
much about databases /indexing etc but would inheriting from
Bio::Index::Abstract be the way to go,, in a similar way to swissPfam?
Ideally one would get a db handle and use that to retrieve a particular
entry indexed by any of the identifiers in the cluster

    e.g., my $db= new Bio::DB::Homologene
                    $db->fetch_cluster_by_name('UBE2B'); access by gene
name
                    $db->fetch_cluster_by_id('NM_005197');

    of by taxon id
                  my @human_clusters =   $db->fetch_all_by_taxon(9606);

Any broad pointers in the right direction concerning indexing would be
greatly appreciated, I am quite happy to do all the coding
but want to make sure it's useful at the end of the day.

Cheers

Richard

--
Dr Richard Adams
Bioinformatician,
Psychiatric Genetics Group,
Medical Genetics,
Molecular Medicine Centre,
Western General Hospital,
Crewe Rd West,
Edinburgh UK
EH4 2XU

Tel: 44 131 651 1084
richard.adams at ed.ac.uk