[Bioperl-l] Homologene parsing
    Richard Adams 
    Richard.Adams at ed.ac.uk
       
    Tue Oct  7 07:19:33 EDT 2003
    
    
  
Hello,
    I'm writing some code to access data from NCBI's homologene dataset
of reciprocal best blast matches
 (see http://www.ncbi.nlm.nih.gov/HomoloGene/details.shtml) and am
thinking of Bioperlifying it.
At the moment I just have code for trawling through the file,  parsing
individual entries and accessing their attributes
     e.g.,
            my @orthologueget_orthologous_accessions ()
             my @locus_ids     =get_orthologous_locusIds()
             my $cluster_size = orthologue_count()
           my $boolean =  has_orthologue('mouse') etc
    but to be useful for accessing clusters quickly I guess it needs
some sort of indexing - there are about 21 000 entries. I don't know
much about databases /indexing etc but would inheriting from
Bio::Index::Abstract be the way to go,, in a similar way to swissPfam?
Ideally one would get a db handle and use that to retrieve a particular
entry indexed by any of the identifiers in the cluster
    e.g., my $db= new Bio::DB::Homologene
                    $db->fetch_cluster_by_name('UBE2B'); access by gene
name
                    $db->fetch_cluster_by_id('NM_005197');
    of by taxon id
                  my @human_clusters =   $db->fetch_all_by_taxon(9606);
Any broad pointers in the right direction concerning indexing would be
greatly appreciated, I am quite happy to do all the coding
but want to make sure it's useful at the end of the day.
Cheers
Richard
--
Dr Richard Adams
Bioinformatician,
Psychiatric Genetics Group,
Medical Genetics,
Molecular Medicine Centre,
Western General Hospital,
Crewe Rd West,
Edinburgh UK
EH4 2XU
Tel: 44 131 651 1084
richard.adams at ed.ac.uk
    
    
More information about the Bioperl-l
mailing list