[Bioperl-l] Homologene parsing
Richard Adams
Richard.Adams at ed.ac.uk
Tue Oct 7 07:19:33 EDT 2003
Hello,
I'm writing some code to access data from NCBI's homologene dataset
of reciprocal best blast matches
(see http://www.ncbi.nlm.nih.gov/HomoloGene/details.shtml) and am
thinking of Bioperlifying it.
At the moment I just have code for trawling through the file, parsing
individual entries and accessing their attributes
e.g.,
my @orthologueget_orthologous_accessions ()
my @locus_ids =get_orthologous_locusIds()
my $cluster_size = orthologue_count()
my $boolean = has_orthologue('mouse') etc
but to be useful for accessing clusters quickly I guess it needs
some sort of indexing - there are about 21 000 entries. I don't know
much about databases /indexing etc but would inheriting from
Bio::Index::Abstract be the way to go,, in a similar way to swissPfam?
Ideally one would get a db handle and use that to retrieve a particular
entry indexed by any of the identifiers in the cluster
e.g., my $db= new Bio::DB::Homologene
$db->fetch_cluster_by_name('UBE2B'); access by gene
name
$db->fetch_cluster_by_id('NM_005197');
of by taxon id
my @human_clusters = $db->fetch_all_by_taxon(9606);
Any broad pointers in the right direction concerning indexing would be
greatly appreciated, I am quite happy to do all the coding
but want to make sure it's useful at the end of the day.
Cheers
Richard
--
Dr Richard Adams
Bioinformatician,
Psychiatric Genetics Group,
Medical Genetics,
Molecular Medicine Centre,
Western General Hospital,
Crewe Rd West,
Edinburgh UK
EH4 2XU
Tel: 44 131 651 1084
richard.adams at ed.ac.uk
More information about the Bioperl-l
mailing list