[Bioperl-l] OMIMparser/OMIMentry fail to retrieve gene symbols
Enrico Ferrero
enricoferrero86 at gmail.com
Fri Jul 12 16:33:40 UTC 2013
Hi,
I'm using Bio::Phenotype::OMIM::OMIMparser to query the OMIM database [1].
In the simplest scenario I just want to retrieve all gene symbols
associated with all the diseases in OMIM and store disease and gene
information in a flat file.
I have modified the code on the CPAN page for OMIMparser [2] just
slightly to add my custom query, but my script fails to retrieve the
gene symbols associated with the OMIM entry (better formatted code
also available here: http://pastebin.com/8SFx4mUW):
***CODE START***
use Bio::Phenotype::OMIM::OMIMparser;
$omim_parser = Bio::Phenotype::OMIM::OMIMparser->new( -genemap => "genemap",
-omimtext =>
"omim.txt" );
my $parsedOMIM="parsed.OMIM.txt";
open my $fh, ">", $parsedOMIM or die "Can't open $parsedOMIM: $!";
print $fh "ID" . "\t" . "Disease" . "\t" . "Genes" . "\n";
while ( my $omim_entry = $omim_parser->next_phenotype() ) {
# This prints everything.
#~ print( $omim_entry->to_string() );
#~ print "\n\n";
# This gets individual data (some of them object-arrays)
# (and illustrates the relevant methods of OMIMentry).
my $numb = $omim_entry->MIM_number(); # *FIELD* NO
my $title = $omim_entry->title(); # *FIELD*
TI - first line
my $alt = $omim_entry->alternative_titles_and_symbols(); # *FIELD*
TI - additional lines
my $mtt = $omim_entry->more_than_two_genes(); # "#" before title
my $sep = $omim_entry->is_separate(); # "*" before title
my $desc = $omim_entry->description(); # *FIELD* TX
my $mm = $omim_entry->mapping_method(); # from genemap
my $gs = $omim_entry->gene_status(); # from genemap
my $cr = $omim_entry->created(); # *FIELD* CD
my $cont = $omim_entry->contributors(); # *FIELD* CN
my $ed = $omim_entry->edited(); # *FIELD* ED
my $sa = $omim_entry->additional_references(); # *FIELD* SA
my $cs = $omim_entry->clinical_symptoms_raw(); # *FIELD* CS
my $comm = $omim_entry->comment(); # from genemap
my $mini_mim = $omim_entry->miniMIM(); # *FIELD* MN
# A Bio::Phenotype::OMIM::MiniMIMentry object.
# class Bio::Phenotype::OMIM::MiniMIMentry
# provides the following:
# - description()
# - created()
# - contributors()
# - edited()
#
# Prints the contents of the MINI MIM entry (most OMIM entries do
# not have MINI MIM entries, though).
#~ print $mini_mim->description()."\n";
#~ print $mini_mim->created()."\n";
#~ print $mini_mim->contributors()."\n";
#~ print $mini_mim->edited()."\n";
my @corrs = $omim_entry->each_Correlate(); # from genemap
# Array of Bio::Phenotype::Correlate objects.
# class Bio::Phenotype::Correlate
# provides the following:
# - name()
# - description() (not used)
# - species() (always mouse)
# - type() ("OMIM mouse correlate")
# - comment()
my @refs = $omim_entry->each_Reference(); # *FIELD* RF
# Array of Bio::Annotation::Reference objects.
my @avs = $omim_entry->each_AllelicVariant(); # *FIELD* AV
# Array of Bio::Phenotype::OMIM::OMIMentryAllelicVariant objects.
# class Bio::Phenotype::OMIM::OMIMentryAllelicVariant
# provides the following:
# - number (e.g ".0001" )
# - title (e.g "ALCOHOL INTOLERANCE" )
# - symbol (e.g "ALDH2*2" )
# - description (e.g "The ALDH2*2-encoded protein has a change ..." )
# - aa_ori (used if information in the form "LYS123ARG" is found)
# - aa_mut (used if information in the form "LYS123ARG" is found)
# - position (used if information in the form "LYS123ARG" is found)
# - additional_mutations (used for e.g. "1-BP DEL, 911T")
my @cps = $omim_entry->each_CytoPosition(); # from genemap
# Array of Bio::Map::CytoPosition objects.
my @gss = $omim_entry->each_gene_symbol(); # from genemap
# Array of strings.
### A handy string to store gene symbols
my $geneSymbols = join(",", @gss);
### My query (this is just an example, I actually need to perform more
complex queries)
if ($title =~ /^#/) {
print $fh $numb . "\t" . $title . "\t" . $geneSymbols . "\n";
}
}
close $fh;
****CODE END***
So, my understanding is that '$omim_entry->each_gene_symbol()' fails
to retrieve gene symbols, except for a handful of cases (making the
issue a lot more mysterious).
I'm still a beginner, so it's entirely possible I'm doing something
stupid or wrong.
Alternatively, there might be something wrong on how
OMIMparser/OMIMentry parse and link the 'omim.txt' and 'genemap'
files.
Any help on how to get this to work is greatly appreciated.
Thank you.
Best,
[1] http://europe.omim.org/
[2] http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/Phenotype/OMIM/OMIMparser.pm
--
Enrico Ferrero
More information about the Bioperl-l
mailing list