[Bioperl-l] OMIM parser
Hilmar Lapp
hlapp@gnf.org
Fri, 12 Jul 2002 13:03:17 -0700
Interesting view. The problem here is as soon as you talk about this on the background of BioSQL. We'd like to load OMIM into BioSQL as one of the more important data sources.
In my biosql-ized view an OMIM entry refers to a locus. A locus would go to a bioentry. This means that the proposed Bio::Organism::Phenotype objects would be loaded into BioSQL as bioentries.
This doesn't sound completely off to me, we just hadn't thought of it that way yet; I'm rather curious whether this matches or is compatible with what you Heikki and Jason had in mind when you conceived that class hierarchy. What else would be in Bio::Organism::Phenotype?
Or do we need a Locus (Bio::Organism:::Locus?) class?
If anyone has any ideas or comments or feelings, please post ...
-hilmar
> -----Original Message-----
> From: Heikki Lehvaslaiho [mailto:heikki@ebi.ac.uk]
> Sent: Friday, July 12, 2002 3:58 AM
> To: Chris Zmasek
> Cc: bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] OMIM parser
>
>
> Chris,
>
> From the sequence point of view OMIM is annotation. On its
> own it contains
> the best we have describeng human phenotype.
>
>
> Last year Jason and I were brainstorming during ISMB. The
> resuts are in
> bioperl file models/maps_and_markers.dia. It is already a bit
> outdated. Maps
> and markers all ended up in Bio::Map but we also outlined
> Bio::Organism
> namespace where Phenotype is one component. So, I'd suggest
> Bio::Phenotype
> or even beter Bio::Organism::Phenotype::OMIM
>
> OMIM phenotypes are quite generic, but in prectice they are
> associate with
> sequences and individuals. We'll need
> Bio::Organism::Individual which could
> have more than one subphenotypes which together form that
> persons phenotype.
>
> The important thing to remember about OMIM is that it is not
> a database in
> rigorous sense. It is a loosely structured - much more than general
> semistructired biological databases - collection of free text
> and various
> other structures:
>
> - ID
> - Name(s)
> - Keywords
> - Summary
> - Main text
> - Mutations (Bio::Variation)
> -- ID
> -- keywords including mutaion description
> -- free text
> - Crossreferences (Bio::Annotation::DBLink)
> - References (Bio::Biblio or Bio::Annotation::Reference)
> - Contributors & History
> - it implies Species (Bio::Species)
>
>
> I am not saying you have to parse nd write out everything,
> but at least try
> to keep the the bigger picture in mind and future options open.
>
> Good luck,
>
> -Heikki
>
>
>
> Chris Zmasek wrote:
> > Hi!
> >
> > I am in the process of writing a parser for the OMIM
> database (to be submitted to Bioperl).
> >
> > Not all entries in OMIM are linked to a gene/locus, some of
> them are just diseases without a associated gene, for example
> the entry for "ABDOMINAL AORTIC ANEURYSM" (100070).
> >
> > Therefore I am not clear what the best output for such a
> parser might be:
> > Sequence objects (without a actual "sequence string") or
> annotation objects?
> > If the output consists of sequence objects, entries without
> a associated gene would have to be ignored.
> >
> > What do you think?
> >
> > Thanks,
> >
> > Christian [czmasek@gnf.org]
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
>
>
> --
> ______ _/ _/_____________________________________________________
> _/ _/ http://www.ebi.ac.uk/mutations/
> _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk
> _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
> _/ _/ _/ Wellcome Trust Genome Campus, Hinxton
> _/ _/ _/ Cambs. CB10 1SD, United Kingdom
> _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
> ___ _/_/_/_/_/________________________________________________________
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>