[Bioperl-l] OMIM parser
Heikki Lehvaslaiho
heikki@ebi.ac.uk
Fri, 12 Jul 2002 11:57:54 +0100
Chris,
From the sequence point of view OMIM is annotation. On its own it contains
the best we have describeng human phenotype.
Last year Jason and I were brainstorming during ISMB. The resuts are in
bioperl file models/maps_and_markers.dia. It is already a bit outdated. Maps
and markers all ended up in Bio::Map but we also outlined Bio::Organism
namespace where Phenotype is one component. So, I'd suggest Bio::Phenotype
or even beter Bio::Organism::Phenotype::OMIM
OMIM phenotypes are quite generic, but in prectice they are associate with
sequences and individuals. We'll need Bio::Organism::Individual which could
have more than one subphenotypes which together form that persons phenotype.
The important thing to remember about OMIM is that it is not a database in
rigorous sense. It is a loosely structured - much more than general
semistructired biological databases - collection of free text and various
other structures:
- ID
- Name(s)
- Keywords
- Summary
- Main text
- Mutations (Bio::Variation)
-- ID
-- keywords including mutaion description
-- free text
- Crossreferences (Bio::Annotation::DBLink)
- References (Bio::Biblio or Bio::Annotation::Reference)
- Contributors & History
- it implies Species (Bio::Species)
I am not saying you have to parse nd write out everything, but at least try
to keep the the bigger picture in mind and future options open.
Good luck,
-Heikki
Chris Zmasek wrote:
> Hi!
>
> I am in the process of writing a parser for the OMIM database (to be submitted to Bioperl).
>
> Not all entries in OMIM are linked to a gene/locus, some of them are just diseases without a associated gene, for example the entry for "ABDOMINAL AORTIC ANEURYSM" (100070).
>
> Therefore I am not clear what the best output for such a parser might be:
> Sequence objects (without a actual "sequence string") or annotation objects?
> If the output consists of sequence objects, entries without a associated gene would have to be ignored.
>
> What do you think?
>
> Thanks,
>
> Christian [czmasek@gnf.org]
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________