[BioRuby] GSOC: phyloXML for BioRuby: Mapping sequence

Sun Jun 7 06:28:53 UTC 2009

Hi,

sorry for delay.

On Sat, 30 May 2009 17:27:52 -0400
Diana Jaunzeikare <rozziite at gmail.com> wrote:

> Hi all,
> 
> So I looked more carefully at the sequence element of phyloXML and it
> consists of information which cannot be mapped to Bio::Sequence object. I
> suggest to have a sequence class which closely resembles phyloXML structure
> and then have a method to extract relevant elements return Bio::Sequence
> object.  What do you think?

In this case, the method to convert from Bio::Sequence to the
phyloXML sequence class is also needed.

If some of the attributes are really essential and not specific
to phyloXML but will be needed from other data types, it is
also possible to add new attributes to Bio::Sequence.

> Here on the left i listed phyloXML sequence tag elements and after the arrow
> -> the possible corresponding attribute of Bio::Sequence
> * type
> ** rna, dna  -> Bio::Sequence::NA -> molecule type
> ** aa -> Bio::Sequence::AA
> * id_source (string ?) -> id_namespace
> * id_ref (string ) -> entry_id
> * symbol (string ?)
> * accession
> ** source (example: "UniProtKB") ->
> ** id (example: "P17304") ->  primary_accession
> * name (string )
> * location (string ? )
> * mol_seq (string) -> seq / Bio::Sequence::NA/AA
> * uri
> ** desc (string)
> ** type (string )
> ** uri
> 
> * annotation []
> ** ref
> ** source
> ** evidence
> ** type
> ** desc
> ** confidence
> ** property []
> ** uri
> 
> * domain_architecture
> ** length
> ** domain []
> *** from
> *** to
> *** confidence
> *** id

The annotations and domain architecture could be mapped to the features
in Bio::Sequence.  But, in some cases, it is difficult to be mapped,
depending on the vocabulary used in the annotations/domain_architecture.

-- 
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org