[BioRuby] [GSoC][NeXML and RDF API] Sequences( doubts )

Anurag Priyam anurag08priyam at gmail.com
Sat Jul 3 11:13:43 UTC 2010

This is going to be a long mail.

NeXML's characters tag serves as a storage block for sequences. Sequences
can be described in NeXML in two ways, raw( with the seq tag ) and granular(
with the cell tags ). NeXML offers six kind of sequences :
1. Protein( AA )
2. DNA
3. RNA
4. Restriction
5. Standard
6. Continuous

As of now, the NeXML parser just returns the sequence as a string. It should
return Bio::Sequence. BioRuby already has classes to work with AA and NA
sequences. I was thinking of adding classes to represent Restriction,
Standard and Continuous sequences. Should I work on adding support for these
as a core BioRuby classes or just as a part of NeXML lib? I will have to
adapt Bio::Sequence class to recognize the new sequences.

Why does the Bio::Sequence#guess method use the some 90% way of recognition
between AA and NA? Why not use regexp instead?

Anurag Priyam,
2nd Year Undergraduate,
Department of Mechanical Engineering,
IIT Kharagpur.

More information about the BioRuby mailing list