[BioRuby] Bio::Sequence forces a DNA sequence to lowercase?

Pjotr Prins pjotr2008 at thebird.nl
Sun Feb 3 09:43:49 UTC 2008


On Sun, Feb 03, 2008 at 02:21:21PM +0900, Naohisa GOTO wrote:
> On Sat, 2 Feb 2008 11:47:41 +0100
> pjotr2008 at thebird.nl (Pjotr Prins) wrote:
> 
> > As case contains (external) information to the Sequence class I would
> > favour 'translate' and 'complement' would conserve case after their
> > job. I think that is the correct thing to do.
> 
> How do 'translate' conserve case when both upcase and downcase
> characters are mixed?

In fact what I meant was to retain case for AA and NA - as converting
a sequence to lower or upper case makes the assumption the user
actually wants that, while it may be he wants to retain that
information.

After some thought I realise that the requirement on the users-end for
retaining case is really a poor-man's solution for storing positional
binary information in addition to the sequence itself. He has:

'aAttGa' rather than ('aattga','010010').

His reason is that later modifications to the sequence - i.e. a
deletion - he can do in one step, rather than two.

I have been in that position where I wanted to store maximum
likelihood values with each nucleotide. Any post processing required
processing two data structures. 

It would be really nice to resolve this in a way where we can create a
sequence, attach positional information (any object type), and allow
processing where the positional information gets retained with the NAs
or AAs. I think a generic solution would be often used if it is
simple enough.

Jan's proposal for using Bio::Feature was in this case useful, but
would not hold for information over a full sequence - or would be
overkill. So perhaps downcasing the sequence is, indeed, the right
thing to do - but allow adding of positional information.

It is a classic case for an adapter. Perhaps something like:

seq = Bio::Sequence::NA.new(:adapter => Bio::Sequence::RetainCase('aAttGA'))

and for maximum likelihoods

class ML < Bio::Sequence::PositionalInformation
end

seqml = Bio::Sequence::NA.new('aattga', :adapter => ML([0.1,0.2,0.3,0.2,0.3,0.1]))

where the adapter handles additional methods like:

seqml[1]
 >> 'a'
seqml.posinfo[1]
 >> 0.3

and in the RetainCase example:

seq[1]
 >> 'a'
seq.posinfo[1]
 >> 'A'

Pj.




More information about the BioRuby mailing list