[BioRuby] Bio::Sequence forces a DNA sequence to lowercase?

Toshiaki Katayama ktym at hgc.jp
Mon Feb 4 04:54:19 UTC 2008


Hi guys,

Please do not to make things complicated.

On 2008/02/03, at 14:21, Naohisa GOTO wrote:
> How do 'translate' conserve case when both upcase and downcase
> characters are mixed?
>  'atg' ==> 'm'
>  'ATG' ==> 'M'
>  'Atg', 'aTg', 'atG' ==> ??? 'm' or 'M'?
>  'ATg', 'AtG', 'aTG' ==> ??? 'm' or 'M'?


Suppose upper case letters in the nucleic acid sequence are the marks of something,

>  'Atg', 'aTg', 'atG' ==> ??? 'm' or 'M'?
>  'ATg', 'AtG', 'aTG' ==> ??? 'm' or 'M'?


these should be 'M'.

However, if we change the policy to retain cases of the input sequence,
we need to go over every aspects of the BioRuby functionality which
utilize Bio::Sequence object (including restriction enzymes, features,
locations, I/O adapters etc. although many of these would be OK).

And, I think it is not worth to create another class PositionalInformation.
All you need would be a instance variable to store an array of properties.

# Also, I think many other important things to be done before we tackle this ...

Regards,
Toshiaki Katayama

On 2008/02/03, at 18:43, Pjotr Prins wrote:

>
> On Sun, Feb 03, 2008 at 02:21:21PM +0900, Naohisa GOTO wrote:
>> On Sat, 2 Feb 2008 11:47:41 +0100
>> pjotr2008 at thebird.nl (Pjotr Prins) wrote:
>>
>>> As case contains (external) information to the Sequence class I would
>>> favour 'translate' and 'complement' would conserve case after their
>>> job. I think that is the correct thing to do.
>>
>> How do 'translate' conserve case when both upcase and downcase
>> characters are mixed?
>
> In fact what I meant was to retain case for AA and NA - as converting
> a sequence to lower or upper case makes the assumption the user
> actually wants that, while it may be he wants to retain that
> information.
>
> After some thought I realise that the requirement on the users-end for
> retaining case is really a poor-man's solution for storing positional
> binary information in addition to the sequence itself. He has:
>
> 'aAttGa' rather than ('aattga','010010').
>
> His reason is that later modifications to the sequence - i.e. a
> deletion - he can do in one step, rather than two.
>
> I have been in that position where I wanted to store maximum
> likelihood values with each nucleotide. Any post processing required
> processing two data structures. 
>
> It would be really nice to resolve this in a way where we can create a
> sequence, attach positional information (any object type), and allow
> processing where the positional information gets retained with the NAs
> or AAs. I think a generic solution would be often used if it is
> simple enough.
>
> Jan's proposal for using Bio::Feature was in this case useful, but
> would not hold for information over a full sequence - or would be
> overkill. So perhaps downcasing the sequence is, indeed, the right
> thing to do - but allow adding of positional information.
>
> It is a classic case for an adapter. Perhaps something like:
>
> seq = Bio::Sequence::NA.new(:adapter => Bio::Sequence::RetainCase('aAttGA'))
>
> and for maximum likelihoods
>
> class ML < Bio::Sequence::PositionalInformation
> end
>
> seqml = Bio::Sequence::NA.new('aattga', :adapter => ML([0.1,0.2,0.3,0.2,0.3,0.1]))
>
> where the adapter handles additional methods like:
>
> seqml[1]
>>> 'a'
> seqml.posinfo[1]
>>> 0.3
>
> and in the RetainCase example:
>
> seq[1]
>>> 'a'
> seq.posinfo[1]
>>> 'A'
>
> Pj.
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby





More information about the BioRuby mailing list