[Bioperl-l] SimpleAlign - get_seq_by_id

Sat Oct 25 17:31:53 UTC 2008

>
> It would make sense to also see about implementing Gblocks style filtering
> method as well (but not in SimpleAlign give then number of methods already
> as you mention!).

Yes, that would be really useful :-)

Some of the aligner programs, like T-Coffee, also have the ability to
produce a score matrix for a given MSA. This matrix-style notation of
alignment "quality" could be used to filter given a threshold not only
vertically (columns) but also horizontally (e.g., mispredicted coding exons
that don't align well to the rest of the proteins). I think there is growing
interest in the concept of meta-aligners nowadays, i.e., methods that will
combine the resulting MSAs from a set of aligners and produce a final MSA
jointly with a score matrix of alignment quality:

http://nar.oxfordjournals.org/cgi/content/full/34/6/1692

It would be very interesting to have the methods for this kind of alignment
filtering in Bioperl so that one can call methods from a $sa object and get
filtered alignments given defined thresholds.

> -jason
>
> On Oct 24, 2008, at 3:05 AM, Heikki Lehvaslaiho wrote:
>
>  Spoke too soon: each_seq_with_id() already exists. Is there really a need
>> for
>> get_seq_by_id()?
>>
>> A more general observation: Bio::SimpleAlign with its 83 methods has grown
>> too
>> big to keep all the code (3055 lines total) in one file. Any volunteers to
>> break it up into more manageable chunks?
>>
>> The methods in the current file have already been categorised which should
>> help
>> in the task:
>>
>> =head1 Modifier methods
>> =head1 Sequence selection methods
>> =head1 Create new alignments
>> =head1 Change sequences within the MSA
>> =head1 MSA attributes
>> =head1 Alignment descriptors
>> =head1 Alignment positions
>> =head1 Sequence names
>>
>> The helper modules should go into Bio::Align name space.
>>
>>
>>  -Heikki
>>
>>
>> On Friday 24 October 2008 08:32:49 Heikki Lehvaslaiho wrote:
>>
>>> The main reason it has not been Bio::SeqAlign is that sequence ID not
>>> necessarily a unique identifier in a MSA. Multiple regions of the
>>> sequence
>>> defined by one ID can be in one.
>>>
>>> The current code returns only the more or less randomly selected first
>>> Bio::LocatebleSeqI object with that ID. Should we make it context
>>> sensitive
>>> and return an array of sequences in array context?
>>>
>>> That brings up an other question: After the change, the get_seq_by_id()
>>> will behave differently from all other instances of that method, so
>>> should
>>> it be renamed to reflect that?
>>>
>>>     -Heikkki
>>>
>>> On Thursday 23 October 2008 21:29:20 Jason Stajich wrote:
>>>
>>>> I added get_seq_by_id to Bio::SimpleAlign to allow retrieval of a
>>>> particular sequence from the alignment by ID. Not sure why this didn't
>>>> exist before.
>>>>
>>>> -jason
>>>> --
>>>> Jason Stajich
>>>> jason at bioperl.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>> --
>> ______ _/      _/_____________________________________________________
>>     _/      _/
>>    _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>>   _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>>  _/  _/  _/  SANBI, South African National Bioinformatics Institute
>>  _/  _/  _/  University of Western Cape, South Africa
>>    _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
>> ___ _/_/_/_/_/________________________________________________________
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> Jason Stajich
> jason at bioperl.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>