[Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning
Frank Schwach
fs5 at sanger.ac.uk
Fri Jan 13 21:47:50 UTC 2012
Hi Chris,
Thanks for merging my pull request. I have a version now tha allows
using clone as an option. Would that be of interest too?
Frank
On 11/01/12 21:03, Frank Schwach wrote:
> Great, I'll work on a branch that gives the user the option to use
> clone instead of new and then we can see if we want to use that in the
> end. In the meantime, what do you think about pulling this into
> bioperl-live? When I have some time again I can work on the HOWTO for
> these new features for the BioPerl wiki
>
> Frank
>
>
> On 11/01/12 18:42, Fields, Christopher J wrote:
>> Note that Bio::Root::Root now has a clone() method that one can take
>> advantage of for this purpose; if Storable or Clone is available, it
>> will pick one of the two, preferably Clone over Storable. It's
>> fairly untested, but we haven't run into problems with it yet (I
>> think it was in the last CPAN release).
>>
>> chris
>>
>> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote:
>>
>>> Hi Frank,
>>>
>>> Looks great, I like the use of between locations, didn't think of that.
>>>
>>> It was suggested that I avoid using Clone for cat,
>>> trunc_with_features etc. to avoid adding a dependency (which may no
>>> longer be an issue) and because it would cause problems for Bio::Seq
>>> implementations that use a database as the back-end. Maybe you could
>>> add it as an option, but keep the default as is?
>>>
>>> Cheers,
>>> Roy.
>>>
>>> On 11/01/2012 18:16, Frank Schwach wrote:
>>>> Hi Roy and Chris,
>>>>
>>>> I have made the changes to the code now. As you suggested, feature
>>>> ends
>>>> no longer change type and I insert a note instead to inform about the
>>>> deletion (or insertion), showing the length and position.
>>>> I have also added a feature to annotate deletion sites themselves
>>>> (with
>>>> IN-BETWEEN locations).
>>>>
>>>> Roy's test script now prints:
>>>>
>>>> LOCUS seq-accession_number 7 bp dna
>>>> linear UNK
>>>> ACCESSION unknown
>>>> FEATURES Location/Qualifiers
>>>> CDS join(2..3,4..6)
>>>> /note="3bp internal deletion between pos 3
>>>> and 4"
>>>> CDS 2..3
>>>> /note="2bp deleted from feature end"
>>>> misc_feature 3^4
>>>> /note="deletion of 3bp"
>>>> ORIGIN
>>>> 1 aaaaaaa
>>>> //
>>>>
>>>>
>>>> or, if you add strand information (-1 in this case) to the second
>>>> feature:
>>>>
>>>> LOCUS seq-accession_number 7 bp dna
>>>> linear UNK
>>>> ACCESSION unknown
>>>> FEATURES Location/Qualifiers
>>>> CDS join(2..3,4..6)
>>>> /note="3bp internal deletion between pos 3
>>>> and 4"
>>>> CDS complement(2..3)
>>>> /note="2bp deleted from feature 5' end"
>>>> misc_feature 3^4
>>>> /note="deletion of 3bp"
>>>> ORIGIN
>>>> 1 aaaaaaa
>>>> //
>>>>
>>>> I have comitted this along with some bugfixes to my master branch
>>>> on GitHub
>>>> https://github.com/fschwach/bioperl-live
>>>> so it's now also in my existing pull request.
>>>>
>>>> I'm still wondering if cloning the sequence objects rather than
>>>> calling
>>>> 'new' on their respective classes would be an option inside
>>>> 'delete' and
>>>> 'insert'?
>>>> I'm experimenting with this for my own purposes because I have to work
>>>> with custom sub-classes of Bio::Seq which have additional
>>>> attributes and
>>>> therefore set 'can_call_new' to false.
>>>> Without cloning the objects, I first have to convert the custom
>>>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to
>>>> avoid.
>>>> Is there any reason why something like Clone::Fast should not be
>>>> used in
>>>> this case? It seems to work for me but there may be situations where
>>>> this is going to blow up which I am not aware of.
>>>> Cloning rather than calling new could be made an option in
>>>> Bio::SeqUtils. I have most of the code for that already.
>>>>
>>>> Frank
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 10/01/12 17:31, Roy Chaudhuri wrote:
>>>>> Or without the typo:
>>>>>
>>>>> CDS join(2..3,4..6)
>>>>> /note="3 bp internal deletion"
>>>>> CDS 2..3
>>>>> /note="2 bp deleted from 3' end"
>>>>>
>>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote:
>>>>>> I think it's me that didn't explain very well - I was talking about
>>>>>> overlapping (rather than spanning) a deletion, although I think
>>>>>> the same
>>>>>> principle applies to the spanning example you gave. Here's some test
>>>>>> code:
>>>>>>
>>>>>> #!/usr/bin/perl
>>>>>> use warnings FATAL=>qw(all);
>>>>>> use strict;
>>>>>> use Bio::Seq;
>>>>>> use Bio::SeqIO;
>>>>>> use Bio::SeqUtils;
>>>>>> use Bio::SeqFeature::Generic;
>>>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>>>>
>>>>>> -start=>2,
>>>>>> -end=>9));
>>>>>>
>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>>>>
>>>>>> -start=>2,
>>>>>> -end=>5));
>>>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank');
>>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
>>>>>> print $out $trunc;
>>>>>>
>>>>>>
>>>>>> This currently outputs:
>>>>>> LOCUS seq-accession_number 7 bp dna
>>>>>> linear UNK
>>>>>> ACCESSION unknown
>>>>>> FEATURES Location/Qualifiers
>>>>>> CDS join(2..>3,<4..6)
>>>>>> CDS 2..>3
>>>>>> ORIGIN
>>>>>> 1 aaaaaaa
>>>>>> //
>>>>>>
>>>>>> However, I was suggesting that the feature table should be something
>>>>>> like:
>>>>>> CDS join(2..3,4..6)
>>>>>> /note="3 bp internal deletion"
>>>>>> CDS join(2..3)
>>>>>> /note="2 bp deleted from 3' end"
>>>>>>
>>>>>> Fuzzy locations are intended to represent features which have
>>>>>> boundaries
>>>>>> spanning outside of the sequence. For a defined deletion that's
>>>>>> not the
>>>>>> case, the boundaries of the feature aren't unknown, they have been
>>>>>> specifically altered.
>>>>>>
>>>>>> Hope this is clearer.
>>>>>> Cheers,
>>>>>> Roy.
>>>>>>
>>>>>> On 10/01/2012 16:47, Frank Schwach wrote:
>>>>>>> Hi Roy,
>>>>>>>
>>>>>>> Sorry, I hadn't explained that very well: it's not the outer
>>>>>>> boundaries
>>>>>>> of the feature that become fuzzy but the "inner" ones of the split
>>>>>>> locations:
>>>>>>>
>>>>>>> -------------------- a feature's location
>>>>>>> ==========xxxx================= sequence
>>>>>>>
>>>>>>>
>>>>>>> --------- sublocation 1
>>>>>>> -------- sublocation 2
>>>>>>> ===============================
>>>>>>>
>>>>>>> x= sequence to delete
>>>>>>> The feature's location has changed from Simple to Split.
>>>>>>>
>>>>>>> Sublocation 1:
>>>>>>> start is still EXACT and has not changed
>>>>>>> end is now AFTER because this is not a true end of the feature
>>>>>>>
>>>>>>> Sublocation 2:
>>>>>>> start is BEFORE
>>>>>>> end is EXACT (but shifted)
>>>>>>>
>>>>>>> I hope this makes more sense(?)
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Frank
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>>>>>>> Hi Frank,
>>>>>>>>
>>>>>>>> Looks good to me. One thing I'm not sure about - why do features
>>>>>>>> overlapping a deletion become fuzzy? That behaviour is in
>>>>>>>> trunc_with_features because it's intended to represent a taking a
>>>>>>>> subregion of a larger sequence, but if you're representing an
>>>>>>>> internal
>>>>>>>> deletion then the boundaries of the overlapping feature aren't
>>>>>>>> unknown,
>>>>>>>> they have been specifically altered. Maybe you could give absolute
>>>>>>>> coordinates, but add a note indicating that the 5' or 3' end
>>>>>>>> has been
>>>>>>>> truncated by however many bases.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Roy.
>>>>>>>>
>>>>>>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>>>>>>> Hi Chris,
>>>>>>>>>
>>>>>>>>> I have made the changes in a Git fork and made the pull
>>>>>>>>> request now.
>>>>>>>>> If this is accepted into BioPerl I can also write a little
>>>>>>>>> SeqUtils
>>>>>>>>> HOWTO for the BioPerl wiki.
>>>>>>>>>
>>>>>>>>> Frank
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>>>>>>> Sounds very promising! The easiest way to contribute is via a
>>>>>>>>>> fork of the code on Github with a pull request (as you already
>>>>>>>>>> know, being a contributor to the Primer3 modules).
>>>>>>>>>>
>>>>>>>>>> chris
>>>>>>>>>>
>>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and
>>>>>>>>>>> sequence
>>>>>>>>>>> features to simulate molecular cloning techniques, e.g. to
>>>>>>>>>>> cut a
>>>>>>>>>>> vector
>>>>>>>>>>> and insert a fragment into it while preserving all the
>>>>>>>>>>> annotations and
>>>>>>>>>>> moving the features accordingly.
>>>>>>>>>>> My main aim was to split features that span deletion/insertion
>>>>>>>>>>> sites in
>>>>>>>>>>> a meaningful way, which can not be done with the currently
>>>>>>>>>>> availble
>>>>>>>>>>> methods.
>>>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new
>>>>>>>>>>> methods:
>>>>>>>>>>>
>>>>>>>>>>> delete
>>>>>>>>>>> ======
>>>>>>>>>>> removes a segment from a sequence object and adjusts positions
>>>>>>>>>>> and types
>>>>>>>>>>> of locations of sequence features:
>>>>>>>>>>> - locations of features that span the deletion sites are turned
>>>>>>>>>>> into
>>>>>>>>>>> Splits.
>>>>>>>>>>> - locations that extend into the deleted region are turned to
>>>>>>>>>>> Fuzzy to
>>>>>>>>>>> indicate that their true start/end was lost.
>>>>>>>>>>> - locations contained inside the deleted regions are lost.
>>>>>>>>>>> - other features are shifted according to the length of the
>>>>>>>>>>> deletion.
>>>>>>>>>>>
>>>>>>>>>>> insert
>>>>>>>>>>> ======
>>>>>>>>>>> adds a Bio::Seq object into another one between specified
>>>>>>>>>>> insertion
>>>>>>>>>>> sites. This also affects the features on the recipient
>>>>>>>>>>> sequence:
>>>>>>>>>>> - locations of features that span the insertion site are
>>>>>>>>>>> split but
>>>>>>>>>>> position types are not turned to Fuzzy because no part of the
>>>>>>>>>>> original
>>>>>>>>>>> feature is lost.
>>>>>>>>>>> - other features are shifted according to the length of the
>>>>>>>>>>> insertion.
>>>>>>>>>>>
>>>>>>>>>>> ligate
>>>>>>>>>>> ======
>>>>>>>>>>> just for convenience. Supply a recipient, a fragment and one
>>>>>>>>>>> or two
>>>>>>>>>>> sites to cut the recipient. Can also flip the fragment if
>>>>>>>>>>> required.
>>>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and
>>>>>>>>>>> insert in
>>>>>>>>>>> turn.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> One situation I haven't handled yet is a deletion that spans
>>>>>>>>>>> the
>>>>>>>>>>> origin
>>>>>>>>>>> of a circular molecule but that should be a rare thing to do
>>>>>>>>>>> anyway. The
>>>>>>>>>>> code currently throws an error if this is attempted.
>>>>>>>>>>>
>>>>>>>>>>> I'm happy to contribute the code on Github if there is
>>>>>>>>>>> interest?
>>>>>>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>>>>>>
>>>>>>>>>>> Frank
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list