[Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning

Frank Schwach fs5 at sanger.ac.uk
Tue Jan 10 22:35:46 UTC 2012


Hi Roy,

I see what you mean and I had the same thought but somehow I liked the 
fuzzy locations more because it suggests to me that the feature is not 
complete (anymore). But I do take your point that this is not the 
intended use of this location type. I can add notes as you suggest but I 
guess I should also add a misc_feature "deletion", in your example 
between bases 3 and 4, to make it clearer that something has happened to 
the feature.

Frank



On 10/01/12 17:27, Roy Chaudhuri wrote:
> I think it's me that didn't explain very well - I was talking about 
> overlapping (rather than spanning) a deletion, although I think the 
> same principle applies to the spanning example you gave. Here's some 
> test code:
>
> #!/usr/bin/perl
> use warnings FATAL=>qw(all);
> use strict;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::SeqUtils;
> use Bio::SeqFeature::Generic;
> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>                                                    -start=>2,
>                                                    -end=>9));
>
> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>                                                    -start=>2,
>                                                    -end=>5));
> my $out=Bio::SeqIO->newFh(-format=>'genbank');
> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
> print $out $trunc;
>
>
> This currently outputs:
> LOCUS       seq-accession_number            7 bp    dna     linear   UNK
> ACCESSION   unknown
> FEATURES             Location/Qualifiers
>      CDS             join(2..>3,<4..6)
>      CDS             2..>3
> ORIGIN
>         1 aaaaaaa
> //
>
> However, I was suggesting that the feature table should be something 
> like:
> CDS             join(2..3,4..6)
>                 /note="3 bp internal deletion"
> CDS             join(2..3)
>                 /note="2 bp deleted from 3' end"
>
> Fuzzy locations are intended to represent features which have 
> boundaries spanning outside of the sequence. For a defined deletion 
> that's not the case, the boundaries of the feature aren't unknown, 
> they have been specifically altered.
>
> Hope this is clearer.
> Cheers,
> Roy.
>
> On 10/01/2012 16:47, Frank Schwach wrote:
>> Hi Roy,
>>
>> Sorry, I hadn't explained that very well: it's not the outer boundaries
>> of the feature that become fuzzy but the "inner" ones of the split
>> locations:
>>
>>   --------------------           a feature's location
>> ==========xxxx================= sequence
>>
>>
>>   ---------                     sublocation 1
>>            --------             sublocation 2
>> ===============================
>>
>> x= sequence to delete
>> The feature's location has changed from Simple to Split.
>>
>> Sublocation 1:
>> start is still EXACT and has not changed
>> end is now AFTER because this is not a true end of the feature
>>
>> Sublocation 2:
>> start is BEFORE
>> end is EXACT (but shifted)
>>
>> I hope this makes more sense(?)
>>
>> Cheers,
>>
>> Frank
>>
>>
>>
>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>> Hi Frank,
>>>
>>> Looks good to me. One thing I'm not sure about - why do features
>>> overlapping a deletion become fuzzy? That behaviour is in
>>> trunc_with_features because it's intended to represent a taking a
>>> subregion of a larger sequence, but if you're representing an internal
>>> deletion then the boundaries of the overlapping feature aren't unknown,
>>> they have been specifically altered. Maybe you could give absolute
>>> coordinates, but add a note indicating that the 5' or 3' end has been
>>> truncated by however many bases.
>>>
>>> Cheers,
>>> Roy.
>>>
>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>> Hi Chris,
>>>>
>>>> I have made the changes in a Git fork and made the pull request now.
>>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>>> HOWTO for the BioPerl wiki.
>>>>
>>>> Frank
>>>>
>>>>
>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>> Sounds very promising!  The easiest way to contribute is via a 
>>>>> fork of the code on Github with a pull request (as you already 
>>>>> know, being a contributor to the Primer3 modules).
>>>>>
>>>>> chris
>>>>>
>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I needed to manipulate Bio::Seq objects with annotations and 
>>>>>> sequence
>>>>>> features to simulate molecular cloning techniques, e.g. to cut a 
>>>>>> vector
>>>>>> and insert a fragment into it while preserving all the 
>>>>>> annotations and
>>>>>> moving the features accordingly.
>>>>>> My main aim was to split features that span deletion/insertion 
>>>>>> sites in
>>>>>> a meaningful way, which can not be done with the currently availble
>>>>>> methods.
>>>>>> I have modified Bio::SeqUtils so that I have the following new 
>>>>>> methods:
>>>>>>
>>>>>> delete
>>>>>> ======
>>>>>> removes a segment from a sequence object and adjusts positions 
>>>>>> and types
>>>>>> of locations of sequence features:
>>>>>> - locations of features that span the deletion sites are turned into
>>>>>> Splits.
>>>>>> - locations that extend into the deleted region are turned to 
>>>>>> Fuzzy to
>>>>>> indicate that their true start/end was lost.
>>>>>> - locations contained inside the deleted regions are lost.
>>>>>> - other features are shifted according to the length of the 
>>>>>> deletion.
>>>>>>
>>>>>> insert
>>>>>> ======
>>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>>> sites. This also affects the features on the recipient sequence:
>>>>>> - locations of features that span the insertion site are split but
>>>>>> position types are not turned to Fuzzy because no part of the 
>>>>>> original
>>>>>> feature is lost.
>>>>>> - other features are shifted according to the length of the 
>>>>>> insertion.
>>>>>>
>>>>>> ligate
>>>>>> ======
>>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>>> Simply calls delete [, reverse_complement_with_features] and 
>>>>>> insert in
>>>>>> turn.
>>>>>>
>>>>>>
>>>>>> One situation I haven't handled yet is a deletion that spans the 
>>>>>> origin
>>>>>> of a circular molecule but that should be a rare thing to do 
>>>>>> anyway. The
>>>>>> code currently throws an error if this is attempted.
>>>>>>
>>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>
>>>>>> Frank
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Bioperl-l mailing list