[Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning
Frank Schwach
fs5 at sanger.ac.uk
Tue Jan 10 22:35:46 UTC 2012
Hi Roy,
I see what you mean and I had the same thought but somehow I liked the
fuzzy locations more because it suggests to me that the feature is not
complete (anymore). But I do take your point that this is not the
intended use of this location type. I can add notes as you suggest but I
guess I should also add a misc_feature "deletion", in your example
between bases 3 and 4, to make it clearer that something has happened to
the feature.
Frank
On 10/01/12 17:27, Roy Chaudhuri wrote:
> I think it's me that didn't explain very well - I was talking about
> overlapping (rather than spanning) a deletion, although I think the
> same principle applies to the spanning example you gave. Here's some
> test code:
>
> #!/usr/bin/perl
> use warnings FATAL=>qw(all);
> use strict;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::SeqUtils;
> use Bio::SeqFeature::Generic;
> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA');
> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
> -start=>2,
> -end=>9));
>
> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
> -start=>2,
> -end=>5));
> my $out=Bio::SeqIO->newFh(-format=>'genbank');
> my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
> print $out $trunc;
>
>
> This currently outputs:
> LOCUS seq-accession_number 7 bp dna linear UNK
> ACCESSION unknown
> FEATURES Location/Qualifiers
> CDS join(2..>3,<4..6)
> CDS 2..>3
> ORIGIN
> 1 aaaaaaa
> //
>
> However, I was suggesting that the feature table should be something
> like:
> CDS join(2..3,4..6)
> /note="3 bp internal deletion"
> CDS join(2..3)
> /note="2 bp deleted from 3' end"
>
> Fuzzy locations are intended to represent features which have
> boundaries spanning outside of the sequence. For a defined deletion
> that's not the case, the boundaries of the feature aren't unknown,
> they have been specifically altered.
>
> Hope this is clearer.
> Cheers,
> Roy.
>
> On 10/01/2012 16:47, Frank Schwach wrote:
>> Hi Roy,
>>
>> Sorry, I hadn't explained that very well: it's not the outer boundaries
>> of the feature that become fuzzy but the "inner" ones of the split
>> locations:
>>
>> -------------------- a feature's location
>> ==========xxxx================= sequence
>>
>>
>> --------- sublocation 1
>> -------- sublocation 2
>> ===============================
>>
>> x= sequence to delete
>> The feature's location has changed from Simple to Split.
>>
>> Sublocation 1:
>> start is still EXACT and has not changed
>> end is now AFTER because this is not a true end of the feature
>>
>> Sublocation 2:
>> start is BEFORE
>> end is EXACT (but shifted)
>>
>> I hope this makes more sense(?)
>>
>> Cheers,
>>
>> Frank
>>
>>
>>
>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote:
>>> Hi Frank,
>>>
>>> Looks good to me. One thing I'm not sure about - why do features
>>> overlapping a deletion become fuzzy? That behaviour is in
>>> trunc_with_features because it's intended to represent a taking a
>>> subregion of a larger sequence, but if you're representing an internal
>>> deletion then the boundaries of the overlapping feature aren't unknown,
>>> they have been specifically altered. Maybe you could give absolute
>>> coordinates, but add a note indicating that the 5' or 3' end has been
>>> truncated by however many bases.
>>>
>>> Cheers,
>>> Roy.
>>>
>>> On 10/01/2012 13:10, Frank Schwach wrote:
>>>> Hi Chris,
>>>>
>>>> I have made the changes in a Git fork and made the pull request now.
>>>> If this is accepted into BioPerl I can also write a little SeqUtils
>>>> HOWTO for the BioPerl wiki.
>>>>
>>>> Frank
>>>>
>>>>
>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote:
>>>>> Sounds very promising! The easiest way to contribute is via a
>>>>> fork of the code on Github with a pull request (as you already
>>>>> know, being a contributor to the Primer3 modules).
>>>>>
>>>>> chris
>>>>>
>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I needed to manipulate Bio::Seq objects with annotations and
>>>>>> sequence
>>>>>> features to simulate molecular cloning techniques, e.g. to cut a
>>>>>> vector
>>>>>> and insert a fragment into it while preserving all the
>>>>>> annotations and
>>>>>> moving the features accordingly.
>>>>>> My main aim was to split features that span deletion/insertion
>>>>>> sites in
>>>>>> a meaningful way, which can not be done with the currently availble
>>>>>> methods.
>>>>>> I have modified Bio::SeqUtils so that I have the following new
>>>>>> methods:
>>>>>>
>>>>>> delete
>>>>>> ======
>>>>>> removes a segment from a sequence object and adjusts positions
>>>>>> and types
>>>>>> of locations of sequence features:
>>>>>> - locations of features that span the deletion sites are turned into
>>>>>> Splits.
>>>>>> - locations that extend into the deleted region are turned to
>>>>>> Fuzzy to
>>>>>> indicate that their true start/end was lost.
>>>>>> - locations contained inside the deleted regions are lost.
>>>>>> - other features are shifted according to the length of the
>>>>>> deletion.
>>>>>>
>>>>>> insert
>>>>>> ======
>>>>>> adds a Bio::Seq object into another one between specified insertion
>>>>>> sites. This also affects the features on the recipient sequence:
>>>>>> - locations of features that span the insertion site are split but
>>>>>> position types are not turned to Fuzzy because no part of the
>>>>>> original
>>>>>> feature is lost.
>>>>>> - other features are shifted according to the length of the
>>>>>> insertion.
>>>>>>
>>>>>> ligate
>>>>>> ======
>>>>>> just for convenience. Supply a recipient, a fragment and one or two
>>>>>> sites to cut the recipient. Can also flip the fragment if required.
>>>>>> Simply calls delete [, reverse_complement_with_features] and
>>>>>> insert in
>>>>>> turn.
>>>>>>
>>>>>>
>>>>>> One situation I haven't handled yet is a deletion that spans the
>>>>>> origin
>>>>>> of a circular molecule but that should be a rare thing to do
>>>>>> anyway. The
>>>>>> code currently throws an error if this is attempted.
>>>>>>
>>>>>> I'm happy to contribute the code on Github if there is interest?
>>>>>> Comments on the handling of feature locations highly welcome!
>>>>>>
>>>>>> Frank
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list