[Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning
Roy Chaudhuri
roy.chaudhuri at gmail.com
Thu Jan 19 15:15:07 UTC 2012
I'm not sure I understand the problem with "<1", < means "less than" not
"less than or equal to", so it does not imply that the feature could
start at position 1.
I can see that there would be cases where negative coordinates might be
useful, but I think it is opening a can of worms and could introduce
many subtle bugs, so I'd vote for throwing an error. If you were to do
it, it would be better to stick with the biological convention of -1
being the base before 1 (as used for -10 and -35 elements).
Roy.
On 19/01/2012 11:15, Frank Schwach wrote:
> I'd prefer "<1" still but I don't insist on this feature if it is
> controversial. Either way, I think we should either allow handling
> negative positions or explicitly throw an error if a negative
> position is passed in. At the moment, the method will take a negative
> position but then do something unexpected with it as it treats it
> like a positive value. The way I handled the negative position
> assumes that "0" is to the left of "1", as you get when calculating
> relative positions to a 1-based location. I can add something to the
> POD to explain this. I was thinking of using this behaviour to add
> another method to SeqUtils to add seqFeatures with relative
> corodinates that may start/end outside of the Bio::Seq you want to
> annotate. Roy, Chris: what do you prefer?
>
> Frank
>
>
>
>
> Fields, Christopher J wrote:
>> You could probably change "<1" to "<" if the start isn't meant to
>> be defined; the former seems to imply the feature location is '1 or
>> before the start', the latter is simply 'before the start'. Either
>> version should work within bioperl AFAIK, though I'm not sure
>> whether the feature table definition covers '<N' as a location
>> type.
>>
>> Just thinking aloud, but such behavior might be something that
>> needs to be defined more specifically in the Bio::RangeI
>> implementation, via trunc().
>>
>> chris
>>
>> On Jan 18, 2012, at 11:46 AM, Frank Schwach wrote:
>>
>>
>>> a yes, that's true. Still shoud be ok with _coord_adjust because
>>> it trims everything<1 to a "<1" fuzzy location, so whether or not
>>> 0 is a location doesn't matter in this case. But if you want me
>>> to revert this change then I am happy to do that.
>>>
>>> Frank
>>>
>>>
>>> On Wed, 2012-01-18 at 17:24 +0000, Fields, Christopher J wrote:
>>>
>>>>> From Bio::RangeI: "The behaviour of a range is undefined if
>>>>> ranges with negative numbers or zero are used." This is left
>>>>> ambiguous b/c the implementation may define this more
>>>>> specifically, but SeqFeatures AFAIK do not clarify this any
>>>>> more that Bio::RangeI does, so I don't think you can rely on
>>>>> any particular consistent behavior.
>>>>>
>>>> Re: why this is so, the ambiguities pertain to how length,
>>>> contains, overlaps, etc are calculated (these all assume
>>>> positive 1-based coords). For example, since we use 1-based
>>>> coords, do we include a 0 position with negative coordinates,
>>>> or do negative coordinates start at -1? The current interface
>>>> and implementations all adhere to how locations are defined in
>>>> the DDBJ/EMBL/GenBank feature table definition, hence Roy's
>>>> question.
>>>>
>>>> chris
>>>>
>>>> On Jan 18, 2012, at 11:08 AM, Frank Schwach wrote:
>>>>
>>>>
>>>>> Thanks Roy! I have pushed that to master now and it's part of
>>>>> my pending pull request. I just wanted to ask you because
>>>>> this is code you had written and there might be a good reason
>>>>> not to do this. Yes, you can create a SeqFeature object with
>>>>> negative coordinates.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>> On Wed, 2012-01-18 at 16:49 +0000, Roy Chaudhuri wrote:
>>>>>
>>>>>> Ok, if the tests pass and it's useful then it's fine by me
>>>>>> (not that you need my permission, of course). Just out of
>>>>>> interest, do negative feature coordinates work with the
>>>>>> rest of BioPerl? I don't think they are covered in the
>>>>>> DDBJ/EMBL/GenBank feature table definition.
>>>>>>
>>>>>> Roy.
>>>>>>
>>>>>> On 18/01/2012 16:21, Frank Schwach wrote:
>>>>>>
>>>>>>> Hi Roy,
>>>>>>>
>>>>>>> I have a use-case for Bio::SeqUtils truncating a feature
>>>>>>> with negative start location. In my case, this is
>>>>>>> happening because I transform genomic coordinates of a
>>>>>>> feature to the coordinate frame of another feature, so
>>>>>>> feature A that starts 10nt before the reference feature
>>>>>>> now has a start of -10. I want to trim that to a fuzzy
>>>>>>> start = "<1"
>>>>>>>
>>>>>>> Bio::SeqUtils::_coord_adjust can be used to trim feature
>>>>>>> A accordingly but we need to change the regex that
>>>>>>> manipulates the coordinates slightly by adding an
>>>>>>> optional "-":
>>>>>>>
>>>>>>> map s/(\d+)/if ($add+$1<1) {'<1'} elsif (defined $length
>>>>>>> and $add +$1>$length) {">$length"} else {$add+$1}/ge,
>>>>>>> @coords;
>>>>>>>
>>>>>>> becomes
>>>>>>>
>>>>>>> map s/(-?\d+)/if ($add+$1<1) {'<1'} elsif (defined
>>>>>>> $length and $add +$1>$length) {">$length"} else
>>>>>>> {$add+$1}/ge, @coords;
>>>>>>>
>>>>>>> It doesn't change anything else as far as I can tell and
>>>>>>> adds some more flexibility to the method. If you are ok
>>>>>>> with it I can push this to my queued pull request for
>>>>>>> Bio::SeqUtils.
>>>>>>>
>>>>>>> Frank
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 2012-01-12 at 14:13 +0000, Frank Schwach wrote:
>>>>>>>
>>>>>>>> I have now created a version that gives the option to
>>>>>>>> create the products of 'delete' and 'insert' via
>>>>>>>> Bio::Root::Root:clone instead of calling 'new' on the
>>>>>>>> input seq object class. Seems to be working fine for me
>>>>>>>> so far.
>>>>>>>>
>>>>>>>> 'delete' and 'insert' can now take a hashref of
>>>>>>>> options. The only option so far is to set 'clone_obj to
>>>>>>>> true, to use cloning instead of creating objects via
>>>>>>>> 'new'. Setting this parameter to false or not supplying
>>>>>>>> the options hashref at all will give you the old
>>>>>>>> behaviour (call 'new'). Example:
>>>>>>>>
>>>>>>>> my $product = Bio::SeqUtils->delete( $seq_obj, 11, 20,
>>>>>>>> { clone_obj => 1} );
>>>>>>>>
>>>>>>>> The ligate method takes clone_obj as a named
>>>>>>>> parameter:
>>>>>>>>
>>>>>>>> my $new_molecule = Bio::Sequtils::Pbrtools->ligate(
>>>>>>>> -recipient => $vector, -fragment => $fragment,
>>>>>>>> -left => 1000, -right => 1100, -flip => 1,
>>>>>>>> -clone_obj => 1 );
>>>>>>>>
>>>>>>>> This is in a branch of my GitHub repo if you would like
>>>>>>>> to have a look:
>>>>>>>>
>>>>>>>> https://github.com/fschwach/bioperl-live/tree/sequtils_clone
>>>>>>>>
>>>>>>>>
>>>>>>>>
Unfortunately, I can't add this option to trunc_with_features because
>>>>>>>> the creation of the new object is delegated to 'trunc'.
>>>>>>>> I guess I could implement 'trunc' in Bio::SeqUtils
>>>>>>>> itself(?)
>>>>>>>>
>>>>>>>> What do you think, could this be merged into
>>>>>>>> bioperl-live?
>>>>>>>>
>>>>>>>>
>>>>>>>> Frank
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 2012-01-11 at 21:03 +0000, Frank Schwach
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Great, I'll work on a branch that gives the user the
>>>>>>>>> option to use clone instead of new and then we can
>>>>>>>>> see if we want to use that in the end. In the
>>>>>>>>> meantime, what do you think about pulling this into
>>>>>>>>> bioperl-live? When I have some time again I can work
>>>>>>>>> on the HOWTO for these new features for the BioPerl
>>>>>>>>> wiki
>>>>>>>>>
>>>>>>>>> Frank
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/01/12 18:42, Fields, Christopher J wrote:
>>>>>>>>>
>>>>>>>>>> Note that Bio::Root::Root now has a clone() method
>>>>>>>>>> that one can take advantage of for this purpose; if
>>>>>>>>>> Storable or Clone is available, it will pick one of
>>>>>>>>>> the two, preferably Clone over Storable. It's
>>>>>>>>>> fairly untested, but we haven't run into problems
>>>>>>>>>> with it yet (I think it was in the last CPAN
>>>>>>>>>> release).
>>>>>>>>>>
>>>>>>>>>> chris
>>>>>>>>>>
>>>>>>>>>> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hi Frank,
>>>>>>>>>>>
>>>>>>>>>>> Looks great, I like the use of between locations,
>>>>>>>>>>> didn't think of that.
>>>>>>>>>>>
>>>>>>>>>>> It was suggested that I avoid using Clone for
>>>>>>>>>>> cat, trunc_with_features etc. to avoid adding a
>>>>>>>>>>> dependency (which may no longer be an issue) and
>>>>>>>>>>> because it would cause problems for Bio::Seq
>>>>>>>>>>> implementations that use a database as the
>>>>>>>>>>> back-end. Maybe you could add it as an option,
>>>>>>>>>>> but keep the default as is?
>>>>>>>>>>>
>>>>>>>>>>> Cheers, Roy.
>>>>>>>>>>>
>>>>>>>>>>> On 11/01/2012 18:16, Frank Schwach wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Roy and Chris,
>>>>>>>>>>>>
>>>>>>>>>>>> I have made the changes to the code now. As you
>>>>>>>>>>>> suggested, feature ends no longer change type
>>>>>>>>>>>> and I insert a note instead to inform about
>>>>>>>>>>>> the deletion (or insertion), showing the length
>>>>>>>>>>>> and position. I have also added a feature to
>>>>>>>>>>>> annotate deletion sites themselves (with
>>>>>>>>>>>> IN-BETWEEN locations).
>>>>>>>>>>>>
>>>>>>>>>>>> Roy's test script now prints:
>>>>>>>>>>>>
>>>>>>>>>>>> LOCUS seq-accession_number 7
>>>>>>>>>>>> bp dna linear UNK ACCESSION unknown
>>>>>>>>>>>> FEATURES Location/Qualifiers CDS
>>>>>>>>>>>> join(2..3,4..6) /note="3bp internal deletion
>>>>>>>>>>>> between pos 3 and 4" CDS 2..3
>>>>>>>>>>>> /note="2bp deleted from feature end"
>>>>>>>>>>>> misc_feature 3^4 /note="deletion of 3bp"
>>>>>>>>>>>> ORIGIN 1 aaaaaaa //
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> or, if you add strand information (-1 in this
>>>>>>>>>>>> case) to the second feature:
>>>>>>>>>>>>
>>>>>>>>>>>> LOCUS seq-accession_number 7
>>>>>>>>>>>> bp dna linear UNK ACCESSION unknown
>>>>>>>>>>>> FEATURES Location/Qualifiers CDS
>>>>>>>>>>>> join(2..3,4..6) /note="3bp internal deletion
>>>>>>>>>>>> between pos 3 and 4" CDS
>>>>>>>>>>>> complement(2..3) /note="2bp deleted from
>>>>>>>>>>>> feature 5' end" misc_feature 3^4
>>>>>>>>>>>> /note="deletion of 3bp" ORIGIN 1 aaaaaaa //
>>>>>>>>>>>>
>>>>>>>>>>>> I have comitted this along with some bugfixes
>>>>>>>>>>>> to my master branch on GitHub
>>>>>>>>>>>> https://github.com/fschwach/bioperl-live so
>>>>>>>>>>>> it's now also in my existing pull request.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm still wondering if cloning the sequence
>>>>>>>>>>>> objects rather than calling 'new' on their
>>>>>>>>>>>> respective classes would be an option inside
>>>>>>>>>>>> 'delete' and 'insert'? I'm experimenting with
>>>>>>>>>>>> this for my own purposes because I have to
>>>>>>>>>>>> work with custom sub-classes of Bio::Seq which
>>>>>>>>>>>> have additional attributes and therefore set
>>>>>>>>>>>> 'can_call_new' to false. Without cloning the
>>>>>>>>>>>> objects, I first have to convert the custom
>>>>>>>>>>>> Bio::Seq::Foo objects to standard Bio::Seq,
>>>>>>>>>>>> which I would like to avoid. Is there any
>>>>>>>>>>>> reason why something like Clone::Fast should
>>>>>>>>>>>> not be used in this case? It seems to work for
>>>>>>>>>>>> me but there may be situations where this is
>>>>>>>>>>>> going to blow up which I am not aware of.
>>>>>>>>>>>> Cloning rather than calling new could be made
>>>>>>>>>>>> an option in Bio::SeqUtils. I have most of the
>>>>>>>>>>>> code for that already.
>>>>>>>>>>>>
>>>>>>>>>>>> Frank
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/01/12 17:31, Roy Chaudhuri wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Or without the typo:
>>>>>>>>>>>>>
>>>>>>>>>>>>> CDS join(2..3,4..6) /note="3 bp
>>>>>>>>>>>>> internal deletion" CDS 2..3
>>>>>>>>>>>>> /note="2 bp deleted from 3' end"
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think it's me that didn't explain very
>>>>>>>>>>>>>> well - I was talking about overlapping
>>>>>>>>>>>>>> (rather than spanning) a deletion, although
>>>>>>>>>>>>>> I think the same principle applies to the
>>>>>>>>>>>>>> spanning example you gave. Here's some
>>>>>>>>>>>>>> test code:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #!/usr/bin/perl use warnings
>>>>>>>>>>>>>> FATAL=>qw(all); use strict; use Bio::Seq;
>>>>>>>>>>>>>> use Bio::SeqIO; use Bio::SeqUtils; use
>>>>>>>>>>>>>> Bio::SeqFeature::Generic; my
>>>>>>>>>>>>>> $seq=Bio::Seq->new(-id=>'seq',
>>>>>>>>>>>>>> -seq=>'AAAAAAAAAA');
>>>>>>>>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
-start=>2,
>>>>>>>>>>>>>> -end=>9));
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS',
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
-start=>2,
>>>>>>>>>>>>>> -end=>5)); my
>>>>>>>>>>>>>> $out=Bio::SeqIO->newFh(-format=>'genbank');
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
my $trunc=Bio::SeqUtils->delete($seq, 4, 6);
>>>>>>>>>>>>>> print $out $trunc;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This currently outputs: LOCUS
>>>>>>>>>>>>>> seq-accession_number 7 bp dna
>>>>>>>>>>>>>> linear UNK ACCESSION unknown FEATURES
>>>>>>>>>>>>>> Location/Qualifiers CDS
>>>>>>>>>>>>>> join(2..>3,<4..6) CDS 2..>3
>>>>>>>>>>>>>> ORIGIN 1 aaaaaaa //
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> However, I was suggesting that the feature
>>>>>>>>>>>>>> table should be something like: CDS
>>>>>>>>>>>>>> join(2..3,4..6) /note="3 bp internal
>>>>>>>>>>>>>> deletion" CDS join(2..3)
>>>>>>>>>>>>>> /note="2 bp deleted from 3' end"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fuzzy locations are intended to represent
>>>>>>>>>>>>>> features which have boundaries spanning
>>>>>>>>>>>>>> outside of the sequence. For a defined
>>>>>>>>>>>>>> deletion that's not the case, the
>>>>>>>>>>>>>> boundaries of the feature aren't unknown,
>>>>>>>>>>>>>> they have been specifically altered.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hope this is clearer. Cheers, Roy.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10/01/2012 16:47, Frank Schwach wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Roy,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sorry, I hadn't explained that very well:
>>>>>>>>>>>>>>> it's not the outer boundaries of the
>>>>>>>>>>>>>>> feature that become fuzzy but the "inner"
>>>>>>>>>>>>>>> ones of the split locations:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -------------------- a
>>>>>>>>>>>>>>> feature's location
>>>>>>>>>>>>>>> ==========xxxx================= sequence
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------- sublocation
>>>>>>>>>>>>>>> 1 -------- sublocation 2
>>>>>>>>>>>>>>> ===============================
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> x= sequence to delete The feature's
>>>>>>>>>>>>>>> location has changed from Simple to
>>>>>>>>>>>>>>> Split.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sublocation 1: start is still EXACT and
>>>>>>>>>>>>>>> has not changed end is now AFTER because
>>>>>>>>>>>>>>> this is not a true end of the feature
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sublocation 2: start is BEFORE end is
>>>>>>>>>>>>>>> EXACT (but shifted)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I hope this makes more sense(?)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Frank
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy
>>>>>>>>>>>>>>> Chaudhuri wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Frank,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Looks good to me. One thing I'm not
>>>>>>>>>>>>>>>> sure about - why do features
>>>>>>>>>>>>>>>> overlapping a deletion become fuzzy?
>>>>>>>>>>>>>>>> That behaviour is in
>>>>>>>>>>>>>>>> trunc_with_features because it's
>>>>>>>>>>>>>>>> intended to represent a taking a
>>>>>>>>>>>>>>>> subregion of a larger sequence, but if
>>>>>>>>>>>>>>>> you're representing an internal
>>>>>>>>>>>>>>>> deletion then the boundaries of the
>>>>>>>>>>>>>>>> overlapping feature aren't unknown,
>>>>>>>>>>>>>>>> they have been specifically altered.
>>>>>>>>>>>>>>>> Maybe you could give absolute
>>>>>>>>>>>>>>>> coordinates, but add a note indicating
>>>>>>>>>>>>>>>> that the 5' or 3' end has been
>>>>>>>>>>>>>>>> truncated by however many bases.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers, Roy.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 10/01/2012 13:10, Frank Schwach
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Chris,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have made the changes in a Git fork
>>>>>>>>>>>>>>>>> and made the pull request now. If
>>>>>>>>>>>>>>>>> this is accepted into BioPerl I can
>>>>>>>>>>>>>>>>> also write a little SeqUtils HOWTO
>>>>>>>>>>>>>>>>> for the BioPerl wiki.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Frank
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000,
>>>>>>>>>>>>>>>>> Fields, Christopher J wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sounds very promising! The easiest
>>>>>>>>>>>>>>>>>> way to contribute is via a fork of
>>>>>>>>>>>>>>>>>> the code on Github with a pull
>>>>>>>>>>>>>>>>>> request (as you already know, being
>>>>>>>>>>>>>>>>>> a contributor to the Primer3
>>>>>>>>>>>>>>>>>> modules).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> chris
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank
>>>>>>>>>>>>>>>>>> Schwach wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I needed to manipulate Bio::Seq
>>>>>>>>>>>>>>>>>>> objects with annotations and
>>>>>>>>>>>>>>>>>>> sequence features to simulate
>>>>>>>>>>>>>>>>>>> molecular cloning techniques,
>>>>>>>>>>>>>>>>>>> e.g. to cut a vector and insert a
>>>>>>>>>>>>>>>>>>> fragment into it while preserving
>>>>>>>>>>>>>>>>>>> all the annotations and moving
>>>>>>>>>>>>>>>>>>> the features accordingly. My main
>>>>>>>>>>>>>>>>>>> aim was to split features that
>>>>>>>>>>>>>>>>>>> span deletion/insertion sites in
>>>>>>>>>>>>>>>>>>> a meaningful way, which can not
>>>>>>>>>>>>>>>>>>> be done with the currently
>>>>>>>>>>>>>>>>>>> availble methods. I have modified
>>>>>>>>>>>>>>>>>>> Bio::SeqUtils so that I have the
>>>>>>>>>>>>>>>>>>> following new methods:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> delete ====== removes a segment
>>>>>>>>>>>>>>>>>>> from a sequence object and
>>>>>>>>>>>>>>>>>>> adjusts positions and types of
>>>>>>>>>>>>>>>>>>> locations of sequence features: -
>>>>>>>>>>>>>>>>>>> locations of features that span
>>>>>>>>>>>>>>>>>>> the deletion sites are turned
>>>>>>>>>>>>>>>>>>> into Splits. - locations that
>>>>>>>>>>>>>>>>>>> extend into the deleted region
>>>>>>>>>>>>>>>>>>> are turned to Fuzzy to indicate
>>>>>>>>>>>>>>>>>>> that their true start/end was
>>>>>>>>>>>>>>>>>>> lost. - locations contained
>>>>>>>>>>>>>>>>>>> inside the deleted regions are
>>>>>>>>>>>>>>>>>>> lost. - other features are
>>>>>>>>>>>>>>>>>>> shifted according to the length
>>>>>>>>>>>>>>>>>>> of the deletion.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> insert ====== adds a Bio::Seq
>>>>>>>>>>>>>>>>>>> object into another one between
>>>>>>>>>>>>>>>>>>> specified insertion sites. This
>>>>>>>>>>>>>>>>>>> also affects the features on the
>>>>>>>>>>>>>>>>>>> recipient sequence: - locations
>>>>>>>>>>>>>>>>>>> of features that span the
>>>>>>>>>>>>>>>>>>> insertion site are split but
>>>>>>>>>>>>>>>>>>> position types are not turned to
>>>>>>>>>>>>>>>>>>> Fuzzy because no part of the
>>>>>>>>>>>>>>>>>>> original feature is lost. - other
>>>>>>>>>>>>>>>>>>> features are shifted according to
>>>>>>>>>>>>>>>>>>> the length of the insertion.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ligate ====== just for
>>>>>>>>>>>>>>>>>>> convenience. Supply a recipient,
>>>>>>>>>>>>>>>>>>> a fragment and one or two sites
>>>>>>>>>>>>>>>>>>> to cut the recipient. Can also
>>>>>>>>>>>>>>>>>>> flip the fragment if required.
>>>>>>>>>>>>>>>>>>> Simply calls delete [,
>>>>>>>>>>>>>>>>>>> reverse_complement_with_features]
>>>>>>>>>>>>>>>>>>> and insert in turn.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> One situation I haven't handled
>>>>>>>>>>>>>>>>>>> yet is a deletion that spans the
>>>>>>>>>>>>>>>>>>> origin of a circular molecule but
>>>>>>>>>>>>>>>>>>> that should be a rare thing to
>>>>>>>>>>>>>>>>>>> do anyway. The code currently
>>>>>>>>>>>>>>>>>>> throws an error if this is
>>>>>>>>>>>>>>>>>>> attempted.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'm happy to contribute the code
>>>>>>>>>>>>>>>>>>> on Github if there is interest?
>>>>>>>>>>>>>>>>>>> Comments on the handling of
>>>>>>>>>>>>>>>>>>> feature locations highly
>>>>>>>>>>>>>>>>>>> welcome!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Frank
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> -- The Wellcome Trust Sanger Institute is operated by Genome
>>>>> Research Limited, a charity registered in England with number
>>>>> 1021457 and a company registered in England with number
>>>>> 2742969, whose registered office is 215 Euston Road, London,
>>>>> NW1 2BE. _______________________________________________
>>>>> Bioperl-l mailing list Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>
>>> -- The Wellcome Trust Sanger Institute is operated by Genome
>>> Research Limited, a charity registered in England with number
>>> 1021457 and a company registered in England with number 2742969,
>>> whose registered office is 215 Euston Road, London, NW1 2BE.
>>>
>>
>>
>
>
>
More information about the Bioperl-l
mailing list