[Bioperl-l] order of sublocations

Jason Stajich jason at cgt.duhs.duke.edu
Wed Sep 17 08:17:35 EDT 2003


This alone won't work because of the case when we have remote locations
the sorting will mess up that order.  Besides the order specified does
mean something - we had to go through this with the spliced_seq method to
generate the spliced out sequence.

If we're specifying a gene on the complement it would be incorrect to sort the
exons like you have shown - that puts them in the wrong order.
Perhaps complement(join(<1..66,129..218)) is what VectorNTI wants - but
we've been down that road before in getting the code to output this and
it would require doing something a bit different to distribute the
complementation status to all the subfeatures.

To please VectorNTI it seems to me we need to split CDSes into individual
exons - if this is in regard to SequenceDumping in Gbrowse - I have a
'CDS' track which I use to dump the CDSes as individual features rather
than joined together.


-jason
On Wed, 17 Sep 2003, Marc Logghe wrote:

> Hi,
> I could not find anything back in the 'DDBJ/EMBL/GenBank Feature Table Definition' about this, but apparently applications exist (like e.g. VectorNTI) which are really picky about the order of sublocations in split locations.
>  CDS             join(complement(649..>1045),complement(129..218),
>                      complement(<1..66))
> for instance, is not accepted by vectorNTI. The following is OK:
>      CDS             join(complement(<1..66),complement(129..218),
>                      complement(649..>1045))
> I expect this is more a problem of vectorNTI, rather than BioPerl ;-) but anyhow, this can easily be fixed by sorting the sublocations first:
> in Bio::Location::Split
> sub to_FTstring  {
>     my ($self) = @_;
>     my @strs;
>     foreach my $loc ( sort { $a->start <=> $b->start } $self->sub_Location() ) {
> #                            ~~~~~~~~~~~~~~~~~~~~~~~
>         my $str = $loc->to_FTstring();
>         # we only append the remote seq_id if it hasn't been done already
>         # by the sub-location (which it should if it knows it's remote)
>         # (and of course only if it's necessary)
>         if( (! $loc->is_remote) &&
>             defined($self->seq_id) && defined($loc->seq_id) &&
>             ($loc->seq_id ne $self->seq_id) ) {
>             $str = sprintf("%s:%s", $loc->seq_id, $str);
>         }
>         push @strs, $str;
>     }
>
>     my $str = sprintf("%s(%s)",lc $self->splittype, join(",", @strs));
>     return $str;
> };
>
> Cheers,
> Marc
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list