[Bioperl-l] Bio::RangeI intersection and Bio::DB::GFF
Brian Osborne
osborne1 at optonline.net
Tue May 2 21:49:49 UTC 2006
Marco,
Odd, because the intersection() code is quite simple and it's clear how it
should behave. What version of Bioperl are you using? I'm looking at the
latest, in bioperl-live...
Brian O.
On 5/2/06 4:32 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
> Brian--
>
> Even when both elements of intersection() are from the negative strand, the
> return object is from the positive strand and $overlap is actually the
> revervese complement of the intersection between the 2 exons. Here is part
> of the output from the script below:
>
> ===
> ex1 Strand: -1
> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTG
> ex2 Strand: -1
> CTTTTTTCCACATACGTCGTCAACGTGATTCGACCTTTTCCGGTTTATTAGTTGAACATGGCAGTCGGCAAAAATA
> AAGGTCTTTCCAAGGGTGGTAAGAAGGGCGG
> TAAGAAGAAGGTGGTGGACCCGTTTTCTCGCAAGGACTGGTACGATGTCAAAGCTCCGAATATGTTTCAAACCCGT
> CAAATCG
> overlap Strand: 1
> CAGTCCTTGCGAGAAAACGGGTCCACCACCTTCTTCTTACCGCCCTTCTTACCACCCTTGGAAAGACCTTTATTTT
> TGCCGACTGCCATGTTCAACTAATAAACCGG
> AAAAGGTCGAATCACGTTGACGACGTATGTGGAAAAAAG
> ...
>
> If both are from the positive strand, the return object is positive as in:
>
> ===
> ex1 Strand: 1
> CAACGCAGACGTGGTACGGCGTTTTAAATCTGATAACATTTTGAACCGGGAATTATTTTAGAGTACCATTCTTTGT
> TTTGTGCCTGTTTCAGTATAAATTAATTATG
> CGCCTGATTTAAAGTACAAAATGTGTAAATATATCACCTTACCGTCGCGGGTGCACCCAATTGTGCTTTGATGAAT
> AAATATACATATATGCAACATATATAACTTC
> CTGTGTTAGTATAAGTGTATGTCAGCCAAAAACAAATATATATATGAGTGTTTATCGGCATTCGTGTGCTGGCAGA
> GCAGCGATCAAAGCTGCGTTCGGTACTCGTT
> GACTGGCCCAAGAATGAATTCTCGTGCAAGTGTGTTGATAAAAAGTATACGTATGTAT
> ex2 Strand: 1
> ATCGACAGTTGCCATCGTCGTTATTCCAGCACTAATTTAAAAAAAATTCGATCAACGCAGACGTG
> overlap Strand: 1
> CAACGCAGACGTG
>
> Is there something I am missing? Here is the script generating the output
>
> Many thanks all...
>
> Marco
>
>
> use strict;
> use warnings;
> use Bio::DB::GFF;
>
> MAIN:{
>
> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
> -dsn =>
> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
> -user => 'guest');
> my $test_db = $db->segment('4');
>
> # Load up the exons into $exons_p
> for my $gene ($test_db->features(-types => 'gene')){
>
> my $exons_p = extractExons($gene);
>
> cluster($exons_p) unless ($#{$exons_p} == -1);
>
> }
> }
>
> sub extractExons {
> my $gene = shift;
> my %ex_list;
> my @tcs = $gene->features( -type =>'processed_transcript',
> -attributes =>{Gene => $gene->group});
>
> for my $tc (@tcs){
> my @exons = $tc->features (-type => 'exon',
> -attributes => {Parent => $tc->group}
> );
>
> for (@exons){
> my $ex_id = $_->id;
> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>
> }
>
> }
> my @values = values %ex_list;
> return(\@values);
> }
>
> sub cluster {
> my $exons_p = shift;
>
> for (my $s = 0; $s <= $#{$exons_p}; $s++){
> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
> my $exon1 = $exons_p->[$s];
> my $exon2 = $exons_p->[$t];
>
> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>
> my $overlap = $exon1->intersection($exon2);
>
> print "===\n";;
> print "ex1\tStrand: ", $exon1->strand, "\n",
> $exon1->seq, "\n";
> print "ex2\tStrand: ", $exon2->strand, "\n",
> $exon2->seq, "\n";
> print "overlap\tStrand: ", $overlap->strand, "\n",
> $overlap->seq, "\n";
> }
> }
> }
> }
>
> On 5/2/06 13:17, "Brian Osborne" <osborne1 at optonline.net> wrote:
>
>> Marco,
>>
>> Yes, this is how intersection() is supposed to work. If both of the Range
>> objects have the same strand then the strand information is returned as part
>> of the result but if they aren't on the same strand then no strand
>> information is returned.
>>
>> Brian O.
>>
>>
>> On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche at berkeley.edu> wrote:
>>
>>> Dear all--
>>>
>>> I have been trying to use the intersection function to extract overlapping
>>> region from alternatively spliced exons as in the following script. The
>>> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
>>> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
>>> Is this behavior expected? Should I check the strand of $exon1 before
>>> working on the object return by any Bio::RangeI function?
>>>
>>> Many thanks
>>>
>>> #!/usr/bin/perl
>>> use strict;
>>> use warnings;
>>> use Bio::DB::GFF;
>>>
>>> MAIN:{
>>>
>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>> -dsn =>
>>> 'dbi:mysql:database=dmel_43_LS;host=riolab.net',
>>> -user => 'guest');
>>> my $test_db = $db->segment('4');
>>>
>>> # Load up the exons into $exons_p
>>> for my $gene ($test_db->features(-types => 'gene')){
>>>
>>> my $exons_p = extractExons($gene);
>>>
>>> cluster($exons_p) unless ($#{$exons_p} == -1);
>>>
>>> }
>>> }
>>>
>>> sub extractExons {
>>> my $gene = shift;
>>> my %ex_list;
>>> my @tcs = $gene->features( -type =>'processed_transcript',
>>> -attributes =>{Gene => $gene->group});
>>>
>>> for my $tc (@tcs){
>>> my @exons = $tc->features (-type => 'exon',
>>> -attributes => {Parent => $tc->group}
>>> );
>>>
>>> for (@exons){
>>> my $ex_id = $_->id;
>>> $ex_list{$ex_id} = $_ unless (exists $ex_list{$ex_id});
>>>
>>> }
>>>
>>> }
>>> my @values = values %ex_list;
>>> return(\@values);
>>> }
>>>
>>> sub cluster {
>>> my $exons_p = shift;
>>>
>>> for (my $s = 0; $s <= $#{$exons_p}; $s++){
>>> for (my $t = $s+1; $t <= $#{$exons_p}; $t++){
>>> my $exon1 = $exons_p->[$s];
>>> my $exon2 = $exons_p->[$t];
>>>
>>> if (!($exon1->equals($exon2)) && $exon1->overlaps($exon2)){
>>>
>>> my $overlap = $exon1->intersection($exon2);
>>>
>>> print "===\n";;
>>> print "ex1\n", $exon1->seq, "\n";
>>> print "ex2\n", $exon2->seq, "\n";
>>> print "overlap\n", $overlap->seq, "\n";
>>> }
>>> }
>>> }
>>> }
>>> ______________________________
>>> Marco Blanchette, Ph.D.
>>>
>>> mblanche at uclink.berkeley.edu
>>>
>>> Donald C. Rio's lab
>>> Department of Molecular and Cell Biology
>>> 16 Barker Hall
>>> University of California
>>> Berkeley, CA 94720-3204
>>>
>>> Tel: (510) 642-1084
>>> Cell: (510) 847-0996
>>> Fax: (510) 642-6062
>>
>>
>
> ______________________________
> Marco Blanchette, Ph.D.
>
> mblanche at uclink.berkeley.edu
>
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
>
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062
More information about the Bioperl-l
mailing list