[Bioperl-l] check for the continous segments to extract thesequences
Cook, Malcolm
MEC at stowers-institute.org
Fri Apr 27 13:52:10 UTC 2007
Gopu/Jason,
Another option is Set::IntSpan, available on CPAN at
http://search.cpan.org/~swmcd/Set-IntSpan-1.11/IntSpan.pm
Here's a perl one-liner that shows you how easy it is:
perl -MSet::IntSpan -e 'my @array = ( 1, 1000, 1001, 2000, 4001, 5000,
5001, 6000, 6001, 7000, 7001, 8000, 12001, 13000); my $is =
Set::IntSpan->new; while (@array) {$is->U(shift(@array) . "-" .
shift(@array))}; print $is;'
1-2000,4001-8000,12001-13000
I use it all the time to great effect and have utility functions that
convert between bioperl split locations and IntSpans.
There is another module which extends it nicely, Set::IntSpan::Island,
worth a gander.
Cheers,
Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> Jason Stajich
> Sent: Thursday, April 26, 2007 8:55 PM
> To: gopu_36
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] check for the continous segments to
> extract thesequences
>
> You want a connectivity algorithm. One can be found on
> perlmonks.org
> as well as in Bio::Search::SearchUtils the method collapse_nums().
> You'll have to modify aspects of it to deal with ranges.
>
> Good luck.
> -jason
> On Apr 26, 2007, at 6:29 PM, gopu_36 wrote:
>
> >
> > As a newbee to programming, thx for the support from this group.
> > Please
> > ignore the message if this message is not relevant to this
> group as my
> > problem may be a typical computer science recursive one! (as I am
> > not aware)
> >
> > I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000,
> > 5001, 6000,
> > 6001, 7000, 7001, 8000, 12001, 13000);
> > The above array gives the posiiton of sequences like '1' shows the
> > start
> > position and the second element '1000' gives the end of the
> > sequence and so
> > on. All the even positions like 0,2,4... shows the starting points
> > of the
> > sequence and odd positions like 1000, 2000, 5000 gives the END
> > positions of
> > the sequences to be retrieved. basically I have to see whwther any
> > continous
> > segments lie in the list and add them together to form a one whole
> > chunk.
> > For example 1-1000 and 1001-2000 can be joined together to extract
> > sequences
> > from 1-2000. In the same way 4001-8000 should be extracted and
> > 12001-13000
> > and so on. As I said earlier, after checking the position, I will
> > be able to
> > extract that part of sequence from a whole genome. Thanks for
> > taking ur
> > time. Any tip or help would be greatly appreciated.
> >
> > Regards
> > Gopu
> > --
> > View this message in context: http://www.nabble.com/check-for-the-
> > continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list