[Bioperl-l] problem to fit genomic coordinates

Thu Mar 26 20:28:36 UTC 2009

I'm _not_ doing your homework, but I did something similar with hashes. 
You'd probably look at the loops in the code and expect it to be slow but it was faster and simpler than any other solution I could come up with.

I was indexing the position of 4.7 million repeats in a genome then finding (about 1 million) SSRs within 50bp of them. I was reading from gff files but you could split your own data to do get the required fields. Instead of using $OFFSET when searching, use your start and end coords.
(this is not all the code and it probably won't run as I've chopped bits out but you get the general idea...)

##load the repeats from a gff file into a hash/array
##please excuse my poorly named variables :-)
open(RM,"repeats.gff") or die $!;
while(<RM>){
	chomp;
	my($chr_rm,undef,undef,$start_rm,$end_rm) = split(/\t/,$_);

	# MULTIPLE REPEATS CAN HAVE THE SAME CHR AND START OR END POSITION
	# SO INDEXING LIKE THIS:  {BTA1}{12345}(repeat1,repeat2,repeat3)
	push(@{$rmarray_starts{$chr_rm}{$start_rm}},$_);
}
close RM;

#read the SSRs and find their positions in the repeat hash
open(SSR,"ssrs.gff") or die $!;
while(<SSR>){
	chomp;
	my($chr_ssr,undef,undef,$start_ssr,$end_ssr) = split(/\t/,$_);

	 for(my$i = $start_ssr - $OFFSET; $i < $start_ssr; $i++){
		 if(defined $rmarray_starts{$chr_ssr}{$i}){
		 	foreach my$s (@{$rmarray_starts{$chr_ssr}{$i}}){
				#do something with the hit
				print $s;
			}
		}
	}

}

Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Laurent MANCHON
> Sent: Friday, 27 March 2009 4:30 a.m.
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] problem to fit genomic coordinates
> 
> okay, you are right,
> but i think in my opinion that my question is a good question about
> parsing enormous
> range of intervals.
> The problem is not perl, bioperl, or other language, it's just an
> algorithmic question.
> I'm not a professionnal in Bioperl and i don't know what is possible to
> do with all the Bioperl modules.
> So if you think it's possible to resolve my question with Bioperl maybe
> you are right, but in my position i stay in the same point.
> If you want i send you the two files needed in my question. And if you
> are agree try to use Bioperl to resolve it. Maybe it's not possible because
> files are big, i don't know.
> 
> 
> 
> Chris Fields a écrit :
> > (edited for those with sensitive eyes)
> >
> > Laurent,
> >
> > Please keep all responses, no matter how puerile, on the mail list ;>
> >
> > We're trying to point out the blatantly obvious: this isn't the place
> > for your question.  Sorry if that irritates you.  And, to reiterate,
> > don't be surprised if you get some nasty responses.
> >
> > chris
> >
> > (hoping this isn't one of the GSoC students, as he's introducing
> > Laurent to his spam filter)
> >
> > On Mar 26, 2009, at 10:04 AM, Laurent MANCHON wrote:
> >
> >> p*** off
> >>
> >> Chris Fields a écrit :
> >>>
> >>> On Mar 26, 2009, at 8:36 AM, Laurent MANCHON wrote:
> >>>
> >>>> Chris Fields a écrit :
> >>>>>
> >>>>> On Mar 26, 2009, at 3:31 AM, Laurent MANCHON wrote:
> >>>>>
> >>>>>> yes but this is a school problem that my teacher ask us to
> >>>>>> resolve without using Bioperl modules !
> >>>>>
> >>>>> I didn't bother reading beyond that sentence.  Not to state the
> >>>>> absolute obvious here, but:
> >>>>>
> >>>>> 1) you are posting to the bioperl list for a non-bioperl-related
> >>>>> question, and
> >>>> genomic coordinates are not questions about biology ?
> >>>> i'm speaking about GENOME, and not GEOGRAPHY
> >>>
> >>> And this is a mail list for BioPerl (the toolkit), not perl and
> >>> biology.  We will sometimes answer questions along these lines if
> >>> they are relevant, but apparently our answers (all notably
> >>> BioPerl-related, mind you) were tossed to the side and you asked for
> >>> more.
> >>>
> >>> I suppose you at least showed some honesty and revealed exactly why
> >>> you needed this answered, but again, don't be surprised if you get a
> >>> nasty response and no answers.  You won't get any from me.
> >>>
> >>>>> 2) you are committing one of the biggest no-no's for a list,
> >>>>> asking us to help you with your homework.
> >>>> in bioperl you have BIO, okay but too you have PERL !
> >>>
> >>> Interesting how you skirted that last question.  We won't do your
> >>> homework for you.  Sorry.
> >>>
> >>> chris
> >>>
> >>>
> >>
> >>
> >
> >
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================