[Bioperl-l] Extract features from GFF

Barry Moore barry.moore at genetics.utah.edu
Tue Oct 23 04:09:05 UTC 2007


Hang-

Something like this would work, although for simply querying the  
coordinates on GFF like that you could iterate over the file with  
your own while loop, split the columns and do the same thing with  
about as many lines.  You'll need to do each chromosome as a separate  
GFF file or add some code to the loop to check the chromosome  
($feature->seq_id I think).

     use Bio::Tools::GFF;

     my $coord_start = 12345;
     my $coord_end  = 67890;

     # specify input via -fh or -file
     my $gffio = Bio::Tools::GFF->new(-fh => \*STDIN, -gff_version =>  
2);
     my $feature;
     # loop over the input stream
     while($feature = $gffio->next_feature()) {
	 if ($coord_start <= $feature->start && $coord_end >= $feature->end) {
	 	print $feature->gff_string;
	}
     }
     $gffio->close();

Barry

On Oct 22, 2007, at 4:30 PM, Hang wrote:

> Hello,
>
> I have a list of about 100,000 short genomic regions with paired  
> start and end
> coordinations on reference fly genome (R5.3). I also have GFF files  
> from the
> same genome release. I wonder how I can extract all overlapping  
> features from
> these regions.
>
> For example:
>
> region A is on chromosome 2L between 123,456 bp to 123,489 bp. What  
> code should
> I use to extract feature, like gene, CDS etc., that overlaps with  
> this region?
>
> Thank you in advance!
>
> -- Hang
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list