[Bioperl-l] Chromosome coordinates
Scott Cain
scott at scottcain.net
Thu Dec 1 16:59:56 UTC 2011
Hi Jay,
Since the maize GFF file is likely to be fairly large, I would consider
putting it in a database, using either Bio::DB::GFF if it is GFF2 or
Bio::DB::SeqFeature::Store if it is gff3. Then you can use the methods
that come along with either of those modules to search regions for for
genes. They both support a get_features_by_location method, so you could
get the range for each of the regions you want to look at, and check the
database with that method to see if anything is there.
Scott
On Thu, Dec 1, 2011 at 11:38 AM, Boddu, Jayanand <jboddu at illinois.edu>wrote:
> Hello
> I am newbie to Perl scripts.
> I have a file with short reads mapped to the MAIZE genome
> The format is a simple BLASTN output.
> READ_ID
>
> Chr
>
> % Similarity
>
> Alignment
>
> Mismatches
>
> Gaps
>
> READ Start
>
> READ End
>
> Chr Start
>
> Chr End
>
> E Value
>
> Score
>
> READ1
>
> chrPt
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 35021
>
> 35037
>
> 0.21
>
> 34.2
>
> READ1
>
> chr10
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 128587356
>
> 128587372
>
> 0.21
>
> 34.2
>
> READ1
>
> chr6
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 160769803
>
> 160769787
>
> 0.21
>
> 34.2
>
> READ1
>
> chr5
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 172103083
>
> 172103067
>
> 0.21
>
> 34.2
>
> READ1
>
> chr4
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 213173683
>
> 213173699
>
> 0.21
>
> 34.2
>
> READ1
>
> chr3
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 23689132
>
> 23689116
>
> 0.21
>
> 34.2
>
> READ2
>
> chr8
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 161048603
>
> 161048587
>
> 0.21
>
> 34.2
>
> READ2
>
> chr6
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 155768884
>
> 155768868
>
> 0.21
>
> 34.2
>
> READ2
>
> chr5
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 32958812
>
> 32958828
>
> 0.21
>
> 34.2
>
> READ2
>
> chr3
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 212451090
>
> 212451074
>
> 0.21
>
> 34.2
>
> READ2
>
> chr2
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 2046449
>
> 2046465
>
> 0.21
>
> 34.2
>
> READ2
>
> chr1
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 223233801
>
> 223233785
>
> 0.21
>
> 34.2
>
> READ2
>
> chr1
>
> 100
>
> 17
>
> 0
>
> 0
>
> 1
>
> 17
>
> 277573037
>
> 277573021
>
> 0.21
>
> 34.2
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> As expected the same read maps to multiple places on the same/different
> chromosome.
> I have a GFF file with annotated coordinates.
> I would like to run a PERL script to find out READS that are within the
> GENES in the GFF file and that are not.
> The anticipated script should;
>
> 1. Take the READ coordinates on the genome (by chromosome);
>
> 2. Go the GFF file;
>
> 3. Find the Chromosome;
>
> 4. Find the GENE (by coordinates);
>
> 5. and report READ-its coordinates-Chromosome-GENE-and its
> coordinates.
>
> It doesn't need to be in the same order.
> After this, I guess I could use simple Microsoft ACCESS query to pull out
> READS that are not mapped to the GENEs.
> I would greatly appreciate if anyone can has a script that more or less
> similar job.
>
> Thanks
> Jay
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot
net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
More information about the Bioperl-l
mailing list