[Bioperl-l] Chromosome coordinates
Boddu, Jayanand
jboddu at illinois.edu
Thu Dec 1 16:38:00 UTC 2011
Hello
I am newbie to Perl scripts.
I have a file with short reads mapped to the MAIZE genome
The format is a simple BLASTN output.
READ_ID
Chr
% Similarity
Alignment
Mismatches
Gaps
READ Start
READ End
Chr Start
Chr End
E Value
Score
READ1
chrPt
100
17
0
0
1
17
35021
35037
0.21
34.2
READ1
chr10
100
17
0
0
1
17
128587356
128587372
0.21
34.2
READ1
chr6
100
17
0
0
1
17
160769803
160769787
0.21
34.2
READ1
chr5
100
17
0
0
1
17
172103083
172103067
0.21
34.2
READ1
chr4
100
17
0
0
1
17
213173683
213173699
0.21
34.2
READ1
chr3
100
17
0
0
1
17
23689132
23689116
0.21
34.2
READ2
chr8
100
17
0
0
1
17
161048603
161048587
0.21
34.2
READ2
chr6
100
17
0
0
1
17
155768884
155768868
0.21
34.2
READ2
chr5
100
17
0
0
1
17
32958812
32958828
0.21
34.2
READ2
chr3
100
17
0
0
1
17
212451090
212451074
0.21
34.2
READ2
chr2
100
17
0
0
1
17
2046449
2046465
0.21
34.2
READ2
chr1
100
17
0
0
1
17
223233801
223233785
0.21
34.2
READ2
chr1
100
17
0
0
1
17
277573037
277573021
0.21
34.2
As expected the same read maps to multiple places on the same/different chromosome.
I have a GFF file with annotated coordinates.
I would like to run a PERL script to find out READS that are within the GENES in the GFF file and that are not.
The anticipated script should;
1. Take the READ coordinates on the genome (by chromosome);
2. Go the GFF file;
3. Find the Chromosome;
4. Find the GENE (by coordinates);
5. and report READ-its coordinates-Chromosome-GENE-and its coordinates.
It doesn't need to be in the same order.
After this, I guess I could use simple Microsoft ACCESS query to pull out READS that are not mapped to the GENEs.
I would greatly appreciate if anyone can has a script that more or less similar job.
Thanks
Jay
More information about the Bioperl-l
mailing list