[Bioperl-l] Chromosome coordinates
Jason Stajich
jason.stajich at gmail.com
Thu Dec 1 17:31:29 UTC 2011
You might try using BEDtools, intersectBED which can do a lot of what you are doing in simple command line program.
Jason
On Dec 1, 2011, at 8:59 AM, Scott Cain wrote:
> Hi Jay,
>
> Since the maize GFF file is likely to be fairly large, I would consider
> putting it in a database, using either Bio::DB::GFF if it is GFF2 or
> Bio::DB::SeqFeature::Store if it is gff3. Then you can use the methods
> that come along with either of those modules to search regions for for
> genes. They both support a get_features_by_location method, so you could
> get the range for each of the regions you want to look at, and check the
> database with that method to see if anything is there.
>
> Scott
>
>
> On Thu, Dec 1, 2011 at 11:38 AM, Boddu, Jayanand <jboddu at illinois.edu>wrote:
>
>> Hello
>> I am newbie to Perl scripts.
>> I have a file with short reads mapped to the MAIZE genome
>> The format is a simple BLASTN output.
>> READ_ID
>>
>> Chr
>>
>> % Similarity
>>
>> Alignment
>>
>> Mismatches
>>
>> Gaps
>>
>> READ Start
>>
>> READ End
>>
>> Chr Start
>>
>> Chr End
>>
>> E Value
>>
>> Score
>>
>> READ1
>>
>> chrPt
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 35021
>>
>> 35037
>>
>> 0.21
>>
>> 34.2
>>
>> READ1
>>
>> chr10
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 128587356
>>
>> 128587372
>>
>> 0.21
>>
>> 34.2
>>
>> READ1
>>
>> chr6
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 160769803
>>
>> 160769787
>>
>> 0.21
>>
>> 34.2
>>
>> READ1
>>
>> chr5
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 172103083
>>
>> 172103067
>>
>> 0.21
>>
>> 34.2
>>
>> READ1
>>
>> chr4
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 213173683
>>
>> 213173699
>>
>> 0.21
>>
>> 34.2
>>
>> READ1
>>
>> chr3
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 23689132
>>
>> 23689116
>>
>> 0.21
>>
>> 34.2
>>
>> READ2
>>
>> chr8
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 161048603
>>
>> 161048587
>>
>> 0.21
>>
>> 34.2
>>
>> READ2
>>
>> chr6
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 155768884
>>
>> 155768868
>>
>> 0.21
>>
>> 34.2
>>
>> READ2
>>
>> chr5
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 32958812
>>
>> 32958828
>>
>> 0.21
>>
>> 34.2
>>
>> READ2
>>
>> chr3
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 212451090
>>
>> 212451074
>>
>> 0.21
>>
>> 34.2
>>
>> READ2
>>
>> chr2
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 2046449
>>
>> 2046465
>>
>> 0.21
>>
>> 34.2
>>
>> READ2
>>
>> chr1
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 223233801
>>
>> 223233785
>>
>> 0.21
>>
>> 34.2
>>
>> READ2
>>
>> chr1
>>
>> 100
>>
>> 17
>>
>> 0
>>
>> 0
>>
>> 1
>>
>> 17
>>
>> 277573037
>>
>> 277573021
>>
>> 0.21
>>
>> 34.2
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> As expected the same read maps to multiple places on the same/different
>> chromosome.
>> I have a GFF file with annotated coordinates.
>> I would like to run a PERL script to find out READS that are within the
>> GENES in the GFF file and that are not.
>> The anticipated script should;
>>
>> 1. Take the READ coordinates on the genome (by chromosome);
>>
>> 2. Go the GFF file;
>>
>> 3. Find the Chromosome;
>>
>> 4. Find the GENE (by coordinates);
>>
>> 5. and report READ-its coordinates-Chromosome-GENE-and its
>> coordinates.
>>
>> It doesn't need to be in the same order.
>> After this, I guess I could use simple Microsoft ACCESS query to pull out
>> READS that are not mapped to the GENEs.
>> I would greatly appreciate if anyone can has a script that more or less
>> similar job.
>>
>> Thanks
>> Jay
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Ontario Institute for Cancer Research
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list