[Bioperl-l] Getting sequences by base pair locations
Sendu Bala
bix at sendu.me.uk
Fri Jul 28 13:13:44 UTC 2006
Yuval Itan wrote:
> Hello all,
>
> I was BLATing a few hundred human genes against the chimp genome, and
> kept the best chimp hits for every human gene.
> I have the base pair start and end location for every chimp hit, and I
> need to get the sequence for each of these chimp hits. Here is an
> example for a few chimp hits bp locations:
>
> Start End*
> *142854 144504
> 154479 155198
> 153066 167370
> 163146 163559
>
> I have one chimp genome file (about 3GB) including all chromosomes, but
> I could also get one file per chromosome if that would make things
> easier. Does anyone have a script or a link for an interface that can do
> the job?
If your genome file is in some standard format, use SeqIO.
http://www.bioperl.org/wiki/HOWTO:SeqIO
And then get the sequence corresponding to the correct chromosome and
get the desired chunk with subseq();
http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object
You'd also have to make sure that the data used during the blat is
exactly the same data you have in your big file.
More information about the Bioperl-l
mailing list