[Biojava-l] particular region of genomic sequence

Matthew Pocock matthew_pocock@yahoo.co.uk
Mon, 22 Apr 2002 11:30:13 +0100


hz5@njit.edu wrote:
> Hi,
> Anybody can give a hint how to use biojava extract a specific region(say: -800
> to +200 relative to transcription startsite) of a gene's genomic sequence from NCBI?
> 
> I wrote java program to do this myself, but I am not if my parsing scheme and
> retrieving scheme are efficient and accurate.
> 
> Thanks!
> 

Morning,

If you have a genbank file with this region (both the tss and -800 - 
+200 relative to that) then you can use SeqIOTools.readGenbank to read 
the file, the filter() method on sequence in combination with an 
instance of FeatureFilter (by location, by type or whatever you need to 
pull out that tss), and then new SubSequence(seq, tssLoc.getMin() - 800, 
tssLoc.getMax() + 200) to cut out that bit of sequence. You may need to 
check the strandedness of the tss and flip the subsequence accordingly.

Matthew