[Biojava-l] Parsing circular sequences

Keith James kdj@sanger.ac.uk
11 Nov 2002 18:40:22 +0000


>>>>> "Greg" == Cox, Greg <gcox@cle.lionbioscience.com> writes:

    Greg> I'm taking a look at a circular genbank sequence of length n
    Greg> with a location n^1 on it.  I think that what has to happen
    Greg> is the size of the sequence and if it is circular have to be
    Greg> passed down to EmblLikeLocationParser, which will check each
    Greg> location and convert x..y on the sequence to a
    Greg> CircularLocation around RangedLocation x..y+n.

    Greg> 	This approach involves changing a lot of method
    Greg> signatures, since there are a lot of layers between the
    Greg> formatter and the location parser.  That's something I'd
    Greg> like to avoid, but I've convinced myself it has to be done
    Greg> this way since at the least the location parser needs the
    Greg> length of the circular sequence.  I'd like a sanity check if
    Greg> someone has a better idea of how to work through this.

I'm wondering whether for circular sequences you can't create some
sort of proxy location for features which include the origin which get
resolved later?

Given that in a Genbank stream the actual sequence comes last I assume
you're trusting the length in the LOCUS line? I haven't checked how
reliable this is, but I wonder...

Keith

-- 

- Keith James <kdj@sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -