[Biojava-l] How handle genbank location like AC1234:790078..790119

Thomas Down td2@sanger.ac.uk
Fri, 18 Oct 2002 14:41:17 +0100


On Fri, Oct 18, 2002 at 02:45:33PM +0200, Stephane Marcel wrote:
> Hi,
> 
> I am a new user of biojava and a need help for a genBank parser.
> My problem is when I try to get the location of my feature. If the location
> is : join(790078..790119,791193..791337)
> no problem... I can get the min, the max and both blocks.
> 
> But, if the location is : join(AC1234:790078..790119,791193..791337) the
> first block describes a clone.
> In this case, the first block is simply ignored and only the second block is
> returned.

This feature should be created in BioJava as a RemoteFeature.
You can call getRegions() on this and inspect the returned list
to discover portions of the feature which are attached to different
sequences.

> There is similar problems with locations like (123.567) or 123^567
> 
> It seems there is a class EmblLikeLocationParser which manages this kind of
> location, but there is any public constructor and I do not know how handle
> with it.

EmblLikeLocationParser is used internally by the EMBL and Genbank file
parsers.  It should be creating FuzzyLocations to represent
these `awkward' cases.

Let me know if you have any trouble with these,

     Thomas.