[Bioperl-l] Bio::Location::Split question
Chris Fields
cjfields at uiuc.edu
Mon Sep 18 15:02:32 UTC 2006
This is a general question about how locations are described for split
locations, so whoever has an opinion, please chip in. This is particularly
pertinent to GenBank/EMBL/swiss formats. Okay, stick with me here...
A pretty interesting question was raised while I was working on a bug (bug
1953), which deals with split location data with the following formats:
join(complement(1..100),complement(201..300),complement(401-500))
complement(join(1..100,201..300,401..500))
GenBank acc #AL137247 has examples of both, if you want a real example.
According to BioPerl these are syntactically the same (look at the last few
tests in LocationFactory.t). However, according to GenBank (and the
rationale outlined in bug 1953), these are actually quite different.
Acc. to the GenBank/EMBL/DDBJ feature table definition, the use of the
operator 'join' entails that the segments in the following parentheses are
joined in the order presented ('placed end-to-end'), whereas the use of
'complement' uses the complementary strand of the segment in parentheses.
So, the operator tells one how to treat the sequence data using the
locations shown.
Here are examples from the definition:
...
complement(join(2691..4571,4918..5163))
Joins regions 2691 to 4571 and 4918 to 5163, then
complements the joined segments (the feature is on
the
strand complementary to the presented strand)
join(complement(4918..5163),complement(2691..4571))
Complements regions 4918 to 5163 and 2691 to 4571,
then
joins the complemented segments (the feature is on
the
strand complementary to the presented strand)
...
Using this rational, substituting in letters for clarity and lower case to
indicate the complement strand:
Location #1 : join(complement(A..B),complement(C..D),complement(E..F))
would be:
join(b..a,d..c,f..e)
and the following:
Location # 2: complement(join(A..B,C..D,E..F)
would be:
join(f..e,d..c,b..a)
The current behavior of Bio::Location::Split propogates the strand
information (flips) to the sublocations w/o resorting them. We could sort
them, but wouldn't it be much simpler to not propogate strand changes at
all? Seems we're making it more complicated than it actually is.
Thoughts?
Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list