[Bioperl-l] feature table definitions
Andrew Dalke
Andrew Dalke" <dalke@dalkescientific.com
Mon, 21 Jan 2002 21:36:56 -0700
Hey all,
I'm back to regression testing my parsers for Biopython and making
sure our usage matches Bioperl's. I have a question about it, and
a bug report.
First, the FTHelper location parser toggles the strand if the
word "complement" exists anywhere in the location.
my $strand = ( $fth->loc =~ /complement/ ) ? -1 : 1;
I noticed record "M96344" contains the location
order(112..131,complement(54..73))
which in XEMBL (I assume it uses Bioperl so I'll use that as a way
to show the results of Bioperl parsing) is converted to
<Qualifier value-type="order" value="order(112..131,complement(54..73))" />
<Interval-loc startpos="54" endpos="131" complement="1" startopen="0"
endopen="0" onepos="0" />
I'm having a hard time understanding why this makes sense. The
BSML meaning of 'complement="1"' is that the feature is on the
complement, while in this case it's both. Now, I can mimic the
existing behaviour, I just don't know if it's right. It feels
suspicious, though it could be because BSML isn't powerful enough
to handle this case.
To make sure I understand things, if any use is made of a complement
then the overall feature should be considered to be on the complement;
leaving the specific subregions to handle the details?
Second, I noticed in U16238 there is a 'one-of()' operator in the
feature location. That's not mentioned in the Feature Table
documentation dated Dec. 15, 2001 at
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
not does the Bioperl code handle that case (it gives a warning
message). Since that document doesn't provide a contact email and
since there's a bunch of EMBL people on this list, perhaps someone
here can tell the maintainers to update their documentation or
fix the database entry.
BTW, the XEMBL conversion for
5'UTR one-of(845,953,963,1078,1104)..1354
/evidence=experimental
exon one-of(845,953,963,1078,1104)..1742
/number=1
/evidence=experimental
is at
http://www.ebi.ac.uk/cgi-bin/xembl/XEMBL.pl?id=U16238&format=Bsml
<Feature id="FTR_U16238.1_3" class="5'UTR" value-type="5'utr" title="5'UTR"
display-auto="1">
<Qualifier value-type="evidence" value="EXPERIMENTAL" />
<Interval-loc startpos="1" endpos="1895" startopen="0" endopen="0"
onepos="0" complement="0" />
</Feature>
<Feature id="FTR_U16238.1_4" class="EXON" value-type="exon" title="exon"
display-auto="1">
<Qualifier value-type="evidence" value="EXPERIMENTAL" />
<Qualifier value-type="number" value="1" />
<Interval-loc startpos="1" endpos="1895" startopen="0" endopen="0"
onepos="0" complement="0" />
</Feature>
so it looks like the conversion code sticks in the start/end values
of the whole entry if it can't figure things out. It should include
some sort of information about the fuzzy for this case, which I
suspect bioperl doesn't provide.
Andrew
dalke@dalkescientific.com