[Bioperl-l] feature table definitions

Andrew Dalke Andrew Dalke" <dalke@dalkescientific.com
Mon, 21 Jan 2002 21:36:56 -0700


Hey all,

  I'm back to regression testing my parsers for Biopython and making
sure our usage matches Bioperl's.  I have a question about it, and
a bug report.

  First, the FTHelper location parser toggles the strand if the
word "complement" exists anywhere in the location.

    my $strand = ( $fth->loc =~ /complement/ ) ? -1 : 1;

I noticed record "M96344" contains the location
     order(112..131,complement(54..73))

which in XEMBL (I assume it uses Bioperl so I'll use that as a way
to show the results of Bioperl parsing) is converted to

<Qualifier value-type="order" value="order(112..131,complement(54..73))" />
  <Interval-loc startpos="54" endpos="131" complement="1" startopen="0"
   endopen="0" onepos="0" />

I'm having a hard time understanding why this makes sense.  The
BSML meaning of 'complement="1"' is that the feature is on the
complement, while in this case it's both.  Now, I can mimic the
existing behaviour, I just don't know if it's right.  It feels
suspicious, though it could be because BSML isn't powerful enough
to handle this case.

To make sure I understand things, if any use is made of a complement
then the overall feature should be considered to be on the complement;
leaving the specific subregions to handle the details?


Second, I noticed in U16238 there is a 'one-of()' operator in the
feature location.  That's not mentioned in the Feature Table
documentation dated Dec. 15, 2001 at
  http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
not does the Bioperl code handle that case (it gives a warning
message).  Since that document doesn't provide a contact email and
since there's a bunch of EMBL people on this list, perhaps someone
here can tell the maintainers to update their documentation or
fix the database entry.

BTW, the XEMBL conversion for

     5'UTR           one-of(845,953,963,1078,1104)..1354
                     /evidence=experimental
     exon            one-of(845,953,963,1078,1104)..1742
                     /number=1
                     /evidence=experimental

is at
 http://www.ebi.ac.uk/cgi-bin/xembl/XEMBL.pl?id=U16238&format=Bsml


 <Feature id="FTR_U16238.1_3" class="5'UTR" value-type="5'utr" title="5'UTR"
 display-auto="1">
  <Qualifier value-type="evidence" value="EXPERIMENTAL" />
  <Interval-loc startpos="1" endpos="1895" startopen="0" endopen="0"
 onepos="0" complement="0" />
  </Feature>

 <Feature id="FTR_U16238.1_4" class="EXON" value-type="exon" title="exon"
 display-auto="1">
  <Qualifier value-type="evidence" value="EXPERIMENTAL" />
  <Qualifier value-type="number" value="1" />
  <Interval-loc startpos="1" endpos="1895" startopen="0" endopen="0"
  onepos="0" complement="0" />
  </Feature>

so it looks like the conversion code sticks in the start/end values
of the whole entry if it can't figure things out.  It should include
some sort of information about the fuzzy for this case, which I
suspect bioperl doesn't provide.

                    Andrew
                    dalke@dalkescientific.com