[Biojava-l] Fasta & EMBL feature table parsing

Keith James kdj@sanger.ac.uk
27 Nov 2000 17:11:19 +0000


>>>>> "Matthew" == Matthew Pocock <mrp@sanger.ac.uk> writes:

    >>  And an observation:
    >> 
    >> The EMBL flatfile feature table parser (at least, as it was
    >> until the new io stuff) would overwrite qualifiers. e.g. where
    >> there were several /gene names in a feature, only the last one
    >> would be retained. Also quirks similar to earlier Bioperl (like
    >> discarding information from < and > in locations, which is
    >> important for us to keep). Are these going to be addressed in
    >> the io shakeup?

    Matthew> The qualifier overwriting should be adressed by the new
    Matthew> IO (fingers crossed). Fuzzy locations are evil. I ducked
    Matthew> handeling this one untill somebody required it. You
    Matthew> requre it, so I guess the days of ducking are over. I am
    Matthew> willing to add a new implementation of the Location
    Matthew> interface called FuzzyLocation. It will have isMinFuzzy()
    Matthew> and isMaxFuzzy() boolean methods, and will decorate
    Matthew> another Location for all the other location methods. This
    Matthew> way I think we can store everything & lose
    Matthew> nothing. Sounds good?

I think we call fuzzy locations something different e.g.

FT   fuzzy_3p        complement(130.140..2780)
FT   fuzzy_both      123.130..789.796

Thankfully, I have some Perl classes to deal with these and I'm going
to ignore them.

The < and > fuzziness is more important for us because they signify
e.g. that there is more of the feature on an adjacent cosmid, or
perhaps just 'beware incomplete CDS'. We sometimes use this to
reconstitute bacterial genes across cosmid overlaps.

Support for these would be great.

Keith

-- 

-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA