[Biojava-l] Fasta & EMBL feature table parsing
Keith James
kdj@sanger.ac.uk
27 Nov 2000 17:11:19 +0000
>>>>> "Matthew" == Matthew Pocock <mrp@sanger.ac.uk> writes:
>> And an observation:
>>
>> The EMBL flatfile feature table parser (at least, as it was
>> until the new io stuff) would overwrite qualifiers. e.g. where
>> there were several /gene names in a feature, only the last one
>> would be retained. Also quirks similar to earlier Bioperl (like
>> discarding information from < and > in locations, which is
>> important for us to keep). Are these going to be addressed in
>> the io shakeup?
Matthew> The qualifier overwriting should be adressed by the new
Matthew> IO (fingers crossed). Fuzzy locations are evil. I ducked
Matthew> handeling this one untill somebody required it. You
Matthew> requre it, so I guess the days of ducking are over. I am
Matthew> willing to add a new implementation of the Location
Matthew> interface called FuzzyLocation. It will have isMinFuzzy()
Matthew> and isMaxFuzzy() boolean methods, and will decorate
Matthew> another Location for all the other location methods. This
Matthew> way I think we can store everything & lose
Matthew> nothing. Sounds good?
I think we call fuzzy locations something different e.g.
FT fuzzy_3p complement(130.140..2780)
FT fuzzy_both 123.130..789.796
Thankfully, I have some Perl classes to deal with these and I'm going
to ignore them.
The < and > fuzziness is more important for us because they signify
e.g. that there is more of the feature on an adjacent cosmid, or
perhaps just 'beware incomplete CDS'. We sometimes use this to
reconstitute bacterial genes across cosmid overlaps.
Support for these would be great.
Keith
--
-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA