Bioperl: FTHelper changes
James Gilbert
jgrg@sanger.ac.uk
Thu, 11 May 2000 12:21:05 +0100 (BST)
Hilmar,
Thanks for all your hard work.
> > Hilmar,
> >
> > Yes, please can you commit your changes to
> > FTHelper.pm
> >
> > I'm sure I saw them at some point, but I can't
> > find them anywhere.
> >
>
> Done. Apparently you or someone else propagated fixes from the branch to the
> main trunk, thereby invalidating some of my fixes and re-introducing bugs.
> I've corrected those again (like reading 1 line too much etc). In addition,
> there was now a bug in Species.pm (subspecies(): no underscore). I've also
> added an implementation to the SwissProt parsing method, but not to the one
> that writes.
Yes, this was me. I'm sorry that I reintroduced
the bug, and thanks for spotting the sub_species
bug.
> The files I changes comprise FTHelper.pm, genbank.pm, embl.pm, swiss.pm,
> Species.pm, t/Species.t, and t/SeqIO.t. All done in the main trunk, so I leave
> it up to you to decide how to propagate these into the branch (copying should
> be safe).
I've been looking over your new code, and have a
few queries:
In Bio::SeqIO::FTHelper::_generic_seqfeature if
_parse_loc fails when making sub_SeqFeatures, we
are just skipping making a sub_SeqFeature, but I
think we should fail to add the parent SeqFeature
(and warn), because the SeqFeature may now be
invalid (missing an exon or something).
>From Bio::SeqIO::FTHelper::_parse_loc
if($loc =~ /^\s*(\w+[A-Za-z])?\(?\<?(\d+)[ \W]{1,3}\>?(\d+)[,;\" ]*([A-Za-z]\w*)?\"?\)?\s*$/) {
#print "1 = \"$1\", 2 = \"$2\", 3 = \"$3\", 4 = \"$4\"\n";
$fea_type = $1 if $1;
$start = $2;
$end = $3;
$tagval = $4 if $4;
$sf->start($start);
$sf->end($end);
I guess this is for catching features like "allele
- replace". I like the idea of having a separate
_parse_loc subroutine, but I think we're doing too
much work in it. I'd like to explictly catch
"join", "complement", and "replace" in
_generic_seqfeature, and have _parse_loc just make
a Bio::SeqFeature::Generic. This should make the
code more explicit, and avoid us having to write
catch-all pattern matches.
# now that we've returned the extra location information to the outbound file
# remove the extra tags
$sub->remove_tag('_part_feature');
We shouldn't do this. Someone may want to print
their Bio::Seq to EMBL and then to GenBank, and
will get different features in the two files.
Finally, we don't (as far as I can see) explicitly
trap locations which we can't yet model such as:
23.45 Fuzzy location (a single base somewhere
between 23 and 45 inclusive)
J00194:(100..202),1..245,300..422
Part of this feature is bases 100..202 of
entry with accession number J00194.
In both cases I think we should fail to return a
SeqFeature, and issue a warning.
I'll go ahead and make these changes myself.
Thanks, James
James G.R. Gilbert
The Sanger Centre
Wellcome Trust Genome Campus
Hinxton
Cambridge Tel: 01223 494906
CB10 1SA Fax: 01223 494919
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================