Bioperl: FTHelper changes

James Gilbert jgrg@sanger.ac.uk
Thu, 11 May 2000 12:21:05 +0100 (BST)



Hilmar,

Thanks for all your hard work.

> > Hilmar,
> > 
> > Yes, please can you commit your changes to
> > FTHelper.pm
> > 
> > I'm sure I saw them at some point, but I can't
> > find them anywhere.
> > 
> 
> Done. Apparently you or someone else propagated fixes from the branch to the
> main trunk, thereby invalidating some of my fixes and re-introducing bugs.
> I've corrected those again (like reading 1 line too much etc). In addition,
> there was now a bug in Species.pm (subspecies(): no underscore). I've also
> added an implementation to the SwissProt parsing method, but not to the one
> that writes.

Yes, this was me.  I'm sorry that I reintroduced
the bug, and thanks for spotting the sub_species
bug.

> The files I changes comprise FTHelper.pm, genbank.pm, embl.pm, swiss.pm,
> Species.pm, t/Species.t, and t/SeqIO.t. All done in the main trunk, so I leave
> it up to you to decide how to propagate these into the branch (copying should
> be safe).

I've been looking over your new code, and have a
few queries:

In Bio::SeqIO::FTHelper::_generic_seqfeature if
_parse_loc fails when making sub_SeqFeatures, we
are just skipping making a sub_SeqFeature, but I
think we should fail to add the parent SeqFeature
(and warn), because the SeqFeature may now be
invalid (missing an exon or something).


>From Bio::SeqIO::FTHelper::_parse_loc

    if($loc =~ /^\s*(\w+[A-Za-z])?\(?\<?(\d+)[ \W]{1,3}\>?(\d+)[,;\" ]*([A-Za-z]\w*)?\"?\)?\s*$/) {
	#print "1 = \"$1\", 2 = \"$2\", 3 = \"$3\", 4 = \"$4\"\n";
	$fea_type = $1 if $1;
	$start = $2;
	$end   = $3;
	$tagval = $4 if $4;
	$sf->start($start);
	$sf->end($end);

I guess this is for catching features like "allele
- replace".  I like the idea of having a separate
_parse_loc subroutine, but I think we're doing too
much work in it.  I'd like to explictly catch
"join", "complement", and "replace" in
_generic_seqfeature, and have _parse_loc just make
a Bio::SeqFeature::Generic.  This should make the
code more explicit, and avoid us having to write
catch-all pattern matches.

    # now that we've returned the extra location information to the outbound file
    # remove the extra tags

    $sub->remove_tag('_part_feature');

We shouldn't do this.  Someone may want to print
their Bio::Seq to EMBL and then to GenBank, and
will get different features in the two files.

Finally, we don't (as far as I can see) explicitly
trap locations which we can't yet model such as:

23.45   Fuzzy location (a single base somewhere
        between 23 and 45 inclusive)

J00194:(100..202),1..245,300..422
        Part of this feature is bases 100..202 of
        entry with accession number J00194.

In both cases I think we should fail to return a
SeqFeature, and issue a warning.

I'll go ahead and make these changes myself.

	Thanks,  James

James G.R. Gilbert
The Sanger Centre
Wellcome Trust Genome Campus
Hinxton
Cambridge                        Tel: 01223 494906
CB10 1SA                         Fax: 01223 494919


=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================