thoughts/questions on Seq.pm and Parse.pm
Georg Fuellen
fuellen@dali.Mathematik.Uni-Bielefeld.DE
Mon, 17 Mar 1997 18:36:34 +0000 (GMT)
Hi Chris,
> It's getting tough keeping track of the 'open' issues that should be
> resolved, I've tried to distill most of them into the "ToDo" file with the
> Seq.pm distribution. I'm limited both by time and by programming skill
> (working on Seq.pm has been like a trial by fire-- learning by doing) which
> limits the amount of 'real' contributions I can make.
>
> I've got some questions/comments about several of the issues so here goes:
>
> Parse.pm
> --------
> I wrote 2 basic methods that were necessary to get things in Seq.pm working
> without thinking much about the overall interface scheme. Any
> guidance/code/observations on method names, interface or implementation
> would be appreciated.
I really don't feel qualified to say much about this; I hope Steve can
mail soon...!
Do you see a way to integrate convert_from_raw() into convert() ?
Should the case that we want to convert but don't know the format yet
be handled by Parse.pm or by Seq.pm ? Do we wanna move the whole parsing
system into Parse.pm ? Make it an object, then ? Can UnivAln.pm use
Parse.pm, too ?
I'd very much prefer ffmt (fileformat) to fmt; why did you change this ?
I'd also prefer ``-input'' to ``-sequence'' since ReadSeq reads multiple
alignment formats as well.
``-location'' is a little long - how about ``-loc'' which we have already
in use for $seq->{names}. Or, call it ``-file'' ?
I just note that there's FileHandle::new_tmpfile() and the Camel
seems to prefer this (p.485).
> Seq.pm
> -------
>
> o One major interface change that needs to happen SOON is changing
> Dna_to_Rna(), Rna_to_Dna() and translate() so that they return biosequence
> objects instead of strings. I tried for a little while to do this, using
> the Perl-OO-tutorial as a guide but kept running into problems with
> scoping. I'm also not sure if the "Right Way" invoves returning an object
> or a ref. to an object. I don't want to waste any more time doing this if
> the answer invoves a tiny piece of code that is immediatly obvious to
> someone on this list. So- if someone knows the "Right Way" to do this,
> please let me know!
In UnivAln.pm, there's a method aln() which returns an alignment; I hope
it can serve as a model (Steve?)...
> o Non-fatal use of Parse.pm if ReadSeq does not exist or not configured
> I wrapped some code around an eval{} statement in Seq.pm that tries to
> politely figure out if Parse.pm is available -- it checks for the presense
> of an exported "OK" variable in Parse.pm. Is this the right approach?
I'd check for $PARSE::VERSION > release_number_which_you_need.
Maybe there's something better though.
> Seq.pm should be able to use/not-use Parse.pm without any obvious error
> messages.
>
>
> o Site-specific configuration issues.
> Right now, Seq.pm does not have to be edited by users but Parse.pm and the
> test scripts do. I'm going to hit the POD docs for MakeMaker, etc. and try
> to figure out how setup a system where users edit a ".config" file or
> somesuch and the resulting info is used to automatically tweak Parse.pm and
> Seq.pm during the 'make' process. Again, any help/suggestions on this would
> be appreciated.
Let's postpone this until Steve replies.
I thought MakeMaker is so complex b/c it can take care of _all_ configuration
issues automatically ?! ;-)
> o Proposed validity markers
> - A marker that would be set to 'false' whenever Seq.pm makes a call to carp()
> - A marker to specify valid/invalid biosequence object
> Are these permutations of the same idea or two different things? I'm also
They are both ways of defining what ``valid'' is. ..
For me a valid object conforms to some requirements, like (for UnivAln),
that $self{type} is correct (especially that it reflects the fact that the
alignment is just a sequence bag, i.e. the rows are of different length),
$self{id} has no whitespace, $self{desc} conforms to $self{descffmt},
$self{row_ids}, etc, have the correct size.
This is something I don't have time for right now, but it's needed eventually.
> not sure about how to implement.
>
> o Default constructor ID
> Steve commented that the default constructor ID should be changed from
> "No_Id_Given" to "No_Id" plus a unique number. Assigning a number is easy
> enough but how would you keep track of "unique" numbers assigned? Is there
> a way to save state or remember these numbers each time new() is called? I
> think I see the potential problems that objects with the same 'ID' field
> could cause but I'm unsure how a 'unique' naming process would work.
Same here..
best wishes,
georg
> o translate() treats ambiguity inconsistantly
> Steve mentioned this, but I want to be sure that I understand the problem
> -- it looks like the code deals with "N" unknown bases but does not deal
> with any of the other IUPAC symbols for ambiguity. Is this what you were
> pointing out Steve?
>
>
>
> Sorry for the length!
>
> Regards,
> Chris Dagdigian
> cdagdigian@genetics.com
>
>
>