[Biopython-dev] [Biopython (old issues only) - Bug #2968] (Closed) Modifications to Emboss eprimer3 parser and associated files

Thu Jul 5 14:56:08 UTC 2018

Issue #2968 has been updated by Peter Cock.

Description updated
Status changed from New to Closed
% Done changed from 0 to 100

I'm going to mark this as closed - the original git commit has gone, and it seems this work had some influence on the main reposoitory already. Thanks!

----------------------------------------
Bug #2968: Modifications to Emboss eprimer3 parser and associated files
https://redmine.open-bio.org/issues/2968#change-15422

* Author: Leighton Pritchard
* Status: Closed
* Priority: Normal
* Assignee: Biopython Dev Mailing List
* Category: Main Distribution
* Target version: 1.52
* URL: 
----------------------------------------
The existing Emboss primer3/eprimer3 code has a couple of issues, and some scope for improvement:

- The existing Primer3.py parser code can only parse output when eprimer3 is applied to a single sequence.  When eprimer3 is applied to multiple sequence input, it groups all primers for all sequences into a single record, which may incorrectly associate primers with the wrong sequences in downstream analysis.
- The current parser lacks an iterator for iterating over multiple sequence output
- The current parser creates 'ghost' primers for all primer pairs, with length zero and sequence as an empty string; it does not do this for internal oligos.  A more intuitive solution might be to return None for absent primers/oligos
- The current data model stores all primer data as individual attributes.  It might be more useful to group the attributes of individual primers into their natural associations

I have written new code for Emboss/Primer3.py that adds iterator/multiple sequence parsing functionality to the parser, and extensively revises the object model for the data.  The Record and Primers objects are retained, but each primer/oligo is now represented by a Primer object that collects the relevant data together.  The Record object has a new attribute that allows the sequence to be recorded directly, rather than having to be parsed from the comments attribute.  The new data model retains the old attribute-based access for compatibility, but adds direct access to the Primer objects (where present) by .forward, .reverse and .oligo attributes, and by keywords.

One change was required to the unit test, to account for the reporting of absent primers as None, rather than having 'null' attributes.  I've added two further test output files, which may be rather large for the distribution (60kb total), and doctests that use these.

The code can be inspected at my GitHub repository:

http://github.com/widdowquinn/biopython/commit/b4701079ced297d7af5aa75b46738280c8783fe0

This enhancement request also relates to bug 2966.

-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20180705/1f474b96/attachment.html>