[Biopython-dev] Bio.Motif AlignAce parser

Bartek Wilczynski bartek at rezolwenta.eu.org
Mon Aug 13 13:12:35 UTC 2012


Sounds great to me.

Bartek

On Sat, Aug 11, 2012 at 6:25 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi guys,
>
> Looking some more at the parsers in Bio.Motif.
>
> In the Record class in Bio/Motif/Parsers/AlignAce.py, we have an attribute self.current_motif that points to the motif currently being parsed by the parser (or, after the parser finishes, the last motif that was parsed). As far as I can tell this, using a temporary variable current_motif within the read() function would be sufficient; we don't need to store it in the record.
>
> I would also suggest for the read() function to strip() all lines. Currently the end-of-line markers are kept. For example the version and the command line are stored as "AlignACE 4.0 05/13/04\n" and "./AlignACE -i test.fa \n" respectively.
>
> The version of the AlignACE program is stored in record.ver. The MEME and Mast parsers in Bio.Motif instead use record.version. For consistency I would suggest to use record.version also in the AlignACE parser.
>
> The command line is stored in record.cmd_line. The MEME parser uses record.command instead. I think both are fine, but I would also prefer this to be consistent.
>
> Then there are two attributes param_dict and seq_dict. The former is a dictionary that stores the parameters used in the run. The latter is not a dictionary but a list of sequence-related information. Since usually we don't put the type of the object in the attribute names, I would suggest to call these simply parameters and sequences. For comparison, the Mast parser uses record.sequences for an analogous attribute; MEME uses record.sequence_names. For consistency I would suggest to use record.sequences for all three.
>
> This would create some backward-incompatible changes that may confuse users. Now currently the parsers are located in Bio.Motif.Parsers.AlignAce, Bio.Motif.Parsers.MEME, and Bio.Motif.Parsers.Mast. I would prefer Bio.Motif.AlignAce, Bio.Motif.MEME, Bio.Motif.Mast. Currently to parse the AlignAce output one would do
>>>> from Bio.Motif.Parsers import AlignAce
>>>> record = AlignAce.read(handle)
>>>> record
> <Bio.Motif.Parsers.AlignAce.Record object at 0x10058c7d0>
> If we move the parsers one level up, this would be
>>>> from Bio.Motif import AlignAce
>>>> record = AlignAce.read(handle)
>>>> record
> <Bio.Motif.AlignAce.Record object at 0x10058c7d0>
> which looks a bit more straightforward to me. In addition, this allows us to put a deprecation warning on the Bio.Motif.Parsers.AlignAce, Bio.Motif.Parsers.MEME, and Bio.Motif.Parsers.Mast modules as a whole, and we won't have to put deprecation warnings on each change separately.
>
> Any comments, objections?
>
> Best,
> -Michiel.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>



-- 
Bartek Wilczynski




More information about the Biopython-dev mailing list