[Biopython-dev] SeqIO Abi Parser

Fri Jul 29 09:39:20 UTC 2011

On Fri, Jul 29, 2011 at 9:07 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
> I made a local branch tracking your seqio-abi tree. I agree to most of the
> changes, but I think I'm a bit lost on the filename part.
> My intention is to use the filename of the Abi file as the ID for the
> SeqRecord, instead of the stored records identified returned by seqret. The
> reason is because it's easier to see which Abi file a SeqRecord came from by
> looking at the ID (or output file name, in case the SeqRecord is written as
> another format), since the records identifier data is not readily available.
> I chose to store the records identifier in SeqRecord.name (sample_id), so
> users can still cross check if they want to.
> My 'except' block (AbiIO.py:83) is a bad way to deal with '.name' being
> absent, now that I think of it. But do you think instead of 'None', maybe we
> could use 'file_id = str(handle)' or 'file_id = self.name'?

There may not be a filename - the ABI file might be piped from stdin,
or supplied as a StringIO handle, or a network handle. So using the
filename as the primary identifier seems wrong to me. I would want
the same ID regardless of how the file was loaded, or what the name
was. Using the filename as the SeqRecord name (if available, "" if
not) would be OK with me.

The other justification for using the ID in the file as the SeqRecord's id
is consistency with EMBOSS. We should also check how BioPerl does
it - but I'm not sure if I have all the dependencies installed.

Also, is it possible to concatenate multiple ABI files together?

> And lastly, could you clarify what you mean by alphabet issue on
> test_SeqIO.py?

Add the three good ABI test files to the list in test_SeqIO.py and
run the test, you'll get a complaint about the alphabet handling.
I didn't have time to look into what exactly was going on yet.

Peter