[emboss-dev] EMBOSS 6.4.0 using EMBOSS_001 as the ID in ABI files

Peter Cock p.j.a.cock at googlemail.com
Tue Aug 2 18:01:54 UTC 2011


Hi EMBOSS folk,

I'm reporting a regression in EMBOSS 6.4.0 spotted by Wibowo Arindrarto
who has been adding ABI support to Biopython.

With EMBOSS 6.3.1 compiled from source on Mac (as an example),

$ seqret -osformat="fastq-sanger" -filter 310.ab1
@D11F
TGATNTTNACNNTTTTGAANCANTGAGTTAATAGCAATNCTTTACNAATAAGAATATACACTTTCTGCTTAGGGATGATAATTGGCAGGCAAGTGAATCCCTGAGCGTGNATTTGATAATGACCTAAATAATGGATGGGGTTTTAATTCCCAGACCTTCCCCTTTTTAANNGGNGGATTANTGGGGGNNNAACNNGGGGGGCCCTTNCCNAAGGGGGAAAAAATTTNAAACCCCCCNAGGNNGGGNAAAAAAAAATTTCCAAATTNCCGGGGTNNCCCCCAANTTTTTNCCGCNGGGAAAANNNNCCCCCCCNGGGNCCCCCCCCNNAAAAAAAAAAAAAAAAACCCCCCCCCCNTTGGGGNGGTNTNCNCCCCCNNANAANNGGGGGNNAAAAAAAAAGGCCCCCCCCAAAAAAAACCCNCNTTCTNNCNNNNNGNNCNGNNCCCCCNNCCNTNTNGGGGGGGGGGGNGGAAAAAAAACCCCTTTNTGNNNANANNAACCCNCTCNTNTTTTTTTTTTTANGNNNNCNNNNCAAAAAAAAANCNCCCCCNNCNNNCNNNCNCCCCNNNNTNAAAANANNAANNNNTTTTTTTNGGGGGGGTGNGCGNCCCNNANCNNNNNNNNGCGNGGNCNCCNNCCCNCNANAAANNNTNTTTTTTTTTTTTTTTNTNNTCNNCCCNNNCCCCNNCCCCCCCCCCCCCNCCNCNNNNNGGGGNNNCGGNNCNNNNNNNCCNTNCTNNANATNCCNTTNNNNNNNNGNNNNNNNNACNNNNNTNNTNNNCNNNNNNNNNNNNNNCNNNNNNCNNCCCNNCANNNNNNNCNNNNNNNNNNNNNNNNNNNNNTCNCTNCNCNCCCCNCCCNNNNNNNG
+
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


With EMBOSS 6.4.0 compiled from source on 64 bit Linux, rather
than the expected ID from within the file we get EMBOSS_001,

$ seqret -osformat="fastq-sanger" -filter 310.ab1
@EMBOSS_001
TGATNTTNACNNTTTTGAANCANTGAGTTAATAGCAATNCTTTACNAATAAGAATATACACTTTCTGCTTAGGGATGATAATTGGCAGGCAAGTGAATCCCTGAGCGTGNATTTGATAATGACCTAAATAATGGATGGGGTTTTAATTCCCAGACCTTCCCCTTTTTAANNGGNGGATTANTGGGGGNNNAACNNGGGGGGCCCTTNCCNAAGGGGGAAAAAATTTNAAACCCCCCNAGGNNGGGNAAAAAAAAATTTCCAAATTNCCGGGGTNNCCCCCAANTTTTTNCCGCNGGGAAAANNNNCCCCCCCNGGGNCCCCCCCCNNAAAAAAAAAAAAAAAAACCCCCCCCCCNTTGGGGNGGTNTNCNCCCCCNNANAANNGGGGGNNAAAAAAAAAGGCCCCCCCCAAAAAAAACCCNCNTTCTNNCNNNNNGNNCNGNNCCCCCNNCCNTNTNGGGGGGGGGGGNGGAAAAAAAACCCCTTTNTGNNNANANNAACCCNCTCNTNTTTTTTTTTTTANGNNNNCNNNNCAAAAAAAAANCNCCCCCNNCNNNCNNNCNCCCCNNNNTNAAAANANNAANNNNTTTTTTTNGGGGGGGTGNGCGNCCCNNANCNNNNNNNNGCGNGGNCNCCNNCCCNCNANAAANNNTNTTTTTTTTTTTTTTTNTNNTCNNCCCNNNCCCCNNCCCCCCCCCCCCCNCCNCNNNNNGGGGNNNCGGNNCNNNNNNNCCNTNCTNNANATNCCNTTNNNNNNNNGNNNNNNNNACNNNNNTNNTNNNCNNNNNNNNNNNNNNCNNNNNNCNNCCCNNCANNNNNNNCNNNNNNNNNNNNNNNNNNNNNTCNCTNCNCNCCCCNCCCNNNNNNNG
+
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Regards,

Peter Cock

---------- Forwarded message ----------
From: Wibowo Arindrarto <w.arindrarto at gmail.com>
Date: Sat, Jul 30, 2011 at 8:42 AM
Subject: Re: [Biopython-dev] SeqIO Abi Parser
To: Peter Cock <p.j.a.cock at googlemail.com>
Cc: biopython-dev at lists.open-bio.org


Hi Peter,
I've done some more improvements to the code:
- I've written the check and unittest for the file handle mode. I've
set it so that abi file has to be opened in 'rb' mode, otherwise it'll
return an error. While it's ok to open in 'r' mode in python 2 in
Linux, it has to be specified as 'rb' in Windows and/or Python 3 for
the file to be read correctly. So I decided forcing it to 'rb' is the
best. Because of this, I changed 'test_SeqIO.py:503' to include the
mode argument when opening.
- I've also checked against test_Emboss.py for seqret output, after
including the abi format in it. My EMBOSS version is 6.4.0. There was
a slight problem with this testing, since for some reason the ID
returned by seqret is always "EMBOSS_001". Something might be wrong
with my EMBOSS installation, since when I previously tested it against
6.1.0, the ID was correct (although the qual values not, so I had to
upgrade). As expected, if I comment out the code that tests for
sequence id ('test_Emboss.py:168-172') the tests pass. Maybe you could
try testing it as well and see if EMBOSS also returns the default id
instead of the sample name?
- Finally, I did some small cosmetic changes to the code (typos, etc).
All changes have been pushed to my github fork. Now I still have time
for the weekend to improve whatever needs to be improved :).
Regards,
---
Wibowo Arindrarto (bow)
http://bow.web.id


On Fri, Jul 29, 2011 at 18:20, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Hi again,
>
> I had a bit of time this afternoon so I looked at this.
>
> On Fri, Jul 29, 2011 at 1:14 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > On Fri, Jul 29, 2011 at 12:34 PM, Wibowo Arindrarto wrote:
> >> Hi Peter,
> >> Thanks for explaining. I understand why we should stick to the stored
> >> sequence id. In this case, we can use the filename as SeqRecord.name as
> >> well. Regarding BioPerl, I don't have it installed myself -- but I took a
> >> quick look at their source and it seems they also use the stored sequence ID
> >> as their main identifier instead of the filename. If the stored sequence ID
> >> is not present, it's "(unknown)" in their case.
> >
> > OK good, that means Biopython, BioPerl and EMBOSS should be
> > consistent :)
>
> I've made that switch,
>
> >> I'll look on the test_SeqIO.py over the weekend. I think it'll have
> >> something to do with some ambiguous dna base stored in the abi files.
> >> Regards,
> >
> > Some of the alphabet stuff is a bit nasty - so please feel free to ask
> > or get me to help.
>
> I've done enough to get the test_SeqIO.py unit test to pass.
>
> We probably need a check (like in SFF) to check the user hasn't given
> a handle opened in text mode. That should probably have a unit test
> too.
>
> I still haven't cross checked the sequence and PHRED scores from
> your code and EMBOSS.
>
> Anyway - I'll leave the code for you to work on for now...
>
> Peter




More information about the emboss-dev mailing list