[EMBOSS] ABI to FASTQ with seqret

Tom Keller kellert at ohsu.edu
Thu Jul 22 16:33:43 UTC 2010


Greetings,
The latest versions of the ABI basecaller does indeed give quality scores.
Nicola Vitacolonna wrote a perl module that access the metadata encoded in the ab1 files.

use Bio::Trace::ABIF;
my $abif = Bio::Trace::ABIF−>new(); $abif−>open_abif('/Path/to/my/file.ab1');
my $sequence = $abif−>sequence();
my @quality_values = $abif−>quality_values();
print $abif−>sample_name(), "\n";
print $sequence, "\n";
print '+\n';
print join(" ", at quality_values), "\n";

Will generate a fastq-sanger format.

regards,
Tom

Thomas (Tom) Keller, PhD
kellert at ohsu.edu<http://ohsu.edu>
503.494.2442
6339b R Jones Hall (BSc/CROET)
www.ohsu.edu/xd/research/research-cores/dna-analysis/<http://www.ohsu.edu/xd/research/research-cores/dna-analysis/>




On Jul 22, 2010, at 7:42 AM, Chevreux, Bastien wrote:

AFAIK ab1 files do not have phred quality scores included. At least they did not a couple of years ago.

You need to mangle them through a basecaller (TraceTuner, phred, others) to get these scores.

B.

--
DSM Nutritional Products AG
R&D Human Nutrition & Health
Bioinformatics - Bldg. 203.4 / 188
P.O. Box 2676
CH-4002 Basel / Switzerland
Tel. +41 61 815 8264


-----Original Message-----
From: emboss-bounces at lists.open-bio.org<mailto:emboss-bounces at lists.open-bio.org> [mailto:emboss-bounces at lists.open-
bio.org] On Behalf Of Peter
Sent: Donnerstag, 22. Juli 2010 15:14
To: Peter Rice
Cc: emboss at lists.open-bio.org<mailto:emboss at lists.open-bio.org>
Subject: Re: [EMBOSS] ABI to FASTQ with seqret

On Thu, Jul 22, 2010 at 1:28 PM, Peter Rice <pmr at ebi.ac.uk<mailto:pmr at ebi.ac.uk>> wrote:

On 22/07/10 12:22, Peter C. wrote:

I truncated this for brevity. Here the quality string repeats ASCI 34,
ASCI 33
(PHRED quality 1, quality 0) which is rather strange. The sequence
appears
to agree with the provided file pGEM_(ABI)_A01.seq

Have I just been unlucky with the AB1 files that I have looked at?
Thus
far all the quality scores seem meaningless.

There are two sets of quality scores in that file. Both are the
alternating characters 1 and 0. Adding 33 gives the scores you see.

Looks as though EMBOSS is just reporting what it finds.

The file offset is the value returned by function
ajSeqABIGetConfidOffset. It simply reads one byte from there for each
base of sequence length.

Looks like that particular random example from the internet was just odd.

I went back through my old emails, and see you had been testing with

http://www.appliedbiosystems.com/support/software_community/ab1_files.zip
(I had trouble downloading this with curl - Firefox worked). Looking at
these
ABI files with seqret as FASTQ does seem to give meaningful quality
scores.
Curious.

It should look for a PCON tag in the file and pick up the second of two,
or the first if there is only one.

Can anyone on the list enlighten us further on what is intended for the
quality socrss in these example files?

The gGEM example I have no idea - I just found it with Google.

I can send you a couple of our locally produced AB1 files off list
if you wouldn't mind having a look at them. It may be that however
these are being generated there simply are no useful scores inside.

Peter
_______________________________________________
EMBOSS mailing list
EMBOSS at lists.open-bio.org<mailto:EMBOSS at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/emboss

DISCLAIMER :
This e-mail is for the intended recipient only
If you have received it by mistake please let us know by reply and then delete it from your system; access, disclosure, copying, distribution or reliance on any of it by anyone else is prohibited.
If you as intended recipient have received this e-mail incorrectly, please notify the sender (via e-mail) immediately.

_______________________________________________
EMBOSS mailing list
EMBOSS at lists.open-bio.org<mailto:EMBOSS at lists.open-bio.org>
http://lists.open-bio.org/mailman/listinfo/emboss




More information about the EMBOSS mailing list