[emboss-dev] EMBOSS 6.3.0 released - SAM/BAM
Peter Rice
pmr at ebi.ac.uk
Thu Jul 15 11:12:02 UTC 2010
On 15/07/2010 12:01, Peter C. wrote:
> Congratulations on the latest release.
>
>> Some highlights include:
>>
>> ...
>> Support for BAM/SAM files
>> ...
>>
>
> Cool. I should take a look at this before (if) merging SAM/BAM
> support into Biopython. The use case I had in mind was for
> conversion to FASTQ (discarding any alignment information).
>
> What do you do about naming for paired reads? I was appending
> /1 or /2 to match the Illumina convention. Doing nothing means
> the paired reads will have the same names.
Not addressed yet - let's look into a common approach though.
We would also have to lok into what the '/' character does to EMBOSS's
handling of sequence names.
> What do you do about the strand issue? SAM/BAM stored reads
> which map onto the reverse strand in reverse complement. If
> you want to get back to the original orientation for output as
> FASTQ you must apply the reverse complement (plus reverse
> the quality scores too of course).
So far we read as sequences. Reading as mapped reads (very large
alignments) is planned for the very near future so it can appear in the
next release.
> Do you support writing SAM/BAM files? If so, would this be
> for aligned reads or unaligned reads only?
Yes we do write them - so far unaligned but we will add aligned reads when
we can treat that as an input type.
> Assuming you do write BAM files, do you support the recent
> convention to use a single BGZF block, and that where possible
> reads should not span a BGZF block boundary?
We looked at samtools 1.7 to get things working. We still need to look at
issues such as using the index for access to remote BAM files, and various
flavours of blocks. I was not aware of the single block version. Again, we
should compare notes.
> (I'm assuming some of the EMBOSS team must be on the
> samtools-devel mailing list which is where most SAM/BAM
> format discussion seems to take place)
Actually no, but I will join it ASAP and catch up.
regards,
Peter Rice
More information about the emboss-dev
mailing list