[BioRuby] Proposal: Bio::FastaFormat#each_entry

MISHIMA, Hiroyuki missy at be.to
Fri Jan 29 11:24:15 UTC 2010


Hi, Naohisa GOTO,

Thank you so much for detailed explanation and a sample code. It was big
help for me to understand BioRuby's overall design.

Although I used here-documents in my code, what I wanted to do was just
make a FASTQ file from regular FASTA and FASTA.QUAL files.

I tried your code using my relatively large input files. It was much
faster than my code.

The final code is simply the following:
----
require 'bio'

ff_fasta = Bio::FlatFile.open(ARGV[0])
ff_qual = Bio::FlatFile.open(ARGV[0]+".qual")

while entry_fasta = ff_fasta.next_entry
   seq = entry_fasta.to_biosequence
   seq.quality_score_type = :phred
   seq.quality_scores = ff_qual.next_entry.data
   puts seq.output(:fastq, :title => entry_fasta.definition)
end
----

Hiro.

Naohisa GOTO wrote (2010/01/29 19:25):
> Hi,
>
> On Fri, 29 Jan 2010 15:46:15 +0900
> "MISHIMA, Hiroyuki"<missy at be.to>  wrote:
>
>> Hi all,
>>
>> How about implementing the following methods?
>>
>> 	Bio::FastaFormat#each_entry
>> 	Bio::FastaNumericFormat#each_entry
>>
>> The following is a sample code to generate a FASTQ string from a FASTA
>> string and a FASTA.QUAL string. This sample may need ruby 1.8.7 or later.
>>
>> I am afraid that simpler or easier ways are already existed in BioRuby...
>
> I think mixing single entry parser with multiple entry iterator
> will cause confusion, and not good way.
>
> For most parser classes in bioruby, expected data source is
> String containing single entry data. In addition, for IO with
> possible multiple entries, Bio::FlatFile is the front-end that
> can detect data type, splits each entry, and calling assigned
> parser class.
>
> For String containing multiple entries, using StringIO and
> then Bio::FlatFile is the easiest way, although indirect.
> Recently, many efficient memory-mapped data transfer methods
> are available, e.g. memcached, IPC shared memory, mmap(2)
> system call. I'm now thinking how to treat such data efficiently.

-- 
MISHIMA, Hiroyuki, DDS, Ph.D.
COE Research Fellow
Department of Human Genetics
Nagasaki University Graduate School of Biomedical Sciences



More information about the BioRuby mailing list