[BioRuby] fastq files reading

Naohisa Goto ngoto at gen-info.osaka-u.ac.jp
Sun May 30 14:31:55 UTC 2010


Hi,

The external itarator can be used with Ruby 1.8.7 or later.
(It can't be used with Ruby 1.8.6 or earlier.)
In addition, it takes many resources and is inefficient with current
Ruby implementation. (In the future, it will be optimized.)

I think using Bio::FlatFile#next_entry is good in this case.
The next_entry method returns nil after the end of file.
In the following example, "entry1" and "entry2" are checked
every time if they are not nil (in "if entry1 then ... end" and
"if entry2 then ... end"). If you believe the two files always have
the same number of entries, the checks can be  skipped.

  require 'bio'
  ff1 = Bio::FlatFile.open(Bio::Fastq, 'readsA.fastq')
  ff2 = Bio::FlatFile.open(Bio::Fastq, 'readsB.fastq')
  while entry1 = ff1.next_entry or entrry2 = ff2.next_entry
    if entry1 then
      header1 = entry1.entry_id
      seq1 = entry1.seq
      puts seq1.to_fasta(header1 + "qwa")
    end
    if entry2 then
      header2 = entry2.entry_id
      seq2 = entry2.seq 
      puts seq2.to_fasta(header2 + "qwa")
    end
  end
  ff2.close
  ff1.close


> Hello xyz,
> 
> You should be able to solve this problem by parallel iteration over the two
> files. An external iterator will be required  here. You can call next on an
> external iterator to get the next object. It will raise a StopIteration
> exception when there is no more item to iterate over. You will have to add a
> case to handle that too.
> 
> Give something like the following a try:
> 
> require 'bio'
> 
> #open the two files
> one = Bio::FlatFile.open(Bio::Fastq, 'readsA.fastq')
> two = Bio::FlatFile.open(Bio::Fastq, 'readsB.fastq')
> 
> #get an external iterator for two
> two_iterator = two.to_enum
> 
> #now iterate
> one.each do |ff1|
>  ff1.each do |entry1|
> 
>    header1 = entry1.entry_id
>    seq1 = entry1.seq
> 
>    puts seq1.to_fasta(header1 + "qwa")
> 
>    entry2 = two_iterator.next
>    header2 = entry2.entry_id
>    seq2 = entry2.seq
>    puts seq2.to_fasta(header2 + "qwa")
>  end
> end
> 
> #close the files
> one.close
> two.close
> 
> I did not have any fasta file to test it on, but it should work.
> 
> On Sat, May 29, 2010 at 5:44 PM, xyz <mitlox at op.pl> wrote:
> 
> > Hello,
> > I would like to read at the same time two fastq files in order to
> > save them to fasta file.
> >
> > require 'bio'
> > Bio::FlatFile.open(Bio::Fastq, 'readsA.fastq') do |ff1|
> >  ff1.each do |entry1|
> >
> >    header1 = entry1.entry_id
> >    seq1 = entry1.seq
> >
> >    puts seq1.to_fasta(header1 + "qwa")
> >
> >    #header2 = entry2.entry_id
> >    #seq2 = entry2.seq
> >    #puts seq2.to_fasta(header2 + "qwa")
> >  end
> > end
> >
> > I have already the following code, but unfortunately I do not know
> > how to read both files at the same time.
> >
> > How is it possible to read two files at the same time and write them
> > to fasta file?
> >
> > Thank you in advance.
> >
> > Best regards,
> >
> >
> > _______________________________________________
> > BioRuby Project - http://www.bioruby.org/
> > BioRuby mailing list
> > BioRuby at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioruby
> >
> 
> 
> 
> -- 
> Anurag Priyam,
> 2nd Year Undergraduate,
> Department of Mechanical Engineering,
> IIT Kharagpur.
> +91-9775550642
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby

-- 
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org




More information about the BioRuby mailing list