[Biojava-dev] [Biojava-l] FASTA Header Parser

Michael Heuer heuermh at gmail.com
Wed Jan 11 21:34:01 UTC 2012


Hannes Brandstätter-Müller wrote:

> On Wed, Jan 11, 2012 at 16:24, Scooter Willis <HWillis at scripps.edu> wrote:
>> Hannes
>>
>> Looks like the length= is something I am specifically looking for to
>> truncate as being redundant information and not part of the unique id for
>> a header. Are you using "HG7JTKN01BFWC8 rank=0000030 x=474.5 y=10.0" as
>> your unique ID?
>
> So far I was using that (or rather, the whole header)
>
>> Is this a custom header or something output from a sequencing
>> instrument/software?
>
> It's the output of the Roche/454 Titanium FLX Sequencer

If you would rather, biojava has support for the FASTQ file format,
insofar as they can be read in, validated, and converted among the
different FASTQ variants.  The exercise left for the reader is to
interpret the quality scores if necessary and import into a biojava
sequence.

http://www.biojava.org/docs/api1.8/org/biojava/bio/program/fastq/package-summary.html

Something like

 FastqReader reader = new SangerFastqReader();
 for (Fastq fastq : reader.read(new File("sanger.fastq"))
 {
   Sequence sequence = DNATools.createDNASequence(fastq.getSequence(),
fastq.getDescription());
   // ...
 }

for biojava-legacy, or replace DNATools with the biojava3 equivalent.

   michael




More information about the biojava-dev mailing list