[Biojava-l] FASTA Header Parser

Scooter Willis HWillis at scripps.edu
Wed Jan 11 14:38:21 UTC 2012


It should parse until the first space as the unique id. Lots of extra info gets added in to the header. You should find a getOriginalHeader method that will preserve to contents of the header. I use this when writing the sequences back to disk to restore the original header.

You can also do your own custom header parser which we use to support the known different fasta headers. If you have extra information in the header you can formally associate that with the sequence at the time of the parse. We can also add support for your header if it is standard ouput from a device.

Thanks

Scooter

----- Reply message -----
From: "Hannes Brandstätter-Müller" <biojava at hannes.oib.com>
To: "biojava-l" <biojava-l at lists.open-bio.org>
Subject: [Biojava-l] FASTA Header Parser
Date: Wed, Jan 11, 2012 9:30 am



Hi there -

I just came across a puzzling "feature" of the GenericFastaHeaderParser.
It seems to throw away everything in the header after (and including) "length="
(see GenericFastaHeaderParser.java lines 71-76)

... Why?

Also, is there a Fasta Header Parser I can use that does not mess
about with the header?

I really would like to have that as key (still working on my
FASTA/QUAL parsing) and not having that (only in the originalHeader,
not in the Hashmap key) really breaks stuff.

Hannes
_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l



More information about the Biojava-l mailing list