[Biojava-l] FASTA Header Parser

Hannes Brandstätter-Müller biojava at hannes.oib.com
Wed Jan 11 14:45:54 UTC 2012


nope, the header is in the hashmap in total, except for everything
after length= -- there are whitespaces before that and these are still
left in the header that is used as key.

either make it work like you say or even better, leave the header as-is.

I need to quickly find the sequence, I don't want to iterate over all
my 35k sequences and look up the original headers.

Hannes

On Wed, Jan 11, 2012 at 15:38, Scooter Willis <HWillis at scripps.edu> wrote:
> It should parse until the first space as the unique id. Lots of extra info
> gets added in to the header. You should find a getOriginalHeader method that
> will preserve to contents of the header. I use this when writing the
> sequences back to disk to restore the original header.
>
> You can also do your own custom header parser which we use to support the
> known different fasta headers. If you have extra information in the header
> you can formally associate that with the sequence at the time of the parse.
> We can also add support for your header if it is standard ouput from a
> device.
>
> Thanks
>
> Scooter
>
>
> ----- Reply message -----
> From: "Hannes Brandstätter-Müller" <biojava at hannes.oib.com>
> To: "biojava-l" <biojava-l at lists.open-bio.org>
> Subject: [Biojava-l] FASTA Header Parser
> Date: Wed, Jan 11, 2012 9:30 am
>
>
>
> Hi there -
>
> I just came across a puzzling "feature" of the GenericFastaHeaderParser.
> It seems to throw away everything in the header after (and including)
> "length="
> (see GenericFastaHeaderParser.java lines 71-76)
>
> ... Why?
>
> Also, is there a Fasta Header Parser I can use that does not mess
> about with the header?
>
> I really would like to have that as key (still working on my
> FASTA/QUAL parsing) and not having that (only in the originalHeader,
> not in the Hashmap key) really breaks stuff.
>
> Hannes
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l




More information about the Biojava-l mailing list