[Biojava-l] blast parsing question

Wed Jul 29 13:16:04 UTC 2009

Hi,

I am new to BioJava. I want to test what is going on here in order to
potentially integrate it with KNIME.

My first project is parsing BLAST output for large files. The example in the
codebook is very good and I had no problems integrating everything in
Eclipse and geting it to work.

Now here is my problem:

I am interested in parsing the summary table in the beginning of the
blast-output, and I haven't found a way to get at this information.

I am blasting short sequences (20nt - 300nt) against genomic databases
(mouse/human/refseq/miRBase). I want to know if a given sequence (out of a
set of sequences) aligns to a specific genome with high identity. I want to
then separate the input source fasta file into a set that aligns to the
genome and one that doesn't (potentially another list of dubious sequences
where there is no clear answer). For this I only need the length of the
query sequence and score and the first few characters of the header line.

At least that's the way I am currently doing it. I have set the blast
parameters to only give me the first alignment, but the first 50 or so in
the summary.

Any help, comments are appreciated.

Thanks,

Bernd

Bernd Jagla
Bioinformatician 

Institut Pasteur
Plate-forme puces a ADN
Genopole / Institut Pasteur
28 rue du Docteur Roux
75724 Paris Cedex 15
France 

 <mailto:bernd.jagla at pasteur.fr> bernd.jagla at pasteur.fr 

tel: 

<http://www.plaxo.com/click_to_call?lang=en&src=jj_signature&To=%2B33+%280%2
9+140+61+35+13&Email=berndjagla at yahoo.com> +33 (0) 140 61 35 13