AW: [Biojava-l] BLAST Parser for extracting all BLAST data?

BIBIS, Garnier, Christophe cgarnier at ttz-Bremerhaven.de
Tue Jun 28 08:03:48 EDT 2005


if you don't find what you need through biojava, you can always write a
small xml parser with for example jdom.

1 - download jdom.jar
2 - use the following code to find <Hsp_midline>:
3 - replace the path of the xml file in the main method
4 - it prints out every found Element


I hope it helps you

Best,
Christophe

+++++++++++++++++++++++++++++++++++++

import java.io.File;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;

import org.jdom.Document;
import org.jdom.Element;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;

public class JDomParser
{

	private static void parseResults(Element iterations)
	{
		System.out.println("*** parseResults ***") ;
		
		Element it = iterations.getChild("Iteration") ;
		
		List elts = it.getChildren();
		
		Iterator iterator = elts.iterator();
		
		while (iterator.hasNext())
		{
			Element child = (Element) iterator.next();

			System.out.println(child + " - " + child.getText() +
" - "
					+ child.getName());

			if ( child.getName().equals("Iteration_hits"))
			{
				parseHits(child) ;
			}
			
			if ( child.getName().equals("Iteration_stat"))
			{
				parseStatistics(child) ;
			}
			
		
		}
	}

	private static void parseHits(Element element)
	{
		List elts = element.getChildren();
		
		Iterator iterator = elts.iterator();
		
		while (iterator.hasNext())
		{
			Element child = (Element) iterator.next();

			printElt(child) ;
			
			parseHit(child) ;
			
		}
	}
	
	private static void parseHspHit(Element element)
	{
		Element hsp = element.getChild("Hsp") ;

		List hsps = hsp.getChildren();
		
		Iterator iterator = hsps.iterator();
		
		while (iterator.hasNext())
		{
			Element child = (Element) iterator.next();

			printElt(child) ;
		}
	}
	
	private static void printElt(Element elt)
	{
		System.out.println("Element: [" + elt.getName() + "] -
text:" + elt.getText() ) ;
	}
	
	private static void parseHit(Element element)
	{
		List elts = element.getChildren();
		
		Iterator iterator = elts.iterator();
		
		while (iterator.hasNext())
		{
			Element child = (Element) iterator.next();

			printElt(child) ;
			
			if (child.getName().equals("Hit_hsps"))
					{
					parseHspHit(child) ;
					}
			
		}
	}
	
	
	private static void parseStatistics(Element element)
	{
		Element stat = element.getChild("Statistics") ;
		
		List elts = stat.getChildren();
		
		Iterator iterator = elts.iterator();
		
		while (iterator.hasNext())
		{
			Element child = (Element) iterator.next();

			printElt(child) ;
			
		}
		
	}
	
	
	public static void parseFile(File file) throws JDOMException,
IOException
	{
		SAXBuilder parser = new SAXBuilder();
		Document doc = parser.build(file);

		Element root = doc.getRootElement();

		List elts = root.getChildren();
		Iterator iterator = elts.iterator();

		int index = 0;
		while (iterator.hasNext())
		{

			Element child = (Element) iterator.next();

			printElt(child) ;

			if
(child.getName().equals("BlastOutput_iterations"))
				parseResults(child);

		}

	}

	/**
    * @param args
    */
	public static void main(String[] args)
	{
		File f = new File("E:/result.xml");

		try
		{
			parseFile(f);
		}
		catch (JDOMException e)
		{
			e.printStackTrace();
		}
		catch (IOException e)
		{
			e.printStackTrace();
		}
	}

}





+++++++++++++++++++++++++++++++++++++




-----Ursprüngliche Nachricht-----
Von: Sébastien PETIT [mailto:great_fred at yahoo.com]
Gesendet: Dienstag, 28. Juni 2005 13:34
An: biojava-l at biojava.org
Betreff: RE: [Biojava-l] BLAST Parser for extracting all BLAST data?


Arggh!!!!I didn't find what I wanted!!

I used the program you gave me but with a light modification because it
didn't recognize my XML file...
The parser is, now, a BlastXMLParserFacade....
And it gave me everythings it found in the file.....
BUT not what I want!!GRRR...>:( >:( >:(

There is a mark out (I don't know if it's the good word...) in my XML
file which frame what I'm searching for : <Hsp_midline>....
Why the parser doesn't see it..??

I didn't really understand how the XML parser works....So, how can I
modifie it to find my happiness...??

PLEASE DOC'!!! ;);)
Help me!!

Thanks for everythings..

Sebastien

--- mark.schreiber at novartis.com a écrit :

> Hi -
> 
> Try running this program 
> http://www.biojava.org/docs/bj_in_anger/blastecho.htm
> 
> If you see what you need in the output then it is being read by the
> Blast 
> parser and emitted as an event (which you could listen for). If it
> isn't 
> then the Blast parser is not emitting those events although someone 
> confident with the blast format could probably modify it so it does.
> 
> In short, it is possible but it might not be implemented ; )
> 
> - Mark
> 
> 
> 
> 
> 
> Sébastien PETIT <great_fred at yahoo.com>
> Sent by: biojava-l-bounces at portal.open-bio.org
> 06/28/2005 05:11 PM
> 
>  
>         To:     biojava-l at biojava.org
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        RE: [Biojava-l] BLAST Parser for extracting
> all BLAST data?
> 
> 
> Hi, everybody...
> 
> I'm like Georges....I want to extract data from BLAST files.....
> I can have the alignements, no problem...But, now, I want the
> alignment
> between the 2 sequences (the lines with "+", "-" and some letters in
> George's example....) because with this, we can see in a glance if
> the
> alignment between the 2 sequences is really good or not.
> 
> Is it possible, Docs??
> 
> Thank you.
> 
> Sebastien
> 
> --- Richard HOLLAND <hollandr at gis.a-star.edu.sg> a écrit :
> 
> > BioJava's BLAST framework parses files and fires events for every
> > piece of information it finds. The SeqSimilarityAdapter class is an
> > example of how to catch these events and construct basic BLAST
> result
> > objects (SimpleSeqSimilarityHit), however they are not
> comprehensive
> > and do not record full details of every hit.
> > 
> > If you want the kind of detail you mention below you will have to
> > write your own content handler for BLAST parsing and parse it to
> the
> > BLASTLikeSAXParser when parsing a file. This event handler should
> > implement the ContentHandler interface. Look at the source of
> > SeqSimilarityAdapter for guidance. You will then receive events for
> > every part of the file, from which you can construct your own
> custom
> > BLAST result objects to describe them.
> > 
> > If you're not sure what tag names to listen for in your
> > ContentHandler the easiest thing to do is just run it once and dump
> > them all out to see what you get.
> > 
> > cheers,
> > Richard
> > 
> > 
> > -----Original Message-----
> > From:          biojava-l-bounces at portal.open-bio.org on behalf of Y
> D 
> Sun
> > Sent:          Sun 6/26/2005 5:42 PM
> > To:            biojava-l at biojava.org
> > Cc: 
> > Subject:               [Biojava-l] BLAST Parser for extracting all
> BLAST 
> data?
> > 
> > Hi,
> > 
> > I want to extract all data from BLASTP results. In the following
> hit,
> > for example, I need to get the lengths of query and subject
> proteins,
> > the identities (including all data 54, 124 and 43%), the positives
> > (all
> > data 79, 124 and 63%), and the gaps (3, 124 and 2%). Can the
> > BLASTLikeSAXParser filter all these information? I can't find the
> > methods in SeqSimilaritySearchHit and SeqSimilaritySearchSubHit
> APIs
> > to
> > retrieve these data. Does Biojava provide any methods for this
> > purpose?
> > 
> > Thanks,
> > 
> > George
> > 
> > 
> > BLASTP 2.2.5 [Nov-16-2002]
> > 
> > Query= Prot0001
> >          (138 letters)
> > 
> > Database: /work/nys1/fasta/protein/AE000782.pro.fasta
> >            2407 sequences; 662,866 total letters
> > 
> > Searching.....done
> > 
> > 
> > Score
> > E
> > Sequences producing significant alignments: 
> > (bits)
> > Value
> > 
> > Prot0002 
> > 100
> > 1e-23
> > Prot0003 
> > 74
> > 2e-15
> > Prot0004 
> > 43
> > 3e-06
> > 
> > >Prot0002
> >           Length = 138
> > 
> >  Score =  100 bits (250), Expect = 1e-23
> >  Identities = 54/124 (43%), Positives = 79/124 (63%), Gaps = 3/124
> > (2%)
> > 
> > Query: 18 
> > NARTKFTDIAKTLNLTEAAIRKRIKKLEENQIIKRYSIDIDYKKLGYNMAIIGLDIDMDY
> > 77
> >            NAR   T IAK LN+TEAA+RKRI  LE  + I  Y   I+YKK+G + ++
> G+D+D
> > D
> > Sbjct: 15 
> > NARIPKTRIAKELNVTEAAVRKRIANLERREEILGYKAIINYKKVGLSASLTGVDVDPDK
> > 74
> > 
> > Query: 78 
> > FPKIIKELEKRKEFLHIYSSAGDHDIMVIAIYK---DLEEIYNYLKNLKGVKRVCPAIII
> > 134
> >              K+++EL+  +    ++ + GDH IM   I K   +L EI+  + 
> > ++GVKRVCP+II
> > Sbjct: 75 
> > LWKVVEELKDLESVKSLWLTTGDHTIMAEIIAKSVQELSEIHQKIAEMEGVKRVCPSIIT
> > 134
> > 
> > Query: 135 DQIK 138
> >            D +K
> > Sbjct: 135 DIVK 138
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> > 
> 
> 
> 
>  
> 
>  
>  
>
___________________________________________________________________________
> 
> 
> Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo!
> Messenger 
> 
> Téléchargez cette version sur http://fr.messenger.yahoo.com
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 
> 
> 
> 



	

	
		
___________________________________________________________________________ 
Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger 
Téléchargez cette version sur http://fr.messenger.yahoo.com
_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l



More information about the Biojava-l mailing list