[Biojava-l] BLAST Parser for extracting all BLAST data?
Sébastien PETIT
great_fred at yahoo.com
Tue Jun 28 05:11:12 EDT 2005
Hi, everybody...
I'm like Georges....I want to extract data from BLAST files.....
I can have the alignements, no problem...But, now, I want the alignment
between the 2 sequences (the lines with "+", "-" and some letters in
George's example....) because with this, we can see in a glance if the
alignment between the 2 sequences is really good or not.
Is it possible, Docs??
Thank you.
Sebastien
--- Richard HOLLAND <hollandr at gis.a-star.edu.sg> a écrit :
> BioJava's BLAST framework parses files and fires events for every
> piece of information it finds. The SeqSimilarityAdapter class is an
> example of how to catch these events and construct basic BLAST result
> objects (SimpleSeqSimilarityHit), however they are not comprehensive
> and do not record full details of every hit.
>
> If you want the kind of detail you mention below you will have to
> write your own content handler for BLAST parsing and parse it to the
> BLASTLikeSAXParser when parsing a file. This event handler should
> implement the ContentHandler interface. Look at the source of
> SeqSimilarityAdapter for guidance. You will then receive events for
> every part of the file, from which you can construct your own custom
> BLAST result objects to describe them.
>
> If you're not sure what tag names to listen for in your
> ContentHandler the easiest thing to do is just run it once and dump
> them all out to see what you get.
>
> cheers,
> Richard
>
>
> -----Original Message-----
> From: biojava-l-bounces at portal.open-bio.org on behalf of Y D Sun
> Sent: Sun 6/26/2005 5:42 PM
> To: biojava-l at biojava.org
> Cc:
> Subject: [Biojava-l] BLAST Parser for extracting all BLAST data?
>
> Hi,
>
> I want to extract all data from BLASTP results. In the following hit,
> for example, I need to get the lengths of query and subject proteins,
> the identities (including all data 54, 124 and 43%), the positives
> (all
> data 79, 124 and 63%), and the gaps (3, 124 and 2%). Can the
> BLASTLikeSAXParser filter all these information? I can't find the
> methods in SeqSimilaritySearchHit and SeqSimilaritySearchSubHit APIs
> to
> retrieve these data. Does Biojava provide any methods for this
> purpose?
>
> Thanks,
>
> George
>
>
> BLASTP 2.2.5 [Nov-16-2002]
>
> Query= Prot0001
> (138 letters)
>
> Database: /work/nys1/fasta/protein/AE000782.pro.fasta
> 2407 sequences; 662,866 total letters
>
> Searching.....done
>
>
> Score
> E
> Sequences producing significant alignments:
> (bits)
> Value
>
> Prot0002
> 100
> 1e-23
> Prot0003
> 74
> 2e-15
> Prot0004
> 43
> 3e-06
>
> >Prot0002
> Length = 138
>
> Score = 100 bits (250), Expect = 1e-23
> Identities = 54/124 (43%), Positives = 79/124 (63%), Gaps = 3/124
> (2%)
>
> Query: 18
> NARTKFTDIAKTLNLTEAAIRKRIKKLEENQIIKRYSIDIDYKKLGYNMAIIGLDIDMDY
> 77
> NAR T IAK LN+TEAA+RKRI LE + I Y I+YKK+G + ++ G+D+D
> D
> Sbjct: 15
> NARIPKTRIAKELNVTEAAVRKRIANLERREEILGYKAIINYKKVGLSASLTGVDVDPDK
> 74
>
> Query: 78
> FPKIIKELEKRKEFLHIYSSAGDHDIMVIAIYK---DLEEIYNYLKNLKGVKRVCPAIII
> 134
> K+++EL+ + ++ + GDH IM I K +L EI+ +
> ++GVKRVCP+II
> Sbjct: 75
> LWKVVEELKDLESVKSLWLTTGDHTIMAEIIAKSVQELSEIHQKIAEMEGVKRVCPSIIT
> 134
>
> Query: 135 DQIK 138
> D +K
> Sbjct: 135 DIVK 138
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
___________________________________________________________________________
Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger
Téléchargez cette version sur http://fr.messenger.yahoo.com
More information about the Biojava-l
mailing list