[Bioperl-l] Aggressive aggregation?

Jason Stajich jason.stajich at duke.edu
Wed Mar 9 13:42:47 EST 2005


So personally, I wouldn't use default BLASTN.

I'd use WU-BLAST with the -links option (this has worked well for  
mapping Brassica ESTs to Arabidopsis in my experience).  Then you can  
parse the BLAST (writing your own slightly customized version of  
search2gff which looks at the $hsp->link option to group things.  I  
just lectured on this today in fact:

http://people.genome.duke.edu/~jes12/BGT203.2005/projects/ 
find_duplicates/scripts/draw_hits.pl
http://people.genome.duke.edu/~jes12/BGT203.2005/projects/ 
find_duplicates/scripts/draw_hits_perlink.pl


Or if you are willing to have a little more overhead - exonerate  
(http://www.ebi.ac.uk/~guy/exonerate/) with the est2genome model which  
will try and splice the EST onto the genome for you as well.  You can  
dump out GFF directly which needs to be massaged a little before  
loading into Bio::DB::GFF.

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 8, 2005, at 10:15 PM, Chad Matsalla wrote:

>
>
> Subject: Aggressive Aggregators
>
> Greetings all,
>
> I'm looking for help in presenting Blast hits in GBrowse.
>
> I blasted Brassica EST sequences against the Arabidopsis
> pseudochromosome assemblies in order to store them in a Bio::DB::GFF
> database. I used a tool based bp_search2gff.pl to `convert' blast
> reports into gff. A sample of that gff is below[1].
>
> My problem is partly based on a peculiarity of Blast and partly based  
> on
> the behavior of the aggregators in GBrowse and I'm wondering if someone
> else has seen this.
>
> Arabidopsis has five chromosomes. In order to get the coordinates
> necessary to place ESTs on the chromosomes I created a blast database
> containing 5 query sequences - chr1, chr2, chr3, chr4, chr5.
>
> My problem presents itself when an EST hits at more than once place on  
> a
> Chromosome.  Let us say that on chr1 there is a cluster of HSPs for the
> est chad1 at position 1000, a second cluster at position 10,000 and a
> third cluster at 50,000. Blast will indicate a SINGLE hit on chr1.
>
> SO, I manually find clusters of HSPs and create GFF that resembles that
> below[1]. Yes I know that wublast has an option to prevent that
> behavior.
>
> The problem is that the `match' aggregator joins all of the `matches'
> together.  I understand that it's because all of the matches have the
> same Target - that's necessary to have the proper sequence appear while
> viewing base-base alignments.
>
> HSPs:        <-->  <-->  <-->                 <-->  <-->  <-->
> matches:     <-------------->                 <-------------->
>
> What I get : <-->--<-->--<-->-----------------<-->--<-->--<-->
> What I want: <-->--<-->--<-->                 <-->--<-->--<-->
>
> How do I get what I want? In my gbrowse.conf I tried the standard
> `match' aggregator and a custom aggregator: csmmatch{csmhsp/csmmatch}
>
>
> Chad Matsalla
>
>
> [1]
> chr1 aafcest     HSP   1     75    .     +     .     Target  
> "Sequence:chad1" 1 75
> chr1 aafcest     HSP   100   150   .     +     .     Target  
> "Sequence:chad1" 100 150
> chr1 aafcest     match 1     150   .     +     .     Target  
> "Sequence:chad1" 1 150
>
> chr1 aafcest     HSP   200   275   .     -     .     Target  
> "Sequence:chad1" 200 275
> chr1 aafcest     HSP   300   450   .     -     .     Target  
> "Sequence:chad1" 300 450
> chr1 aafcest     match 200   450   .     -     .     Target  
> "Sequence:chad1" 200 450
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050309/e805c467/PGP.bin


More information about the Bioperl-l mailing list