[Bioperl-l] Aggressive aggregation?
Jason Stajich
jason.stajich at duke.edu
Wed Mar 9 13:42:47 EST 2005
So personally, I wouldn't use default BLASTN.
I'd use WU-BLAST with the -links option (this has worked well for
mapping Brassica ESTs to Arabidopsis in my experience). Then you can
parse the BLAST (writing your own slightly customized version of
search2gff which looks at the $hsp->link option to group things. I
just lectured on this today in fact:
http://people.genome.duke.edu/~jes12/BGT203.2005/projects/
find_duplicates/scripts/draw_hits.pl
http://people.genome.duke.edu/~jes12/BGT203.2005/projects/
find_duplicates/scripts/draw_hits_perlink.pl
Or if you are willing to have a little more overhead - exonerate
(http://www.ebi.ac.uk/~guy/exonerate/) with the est2genome model which
will try and splice the EST onto the genome for you as well. You can
dump out GFF directly which needs to be massaged a little before
loading into Bio::DB::GFF.
-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
On Mar 8, 2005, at 10:15 PM, Chad Matsalla wrote:
>
>
> Subject: Aggressive Aggregators
>
> Greetings all,
>
> I'm looking for help in presenting Blast hits in GBrowse.
>
> I blasted Brassica EST sequences against the Arabidopsis
> pseudochromosome assemblies in order to store them in a Bio::DB::GFF
> database. I used a tool based bp_search2gff.pl to `convert' blast
> reports into gff. A sample of that gff is below[1].
>
> My problem is partly based on a peculiarity of Blast and partly based
> on
> the behavior of the aggregators in GBrowse and I'm wondering if someone
> else has seen this.
>
> Arabidopsis has five chromosomes. In order to get the coordinates
> necessary to place ESTs on the chromosomes I created a blast database
> containing 5 query sequences - chr1, chr2, chr3, chr4, chr5.
>
> My problem presents itself when an EST hits at more than once place on
> a
> Chromosome. Let us say that on chr1 there is a cluster of HSPs for the
> est chad1 at position 1000, a second cluster at position 10,000 and a
> third cluster at 50,000. Blast will indicate a SINGLE hit on chr1.
>
> SO, I manually find clusters of HSPs and create GFF that resembles that
> below[1]. Yes I know that wublast has an option to prevent that
> behavior.
>
> The problem is that the `match' aggregator joins all of the `matches'
> together. I understand that it's because all of the matches have the
> same Target - that's necessary to have the proper sequence appear while
> viewing base-base alignments.
>
> HSPs: <--> <--> <--> <--> <--> <-->
> matches: <--------------> <-------------->
>
> What I get : <-->--<-->--<-->-----------------<-->--<-->--<-->
> What I want: <-->--<-->--<--> <-->--<-->--<-->
>
> How do I get what I want? In my gbrowse.conf I tried the standard
> `match' aggregator and a custom aggregator: csmmatch{csmhsp/csmmatch}
>
>
> Chad Matsalla
>
>
> [1]
> chr1 aafcest HSP 1 75 . + . Target
> "Sequence:chad1" 1 75
> chr1 aafcest HSP 100 150 . + . Target
> "Sequence:chad1" 100 150
> chr1 aafcest match 1 150 . + . Target
> "Sequence:chad1" 1 150
>
> chr1 aafcest HSP 200 275 . - . Target
> "Sequence:chad1" 200 275
> chr1 aafcest HSP 300 450 . - . Target
> "Sequence:chad1" 300 450
> chr1 aafcest match 200 450 . - . Target
> "Sequence:chad1" 200 450
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050309/e805c467/PGP.bin
More information about the Bioperl-l
mailing list