[Bioperl-l] XML Blast parsing?

Aaron J Mackey Aaron J. Mackey" <amackey@virginia.edu
Sun, 27 May 2001 20:43:19 -0400 (EDT)


  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.
  Send mail to mime@docserver.cac.washington.edu for more info.

--0-1078688315-991010599=:10920
Content-Type: TEXT/PLAIN; charset=US-ASCII


Attached is a "template" script that we use to get data out of XML blast
output.  All this template does is build up some simple Perl
datastructures (list of lists of hits, etc) with minimal data, but gives
you the overall structure to get what you need, while also taking
advantage of Twig's purging features (so that you get rid of the pieces of
the XML document you've already handled).

I want to repeat that this is simply a template for more advanced usage -
the script itself does absolutely nothing useful (I'm actually pretty good
at writing those kind of scripts, but that's another story).

-Aaron

On Sat, 26 May 2001, Hilmar Lapp wrote:

> Aaron J Mackey wrote:
> >
> > I don't have the scripts handy at the moment, but I can post them later if
> > people are interested ...
> >
>
> Do post; we can put it somewhere (dist. or Wiki?), and if someone
> wants to extend it, he or she has got a head start.
>
> 	Hilmar
>

-- 
 o ~   ~   ~   ~   ~   ~  o
/ Aaron J Mackey           \
\  Dr. Pearson Laboratory  /
 \ University of Virginia  \
 /  (804) 924-2821          \
 \  amackey@virginia.edu    /
  o ~   ~   ~   ~   ~   ~  o


--0-1078688315-991010599=:10920
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="blast-twig.pl"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.OSF.4.33.0105272043190.10920@alpha10.bioch.virginia.edu>
Content-Description: blast-twig.pl
Content-Disposition: attachment; filename="blast-twig.pl"

IyEvdXNyL2Jpbi9wZXJsIC13DQoNCnVzZSBzdHJpY3Q7DQp1c2UgWE1MOjpU
d2lnOw0KdXNlIEhUTUw6OkVudGl0aWVzOyAjIHRvIGdldCByaWQgb2YgIiZs
dDsgJmd0OyIgSFRNTCBhcnRpZmFjdHMNCg0KdXNlIERhdGE6OkR1bXBlcjsN
Cg0KbXkgQHF1ZXJpZXM7DQpteSBAaGl0czsNCm15ICRxdWVyeW51bSA9IC0x
OyAjIHdoaWNoIHF1ZXJ5IG51bWJlciB3ZSdyZSBjdXJyZW50bHkgcHJvY2Vz
c2luZw0KbXkgJGhpdG51bSA9IC0xOyAjIHdoaWNoIGhpdCBudW1iZXIgd2Un
cmUgY3VycmVudGx5IHByb2Nlc3NpbmcNCg0KIyBOb3RlOiBieSB1c2luZyBv
bmx5IG9uZSBoYW5kbGVyIGZvciAiaGl0IiwgdGhpcyBtZWFucyB0aGF0IGZv
ciBlYWNoDQojIGhpdCwgaXRzIGVudGlyZSBzdWJ0cmVlIHdpbGwgbmVlZCB0
byBiZSBpbiBtZW1vcnkgYXQgb25jZS4gIFRocm91Z2gNCiMgdXNlIG9mIG1v
cmUgc3ViaGFuZGxlcnMgYW5kIHByb2dyYW1taW5nIGxvZ2ljLCB0aGlzIGNv
dWxkIGJlIHBhcmVkDQojIGRvd24gZXZlbiBmdXJ0aGVyIChpLmUuIHRvIHRo
ZSBsZXZlbCBvZiBlYWNoIGhzcCBlbGVtZW50KS4NCg0KbXkgJHR3aWcgPSBu
ZXcgWE1MOjpUd2lnKCBUd2lnSGFuZGxlcnMgPT4geyAnQmxhc3RPdXRwdXRf
cXVlcnktSUQnID0+IFwmcXVlcnlfaWQsDQoJCQkJCSAgICAnQmxhc3RPdXRw
dXRfcXVlcnktZGVmJyA9PiBcJnF1ZXJ5X2RlZiwNCgkJCQkJICAgICdCbGFz
dE91dHB1dF9xdWVyeS1sZW4nID0+IFwmcXVlcnlfbGVuLA0KCQkJCQkgICAg
J0hpdCcgPT4gXCZoaXQsDQoJCQkJCX0sDQoNCgkJCSAgQ2hhckhhbmRsZXIg
PT4gXCZkZWNvZGVfZW50aXRpZXMNCgkJCSk7DQoNCmlmKCAkQVJHVlswXSkg
eyAkdHdpZy0+cGFyc2VmaWxlKCAkQVJHVlswXSk7IH0gICAgICAgICMgcGFy
c2UgYSBmaWxlDQplbHNlICAgICAgICAgIHsgJHR3aWctPnBhcnNlKCBcKlNU
RElOKTsgICAgICB9ICAgICAgICAjIHBhcnNlIHRoZSBzdGFuZGFyZCBpbnB1
dA0KDQpwcmludCBEdW1wZXIoXEBxdWVyaWVzLCBcQGhpdHMpOw0KZXhpdDsN
Cg0KIyMjIyMjIyMgb25seSBUd2lnIGhhbmRsZXJzIGJlbG93Og0KDQpzdWIg
cXVlcnlfaWQgew0KICAgIG15ICgkdCwgJG5vZGUpID0gQF87DQogICAgJHF1
ZXJ5bnVtKys7ICMgbm90ZTogd2UgZGVwZW5kIG9uIHRoZSBJRCBvY2N1cnJp
bmcgYmVmb3JlIGFueXRoaW5nIGVsc2UhIQ0KICAgICRoaXRudW0gPSAtMTsN
CiAgICAkaGl0c1skcXVlcnludW1dID0gW107ICMgaW5pdGlhbGl6ZSBoaXQg
YXJyYXkNCg0KICAgICRxdWVyaWVzWyRxdWVyeW51bV0tPntpZH0gPSAkbm9k
ZS0+dGV4dCgpOw0KICAgICR0LT5wdXJnZSgpOw0KfQ0KDQpzdWIgcXVlcnlf
ZGVmIHsNCiAgICBteSAoJHQsICRub2RlKSA9IEBfOw0KICAgICRxdWVyaWVz
WyRxdWVyeW51bV0tPntkZWZ9ID0gJG5vZGUtPnRleHQoKTsNCiAgICAkdC0+
cHVyZ2UoKTsNCn0NCg0Kc3ViIHF1ZXJ5X2xlbiB7DQogICAgbXkgKCR0LCAk
bm9kZSkgPSBAXzsNCiAgICAkcXVlcmllc1skcXVlcnludW1dLT57bGVufSA9
ICRub2RlLT50ZXh0KCk7DQogICAgJHQtPnB1cmdlKCk7DQp9DQoNCnN1YiBo
aXQgew0KICAgIG15ICgkdCwgJG5vZGUpID0gQF87DQogICAgJGhpdG51bSsr
Ow0KDQogICAgIyBnZXQgdGhlIG9uZS1kaW1lbnNpb25hbCBkYXRhOg0KICAg
IG15ICVoaXRfZmllbGRzID0gKCBpZCA9PiAnSGl0X2lkJywNCgkJICAgICAg
IGRlZiA9PiAnSGl0X2RlZicsDQoJCSAgICAgICBhY2MgPT4gJ0hpdF9hY2Nl
c3Npb24nLA0KCQkgICAgICAgbGVuID0+ICdIaXRfbGVuJw0KCQkgICAgICAg
KTsNCiAgICB3aGlsZShteSAoJGtleSwgJG5hbWUpID0gZWFjaCAlaGl0X2Zp
ZWxkcykgew0KCSRoaXRzWyRxdWVyeW51bV0tPlskaGl0bnVtXS0+eyRrZXl9
ID0gJG5vZGUtPmZpcnN0X2NoaWxkX3RleHQoJG5hbWUpOw0KICAgIH0NCg0K
ICAgIGZvciBteSAkaHNwICggJG5vZGUtPmZpcnN0X2NoaWxkKCdIaXRfaHNw
cycpLT5jaGlsZHJlbignSHNwJykgKSB7DQoJbXkgJWRhdGE7DQoNCglteSAl
aHNwX2ZpZWxkcyA9ICggZV92YWx1ZSA9PiAnSHNwX2V2YWx1ZScsDQoJCQkg
ICBhbGlnbl9sZW4gPT4gJ0hzcF9hbGlnbi1sZW4nLA0KCQkJICAgaWRlbnRp
dHkgPT4gJ0hzcF9pZGVudGl0eScsDQoJCQkgICBxdWVyeV9mcm9tID0+ICdI
c3BfcXVlcnktZnJvbScsDQoJCQkgICBxdWVyeV90byA9PiAnSHNwX3F1ZXJ5
LXRvJywNCgkJCSAgIHF1ZXJ5X2ZyYW1lID0+ICdIc3BfcXVlcnktZnJhbWUn
LA0KCQkJICAgaGl0X2Zyb20gPT4gJ0hzcF9oaXQtZnJvbScsDQoJCQkgICBo
aXRfdG8gPT4gJ0hzcF9oaXQtdG8nLA0KCQkJICAgaGl0X2ZyYW1lID0+ICdI
c3BfaGl0LWZyYW1lJw0KCQkJICAgKTsNCg0KCXdoaWxlKG15KCRrZXksICRu
YW1lKSA9IGVhY2ggJWhzcF9maWVsZHMpIHsNCgkgICAgJGRhdGF7JGtleX0g
PSAkaHNwLT5maXJzdF9jaGlsZF90ZXh0KCRuYW1lKTsNCgl9DQoJcHVzaCBA
eyRoaXRzWyRxdWVyeW51bV0tPlskaGl0bnVtXS0+e2hzcHN9fSwgXCVkYXRh
Ow0KICAgIH0NCg0KICAgICR0LT5wdXJnZSgpOw0KfQ0K
--0-1078688315-991010599=:10920--