[Biopython-dev] NCBIStandalone Blast HSP parsing

Mark Hoebeke Mark.Hoebeke at jouy.inra.fr
Mon Oct 17 10:07:13 EDT 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

I wanted a quick and easy way to determine the endpoints of HSPs extraced from
Blast reports parser with NCBIStandalone. Unfortunately the HSP class lacks the
query_end and sbjct_end attributes. Googling around led me to a recipe
describing how to compute the endpoint using the total length, gap length and
other niceties. Not exactly intuitive to me.

Hence I dove into the NCBIStandalone and HSP modules and made some slight
modifications. Basically I added the two attributes to HSP and the following
snippets to NCBIStandalone (release 1.4b):

972c972
<     _query_re = re.compile(r"Query: (\d+)\s*(.+) (\d+)")
- ---
>     _query_re = re.compile(r"Query: (\d+)\s*(.+) \d")
977,978c977
<         start, seq, end = m.groups()
<       self._hsp.query_end=string.atoi(end);
- ---
>         start, seq = m.groups()
997,998c996,997
<         start, seq, end = _re_search(
<             r"Sbjct: (\d+)\s*(.+) (\d+)", line,
- ---
>         start, seq = _re_search(
>             r"Sbjct: (\d+)\s*(.+) \d", line,
1014c1013
<       self._hsp.sbjct_end=string.atoi(end)
- ---
>

Looks to easy to be true, I thought. Now sorry if I'm missing some important
issues here (I'm quite new to BioPython), but is there a reason no one has made
this patch yet ?

Thanks for any comments (flames and others.)

Cheers,

Mark


- --
- ----------------------------Mark.Hoebeke at jouy.inra.fr-----------------------
Unité Statistique & Génome    _/_/_/    _/_/_/  http://stat.genopole.cnrs.fr
Tél : +33 (0)1 60 87 38 03  _/        _/          Fax : +33 (0)1 60 87 38 09
Tour Evry 2,                 _/_/    _/  _/_/         523, pl. des Terrasses
F-91000,                        _/  _/    _/                            Evry
PGP : A2AD52E3           _/_/_/      _/_/_/




-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDU7ARa3nTV6KtUuMRArBqAKC/m4i+VpVaU3clvOkMuYkfRrZQ+QCfbRKg
gBBW5wNKS3sb/Uqr31eumx8=
=vSWV
-----END PGP SIGNATURE-----


More information about the Biopython-dev mailing list