[Bioperl-l] Parsing Blast-Report extracting "Features flanking .."
jobu
buschj at hhu.de
Fri Jan 25 10:20:22 UTC 2013
Am 22.01.2013 19:03, schrieb Mgavi Brathwaite:
> What upstream and downstream elements are you interested in?
I've got a huge pile of short RNA reads.
Part of the question now is whether those RNA fragments originate from
siRNA events,
or may represent miRNAs / parts of pre-miRNAs.
So I did an online blast search against database nt.
The resulting report quite often just gives subject information like this:
-----
> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence
Length=23459830
-----
Now I would like to get the hit's neighbouring regions for further
analysis.
Preferably I would like to do that in an automized way, but the only
possible action with this kind of subject gi | description would be to
fetch the entire chromosomal sequence I guess ?
However,
right below the line above, the report states more precisely:
------
Features flanking this part of subject sequence:
8872 bp at 5' side: cytochrome P450 90B1
402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K
------
Still I would like to have the possibility to automatically fetch the
subject's sequence(s),
as of now I think parsing the report with SearchIO won't let me aquire
that information, because SearchIO does not recognize report sections
like those.
I hope I did not miss any of SearchIOs capabilities, but I could not
find any method covering my wish?!
Right now maybe the only way to get the information I want is to
construct my own parser and write it out into a separate file, which in
turn again I could read into a hash before processing the Blast-Report
with SearchIO to combine both data for further automized work.
I am aware though that even successfully getting the flanking features
would leave me with the more or less wide intergenic gap my hsp is
located in.
However I'm in need of a way to get the flanking features including
their annotation and the region spanning between them.
But I hope I do not have to get complete sequences to accomplish that,
as this would be kind of an overkill.
with kind regards
Jochen
More information about the Bioperl-l
mailing list