[Bioperl-l] Bioperl-l Digest, Vol 117, Issue 13
Dan kilburn
dr_kilburn59 at yahoo.com
Wed Jan 30 21:40:26 UTC 2013
Hi Jason,
Are there any plans to keep SearchIO up to date with ncbi blast? I know they change formats ridiculously often, but I had to write my own parser to get sequence identity, which I would rather not have done. I realize that this job would be a big load on anyone who takes it, but it's so fundamental. Maybe I can help.
--Dan
Sent from my iPhone
On Jan 30, 2013, at 12:00 PM, bioperl-l-request at lists.open-bio.org wrote:
> Send Bioperl-l mailing list submissions to
> bioperl-l at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> or, via email, send a message with subject or body 'help' to
> bioperl-l-request at lists.open-bio.org
>
> You can reach the person managing the list at
> bioperl-l-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bioperl-l digest..."
>
>
> Today's Topics:
>
> 1. Re: Parsing Blast-Report extracting "Features flanking .."
> (Jason Stajich)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 29 Jan 2013 11:00:16 -0800
> From: Jason Stajich <jason.stajich at gmail.com>
> Subject: Re: [Bioperl-l] Parsing Blast-Report extracting "Features
> flanking .."
> To: buschj at hhu.de
> Cc: bioperl-l at lists.open-bio.org
> Message-ID: <6E83E3F3-C304-4DC4-9A11-FE1CA90F207D at gmail.com>
> Content-Type: text/plain; charset=us-ascii
>
> We don't parse the NCBI feature info from the BLAST reports per your query. To look up a specific feature you can use Bio::DB::GenBank to query for sequence from a specific feature by accession number - see the HOWTOs for that.
>
> However, most people use tools that generate SAM/BAM files with short reads - then you can use a tool like bedtools to find overlaps of reads with the locations of features.
>
> basically:
> - download the genome and GFF for arabidopsis
> - align your sRNA to the genome with a short read aligner - bowtie, bwa, others
> - convert your sam to bam file with SAMtools or picard
> - compare the location of features with the reads to get expression summaries or individuals reads with BEDTools
>
>
> On Jan 25, 2013, at 2:20 AM, jobu <buschj at hhu.de> wrote:
>
>> Am 22.01.2013 19:03, schrieb Mgavi Brathwaite:
>>> What upstream and downstream elements are you interested in?
>>
>>
>> I've got a huge pile of short RNA reads.
>> Part of the question now is whether those RNA fragments originate from
>> siRNA events,
>> or may represent miRNAs / parts of pre-miRNAs.
>>
>> So I did an online blast search against database nt.
>> The resulting report quite often just gives subject information like this:
>>
>> -----
>>> gb|CP002686.1| Arabidopsis thaliana chromosome 3, complete sequence
>> Length=23459830
>> -----
>>
>> Now I would like to get the hit's neighbouring regions for further
>> analysis.
>> Preferably I would like to do that in an automized way, but the only
>> possible action with this kind of subject gi | description would be to
>> fetch the entire chromosomal sequence I guess ?
>>
>> However,
>> right below the line above, the report states more precisely:
>>
>> ------
>> Features flanking this part of subject sequence:
>> 8872 bp at 5' side: cytochrome P450 90B1
>> 402 bp at 3' side: U1 small nuclear ribonucleoprotein-70K
>> ------
>>
>> Still I would like to have the possibility to automatically fetch the
>> subject's sequence(s),
>> as of now I think parsing the report with SearchIO won't let me aquire
>> that information, because SearchIO does not recognize report sections
>> like those.
>>
>> I hope I did not miss any of SearchIOs capabilities, but I could not
>> find any method covering my wish?!
>>
>> Right now maybe the only way to get the information I want is to
>> construct my own parser and write it out into a separate file, which in
>> turn again I could read into a hash before processing the Blast-Report
>> with SearchIO to combine both data for further automized work.
>>
>> I am aware though that even successfully getting the flanking features
>> would leave me with the more or less wide intergenic gap my hsp is
>> located in.
>>
>> However I'm in need of a way to get the flanking features including
>> their annotation and the region spanning between them.
>> But I hope I do not have to get complete sequences to accomplish that,
>> as this would be kind of an overkill.
>>
>> with kind regards
>> Jochen
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> End of Bioperl-l Digest, Vol 117, Issue 13
> ******************************************
More information about the Bioperl-l
mailing list