[Bioperl-l] recovering blast query_name

Wiepert, Mathieu Wiepert.Mathieu@mayo.edu
Wed, 20 Nov 2002 15:51:24 -0600


Hi,

I made a few assumptions with the previous answer, sorry. You need bioperl-live to get that to work, I don't think it is in the 1.02 distro.  

Additionally, I only tested with fasta files, I assume that anything else will still work, as long as the sequence has a description.  The query name is built up like

	$header{'QUERY'} = ">".(defined $seq->display_id() ? $seq->display_id() : "").
		" ".(defined $seq->desc() ? $seq->desc() : "")."\n".$seq->seq();

so, the sequences have to have a display id and description to get a query name?


My previous example was only slightly off, I left out the description. 

>U20499_EXON_1A 2848-2960 of U20499
acactggaccttcaaaaccctcagggcagagagcagccctacactccctacaccacaccc
atactcagcccctgcaggcaaggagagaacaggtcaggttcccgagagctcag

results in query name of
U20499_EXON_1A 2848-2960 of U20499

parsed from the header of this blast result (saved from the remote blast)

BLASTN 2.2.4 [Aug-26-2002]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.
RID: 1033569396-029169-20578
Query= U20499_EXON_1A 2848-2960 of U20499
         (113 letters)

Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,
or phase 0, 1 or 2 HTGS sequences) 
           1,406,693 sequences; 6,799,009,920 total letters

Check the actual blast results, and make sure that has the query name in it, if it doesn't, then we have a problem...

Here is the more current documentation 
http://doc.bioperl.org/bioperl-live/Bio/Tools/Run/RemoteBlast.html

-Mat


> -----Original Message-----
> From: Lewis Lukens [mailto:llukens@uoguelph.ca]
> Sent: Wednesday, November 20, 2002 2:49 PM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] recovering blast query_name
> 
> 
> Hello,
> 
> Sorry for a basic question... I have been trying to use the 
> Bio::Tools:Run:RemoteBlast module to blast a single file with many 
> fasta formated sequences against ncbi nt and parse the blast reports. 
> Almost everything is working well.  I get all the hit and hsp 
> features for all the hits.  I can recover the query sequence, but I 
> can't seem to recover the query sequence names.  How does one do this?
> 
> I used almost the exact code as in the Remoteblast Synopsis
> http://doc.bioperl.org/releases/bioperl-1.0.2/Bio/Tools/Run/Re
> moteBlast.html
> 
> in this code, this expression works:
> print "db is ", $result->database_name(), "\n";
> 
> but, these expressions return empty fields:
>      my $name = $result->query_name();
>      my $desc = $result->query_description();
>      my $acc= $result->query_accession();
> 
> I have been using SearchIO to parse blast output files and never had 
> this problem before.  Any ideas?
> 
> Thanks much,
> Lewis
> -- 
> Lewis Lukens
> Assistant Professor
> Department of Plant Agriculture
> Univ. of Guelph, Guelph, Ontario. N1G 2W1
> 
> Tel: (519) 824- 4120 ext 2304
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>