[Bioperl-l] Sequence names and descriptions in Bio::SearchIO
Jason Stajich
jason@cgt.mc.duke.edu
Wed, 15 May 2002 13:05:44 -0400 (EDT)
These are separated into two fields, name and description.
The reason for the inconsistency between query and hit fields is because
of the way NCBI separated their XML BLAST output providing the whole line
in the name for a query while providing an accession and description line
in the hit. I should have implemented the query_description by separating
it from the name and will try and do that before the bugfix release.
For hits, I did a best guess and returned the first thing before
whitespace for the name in the hit and the rest of the line is pushed into
the description field.
so
while( my $result = $in->next_result ) {
# query info
my ($queryname, $querylen) = ($result->query_name, $result->query_length);
# this would be the difference I would hope to implement
# to push the description into the right place
my ($qname,$desc) = split(/\s+/,$queryname);
while( my $hit = $result->next_hit ) {
my ($name,$desc) = ($hit->name, $hit->description);
...
}
}
On Wed, 15 May 2002, Andy Nunberg wrote:
> Hi,
> I am trying to determine how names and descriptions for sequences are
> implemented in SearchIO blast objects. I seem to be getting the entire
> definition line of the query and truncation of the hit
> description,where the description of the hit is more than one line in
> the raw blast report.
>
> accessing the query name from the result object or hsp object gives the
> same result
>
> Andy
> *******************************************************************
> Andy Nunberg, Ph.D
> Computational Biologist
> Orion Genomics, LLC
> (314) 615-6989
> http://www.oriongenomics.com
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu