[Bioperl-l] Bio::Tools::Blast::HTML questions

Andrew Dalke dalke@acm.org
Fri, 15 Sep 2000 23:08:20 -0600


Zhao, David [PRI] <DZhao1@prius.jnj.com> said:
> It seems that nobody has had the same problem, or you guy think
> this is just not significant enough to be answered.

Actually, there could be several other reasons.  For example, almost
the only time I get HTML formatted mail is from junk/spam mail, so
I have an almost instinctual urge to delete those mails when I see
them.  Since your message in no way needed the extra abilities of
HTML, you should have used ASCII instead.

Second, it was hard to figure out what the problem was, if just
given your description.  A more helpful report might have been

] Hi there,
]   It seems the HTML module doesn't recognize the genbank format in
] the summary table.  The lines look like:
]
] dbj|AU027194.1|AU027194 Rattus norvegicus, OTSUKA clone, OT17.21...    52
4e-06
]
] When I replace the ".1" in "AU027194.1" with "AU027194" it works.
] Here's the full table:
] ...

It isn't much harder to write than your original email, but it is much
easier for someone else to understand.

It would also be helpful to know what "doesn't recognize" means.  Does
it stop with an error?  Is the line just ignored?  Is the rest of the
input file ignored?

The better a bug report is, the more likely it will be answered.  But
it takes effort and practice to learn how to write a good report.

Third, given that it's been over 24 hours, you could have messed around
with the code yourself.  A good bug report can almost guide you to where to
look in the code.

In this case, that's the code for parsing the summary table, most likely
related to parsing genbank lines.  From a quick perusal, the problem would
likely be in the section:

## REGEXPS FOR SUMMARY TABLE LINES AT TOP OF REPORT (a.k.a. 'descriptions')
## (table of sequence id, description, score, P/Expect value, n)
##
## Not using bold face to highlight the sequence id's since this can throw
off
## off formatting of the line when the IDs are different lengths. This lead
to
## the scores and P/Expect values not lining up properly.

    ### NCBI-specific markups for description lines:

  # GenBank/EMBL, DDBJ hits (GenBank Format):
  s@^ ?(gb|emb|dbj)\|($Word)(\|$Word)?($Descrip)($Int +)($Signif)(.*)$@$1:<a
href=
"$DbUrl{'gb_n'}$2">$2$3</a>$4$5<A href="\#$2_A">$6</a>$7<a
name="$2_H"></a>@o;

It wasn't very hard to find this code.

If you look at the definition of "Word" you'll see it is defined as "[\w_.]"
so this pattern *should* match the data line you give.

So in a followup email you could describe your hypothesis of the problem
and what you've done to track it down.


Fourth, given that the code is correct, you could check the CVS logs,
available even to anonymous external users, and see

revision 1.3.2.1
date: 2000/05/18 20:53:31;  author: sac;  state: Exp;  lines: +6 -4
- The $Word and $Acc strings now include '.' to accomodate accessions with
  version number. Word also allows '_'  to work with ref seq accessions.
- Silencing warnings during _markup_report.

Checking bioperl-0.6.1.tar.gz (with the file datestamp on the ftp site of
May 19, 2000, so the day after Steve's fix) you'll see that the code
contains the fix mentioned in the CVS log.

So the answer to your statement:
> It seems that nobody has had the same problem, or you guy think
> this is just not significant enough to be answered.

is that it has been seen, corrected, and distributed almost 4 months
ago, so nobody has the problem.  You need to update your distribution.
Also, I'll bet that only 2 or 3 people ever saw the bug before it was
fixed, so most of the people on the list really have not ever seen
the problem and could not answer your email without spending non-trivial
time digging through the back logs.

This also means that if you are submitting a bug report, you do need
to include the version number in which you found the problem.

                    Andrew Dalke
                    dalke@acm.org