[Biopython-dev] GenBank parser -- first go

Fri Dec 8 00:21:12 EST 2000

> ----- Original Message -----
> From: "Brad Chapman" <chapmanb at arches.uga.edu>
> To: "Cayte" <katel at worldpath.net>
> Cc: <biopython-dev at biopython.org>
> Sent: Tuesday, December 05, 2000 11:31 PM
> Subject: Re: [Biopython-dev] GenBank parser -- first go
>
>
> >
> > You should be able to get the text GenBank version of any record
> > without having to do a "save as text" on an html page. On the NCBI
> > page, there is a Text button at the top of a list of records that
> > will give you the flat-file text version of a record you searched
> > for using Entrez. You can then save this as text, and it'll be
> > consistent between browsers.
> >
>
   This should be fine for the first go.  For some later go, I think we
should strip the xml/html.  If there are multiple ways of manually
converting to text, you can just about guarantee all of them will be used
sooner or later.  As much as possible, manual editing should be replaced
with automated enhancements.

  There are some difficulties with conversion to text, because html/xml
isn't tied to the newline mechanism.  It can position lines anyway it likes
with any kind of fonts.  Genbank may be one step away from a flat file, but
its not true of all databases.  Rebase and Gobase are examples.

                   Cayte