[Bioperl-l] How to draw a plasmid map from a genbank-formatted file?

Mon Jun 25 16:48:30 UTC 2007

Martin,

Keep bioperl-related discussion on the bioperl mail list.  The large  
majority of this isn't biopython-related, but maybe some devs there  
can add to this?

On Jun 25, 2007, at 11:05 AM, Martin MOKREJŠ wrote:

...

> Would you please tell me exactly what is wrong with the spacing?

Here's a section of the seq record attached to your previous email:

DEFINITION .
ACCESSION .
VERSION .
SOURCE .
   ORGANISM .

Normally there is a fixed column width for any data present in a  
field, so it would look more like this:

DEFINITION  PYR4 (DIHYDROOROTASE, PYRIMIDIN 4, dihydroorotase);  
dihydroorotase
             [Arabidopsis thaliana].
ACCESSION   NP_194024
VERSION     NP_194024.1  GI:15235865
DBSOURCE    REFSEQ: accession NM_118422.3
KEYWORDS    .
SOURCE      Arabidopsis thaliana (thale cress)
   ORGANISM  Arabidopsis thaliana
             Eukaryota; Viridiplantae; Streptophyta; Embryophyta;  
Tracheophyta;
             Spermatophyta; Magnoliophyta; eudicotyledons; core  
eudicotyledons;
             rosids; eurosids II; Brassicales; Brassicaceae;  
Arabidopsis.

Here's the relevant bit in the latest release notes:

"The second part of each sequence entry record contains the information
appropriate to its keyword, in positions 13 to 80 for keywords and
positions 11 to 80 for the sequence."

The bioperl devs try to make our parsers as flexible as possible but  
others may not, so it's something in ApE that should probably be  
fixed.  And as mentioned to you several times in the past on the mail  
list and on bugzilla, don't expect sequence records which sway from  
the standard (in this case, the release notes) to parse correctly in  
all cases.  We can try supporting some that sway from that standard  
but only up to a point.  If it causes additional bugs, headaches, or  
degrades performance it won't be supported.

> ...
> Well, I just copy&pasted the script from the bioperl webpages, I think
> from a tutorial or FAQ, don't remember anymore.

Well, can't help you if you can't point out where the code originated  
from.  We would like to know so it can be corrected.

> ...
> Well, my search for such tools available on Unix to be used in a  
> script,
> non-interactively, completely failed. My last hope except getting  
> improved
> ApE is to use the GenomeDiagram under biopython, but so far my .gb  
> files
> cannot be parsed yet. :(
> Martin

As mentioned previously you will likely have to code for it yourself  
(perl or python) or help debug the relevant biopython code to get it  
working.  We can't/won't do this for you unless/until it's something  
we feel warrants implementation.  Judging by the bug list, we also  
haven't the time nor inclination to code for it.  Sorry but we have  
other priorities besides doing your work for you.

chris