[Bioperl-l] bp_genbank2gff3.pl

Scott Cain scott at scottcain.net
Sat Sep 18 13:48:35 UTC 2010


Hi Dave,

The blib directory is not part of the repository; it is created when
you execute ./Build as a staging area before installation.  The
directory that the script resides is scripts/Bio-DB-GFF/

Scott


On Sat, Sep 18, 2010 at 2:40 PM, David Breimann
<david.breimann at gmail.com> wrote:
> Now I did a fresh clone (instead of pull) into a new dir:
>
> $ git clone http://github.com/bioperl/bioperl-live.git
>
> but I don't find the script at all (there is no blib dir as before)...
>
>
> On Sat, Sep 18, 2010 at 3:14 PM, David Breimann <david.breimann at gmail.com>
> wrote:
>>
>> Yes, I'm using Ubuntu 10.04.
>>
>> That is really weired. I tried running the script from the perl-live dir
>> (which I just pulled using git), and I get the same results as before
>> (`Name` instead of `locus_tag`):
>>
>>  $ wget
>> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>>  $ /home/dave/src/bioperl-live/blib/script/bp_genbank2gff3.pl -y
>> NC_009789.genbank
>>
>> Attached is the resulting GFF3.
>> I also attach a copy of bp_genbank2gff3.pl as found under
>> /home/dave/src/bioperl-live/blib/script.
>>
>> This is a real mystery for me!
>>
>> On Sat, Sep 18, 2010 at 2:54 PM, Scott Cain <scott at scottcain.net> wrote:
>>>
>>> Typically I do build and install, but you can run it directly from the
>>> git checkout directory.
>>>
>>> For locating other versions of the script, are you running linux?  If
>>> so, are you familiar with the "locate" command:
>>>
>>>  locate bp_genbank2gff3.pl
>>>
>>> If you've never used it before, you may need to update the database
>>> the locate command uses as root:
>>>
>>>  sudo updatedb
>>>
>>> Scott
>>>
>>>
>>> On Sat, Sep 18, 2010 at 1:46 PM, David Breimann
>>> <david.breimann at gmail.com> wrote:
>>> > Your gff seems fine. I get a vey similiar one, but with `Name=` instaed
>>> > of
>>> > `locus_tag=`.
>>> >
>>> > I don't really know how to check for multiple bioperl installations.
>>> > I'm using my personal server, so I don't mind removing and installing
>>> > everything from scratch -- but I do'nt know ho to do that.
>>> >
>>> > Also, what I don't get with the git is how the scripts are supposed to
>>> > be
>>> > updated (unless you build and install).
>>> >
>>> > Thanks you!
>>> >
>>> > On Sat, Sep 18, 2010 at 2:38 PM, Scott Cain <scott at scottcain.net>
>>> > wrote:
>>> >>
>>> >> Well, if you aren't getting the same results as me then I'd say you
>>> >> aren't using the same version of the script :-)
>>> >>
>>> >> Unfortunately, the scripts are no longer automatically marked with the
>>> >> "internal" version information when committed, so there really isn't
>>> >> anything in the script I can tell you to look for.  Check for more
>>> >> than one bioperl instance on your  computer.
>>> >>
>>> >> I've attached the GFF3 file I got so you can look at it and tell me if
>>> >> it is what you expect.
>>> >>
>>> >> Scott
>>> >>
>>> >>
>>> >>
>>> >> On Sat, Sep 18, 2010 at 12:26 PM, David Breimann
>>> >> <david.breimann at gmail.com> wrote:
>>> >> > Hi Scott,
>>> >> >
>>> >> > I just pulled the lated bioperl-live using git.
>>> >> > I'm not sure how the scripts are updated, so I Build and installed
>>> >> > anyway
>>> >> > (perhaps exporting the path is supposed to be enough?)
>>> >> > Anyway, I still get the same results. No locus_tag.
>>> >> > How can I tell if I'm using the latest version of the script?
>>> >> >
>>> >> > Thanks again.
>>> >> >
>>> >> > On Sat, Sep 18, 2010 at 1:07 PM, Scott Cain <scott at scottcain.net>
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi Dave,
>>> >> >>
>>> >> >> A fresh "pull" of the bioperl git repository shows that
>>> >> >> bp_genbank2gff3.pl already does this.  It creates a locus_tag for
>>> >> >> all
>>> >> >> features that have a locus_tag, and uses the locus_tag for the ID
>>> >> >> when
>>> >> >> it can (it can't blindly use the locus tag for the ID since both
>>> >> >> the
>>> >> >> gene and the CDS have the same tag).
>>> >> >>
>>> >> >> Scott
>>> >> >>
>>> >> >>
>>> >> >> On Sat, Sep 18, 2010 at 11:20 AM, David Breimann
>>> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> > Hi Scott,
>>> >> >> >
>>> >> >> > Here is a very short genbank:
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_E24377A/NC_009789.gbk
>>> >> >> >
>>> >> >> > Note all genes in the genbank have locus tags. In the resulting
>>> >> >> > GFF3,
>>> >> >> > however, only the last gene (EcE24377A_B0005) gets a locus_tag. I
>>> >> >> > have
>>> >> >> > no
>>> >> >> > idea why it deserves a special treatment... :)
>>> >> >> >
>>> >> >> > p.s. making this change (i.e., copying locus_tag to the GFF3 last
>>> >> >> > column
>>> >> >> > whenever available) will really make my life easier.
>>> >> >> >
>>> >> >> > Thank you,
>>> >> >> > Dave
>>> >> >> >
>>> >> >> > On Sat, Sep 18, 2010 at 12:08 PM, Scott Cain
>>> >> >> > <scott at scottcain.net>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Hi Dave,
>>> >> >> >>
>>> >> >> >> That seems perfectly reasonable.  If you could point out a
>>> >> >> >> GenBank
>>> >> >> >> entry for which that does not happen, I could try to figure out
>>> >> >> >> why
>>> >> >> >> not.
>>> >> >> >>
>>> >> >> >> Scott
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> On Sat, Sep 18, 2010 at 10:20 AM, David Breimann
>>> >> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> >> > Since locus_tag is an essential tag in genbank, I suggest
>>> >> >> >> > locus_tag
>>> >> >> >> > will
>>> >> >> >> > be
>>> >> >> >> > always added to the GFF last column if it exists in the
>>> >> >> >> > genbank,
>>> >> >> >> > whether
>>> >> >> >> > it
>>> >> >> >> > is used as ID in the GFF or not.
>>> >> >> >> >
>>> >> >> >> > On Sat, Sep 18, 2010 at 11:17 AM, Scott Cain
>>> >> >> >> > <scott at scottcain.net>
>>> >> >> >> > wrote:
>>> >> >> >> >>
>>> >> >> >> >> Hi Dave,
>>> >> >> >> >>
>>> >> >> >> >> bp_genbank2gff3.pl suffers from the fact that it has to deal
>>> >> >> >> >> with
>>> >> >> >> >> GenBank files :-)  It was designed initially to work on whole
>>> >> >> >> >> genome
>>> >> >> >> >> refseqs, and contains several ad hoc rules for trying to make
>>> >> >> >> >> it
>>> >> >> >> >> "do
>>> >> >> >> >> the right thing."  In practice, it is not unusual for a post
>>> >> >> >> >> processing step (either by hand or a quicky perl script) to
>>> >> >> >> >> be
>>> >> >> >> >> required to really get it right.  I don't recall the
>>> >> >> >> >> specifics
>>> >> >> >> >> (if I
>>> >> >> >> >> ever knew :-) for when and how the locus tag is used, but I
>>> >> >> >> >> do
>>> >> >> >> >> know
>>> >> >> >> >> that there is a list of things that it will try to use for
>>> >> >> >> >> the
>>> >> >> >> >> ID,
>>> >> >> >> >> and
>>> >> >> >> >> while the locus is on the list, I don't know where it comes
>>> >> >> >> >> in
>>> >> >> >> >> the
>>> >> >> >> >> list, so it's possible that other items might supersede it.
>>> >> >> >> >>
>>> >> >> >> >> Scott
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> On Sat, Sep 18, 2010 at 10:05 AM, David Breimann
>>> >> >> >> >> <david.breimann at gmail.com> wrote:
>>> >> >> >> >> > Hello,
>>> >> >> >> >> >
>>> >> >> >> >> > I'm not sure how bp_genbank2gff3.pl works. Sometimes it
>>> >> >> >> >> > adds a
>>> >> >> >> >> > `locus_tag`
>>> >> >> >> >> > in the fields and sometime it doesn't, even though the
>>> >> >> >> >> > genabank
>>> >> >> >> >> > has a
>>> >> >> >> >> > locus
>>> >> >> >> >> > tag.
>>> >> >> >> >> > Also, is the ID always equivalent to the locus tag?
>>> >> >> >> >> >
>>> >> >> >> >> > Thanks,
>>> >> >> >> >> > Dave
>>> >> >> >> >> > _______________________________________________
>>> >> >> >> >> > Bioperl-l mailing list
>>> >> >> >> >> > Bioperl-l at lists.open-bio.org
>>> >> >> >> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> --
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> ------------------------------------------------------------------------
>>> >> >> >> >> Scott Cain, Ph. D.                                   scott at
>>> >> >> >> >> scottcain
>>> >> >> >> >> dot net
>>> >> >> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> >> >> 216-392-3087
>>> >> >> >> >> Ontario Institute for Cancer Research
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> --
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> ------------------------------------------------------------------------
>>> >> >> >> Scott Cain, Ph. D.                                   scott at
>>> >> >> >> scottcain
>>> >> >> >> dot net
>>> >> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> >> 216-392-3087
>>> >> >> >> Ontario Institute for Cancer Research
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >>
>>> >> >>
>>> >> >> ------------------------------------------------------------------------
>>> >> >> Scott Cain, Ph. D.                                   scott at
>>> >> >> scottcain
>>> >> >> dot net
>>> >> >> GMOD Coordinator (http://gmod.org/)
>>> >> >> 216-392-3087
>>> >> >> Ontario Institute for Cancer Research
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> ------------------------------------------------------------------------
>>> >> Scott Cain, Ph. D.                                   scott at
>>> >> scottcain
>>> >> dot net
>>> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>> >> Ontario Institute for Cancer Research
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                   scott at scottcain
>>> dot net
>>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>> Ontario Institute for Cancer Research
>>
>
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research




More information about the Bioperl-l mailing list