[Bioperl-l] Thoughts on Bio::Tools::Glimmer

Andrew Stewart stewarta at nmrc.navy.mil
Wed Apr 11 18:40:18 UTC 2007

First of all, mucho kudos to those who revamped this module.  It  
works really nice.  I have a couple thoughts..

* The .predict file from Glimmer provides frame and score information  
which could be parsed and included in the generated feature prediction

* It'd be nice to include the orfID somewhere on the feature  
prediction..  maybe the seqID ? (these could be post-processed into  
locus_tags for those using Glimmer as a preliminary annotation tool)

* Options to set the source and primary tags to something other than  
the default (ie) Glimmer3.X and 'transcript'.  This could always be  
done post-Bio::Tools::Glimmer, though, of course.

* This section..

         elsif (
                # Glimmer 2.X prediction
                (/^\s+(\d+)\s+      # gene num
                 (\d+)\s+(\d+)\s+   # start, end
                 \[([\+\-])\d{1}\s+ # strand
                 /ox ) ||
                # Glimmer 3.X prediction
                (/\w+(\d+)\s+       # orf (numeric portion)
                 (\d+)\s+(\d+)\s+   # start, end
                 ([\+\-])\d{1}\s+   # strand
                /ox)) {
	    my ($genenum,$start,$end,$strand) =
		( $1,$2,$3,$4 );

...isn't picking up more than the last digit in the orf-number.  Not  
sure if that's intentional.  A sample of the feature output using - 
 >gff_string shows up as ...

test-pseudocontig       Glimmer_3.X     transcript      1018     
8       .       -       .       Group GenePrediction_1
test-pseudocontig       Glimmer_3.X     transcript      1134     
1736    .       +       .       Group GenePrediction_2
test-pseudocontig       Glimmer_3.X     transcript      1832     
2596    .       +       .       Group GenePrediction_4
test-pseudocontig       Glimmer_3.X     transcript      2710     
3225    .       +       .       Group GenePrediction_5
test-pseudocontig       Glimmer_3.X     transcript      3246     
4016    .       +       .       Group GenePrediction_6
test-pseudocontig       Glimmer_3.X     transcript      4177     
5064    .       +       .       Group GenePrediction_7
test-pseudocontig       Glimmer_3.X     transcript      5083     
5673    .       +       .       Group GenePrediction_8
test-pseudocontig       Glimmer_3.X     transcript      6001     
7275    .       +       .       Group GenePrediction_9
test-pseudocontig       Glimmer_3.X     transcript      7530     
8081    .       +       .       Group GenePrediction_0
test-pseudocontig       Glimmer_3.X     transcript      8785     
8117    .       -       .       Group GenePrediction_1
test-pseudocontig       Glimmer_3.X     transcript      9423     
8788    .       -       .       Group GenePrediction_2
test-pseudocontig       Glimmer_3.X     transcript      10088    
9549    .       -       .       Group GenePrediction_3

...which was parsed originally from...

orf00001     1018        8  -2     2.95
orf00002     1134     1736  +3     2.91
orf00004     1832     2596  +2     2.93
orf00005     2710     3225  +1     2.90
orf00006     3246     4016  +3     2.93
orf00007     4177     5064  +1     2.94
orf00008     5083     5673  +1     2.91
orf00009     6001     7275  +1     2.96
orf00010     7530     8081  +3     2.58
orf00011     8785     8117  -2     2.92
orf00012     9423     8788  -1     2.81
orf00013    10088     9549  -3     2.90

* It'd also be nice if you could somehow set the string that is  
placed in front of the orf-number in the line...

                  '-tag'         => { 'Group' => "GenePrediction_ 

...seeing as how these tag/values can't seem to be changed manually  
anymore without getting into AnnotationCollection stuff, which is no  
longer a simple matter of changing a tag/value string.  (By the way,  
where can I find a list of AnnotationCollectionI compliant objects?)

Any thoughts on the suggestions?  (I don't mind taking a stab at  
incorporating them into the code.. I've never submitted anything to  
BioPerl before)


Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270

More information about the Bioperl-l mailing list