[Bioperl-l] Thoughts on Bio::Tools::Glimmer
Andrew Stewart
stewarta at nmrc.navy.mil
Wed Apr 11 18:40:18 UTC 2007
First of all, mucho kudos to those who revamped this module. It
works really nice. I have a couple thoughts..
* The .predict file from Glimmer provides frame and score information
which could be parsed and included in the generated feature prediction
* It'd be nice to include the orfID somewhere on the feature
prediction.. maybe the seqID ? (these could be post-processed into
locus_tags for those using Glimmer as a preliminary annotation tool)
* Options to set the source and primary tags to something other than
the default (ie) Glimmer3.X and 'transcript'. This could always be
done post-Bio::Tools::Glimmer, though, of course.
* This section..
elsif (
# Glimmer 2.X prediction
(/^\s+(\d+)\s+ # gene num
(\d+)\s+(\d+)\s+ # start, end
\[([\+\-])\d{1}\s+ # strand
/ox ) ||
# Glimmer 3.X prediction
(/\w+(\d+)\s+ # orf (numeric portion)
(\d+)\s+(\d+)\s+ # start, end
([\+\-])\d{1}\s+ # strand
/ox)) {
my ($genenum,$start,$end,$strand) =
( $1,$2,$3,$4 );
...isn't picking up more than the last digit in the orf-number. Not
sure if that's intentional. A sample of the feature output using -
>gff_string shows up as ...
test-pseudocontig Glimmer_3.X transcript 1018
8 . - . Group GenePrediction_1
test-pseudocontig Glimmer_3.X transcript 1134
1736 . + . Group GenePrediction_2
test-pseudocontig Glimmer_3.X transcript 1832
2596 . + . Group GenePrediction_4
test-pseudocontig Glimmer_3.X transcript 2710
3225 . + . Group GenePrediction_5
test-pseudocontig Glimmer_3.X transcript 3246
4016 . + . Group GenePrediction_6
test-pseudocontig Glimmer_3.X transcript 4177
5064 . + . Group GenePrediction_7
test-pseudocontig Glimmer_3.X transcript 5083
5673 . + . Group GenePrediction_8
test-pseudocontig Glimmer_3.X transcript 6001
7275 . + . Group GenePrediction_9
test-pseudocontig Glimmer_3.X transcript 7530
8081 . + . Group GenePrediction_0
test-pseudocontig Glimmer_3.X transcript 8785
8117 . - . Group GenePrediction_1
test-pseudocontig Glimmer_3.X transcript 9423
8788 . - . Group GenePrediction_2
test-pseudocontig Glimmer_3.X transcript 10088
9549 . - . Group GenePrediction_3
...which was parsed originally from...
orf00001 1018 8 -2 2.95
orf00002 1134 1736 +3 2.91
orf00004 1832 2596 +2 2.93
orf00005 2710 3225 +1 2.90
orf00006 3246 4016 +3 2.93
orf00007 4177 5064 +1 2.94
orf00008 5083 5673 +1 2.91
orf00009 6001 7275 +1 2.96
orf00010 7530 8081 +3 2.58
orf00011 8785 8117 -2 2.92
orf00012 9423 8788 -1 2.81
orf00013 10088 9549 -3 2.90
* It'd also be nice if you could somehow set the string that is
placed in front of the orf-number in the line...
'-tag' => { 'Group' => "GenePrediction_
$genenum"},
...seeing as how these tag/values can't seem to be changed manually
anymore without getting into AnnotationCollection stuff, which is no
longer a simple matter of changing a tag/value string. (By the way,
where can I find a list of AnnotationCollectionI compliant objects?)
Any thoughts on the suggestions? (I don't mind taking a stab at
incorporating them into the code.. I've never submitted anything to
BioPerl before)
-Andrew
--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852
email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270
More information about the Bioperl-l
mailing list