[Bioperl-l] Using Genscan.pm to parse GenomeScan output

Tyler tsw@uclink4.berkeley.edu
Thu, 31 Oct 2002 16:22:24 -0800


I'm new to Bioperl but I have searched the archives and can't find an 
answer to this question.
Can Genscan.pm be used to parse Genomescan output? I've tried it as is 
and can't make it work even though it works flawlessly on Genscan 
output.
Here is a dump of the Bio::Tools::Prediction::Gene object returned by 
next_prediction() when a particular genomescan output file is parsed:

$VAR1 = bless( { '_source_tag' => 'Genscan', '_gsf_tag_hash' => {}, 
'_parse_h' => {}, '_gsf_seqname' => 'z06s024797', '_features' => [ 
bless( { '_source_tag' => 'Genscan', '_primary_tag' => 'Poly_A_site', 
'_gsf_tag_hash' => {}, '_gsf_score' => '1.05', '_location' => bless( { 
'_start' => '9237', '_strand' => 1, '_end' => '9242' }, 
'Bio::Location::Simple' ), '_parse_h' => {}, '_root_verbose' => 0 }, 
'Bio::SeqFeature::Gene::Poly_A_site' ) ], '_primary_tag' => 
'GenePrediction1', '_location' => bless( { '_start' => '9237', 
'_strand' => 1, '_end' => '9242' }, 'Bio::Location::Simple' ), 
'_root_verbose' => 0 }, 'Bio::Tools::Prediction::Gene' );

As you can see, there is no predicted_gene or predicted_cds object, 
although it did manage to parse some information out. Is it because 
there is no GenomeScan_predicted_peptide_1?

Below is an example GenomeScan output ("blah blah"s inserted by me):

GenomeScan 1.0	Date run: 31-Oct-2002	Time: 14:26:43

Sequence z06s024797 : 25016 bp : 35.96% C+G : Isochore 1 ( 0 - 43 C+G%)

Options:  	GenoaOnly	BothStrands

Parameter matrix: HumanIso.smat

Predicted genes/exons:

Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..
----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------

  1.01 PlyA +   9237   9242    6                               1.05

  2.04 PlyA -   9390   9385    6                               1.05
  2.03 Term -  10647  10543  105  0  0   62   43    75 0.000  -2.17
  2.02 Intr -  17885  17401  485  0  2   -1   77   326 0.000 132.07
  2.01 Init -  18136  18133    4  1  1   81   69     0 0.000  -2.39
  2.00 Prom -  18591  18552   40                             -13.78

  3.03 PlyA -  18598  18593    6                               1.05
  3.02 Term -  19446  18796  651  0  0   -9   38   428 0.000 199.29
  3.01 Init -  19779  19522  258  0  0   61    1    93 0.000  71.94
  3.00 Prom -  20062  20023   40                              -5.15

  4.06 PlyA -  20228  20223    6                              -0.45
  4.05 Term -  20602  20426  177  1  0   88   36    65 0.000  15.37
  4.04 Intr -  21273  21070  204  0  0   73   80   254 0.000  40.66
  4.03 Intr -  22129  21371  759  1  0   75   77   602 0.000 219.37
  4.02 Intr -  22515  22218  298  2  1  100   96   336 0.000  50.63
  4.01 Intr -  24962  24930   33  1  0  -11   99    53 0.000  11.82

Genoa hits used: 32 nonredundant Genoa hits

Hit1.13L     test                                       839  863  863  
18876  18799 3 BLASTX     1.19e-13
Hit1.12      test                                       814  839  863  
18952  18877 3 BLASTX     1.19e-13
Hit1.11      test                                       789  814  863  
19028  18953 3 BLASTX     1.19e-13
.
.
.
blah blah blah

Predicted peptide sequence(s):

Predicted coding sequence(s):


 >z06s024797|GenomeScan_predicted_peptide_2|197_aa:test:395..421:E=3e-99
MDIENTLNIIENNPKVRVVISFAKSSQMQLLFKGLQSRNISNNMVWVASDNWSTAKHILN
blah blah

 >z06s024797|GenomeScan_predicted_CDS_2|594_bp:test:395..421:E=3e-99
atggacatcgaaaacaccttgaacatcattgaaaacaatccgaaagttagagtggtgatc
tcgtttgctaaatcctctcaaatgcagttgctatttaaggggctgcagagtagaaacatt
..blah blah

 >z06s024797|GenomeScan_predicted_peptide_3|302_aa:test:839..863:E=1e-168
MWSLANSTACHPKVVEYFDWNSGFAIVLLILAALGVLLLFFMSALFFWQRHSPVVKAARG
PLCHLILVSLLGSSISVVFFVELTERSLKILLAFEMNFELKELLCMLYKPYMIVSVGMGV
blah blah

 >z06s024797|GenomeScan_predicted_CDS_3|909_bp:test:839..863:E=1e-168
atgtggtcattggccaacagcactgcatgtcatcccaaggttgttgaatactttgattgg
aacagtggcttcgctattgtcctgctgatactggctgccctcggcgtccttcttctcttc
blah blah

 >z06s024797|GenomeScan_predicted_peptide_4|490_aa:test:418..484:E=1e-100
XISAPEHPDCIRFYTKGLNQALAMINAVEMANKSPMLSSLNITLGYRIYDTCSDVTTALR
AVHDIMRPFSDCESPEDSSQPVQPIMAVIGTTSSEISIAVARDLNLQMIPQISYASTATI
blah blah

 >z06s024797|GenomeScan_predicted_CDS_4|1473_bp:test:418..484:E=1e-100
nncatctctgcccctgagcatccggactgcatcagattctacacaaagggtctaaatcaa
gctctagcgatgattaatgctgtagaaatggcaaacaaatcccccatgttgagcagtttg
blah blah