[Bioperl-l] Using Genscan.pm to parse GenomeScan output
Tyler
tsw@uclink4.berkeley.edu
Thu, 31 Oct 2002 16:22:24 -0800
I'm new to Bioperl but I have searched the archives and can't find an
answer to this question.
Can Genscan.pm be used to parse Genomescan output? I've tried it as is
and can't make it work even though it works flawlessly on Genscan
output.
Here is a dump of the Bio::Tools::Prediction::Gene object returned by
next_prediction() when a particular genomescan output file is parsed:
$VAR1 = bless( { '_source_tag' => 'Genscan', '_gsf_tag_hash' => {},
'_parse_h' => {}, '_gsf_seqname' => 'z06s024797', '_features' => [
bless( { '_source_tag' => 'Genscan', '_primary_tag' => 'Poly_A_site',
'_gsf_tag_hash' => {}, '_gsf_score' => '1.05', '_location' => bless( {
'_start' => '9237', '_strand' => 1, '_end' => '9242' },
'Bio::Location::Simple' ), '_parse_h' => {}, '_root_verbose' => 0 },
'Bio::SeqFeature::Gene::Poly_A_site' ) ], '_primary_tag' =>
'GenePrediction1', '_location' => bless( { '_start' => '9237',
'_strand' => 1, '_end' => '9242' }, 'Bio::Location::Simple' ),
'_root_verbose' => 0 }, 'Bio::Tools::Prediction::Gene' );
As you can see, there is no predicted_gene or predicted_cds object,
although it did manage to parse some information out. Is it because
there is no GenomeScan_predicted_peptide_1?
Below is an example GenomeScan output ("blah blah"s inserted by me):
GenomeScan 1.0 Date run: 31-Oct-2002 Time: 14:26:43
Sequence z06s024797 : 25016 bp : 35.96% C+G : Isochore 1 ( 0 - 43 C+G%)
Options: GenoaOnly BothStrands
Parameter matrix: HumanIso.smat
Predicted genes/exons:
Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..
----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------
1.01 PlyA + 9237 9242 6 1.05
2.04 PlyA - 9390 9385 6 1.05
2.03 Term - 10647 10543 105 0 0 62 43 75 0.000 -2.17
2.02 Intr - 17885 17401 485 0 2 -1 77 326 0.000 132.07
2.01 Init - 18136 18133 4 1 1 81 69 0 0.000 -2.39
2.00 Prom - 18591 18552 40 -13.78
3.03 PlyA - 18598 18593 6 1.05
3.02 Term - 19446 18796 651 0 0 -9 38 428 0.000 199.29
3.01 Init - 19779 19522 258 0 0 61 1 93 0.000 71.94
3.00 Prom - 20062 20023 40 -5.15
4.06 PlyA - 20228 20223 6 -0.45
4.05 Term - 20602 20426 177 1 0 88 36 65 0.000 15.37
4.04 Intr - 21273 21070 204 0 0 73 80 254 0.000 40.66
4.03 Intr - 22129 21371 759 1 0 75 77 602 0.000 219.37
4.02 Intr - 22515 22218 298 2 1 100 96 336 0.000 50.63
4.01 Intr - 24962 24930 33 1 0 -11 99 53 0.000 11.82
Genoa hits used: 32 nonredundant Genoa hits
Hit1.13L test 839 863 863
18876 18799 3 BLASTX 1.19e-13
Hit1.12 test 814 839 863
18952 18877 3 BLASTX 1.19e-13
Hit1.11 test 789 814 863
19028 18953 3 BLASTX 1.19e-13
.
.
.
blah blah blah
Predicted peptide sequence(s):
Predicted coding sequence(s):
>z06s024797|GenomeScan_predicted_peptide_2|197_aa:test:395..421:E=3e-99
MDIENTLNIIENNPKVRVVISFAKSSQMQLLFKGLQSRNISNNMVWVASDNWSTAKHILN
blah blah
>z06s024797|GenomeScan_predicted_CDS_2|594_bp:test:395..421:E=3e-99
atggacatcgaaaacaccttgaacatcattgaaaacaatccgaaagttagagtggtgatc
tcgtttgctaaatcctctcaaatgcagttgctatttaaggggctgcagagtagaaacatt
..blah blah
>z06s024797|GenomeScan_predicted_peptide_3|302_aa:test:839..863:E=1e-168
MWSLANSTACHPKVVEYFDWNSGFAIVLLILAALGVLLLFFMSALFFWQRHSPVVKAARG
PLCHLILVSLLGSSISVVFFVELTERSLKILLAFEMNFELKELLCMLYKPYMIVSVGMGV
blah blah
>z06s024797|GenomeScan_predicted_CDS_3|909_bp:test:839..863:E=1e-168
atgtggtcattggccaacagcactgcatgtcatcccaaggttgttgaatactttgattgg
aacagtggcttcgctattgtcctgctgatactggctgccctcggcgtccttcttctcttc
blah blah
>z06s024797|GenomeScan_predicted_peptide_4|490_aa:test:418..484:E=1e-100
XISAPEHPDCIRFYTKGLNQALAMINAVEMANKSPMLSSLNITLGYRIYDTCSDVTTALR
AVHDIMRPFSDCESPEDSSQPVQPIMAVIGTTSSEISIAVARDLNLQMIPQISYASTATI
blah blah
>z06s024797|GenomeScan_predicted_CDS_4|1473_bp:test:418..484:E=1e-100
nncatctctgcccctgagcatccggactgcatcagattctacacaaagggtctaaatcaa
gctctagcgatgattaatgctgtagaaatggcaaacaaatcccccatgttgagcagtttg
blah blah