[Bioperl-l] parser for GCG flavor of FASTA ?
Martin A. Hansen
maasha at image.dk
Thu Aug 21 09:02:21 EDT 2003
On Thu, Aug 21, 2003 at 09:18:45AM -0400, Jason Stajich wrote:
> Not I, but if you post an example report as a feature request to
> http://bugzilla.open-bio.org it might get on the to do list of a kind soul
> out there.
hm, i was thinking that maybe somebody already wrote the parser for this - so
maybe ill wait around to see if anyone responds - and then request.
anyways - ill attach a sample file.
martin
>
>
> On Thu, 21 Aug 2003, Martin A. Hansen wrote:
>
> > hi
> >
> > does anyone have any code that can pipe GCG flavor FASTA reports to
> > Bio::SearchIO ?
> >
> >
> > martin
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
-------------- next part --------------
!!SEQUENCE_LIST 1.0
(Nucleotide) FASTA of: BTG1.seq from: 1 to: 21 June 3, 2002 10:51
Oligo BTG1 seq
TO: @/usr/users/ddbase/seq/seq.all Sequences: 9,324 Symbols: 31,508,009 Word Size: 6
Sequences too short to analyze: 9 (27 symbols)
Sequences skipped due to type mismatch with query: 2
Searching with both strands of the query.
Scoring matrix: GenRunData:fastadna.cmp
Constant pamfactor used
Gap creation penalty: 16 Gap extension penalty: 4
Histogram Key:
Each histogram symbol represents 32 search set sequences
Each inset symbol represents 1 search set sequences
z-scores computed from opt scores
z-score obs exp
(=) (*)
< 20 1920 0:============================================================
22 2 0:=
24 0 0:
26 6 0:=
28 6 2:*
30 20 10:*
32 32 39:=*
34 91 107:===*
36 146 220:===== *
38 287 363:========= *
40 556 506:===============*==
42 691 619:===================*==
44 888 683:=====================*======
46 818 695:=====================*====
48 687 666:====================*=
50 555 607:==================*
52 442 534:============== *
54 307 456:========== *
56 314 381:========== *
58 314 313:=========*
60 258 253:=======*=
62 195 203:======*
64 137 162:=====*
66 147 128:===*=
68 108 100:===*
70 83 79:==*
72 91 62:=*=
74 37 48:=*
76 39 37:=*
78 38 29:*=
80 30 23:*
82 21 17:*
84 20 14:*
86 9 11:*
88 9 8:*
90 5 6:*
92 7 5:* :====*==
94 1 4:* := *
96 5 3:* :==*==
98 0 2:* : *
100 1 2:* :=*
102 0 1:* :*
104 0 1:* :*
106 0 1:* :*
108 0 1:* :*
110 1 0:= *=
112 0 0: *
114 0 0: *
116 0 0: *
118 0 0: *
>120 0 0: *
Joining threshold: 45, opt. threshold: 30, opt. width: 16, reg.-scaled
The best scores are: init1 initn opt z-sc E(7402)..
/usr/users/ddbase/seq/Mouse/Rat/Masami/mpla2/ac002406.emrod Begin: 26864 End: 26884
! ID AC002406 standard; DNA; ROD; 194... 72 72 78 89.3 0.59
/usr/users/ddbase/seq/Anne_Mette/nrf1aga1amb22/OtherBand/nrf1aga4amb27-al355520.emhum Begin: 75031 End: 75045
! ID AL355520 standard; DNA; HUM; 157... 75 75 75 85.2 1.2
/usr/users/ddbase/seq/Mouse/Rat/Masami/Lene-seqs/U-Repeat-TC104374.seq Begin: 915 End: 935 Strand: -
! TC104374 from TIGR. Similar to mpla... 40 40 78 109.4 1.3
\\End of List
BTG1.seq
/usr/users/ddbase/seq/Mouse/Rat/Masami/mpla2/ac002406.emrod
ID AC002406 standard; DNA; ROD; 194985 BP.
XX
AC AC002406;
XX
SV AC002406.1
XX . . .
SCORES Init1: 72 Initn: 72 Opt: 78 z-score: 89.3 E(): 0.59
>>/usr/users/ddbase/seq/Mouse/Rat/Masami/mpla2/ac002406.emrod (194985 nt)
initn: 72 init1: 72 opt: 78 Z-score: 89.3 expect(): 0.59
85.7% identity in 21 nt overlap
(1-21:26864-26884)
10 20
BTG1.seq GTGACAGTGCCATAGTTTGGA
|| |||||| || ||||||||
ac002406.emr TAGAATTGGGAACAATCACCCATGGAAGGAGTTACAGTGACAAAGTTTGGAGCTGAGACA
26840 26850 26860 26870 26880 26890
ac002406.emr AAAGGATGGAACATCTAGAGACTGCCGTATCCAGAGATCCATCCCATAATTAGCCTCCAA
26900 26910 26920 26930 26940 26950
BTG1.seq
/usr/users/ddbase/seq/Anne_Mette/nrf1aga1amb22/OtherBand/nrf1aga4amb27-al355520.emhum
ID AL355520 standard; DNA; HUM; 157575 BP.
XX
AC AL355520;
XX
SV AL355520.8
XX . . .
SCORES Init1: 75 Initn: 75 Opt: 75 z-score: 85.2 E(): 1.2
>>/usr/users/ddbase/seq/Anne_Mette/nrf1aga1amb22/OtherBand/nrf1aga4amb27-al355520.emhum (157575 nt)
initn: 75 init1: 75 opt: 75 Z-score: 85.2 expect(): 1.2
100.0% identity in 15 nt overlap
(6-20:75031-75045)
10 20
BTG1.seq GTGACAGTGCCATAGTTTGGA
|||||||||||||||
nrf1aga4amb2 CTCGGAATCTGATTCCACATGGACATAGGAAGTGCCATAGTTTGGGTTATAAGTCAGCAT
75010 75020 75030 75040 75050 75060
nrf1aga4amb2 TTTTAATTTTATCTTTCAAATTTTTAAGTCTTTTGTAATTGGATTTATTGTCGATTTATT
75070 75080 75090 75100 75110 75120
BTG1.seq /rev
/usr/users/ddbase/seq/Mouse/Rat/Masami/Lene-seqs/U-Repeat-TC104374.seq
TC104374 from TIGR. Similar to mpla2rcg3m35.seq, etc. (inverse-U repeat)
retrovirus-related pol polyprotein (reverse transcriptase {Mus musculus}
SP|P11369|POL2_MOUSE RETROV
SCORES Init1: 40 Initn: 40 Opt: 78 z-score: 109.4 E(): 1.3
>>/usr/users/ddbase/seq/Mouse/Rat/Masami/Lene-seqs/U-Repeat-TC104374.seq (6469 nt)
initn: 40 init1: 40 opt: 78 Z-score: 109.4 expect(): 1.3
85.7% identity in 21 nt overlap
(21-1:915-935)
20 10
BTG1.seq TCCAAACTATGGCACTGTCAC
|||||||| |||| |||| ||
U-Repeat-TC1 TCTCTACATGGTCCATCCTTTCATCTCAGCTCCAAACTTTGGCTCTGTAACTCCTTCCAT
890 900 910 920 930 940
U-Repeat-TC1 GGGTGTTTTGTTCCCAAATCTAAGGAGGGGCATAGTGTCCACACTTCAGTCTTCATTCTT
950 960 970 980 990 1000
! Distributed over 1 thread.
! Start time: Mon Jun 3 10:46:53 2002
! Completion time: Mon Jun 3 10:52:00 2002
! CPU time used:
! Database scan: 0:01:27.1
! Post-scan processing: 0:00:01.6
! Total CPU time: 0:01:28.8
! Output File: btg1.fasta
More information about the Bioperl-l
mailing list