[Bioperl-l] parser for GCG flavor of FASTA ?

Martin A. Hansen maasha at image.dk
Thu Aug 21 09:02:21 EDT 2003


On Thu, Aug 21, 2003 at 09:18:45AM -0400, Jason Stajich wrote:
> Not I, but if you post an example report as a feature request to
> http://bugzilla.open-bio.org it might get on the to do list of a kind soul
> out there.

hm, i was thinking that maybe somebody already wrote the parser for this - so
maybe ill wait around to see if anyone responds - and then request.

anyways - ill attach a sample file.


martin

> 
> 
> On Thu, 21 Aug 2003, Martin A. Hansen wrote:
> 
> > hi
> >
> > does anyone have any code that can pipe GCG flavor FASTA reports to
> > Bio::SearchIO ?
> >
> >
> > martin
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
-------------- next part --------------
!!SEQUENCE_LIST 1.0


(Nucleotide) FASTA of: BTG1.seq  from: 1 to: 21  June 3, 2002 10:51

Oligo BTG1 seq

 TO: @/usr/users/ddbase/seq/seq.all  Sequences:      9,324  Symbols:    31,508,009  Word Size: 6

 Sequences too short to analyze: 9 (27 symbols)
 Sequences skipped due to type mismatch with query: 2
 Searching with both strands of the query.
 Scoring matrix: GenRunData:fastadna.cmp
 Constant pamfactor used
 Gap creation penalty: 16  Gap extension penalty: 4



Histogram Key:
 Each histogram symbol represents 32 search set sequences
 Each inset symbol represents 1 search set sequences
 z-scores computed from opt scores

z-score obs    exp
        (=)    (*)

< 20   1920      0:============================================================
  22      2      0:=
  24      0      0:
  26      6      0:=
  28      6      2:*
  30     20     10:*
  32     32     39:=*
  34     91    107:===*
  36    146    220:===== *
  38    287    363:=========  *
  40    556    506:===============*==
  42    691    619:===================*==
  44    888    683:=====================*======
  46    818    695:=====================*====
  48    687    666:====================*=
  50    555    607:==================*
  52    442    534:==============  *
  54    307    456:==========    *
  56    314    381:========== *
  58    314    313:=========*
  60    258    253:=======*=
  62    195    203:======*
  64    137    162:=====*
  66    147    128:===*=
  68    108    100:===*
  70     83     79:==*
  72     91     62:=*=
  74     37     48:=*
  76     39     37:=*
  78     38     29:*=
  80     30     23:*
  82     21     17:*
  84     20     14:*
  86      9     11:*
  88      9      8:*
  90      5      6:*
  92      7      5:*         :====*==
  94      1      4:*         :=  *
  96      5      3:*         :==*==
  98      0      2:*         : *
 100      1      2:*         :=*
 102      0      1:*         :*
 104      0      1:*         :*
 106      0      1:*         :*
 108      0      1:*         :*
 110      1      0:=         *=
 112      0      0:          *
 114      0      0:          *
 116      0      0:          *
 118      0      0:          *
>120      0      0:          *

Joining threshold: 45, opt. threshold: 30, opt. width:  16, reg.-scaled


The best scores are:                    init1 initn   opt    z-sc E(7402)..

/usr/users/ddbase/seq/Mouse/Rat/Masami/mpla2/ac002406.emrod    Begin: 26864  End: 26884
! ID AC002406 standard; DNA; ROD; 194...   72    72    78    89.3    0.59
/usr/users/ddbase/seq/Anne_Mette/nrf1aga1amb22/OtherBand/nrf1aga4amb27-al355520.emhum    Begin: 75031  End: 75045
! ID AL355520 standard; DNA; HUM; 157...   75    75    75    85.2     1.2
/usr/users/ddbase/seq/Mouse/Rat/Masami/Lene-seqs/U-Repeat-TC104374.seq    Begin: 915  End: 935  Strand: -
! TC104374 from TIGR. Similar to mpla...   40    40    78   109.4     1.3
\\End of List


BTG1.seq
/usr/users/ddbase/seq/Mouse/Rat/Masami/mpla2/ac002406.emrod

ID   AC002406   standard; DNA; ROD; 194985 BP.
XX
AC   AC002406;
XX
SV   AC002406.1
XX . . . 


SCORES   Init1: 72    Initn: 72    Opt: 78    z-score: 89.3  E(): 0.59  
>>/usr/users/ddbase/seq/Mouse/Rat/Masami/mpla2/ac002406.emrod (194985 nt)
 initn:  72 init1:  72 opt:  78 Z-score: 89.3 expect(): 0.59
  85.7% identity in 21 nt overlap
 (1-21:26864-26884)

                                                   10        20          
BTG1.seq                                   GTGACAGTGCCATAGTTTGGA         
                                           || |||||| || ||||||||         
ac002406.emr TAGAATTGGGAACAATCACCCATGGAAGGAGTTACAGTGACAAAGTTTGGAGCTGAGACA
               26840     26850     26860     26870     26880     26890   

ac002406.emr AAAGGATGGAACATCTAGAGACTGCCGTATCCAGAGATCCATCCCATAATTAGCCTCCAA
               26900     26910     26920     26930     26940     26950   


BTG1.seq
/usr/users/ddbase/seq/Anne_Mette/nrf1aga1amb22/OtherBand/nrf1aga4amb27-al355520.emhum

ID   AL355520   standard; DNA; HUM; 157575 BP.
XX
AC   AL355520;
XX
SV   AL355520.8
XX . . . 


SCORES   Init1: 75    Initn: 75    Opt: 75    z-score: 85.2  E(): 1.2   
>>/usr/users/ddbase/seq/Anne_Mette/nrf1aga1amb22/OtherBand/nrf1aga4amb27-al355520.emhum (157575 nt)
 initn:  75 init1:  75 opt:  75 Z-score: 85.2 expect():  1.2
 100.0% identity in 15 nt overlap
 (6-20:75031-75045)

                                              10        20               
BTG1.seq                              GTGACAGTGCCATAGTTTGGA              
                                           |||||||||||||||               
nrf1aga4amb2 CTCGGAATCTGATTCCACATGGACATAGGAAGTGCCATAGTTTGGGTTATAAGTCAGCAT
                  75010     75020     75030     75040     75050     75060

nrf1aga4amb2 TTTTAATTTTATCTTTCAAATTTTTAAGTCTTTTGTAATTGGATTTATTGTCGATTTATT
                  75070     75080     75090     75100     75110     75120


BTG1.seq /rev
/usr/users/ddbase/seq/Mouse/Rat/Masami/Lene-seqs/U-Repeat-TC104374.seq

TC104374 from TIGR. Similar to mpla2rcg3m35.seq, etc. (inverse-U repeat)
retrovirus-related pol polyprotein (reverse transcriptase {Mus musculus} 
 SP|P11369|POL2_MOUSE RETROV


SCORES   Init1: 40    Initn: 40    Opt: 78    z-score: 109.4 E(): 1.3   
>>/usr/users/ddbase/seq/Mouse/Rat/Masami/Lene-seqs/U-Repeat-TC104374.seq (6469 nt)
 initn:  40 init1:  40 opt:  78 Z-score: 109.4 expect():  1.3
  85.7% identity in 21 nt overlap
 (21-1:915-935)

                                           20        10                  
BTG1.seq                                   TCCAAACTATGGCACTGTCAC         
                                           |||||||| |||| |||| ||         
U-Repeat-TC1 TCTCTACATGGTCCATCCTTTCATCTCAGCTCCAAACTTTGGCTCTGTAACTCCTTCCAT
                890       900       910       920       930       940    

U-Repeat-TC1 GGGTGTTTTGTTCCCAAATCTAAGGAGGGGCATAGTGTCCACACTTCAGTCTTCATTCTT
                950       960       970       980       990      1000    



! Distributed over 1 thread.
!      Start time: Mon Jun  3 10:46:53 2002
! Completion time: Mon Jun  3 10:52:00 2002

! CPU time used:
!        Database scan:  0:01:27.1
! Post-scan processing:  0:00:01.6
!       Total CPU time:  0:01:28.8
! Output File: btg1.fasta


More information about the Bioperl-l mailing list