[Bioperl-l] gcg.pm, another one
    Derek Gatherer 
    d.gatherer at vir.gla.ac.uk
       
    Fri Oct 31 10:40:25 EST 2003
    
    
  
Hello again
Last bug submitted to bugzilla.  The following may be a bug, but I wonder 
if there is a problem with my GCG format.
Try this script:
#!/usr/bin/perl -w
use lib "/usr/local/lib/site_perl/5.8.0/";
use strict;
use Bio::Seq;
use Bio::SeqIO;
my $dnain = Bio::SeqIO->new( '-format' => 'GCG' , -file => "cds.gcg");
while((my $seqobj = $dnain->next_seq()))
{
     $seqobj->display_id;
}
on the file cds.gcg below.
to get:
------------- EXCEPTION  -------------
MSG: Looks like start of another sequence. See documentation.
STACK Bio::SeqIO::gcg::next_seq 
/usr/local/lib/site_perl/5.8.0//Bio/SeqIO/gcg.pm:124
STACK toplevel gcgtest.pl:10
The problem, I think, is that the SeqIO stream doesn't seem to recognise 
the change over from one sequence to another.  Or do I need some record 
separator between my sequences???  If this really is also a bug, I'll 
submit it too.
Offending line in gcg.pm is:
  124      if( /\.\.$/ ) {
         $self->throw("Looks like start of another sequence. See 
documentation. ");
        }
and here's the file cds.gcg, containing two sequences in GCG format
!!NA_SEQUENCE 1.0
  ASSEMBLE    October 27, 2003 15:32
Symbols:     1 to: 1269  from: merlin.seq /rev   ck: 8363, 55186 to: 56454
LOCUS       MERLIN                235645 bp    DNA     linear   VRL 14-AUG-2003
DEFINITION  Human herpesvirus 5 strain Merlin, complete genome.
ACCESSION   MERLIN
VERSION
KEYWORDS    .
SOURCE      Human herpesvirus 5 . . .
merlin_ul43.cds  Length: 1269  October 27, 2003 15:32  Type: N  Check: 9039  ..
        1  ATGGAGAAAA CGCCGGCGGA GACGACGGCG GTTTCAGCTG GCAACGTGCC
       51  ACGTGACTCA ATTCCGTGTA TAACTAACGT GTCCGCGGAC ACCCGCGGCC
      101  GTACCCGCCC CAGCAGACCA GCCACCGTCC CTCAGCGACG TCCCGCGCGG
      151  ATCGGACACT TTAGGCGGCG CAGCGCCAGC CTTAGCTTTC TTGACTGGCC
      201  GGACGACAGC GTCACAGAGG GCGTTCGGAC GACCTCCGCG TCGGTCGCCG
      251  CCTCCGCGGC CCGTTTCGAC GAAATCCGGC GGCGCCGCCA GAGCATCAAC
      301  GACGAGATGA AGGAACGCAC GCTGGAGGAC GCGCTGGCTG TCGAGCTGGT
      351  CAACGAGACC TTCCGCTGCT CTGTCACCTC CGACGCCCGC AAGGACTTGC
      401  AGAAGCTGGT TCGTCGCGTC AGCGGCACGG TGCTGCGTCT CAGCTGGCCA
      451  AACGGTTGGT TCTTCACCTA CTGCGACCTG TTACGCGTCG GCTACTTTGG
      501  ACATCTCAAT ATTAAAGGTT TGGAGAAGAC CTTCCTGTGC TGCGACAAGT
      551  TCTTGCTGCC GGTGGGCACT GTGAGTCGTT GCGAAGCCAT CGGCCGCCCA
      601  CCGCTACCCG TACTCATCGG CGAGGGCGGT CGCGTCTACG TCTACTCGCC
      651  TGTGGTGGAA TCGCTGTACC TGGTGTCGCG GTCCGGTTTC CGCGGCTTCG
      701  TGCAGGAGGG CCTGCGCAAC TACGCGCCGC TGCGCGAAGA ACTGGGCTAT
      751  GTCCGCTTCG AGACCGGCGG CGACGTGGGT CGCGAGTTCA TGTTGGCGCG
      801  CGACCTGCTG GCCCTGTGGC GCCTGTGCAT GAAGCGCGAG GGTTCTATCT
      851  TCAGCTGGCG AGACGGTAAC GAGGCGCTGA CGACGGTCGT CTTGAACGGG
      901  AGCCAGACTT ACGAGGATCC GGCCCACGGC AACTGGTTAA AAGAGACGTG
      951  CTCGCTGAAC GTGCTGCAGG TATTTGTGGT GCGGGCCGTG CCGGTGGAGT
     1001  CGCAGCAGCG CCTGGACATC TCCATACTGG TGAACGAGAG CGGCGCCGTC
     1051  TTCGGCGTGC ATCCCGATAC GCGGCAGGCG CACTTTCTGG CGCGCGGACT
     1101  CCTGGGCTTC TTTCGCGTCG GGTTCTTGCG GTTCTGCAAC AACTACTGCT
     1151  TCGCCCGCGA CTGTTTTACC CACCCTGAAA GCGTGGCACC CGCTTACCGC
     1201  GCCACCGGCT GTCCCAGAGA ACTGTTTTGT CGTCGTTTGC GCAAAAAGAA
     1251  GGGGCTCTTT GCTCGAAGG
!!NA_SEQUENCE 1.0
  ASSEMBLE    October 27, 2003 15:32
Symbols:     1 to: 2718  from: merlin.seq /rev   ck: 8363, 57946 to: 60663
LOCUS       MERLIN                235645 bp    DNA     linear   VRL 14-AUG-2003
DEFINITION  Human herpesvirus 5 strain Merlin, complete genome.
ACCESSION   MERLIN
VERSION
KEYWORDS    .
SOURCE      Human herpesvirus 5 . . .
merlin_ul45.cds  Length: 2718  October 27, 2003 15:32  Type: N  Check: 4998  ..
        1  ATGAATCCGG CTGACGCGGA CGAGGAACAG CGGGTGTCCT CGGTGCCCGC
       51  ACATCGGTGC CGGCCAGGTA GGATTCCAAG CCGCAGCGCG GAAACCGAGA
      101  CGGAGGAATC GTCGGCAGAG GTCGCCGCTG ATACTATCGG GGGAGATGAC
      151  AGCGAGCTCG AGGAGGGGCC GCTGCCCGGG GGTGACAAGG AAGCGTCCGC
      201  TGGAAATACC AACGTATCGA GCGGTGTAGC ATGTGTAGCG GGTTTTACGA
      251  GTGGTGGCGG CGTCGTCAGT TGGCGTCCCG AGTCGCCGTC TCCCGACGGC
      301  ACGCCGTCTG TGCTGTCGTT GACGCGTGAC AGCGGTCCCG CCGTGCCCAG
      351  TCGCGGTGGA CGCGTGAGTA GCGGTCTGAG CACCTTTAAT CCGGCCGGCG
      401  CGACCAGGAT GGAGCTGGAC AGTGTCGAGG AGGAGGACGA TTTCGGGGCT
      451  TCGCTCTGCA AAGTATCGCC GCCGATACAA GCTATGCGCA TGTTGATGGG
      501  CAAAAAGTGT CATTGTCACG GCTACTGGGG CAAGTTTCGC TTTTGCGGCG
      551  TACAGGAGCC GGCGCGGGAG CTGCCGTCCG ACAGGAACGC GCTGTGGCGC
      601  GAGATGGACA CCGTGTCGCG GCACAGTGCC GGTTTGGGCA GTTTCAGGCT
      651  ATTTCAGCTC ATTATGCGCC ACGGTCCCTG TCTGATTCGT CACTCGCCGC
      701  GTTGCGACCT GCTGTTGGGT CGCTTTTATT TCAAAGCCAA CTGGGCGCGT
      751  GAAAGCCGCA CGCCACTGTG TTACGCTTCG GAGCTGTGCG ATGAGTCGGT
      801  GCGCCGTTTT GTGCTGCGTC ACATGGAGGA TCTACCCAAG CTGGCCGAGG
      851  AGACGGCGCG TTTTGTGGAA TTGGCCGGTT GCTGGGGCTT GTACGCGGCC
      901  ATTTTGTGTT TGGATAAGGT GTGTCGCCAA CTGCACGGAC AGGACGAGAG
      951  CCCGGGCGGC GTGTTTTTGC GCATCGCCGT GGCGTTGACG GCCGCTATCG
     1001  AGAACAGTAG GCACTCGCGC ATCTATCGTT TCCATCTGGA TGCGCGTTTC
     1051  GAGGGCGAGG TGTTGGAATC GGTGTTGAAG CGCTGTCGCG ATGGGCAGCT
     1101  GTCGCTGTCC ACCTTCACCA TGTCTACCGT GGGTTTCGAT CGCGTGCCGC
     1151  AGTACGACTT TCTGATCTCG GCCGACCCTT TCTCGCGTGA CGCCAGTTGG
     1201  GCGGCCATGT GCAAGTGGAT GAGTACCTTG AGTTGCGGCG TTTCTGTGTC
     1251  GGTGAACGTA ACGCGACTTA ACGCCGATGT GAACAGCGTG ATTCGTTGCC
     1301  TGGGGGGATA CTGCGATTTG ATACGCGAGA AGGAGGTGCA TCGACCCGTG
     1351  GTACGTGTGT TTGTGGACAT GTGGGACGTG GCCGCTATCC GCGTGATTAA
     1401  CTTTATTCTC AAAGAAAGCA CGTCGGAGTT GACGGGGGTT TGCTACGCTT
     1451  TCAACGTGCC TAGCGTGTTA ATGAAGCGCT ACCGTGCGCG TGAGCAGCGC
     1501  TACTCGCTGT TTGGGCGGCC TGTCTCCCGG CGGCTCTCGG ACCTGGGTCA
     1551  GGAGTCGGCT TTCGAGAAAG AGTATTCGCG CTGCGAGCAA TCGTGCCCCA
     1601  AGGTGGTCGT GAACACGGAC GATTTTCTGA AAAAGATGTT GCTGTGCGCG
     1651  CTCAAGGGCC GTGCCTCGGT GGTCTTTGTC CATCACGTAG TCAAGTACTC
     1701  GATTATGGCC GACAGCGTGT GCCTGCCGCC GTGCTTGAGT CCCGATATGG
     1751  CGTCGTGCCA CTTTGGCGAG TGTGACATGC CGGTGCAGCG GCTGACGGTG
     1801  AACGTGGCTC GCTGCGTGTT TGCGCGTAGC GACGAGCAGA AGCTGCATCT
     1851  ACCCGACGTG GTTTTGGGGA ACACGCGACG TTACTTTGAT TTGAGCGTGC
     1901  TGCGCGAGTT GGTGACCGAG GCGGTGGTTT GGGGCAACGC GCGCTTGGAC
     1951  GCGCTAATGT CGGCGTCCGA ATGGTGGGTA GAGAGCGCGC TGGAAAAACT
     2001  GCGTCCGCTG CACATCGGCG TGGCTGGCTT GCACACGGCG CTCATGCGGT
     2051  TAGGGTTCAC GTACTTTGCC TCTTGGGACT TGATCGAGCG CATCTTTGAG
     2101  CACATGTACT TTGCCGCGGT GCGCGCTAGC GTCGATTTGT GCAAGTCGGG
     2151  TTTGCCGCGC TGCGAGTGGT TCGAACGCAC CATCTATCAA GAGGGCAAAT
     2201  TCATTTTCGA ATTGTATCGG TTGCCGCGGC TCTCCATCGC CAGCGCGCGC
     2251  TGGGAAGCGC TGCGCGCCGA CATGCTCGAG TTCGGATTGC GCAACTGTCA
     2301  GTTTCTGGCG GTGGGTCCCG ACGACGAGGT GGCGCATCTG TGGGGCGTGA
     2351  CGCCGTCAGT GTGGGCTTCG CGCGGCACCG TGTTCGAGGA GGAGACGGTG
     2401  TGGTCATTGT GCCCGCCCAA CCGTGAGTGT TACTTCCCCA CCGTGGTGCG
     2451  GAGGCCGCTG CGCGTGCCCG TGGTGAATTA CGCGTGGTTG GAGCAGCACC
     2501  AGGAGGAGGG CAAGGCGACG CAGTGTCTGT TCCAGGCGGC ACCGGCGATC
     2551  CAAAACGACG TGGAAATGGC GGCCGTGAAC CTGAGCGTGT TTGTGGACCA
     2601  GTGCGTGGCC CTGGTTTTCT ACTATGACTC GGGGATGACG CCCGACGTGC
     2651  TTCTGGCCAG GATGCTCAAG TGGTACCACT GGCGCTTTAA GGTCGGAGTA
     2701  TATAAGTACT GTGCCTCT
    
    
More information about the Bioperl-l
mailing list