[Bioperl-l] gcg.pm, another one
Hilmar Lapp
hlapp at gnf.org
Fri Oct 31 14:10:08 EST 2003
The SeqIO gcg parser was written for single-sequence gcg files.
In fact, the parser I think doesn't currently have a maintainer, so anyone
who is willing to become that is welcome. Since GCG has been commercialized
since many years, you'll not easily find people among the core who actively
use GCG, just to explain why there is no maintainer.
-hilmar
On 10/31/03 7:40 AM, "Derek Gatherer" <d.gatherer at vir.gla.ac.uk> wrote:
> Hello again
>
> Last bug submitted to bugzilla. The following may be a bug, but I wonder
> if there is a problem with my GCG format.
>
> Try this script:
>
> #!/usr/bin/perl -w
>
> use lib "/usr/local/lib/site_perl/5.8.0/";
> use strict;
> use Bio::Seq;
> use Bio::SeqIO;
>
> my $dnain = Bio::SeqIO->new( '-format' => 'GCG' , -file => "cds.gcg");
>
> while((my $seqobj = $dnain->next_seq()))
> {
> $seqobj->display_id;
> }
>
> on the file cds.gcg below.
>
> to get:
>
> ------------- EXCEPTION -------------
> MSG: Looks like start of another sequence. See documentation.
> STACK Bio::SeqIO::gcg::next_seq
> /usr/local/lib/site_perl/5.8.0//Bio/SeqIO/gcg.pm:124
> STACK toplevel gcgtest.pl:10
>
> The problem, I think, is that the SeqIO stream doesn't seem to recognise
> the change over from one sequence to another. Or do I need some record
> separator between my sequences??? If this really is also a bug, I'll
> submit it too.
>
> Offending line in gcg.pm is:
>
> 124 if( /\.\.$/ ) {
> $self->throw("Looks like start of another sequence. See
> documentation. ");
> }
>
> and here's the file cds.gcg, containing two sequences in GCG format
>
> !!NA_SEQUENCE 1.0
> ASSEMBLE October 27, 2003 15:32
>
> Symbols: 1 to: 1269 from: merlin.seq /rev ck: 8363, 55186 to: 56454
>
> LOCUS MERLIN 235645 bp DNA linear VRL
> 14-AUG-2003
> DEFINITION Human herpesvirus 5 strain Merlin, complete genome.
> ACCESSION MERLIN
> VERSION
> KEYWORDS .
> SOURCE Human herpesvirus 5 . . .
>
> merlin_ul43.cds Length: 1269 October 27, 2003 15:32 Type: N Check: 9039
> ..
>
> 1 ATGGAGAAAA CGCCGGCGGA GACGACGGCG GTTTCAGCTG GCAACGTGCC
>
> 51 ACGTGACTCA ATTCCGTGTA TAACTAACGT GTCCGCGGAC ACCCGCGGCC
>
> 101 GTACCCGCCC CAGCAGACCA GCCACCGTCC CTCAGCGACG TCCCGCGCGG
>
> 151 ATCGGACACT TTAGGCGGCG CAGCGCCAGC CTTAGCTTTC TTGACTGGCC
>
> 201 GGACGACAGC GTCACAGAGG GCGTTCGGAC GACCTCCGCG TCGGTCGCCG
>
> 251 CCTCCGCGGC CCGTTTCGAC GAAATCCGGC GGCGCCGCCA GAGCATCAAC
>
> 301 GACGAGATGA AGGAACGCAC GCTGGAGGAC GCGCTGGCTG TCGAGCTGGT
>
> 351 CAACGAGACC TTCCGCTGCT CTGTCACCTC CGACGCCCGC AAGGACTTGC
>
> 401 AGAAGCTGGT TCGTCGCGTC AGCGGCACGG TGCTGCGTCT CAGCTGGCCA
>
> 451 AACGGTTGGT TCTTCACCTA CTGCGACCTG TTACGCGTCG GCTACTTTGG
>
> 501 ACATCTCAAT ATTAAAGGTT TGGAGAAGAC CTTCCTGTGC TGCGACAAGT
>
> 551 TCTTGCTGCC GGTGGGCACT GTGAGTCGTT GCGAAGCCAT CGGCCGCCCA
>
> 601 CCGCTACCCG TACTCATCGG CGAGGGCGGT CGCGTCTACG TCTACTCGCC
>
> 651 TGTGGTGGAA TCGCTGTACC TGGTGTCGCG GTCCGGTTTC CGCGGCTTCG
>
> 701 TGCAGGAGGG CCTGCGCAAC TACGCGCCGC TGCGCGAAGA ACTGGGCTAT
>
> 751 GTCCGCTTCG AGACCGGCGG CGACGTGGGT CGCGAGTTCA TGTTGGCGCG
>
> 801 CGACCTGCTG GCCCTGTGGC GCCTGTGCAT GAAGCGCGAG GGTTCTATCT
>
> 851 TCAGCTGGCG AGACGGTAAC GAGGCGCTGA CGACGGTCGT CTTGAACGGG
>
> 901 AGCCAGACTT ACGAGGATCC GGCCCACGGC AACTGGTTAA AAGAGACGTG
>
> 951 CTCGCTGAAC GTGCTGCAGG TATTTGTGGT GCGGGCCGTG CCGGTGGAGT
>
> 1001 CGCAGCAGCG CCTGGACATC TCCATACTGG TGAACGAGAG CGGCGCCGTC
>
> 1051 TTCGGCGTGC ATCCCGATAC GCGGCAGGCG CACTTTCTGG CGCGCGGACT
>
> 1101 CCTGGGCTTC TTTCGCGTCG GGTTCTTGCG GTTCTGCAAC AACTACTGCT
>
> 1151 TCGCCCGCGA CTGTTTTACC CACCCTGAAA GCGTGGCACC CGCTTACCGC
>
> 1201 GCCACCGGCT GTCCCAGAGA ACTGTTTTGT CGTCGTTTGC GCAAAAAGAA
>
> 1251 GGGGCTCTTT GCTCGAAGG
>
> !!NA_SEQUENCE 1.0
> ASSEMBLE October 27, 2003 15:32
>
> Symbols: 1 to: 2718 from: merlin.seq /rev ck: 8363, 57946 to: 60663
>
> LOCUS MERLIN 235645 bp DNA linear VRL
> 14-AUG-2003
> DEFINITION Human herpesvirus 5 strain Merlin, complete genome.
> ACCESSION MERLIN
> VERSION
> KEYWORDS .
> SOURCE Human herpesvirus 5 . . .
>
> merlin_ul45.cds Length: 2718 October 27, 2003 15:32 Type: N Check: 4998
> ..
>
> 1 ATGAATCCGG CTGACGCGGA CGAGGAACAG CGGGTGTCCT CGGTGCCCGC
>
> 51 ACATCGGTGC CGGCCAGGTA GGATTCCAAG CCGCAGCGCG GAAACCGAGA
>
> 101 CGGAGGAATC GTCGGCAGAG GTCGCCGCTG ATACTATCGG GGGAGATGAC
>
> 151 AGCGAGCTCG AGGAGGGGCC GCTGCCCGGG GGTGACAAGG AAGCGTCCGC
>
> 201 TGGAAATACC AACGTATCGA GCGGTGTAGC ATGTGTAGCG GGTTTTACGA
>
> 251 GTGGTGGCGG CGTCGTCAGT TGGCGTCCCG AGTCGCCGTC TCCCGACGGC
>
> 301 ACGCCGTCTG TGCTGTCGTT GACGCGTGAC AGCGGTCCCG CCGTGCCCAG
>
> 351 TCGCGGTGGA CGCGTGAGTA GCGGTCTGAG CACCTTTAAT CCGGCCGGCG
>
> 401 CGACCAGGAT GGAGCTGGAC AGTGTCGAGG AGGAGGACGA TTTCGGGGCT
>
> 451 TCGCTCTGCA AAGTATCGCC GCCGATACAA GCTATGCGCA TGTTGATGGG
>
> 501 CAAAAAGTGT CATTGTCACG GCTACTGGGG CAAGTTTCGC TTTTGCGGCG
>
> 551 TACAGGAGCC GGCGCGGGAG CTGCCGTCCG ACAGGAACGC GCTGTGGCGC
>
> 601 GAGATGGACA CCGTGTCGCG GCACAGTGCC GGTTTGGGCA GTTTCAGGCT
>
> 651 ATTTCAGCTC ATTATGCGCC ACGGTCCCTG TCTGATTCGT CACTCGCCGC
>
> 701 GTTGCGACCT GCTGTTGGGT CGCTTTTATT TCAAAGCCAA CTGGGCGCGT
>
> 751 GAAAGCCGCA CGCCACTGTG TTACGCTTCG GAGCTGTGCG ATGAGTCGGT
>
> 801 GCGCCGTTTT GTGCTGCGTC ACATGGAGGA TCTACCCAAG CTGGCCGAGG
>
> 851 AGACGGCGCG TTTTGTGGAA TTGGCCGGTT GCTGGGGCTT GTACGCGGCC
>
> 901 ATTTTGTGTT TGGATAAGGT GTGTCGCCAA CTGCACGGAC AGGACGAGAG
>
> 951 CCCGGGCGGC GTGTTTTTGC GCATCGCCGT GGCGTTGACG GCCGCTATCG
>
> 1001 AGAACAGTAG GCACTCGCGC ATCTATCGTT TCCATCTGGA TGCGCGTTTC
>
> 1051 GAGGGCGAGG TGTTGGAATC GGTGTTGAAG CGCTGTCGCG ATGGGCAGCT
>
> 1101 GTCGCTGTCC ACCTTCACCA TGTCTACCGT GGGTTTCGAT CGCGTGCCGC
>
> 1151 AGTACGACTT TCTGATCTCG GCCGACCCTT TCTCGCGTGA CGCCAGTTGG
>
> 1201 GCGGCCATGT GCAAGTGGAT GAGTACCTTG AGTTGCGGCG TTTCTGTGTC
>
> 1251 GGTGAACGTA ACGCGACTTA ACGCCGATGT GAACAGCGTG ATTCGTTGCC
>
> 1301 TGGGGGGATA CTGCGATTTG ATACGCGAGA AGGAGGTGCA TCGACCCGTG
>
> 1351 GTACGTGTGT TTGTGGACAT GTGGGACGTG GCCGCTATCC GCGTGATTAA
>
> 1401 CTTTATTCTC AAAGAAAGCA CGTCGGAGTT GACGGGGGTT TGCTACGCTT
>
> 1451 TCAACGTGCC TAGCGTGTTA ATGAAGCGCT ACCGTGCGCG TGAGCAGCGC
>
> 1501 TACTCGCTGT TTGGGCGGCC TGTCTCCCGG CGGCTCTCGG ACCTGGGTCA
>
> 1551 GGAGTCGGCT TTCGAGAAAG AGTATTCGCG CTGCGAGCAA TCGTGCCCCA
>
> 1601 AGGTGGTCGT GAACACGGAC GATTTTCTGA AAAAGATGTT GCTGTGCGCG
>
> 1651 CTCAAGGGCC GTGCCTCGGT GGTCTTTGTC CATCACGTAG TCAAGTACTC
>
> 1701 GATTATGGCC GACAGCGTGT GCCTGCCGCC GTGCTTGAGT CCCGATATGG
>
> 1751 CGTCGTGCCA CTTTGGCGAG TGTGACATGC CGGTGCAGCG GCTGACGGTG
>
> 1801 AACGTGGCTC GCTGCGTGTT TGCGCGTAGC GACGAGCAGA AGCTGCATCT
>
> 1851 ACCCGACGTG GTTTTGGGGA ACACGCGACG TTACTTTGAT TTGAGCGTGC
>
> 1901 TGCGCGAGTT GGTGACCGAG GCGGTGGTTT GGGGCAACGC GCGCTTGGAC
>
> 1951 GCGCTAATGT CGGCGTCCGA ATGGTGGGTA GAGAGCGCGC TGGAAAAACT
>
> 2001 GCGTCCGCTG CACATCGGCG TGGCTGGCTT GCACACGGCG CTCATGCGGT
>
> 2051 TAGGGTTCAC GTACTTTGCC TCTTGGGACT TGATCGAGCG CATCTTTGAG
>
> 2101 CACATGTACT TTGCCGCGGT GCGCGCTAGC GTCGATTTGT GCAAGTCGGG
>
> 2151 TTTGCCGCGC TGCGAGTGGT TCGAACGCAC CATCTATCAA GAGGGCAAAT
>
> 2201 TCATTTTCGA ATTGTATCGG TTGCCGCGGC TCTCCATCGC CAGCGCGCGC
>
> 2251 TGGGAAGCGC TGCGCGCCGA CATGCTCGAG TTCGGATTGC GCAACTGTCA
>
> 2301 GTTTCTGGCG GTGGGTCCCG ACGACGAGGT GGCGCATCTG TGGGGCGTGA
>
> 2351 CGCCGTCAGT GTGGGCTTCG CGCGGCACCG TGTTCGAGGA GGAGACGGTG
>
> 2401 TGGTCATTGT GCCCGCCCAA CCGTGAGTGT TACTTCCCCA CCGTGGTGCG
>
> 2451 GAGGCCGCTG CGCGTGCCCG TGGTGAATTA CGCGTGGTTG GAGCAGCACC
>
> 2501 AGGAGGAGGG CAAGGCGACG CAGTGTCTGT TCCAGGCGGC ACCGGCGATC
>
> 2551 CAAAACGACG TGGAAATGGC GGCCGTGAAC CTGAGCGTGT TTGTGGACCA
>
> 2601 GTGCGTGGCC CTGGTTTTCT ACTATGACTC GGGGATGACG CCCGACGTGC
>
> 2651 TTCTGGCCAG GATGCTCAAG TGGTACCACT GGCGCTTTAA GGTCGGAGTA
>
> 2701 TATAAGTACT GTGCCTCT
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list