[Bioperl-l] gcg.pm, another one
Derek Gatherer
d.gatherer at vir.gla.ac.uk
Fri Oct 31 10:40:25 EST 2003
Hello again
Last bug submitted to bugzilla. The following may be a bug, but I wonder
if there is a problem with my GCG format.
Try this script:
#!/usr/bin/perl -w
use lib "/usr/local/lib/site_perl/5.8.0/";
use strict;
use Bio::Seq;
use Bio::SeqIO;
my $dnain = Bio::SeqIO->new( '-format' => 'GCG' , -file => "cds.gcg");
while((my $seqobj = $dnain->next_seq()))
{
$seqobj->display_id;
}
on the file cds.gcg below.
to get:
------------- EXCEPTION -------------
MSG: Looks like start of another sequence. See documentation.
STACK Bio::SeqIO::gcg::next_seq
/usr/local/lib/site_perl/5.8.0//Bio/SeqIO/gcg.pm:124
STACK toplevel gcgtest.pl:10
The problem, I think, is that the SeqIO stream doesn't seem to recognise
the change over from one sequence to another. Or do I need some record
separator between my sequences??? If this really is also a bug, I'll
submit it too.
Offending line in gcg.pm is:
124 if( /\.\.$/ ) {
$self->throw("Looks like start of another sequence. See
documentation. ");
}
and here's the file cds.gcg, containing two sequences in GCG format
!!NA_SEQUENCE 1.0
ASSEMBLE October 27, 2003 15:32
Symbols: 1 to: 1269 from: merlin.seq /rev ck: 8363, 55186 to: 56454
LOCUS MERLIN 235645 bp DNA linear VRL 14-AUG-2003
DEFINITION Human herpesvirus 5 strain Merlin, complete genome.
ACCESSION MERLIN
VERSION
KEYWORDS .
SOURCE Human herpesvirus 5 . . .
merlin_ul43.cds Length: 1269 October 27, 2003 15:32 Type: N Check: 9039 ..
1 ATGGAGAAAA CGCCGGCGGA GACGACGGCG GTTTCAGCTG GCAACGTGCC
51 ACGTGACTCA ATTCCGTGTA TAACTAACGT GTCCGCGGAC ACCCGCGGCC
101 GTACCCGCCC CAGCAGACCA GCCACCGTCC CTCAGCGACG TCCCGCGCGG
151 ATCGGACACT TTAGGCGGCG CAGCGCCAGC CTTAGCTTTC TTGACTGGCC
201 GGACGACAGC GTCACAGAGG GCGTTCGGAC GACCTCCGCG TCGGTCGCCG
251 CCTCCGCGGC CCGTTTCGAC GAAATCCGGC GGCGCCGCCA GAGCATCAAC
301 GACGAGATGA AGGAACGCAC GCTGGAGGAC GCGCTGGCTG TCGAGCTGGT
351 CAACGAGACC TTCCGCTGCT CTGTCACCTC CGACGCCCGC AAGGACTTGC
401 AGAAGCTGGT TCGTCGCGTC AGCGGCACGG TGCTGCGTCT CAGCTGGCCA
451 AACGGTTGGT TCTTCACCTA CTGCGACCTG TTACGCGTCG GCTACTTTGG
501 ACATCTCAAT ATTAAAGGTT TGGAGAAGAC CTTCCTGTGC TGCGACAAGT
551 TCTTGCTGCC GGTGGGCACT GTGAGTCGTT GCGAAGCCAT CGGCCGCCCA
601 CCGCTACCCG TACTCATCGG CGAGGGCGGT CGCGTCTACG TCTACTCGCC
651 TGTGGTGGAA TCGCTGTACC TGGTGTCGCG GTCCGGTTTC CGCGGCTTCG
701 TGCAGGAGGG CCTGCGCAAC TACGCGCCGC TGCGCGAAGA ACTGGGCTAT
751 GTCCGCTTCG AGACCGGCGG CGACGTGGGT CGCGAGTTCA TGTTGGCGCG
801 CGACCTGCTG GCCCTGTGGC GCCTGTGCAT GAAGCGCGAG GGTTCTATCT
851 TCAGCTGGCG AGACGGTAAC GAGGCGCTGA CGACGGTCGT CTTGAACGGG
901 AGCCAGACTT ACGAGGATCC GGCCCACGGC AACTGGTTAA AAGAGACGTG
951 CTCGCTGAAC GTGCTGCAGG TATTTGTGGT GCGGGCCGTG CCGGTGGAGT
1001 CGCAGCAGCG CCTGGACATC TCCATACTGG TGAACGAGAG CGGCGCCGTC
1051 TTCGGCGTGC ATCCCGATAC GCGGCAGGCG CACTTTCTGG CGCGCGGACT
1101 CCTGGGCTTC TTTCGCGTCG GGTTCTTGCG GTTCTGCAAC AACTACTGCT
1151 TCGCCCGCGA CTGTTTTACC CACCCTGAAA GCGTGGCACC CGCTTACCGC
1201 GCCACCGGCT GTCCCAGAGA ACTGTTTTGT CGTCGTTTGC GCAAAAAGAA
1251 GGGGCTCTTT GCTCGAAGG
!!NA_SEQUENCE 1.0
ASSEMBLE October 27, 2003 15:32
Symbols: 1 to: 2718 from: merlin.seq /rev ck: 8363, 57946 to: 60663
LOCUS MERLIN 235645 bp DNA linear VRL 14-AUG-2003
DEFINITION Human herpesvirus 5 strain Merlin, complete genome.
ACCESSION MERLIN
VERSION
KEYWORDS .
SOURCE Human herpesvirus 5 . . .
merlin_ul45.cds Length: 2718 October 27, 2003 15:32 Type: N Check: 4998 ..
1 ATGAATCCGG CTGACGCGGA CGAGGAACAG CGGGTGTCCT CGGTGCCCGC
51 ACATCGGTGC CGGCCAGGTA GGATTCCAAG CCGCAGCGCG GAAACCGAGA
101 CGGAGGAATC GTCGGCAGAG GTCGCCGCTG ATACTATCGG GGGAGATGAC
151 AGCGAGCTCG AGGAGGGGCC GCTGCCCGGG GGTGACAAGG AAGCGTCCGC
201 TGGAAATACC AACGTATCGA GCGGTGTAGC ATGTGTAGCG GGTTTTACGA
251 GTGGTGGCGG CGTCGTCAGT TGGCGTCCCG AGTCGCCGTC TCCCGACGGC
301 ACGCCGTCTG TGCTGTCGTT GACGCGTGAC AGCGGTCCCG CCGTGCCCAG
351 TCGCGGTGGA CGCGTGAGTA GCGGTCTGAG CACCTTTAAT CCGGCCGGCG
401 CGACCAGGAT GGAGCTGGAC AGTGTCGAGG AGGAGGACGA TTTCGGGGCT
451 TCGCTCTGCA AAGTATCGCC GCCGATACAA GCTATGCGCA TGTTGATGGG
501 CAAAAAGTGT CATTGTCACG GCTACTGGGG CAAGTTTCGC TTTTGCGGCG
551 TACAGGAGCC GGCGCGGGAG CTGCCGTCCG ACAGGAACGC GCTGTGGCGC
601 GAGATGGACA CCGTGTCGCG GCACAGTGCC GGTTTGGGCA GTTTCAGGCT
651 ATTTCAGCTC ATTATGCGCC ACGGTCCCTG TCTGATTCGT CACTCGCCGC
701 GTTGCGACCT GCTGTTGGGT CGCTTTTATT TCAAAGCCAA CTGGGCGCGT
751 GAAAGCCGCA CGCCACTGTG TTACGCTTCG GAGCTGTGCG ATGAGTCGGT
801 GCGCCGTTTT GTGCTGCGTC ACATGGAGGA TCTACCCAAG CTGGCCGAGG
851 AGACGGCGCG TTTTGTGGAA TTGGCCGGTT GCTGGGGCTT GTACGCGGCC
901 ATTTTGTGTT TGGATAAGGT GTGTCGCCAA CTGCACGGAC AGGACGAGAG
951 CCCGGGCGGC GTGTTTTTGC GCATCGCCGT GGCGTTGACG GCCGCTATCG
1001 AGAACAGTAG GCACTCGCGC ATCTATCGTT TCCATCTGGA TGCGCGTTTC
1051 GAGGGCGAGG TGTTGGAATC GGTGTTGAAG CGCTGTCGCG ATGGGCAGCT
1101 GTCGCTGTCC ACCTTCACCA TGTCTACCGT GGGTTTCGAT CGCGTGCCGC
1151 AGTACGACTT TCTGATCTCG GCCGACCCTT TCTCGCGTGA CGCCAGTTGG
1201 GCGGCCATGT GCAAGTGGAT GAGTACCTTG AGTTGCGGCG TTTCTGTGTC
1251 GGTGAACGTA ACGCGACTTA ACGCCGATGT GAACAGCGTG ATTCGTTGCC
1301 TGGGGGGATA CTGCGATTTG ATACGCGAGA AGGAGGTGCA TCGACCCGTG
1351 GTACGTGTGT TTGTGGACAT GTGGGACGTG GCCGCTATCC GCGTGATTAA
1401 CTTTATTCTC AAAGAAAGCA CGTCGGAGTT GACGGGGGTT TGCTACGCTT
1451 TCAACGTGCC TAGCGTGTTA ATGAAGCGCT ACCGTGCGCG TGAGCAGCGC
1501 TACTCGCTGT TTGGGCGGCC TGTCTCCCGG CGGCTCTCGG ACCTGGGTCA
1551 GGAGTCGGCT TTCGAGAAAG AGTATTCGCG CTGCGAGCAA TCGTGCCCCA
1601 AGGTGGTCGT GAACACGGAC GATTTTCTGA AAAAGATGTT GCTGTGCGCG
1651 CTCAAGGGCC GTGCCTCGGT GGTCTTTGTC CATCACGTAG TCAAGTACTC
1701 GATTATGGCC GACAGCGTGT GCCTGCCGCC GTGCTTGAGT CCCGATATGG
1751 CGTCGTGCCA CTTTGGCGAG TGTGACATGC CGGTGCAGCG GCTGACGGTG
1801 AACGTGGCTC GCTGCGTGTT TGCGCGTAGC GACGAGCAGA AGCTGCATCT
1851 ACCCGACGTG GTTTTGGGGA ACACGCGACG TTACTTTGAT TTGAGCGTGC
1901 TGCGCGAGTT GGTGACCGAG GCGGTGGTTT GGGGCAACGC GCGCTTGGAC
1951 GCGCTAATGT CGGCGTCCGA ATGGTGGGTA GAGAGCGCGC TGGAAAAACT
2001 GCGTCCGCTG CACATCGGCG TGGCTGGCTT GCACACGGCG CTCATGCGGT
2051 TAGGGTTCAC GTACTTTGCC TCTTGGGACT TGATCGAGCG CATCTTTGAG
2101 CACATGTACT TTGCCGCGGT GCGCGCTAGC GTCGATTTGT GCAAGTCGGG
2151 TTTGCCGCGC TGCGAGTGGT TCGAACGCAC CATCTATCAA GAGGGCAAAT
2201 TCATTTTCGA ATTGTATCGG TTGCCGCGGC TCTCCATCGC CAGCGCGCGC
2251 TGGGAAGCGC TGCGCGCCGA CATGCTCGAG TTCGGATTGC GCAACTGTCA
2301 GTTTCTGGCG GTGGGTCCCG ACGACGAGGT GGCGCATCTG TGGGGCGTGA
2351 CGCCGTCAGT GTGGGCTTCG CGCGGCACCG TGTTCGAGGA GGAGACGGTG
2401 TGGTCATTGT GCCCGCCCAA CCGTGAGTGT TACTTCCCCA CCGTGGTGCG
2451 GAGGCCGCTG CGCGTGCCCG TGGTGAATTA CGCGTGGTTG GAGCAGCACC
2501 AGGAGGAGGG CAAGGCGACG CAGTGTCTGT TCCAGGCGGC ACCGGCGATC
2551 CAAAACGACG TGGAAATGGC GGCCGTGAAC CTGAGCGTGT TTGTGGACCA
2601 GTGCGTGGCC CTGGTTTTCT ACTATGACTC GGGGATGACG CCCGACGTGC
2651 TTCTGGCCAG GATGCTCAAG TGGTACCACT GGCGCTTTAA GGTCGGAGTA
2701 TATAAGTACT GTGCCTCT
More information about the Bioperl-l
mailing list