[Bioperl-l] Bug in GCG SeqIO Formatting?
    Tex Thompson 
    tex at biosysadmin.com
       
    Mon Feb 16 17:49:04 EST 2004
    
    
  
Hello Mailing List,
I have a user complaining that the following code isn't working on his
GCG-formatted sequence files:
#!/usr/bin/perl
use strict;
use Bio::SeqIO; 
my $io  = Bio::SeqIO->new( -file => "af317472.gbpln3", -format => "gcg");
my $out = Bio::SeqIO->new( -fh => \*STDOUT, -format => "fasta" );
while ( my $seq = $io->next_seq ) {
   $out->write_seq( $seq );
}
Here's an example sequence file:
!!NA_SEQUENCE 1.0
LOCUS       AF317472                2679 bp    DNA     linear   PLN 07-DEC-2000
DEFINITION  Candida albicans cAMP-dependent protein kinase regulatory subunit
            (PKA-R) gene, complete cds.
ACCESSION   AF317472
VERSION     AF317472.1  GI:11596392
KEYWORDS    .
SOURCE      Candida albicans
  ORGANISM  Candida albicans
            Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes;
            Saccharomycetales; mitosporic Saccharomycetales; Candida.
REFERENCE   1  (bases 1 to 2679)
  AUTHORS   Giasson,L. and Parrot,M.
  TITLE     Sequence of the Candida albicans cAMP-dependent protein kinase
            regulatory subunit
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 2679)
  AUTHORS   Giasson,L. and Parrot,M.
  TITLE     Direct Submission
  JOURNAL   Submitted (27-OCT-2000) School of Dentistry, Laval University,
            GREB, Ste-Foy, Quebec G1K 7P4, Canada
FEATURES             Location/Qualifiers
     source          1. .2679
                     /organism="Candida albicans"
                     /mol_type="genomic DNA"
                     /strain="CAI4"
                     /db_xref="taxon:5476"
     gene            <977. .>2356
                     /gene="PKA-R"
     mRNA            <977. .>2356
                     /gene="PKA-R"
                     /product="cAMP-dependent protein kinase regulatory
                     subunit"
     CDS             977. .2356
                     /gene="PKA-R"
                     /codon_start=1
                     /transl_table=12
                     /product="cAMP-dependent protein kinase regulatory
                     subunit"
                     /protein_id="AAG38599.1"
                     /db_xref="GI:11596393"
                     /translation="MSNPQQQFISDELSQLQKEIISKNPQDVLQFCANYFNTKLQAQR
                     SELWSQQAKAEAAGIDLFPSVDHVNVNSSGVSIVNDRQPSFKSPFGVNDPHSNHDEDP
                     HAKDTKTDTAAAAVGGGIFKSNFDVKKSASNPPTKEVDPDDPSKPSSSSQPNQQSASA
                     SSKTPSSKIPVAFNANRRTSVSAEALNPAKLKLDSWKPPVNNLSITEEETLANNLKNN
                     FLFKQLDANSKKTVIAALQQKSFAKDTVIIQQGDEGDFFYIIETGTVDFYVNDAKVSS
                     SSEGSSFGELALMYNSPRAATAVAATDVVCWALDRLTFRRILLEGTFNKRLMYEDFLK
                     DIEVLKSLSDHARSKLADALSTEMYHKGDKIVTEGEQGENFYLIESGNCQVYNEKLGN
                     IKQLTKGDYFGELALIKDLPRQATVEALDNVIVATLGKSGFQRLLGPVVEVLKEQDPT
                     KSQDPTAGH"
ORIGIN
AF317472  Length: 2679  February 16, 2004 17:02  Type: N  Check: 9369  ..
       1  GAATTCAAAA AATCAAAAAA ATCAAAAAAA AACCGTGGAA GGTAAGTTGT 
      51  ATATTTATAA ATCAACGTGA ATAATTTTCA ACACTGTGTC AACATCTGTG 
     101  AAAAAAACCT GTGTGTACTG CATATAGGAC CTCACCTATT ACGTAGAATA 
     151  TACTAGAAAT AGTTACAACC ATAAAAAGAT TAATTGTGCT TACGTGGCAA 
     201  CTTTGAGATT TTTCTTTTTT CTGTTTCTTT CTTTCTTTTT TTGGCTTAAA 
     251  CAACAAATGT CGCAAATTAT ACAAACGACA TTTGCTGCCC ATGTCATTTT 
     301  GTCGTTATCA CGTGAAGTGT CGCAGATTTA TGTATTCTCA CTTCATTTCT 
     351  ATGGTCATCA ATTGTTCATT CATTCTCTAT CTTCAAAAAT CTGTGATTTG 
     401  ATGATTTTGA TTAAAAGAAA GCAAAGAGAA TACTGAAAAA AAGCAAAGAG 
     451  AATATAGAAA AGAAACAATA AAAGAATAGT TTCTAAGTTA CTTTGGAGTC 
     501  TGCTATTACC ATGTATCTAT GTGATTGCCC TATCAAATTG GACAATACGG 
     551  GTTTTTGTTT AGTCACGATA ATCACAAACT TCCCCCAGCA ATGACATACG 
     601  TAGCAAGTAA TATTTATATC TCTTCTATTT TTTTGATCTT ACATAATCTG 
     651  TCGTGTTTTT TTAAGTTGTT GTTATGAAGA AGTAATTTCA TAATGATCAA 
     701  GTGTGTAACT GAAATTTCAT CGCAATTTTA AACAAACAAG CTAATAATTA 
     751  TTATTATTAA TAGTTAATTT GCTAAGTTGA GTAAAATTTG CTTTTCTTGA 
     801  GAAAAAGGAG AAATTACTTT GGGAGTGAGT TTGAAGAGAG AAACTAAAGT 
     851  AAGTAAATGA GTGAGAGGGA GAGACAGAGA GCGAGAGGGG GAGTAAAAAA 
     901  AAAAGTTGCC CACAAACAAA TTGTGATACC GGTCTTTTAG CATATATCTT 
     951  CTACTCTTCA ATCAACATCT TTACCAATGT CTAATCCTCA ACAACAATTC 
    1001  ATATCTGATG AATTGTCGCA GTTACAGAAA GAAATAATTT CCAAAAACCC 
    1051  GCAAGATGTC TTACAGTTTT GCGCCAACTA TTTCAACACC AAGTTACAAG 
    1101  CTCAAAGAAG TGAGTTATGG TCGCAACAAG CTAAAGCAGA AGCCGCAGGC 
    1151  ATCGACTTAT TCCCATCTGT TGATCATGTG AATGTTAATT CTAGTGGTGT 
    1201  GAGCATTGTG AATGATAGAC AACCAAGTTT TAAATCACCT TTTGGTGTTA 
    1251  ATGATCCACA TCTGAATCAC GACGAAGATC CCCATGCCAA AGATACCAAA 
    1301  ACAGATACTG CTGCTGCTGC TGTTGGTGGG GGTATTTTCA AATCAAATTT 
    1351  TGATGTTAAA AAGAGTGCTT CTAATCCTCC AACCAAGGAA GTAGATCCAG 
    1401  ATGACCCATC AAAACCATCG TCATCGAGCC AACCAAATCA ACAATCAGCA 
    1451  TCAGCATCAT CAAAAACGCC ATCATCAAAG ATCCCAGTTG CTTTCAACGC 
    1501  TAATAGAAGA ACATCTGTAT CTGCTGAAGC CTTGAATCCA GCAAAATTGA 
    1551  AATTAGATAG TTGGAAACCT CCAGTTAATA ATTTGAGCAT TACCGAAGAA 
    1601  GAAACATTAG CCAACAATTT AAAGAACAAT TTCCTTTTCA AACAATTGGA 
    1651  CGCAAACTCT AAGAAAACTG TGATTGCTGC TTTACAACAA AAATCATTTG 
    1701  CTAAAGATAC AGTAATTATC CAACAAGGTG ATGAAGGGGA CTTTTTTTAC 
    1751  ATTATTGAAA CTGGTACAGT TGATTTCTAT GTTAATGATG CTAAAGTAAG 
    1801  TTCCAGTAGC GAAGGGTCAT CTTTTGGGGA ATTGGCTTTG ATGTATAATT 
    1851  CACCAAGAGC TGCTACGGCA GTTGCTGCCA CCGATGTTGT CTGTTGGGCA 
    1901  TTGGACCGTT TGACATTCCG TCGAATTCTT TTGGAAGGTA CTTTTAACAA 
    1951  GAGATTGATG TACGAGGATT TCTTAAAAGA TATTGAGGTT TTGAAATCTC 
    2001  TTTCGGATCA TGCACGTTCA AAATTGGCAG ATGCATTGAG CACAGAAATG 
    2051  TATCACAAGG GTGATAAAAT AGTCACTGAA GGTGAACAAG GAGAGAACTT 
    2101  TTATTTAATA GAAAGTGGAA ACTGTCAAGT TTACAATGAA AAGTTGGGCA 
    2151  ATATCAAACA ATTAACAAAA GGTGATTATT TTGGTGAGCT TGCATTAATA 
    2201  AAAGACTTAC CAAGACAAGC TACTGTGGAA GCATTGGATA ATGTAATCGT 
    2251  TGCCACATTA GGTAAATCCG GGTTCCAAAG ATTATTGGGT CCTGTTGTGG 
    2301  AGGTATTGAA AGAACAAGAC CCTACAAAGA GTCAAGACCC AACTGCTGGT 
    2351  CATTAAGTGT ACAATAAGTA GTTGTTTATT ATCTTATATT GTTTTATGTT 
    2401  AGTATATTCT ATCTTTTTTT TTTTGGCTTA CTCACCTTCT GGTGTTTTCG 
    2451  TTGCGATTTT GATAATGGAT GGTTGGTGCA AAAGTTCAAC TACATTTCTT 
    2501  GTTGTCAGGT ATATACGAGA TGGCAGCATG AACGAGCTCA CCATGGGTTG 
    2551  AACATTATTG AAGTTATCCG GCCGTGCCTT TTGCGAAACA TGGTAACTAA 
    2601  TATATTGCAA ACTTGGCTTC TACAGAAAAT ATACAATCTA ATACCTTGAG 
    2651  GAATTTCCTC TATATATAAT AGAGAATTC
I'm not a GCG expert, but is this a correctly formatted GCG file in the first
place? If not, is this an error in the SeqIO parser?  I've found this behavior
to be the same on Solaris 8 and on Linux, both running BioPerl 1.4 and Perl
5.8.1.
Thanks a bunch,
Tex Thompson
RIT Bioinformatics
    
    
More information about the Bioperl-l
mailing list