[Bioperl-l] Bug in GCG SeqIO Formatting?
Tex Thompson
tex at biosysadmin.com
Mon Feb 16 17:49:04 EST 2004
Hello Mailing List,
I have a user complaining that the following code isn't working on his
GCG-formatted sequence files:
#!/usr/bin/perl
use strict;
use Bio::SeqIO;
my $io = Bio::SeqIO->new( -file => "af317472.gbpln3", -format => "gcg");
my $out = Bio::SeqIO->new( -fh => \*STDOUT, -format => "fasta" );
while ( my $seq = $io->next_seq ) {
$out->write_seq( $seq );
}
Here's an example sequence file:
!!NA_SEQUENCE 1.0
LOCUS AF317472 2679 bp DNA linear PLN 07-DEC-2000
DEFINITION Candida albicans cAMP-dependent protein kinase regulatory subunit
(PKA-R) gene, complete cds.
ACCESSION AF317472
VERSION AF317472.1 GI:11596392
KEYWORDS .
SOURCE Candida albicans
ORGANISM Candida albicans
Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes;
Saccharomycetales; mitosporic Saccharomycetales; Candida.
REFERENCE 1 (bases 1 to 2679)
AUTHORS Giasson,L. and Parrot,M.
TITLE Sequence of the Candida albicans cAMP-dependent protein kinase
regulatory subunit
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 2679)
AUTHORS Giasson,L. and Parrot,M.
TITLE Direct Submission
JOURNAL Submitted (27-OCT-2000) School of Dentistry, Laval University,
GREB, Ste-Foy, Quebec G1K 7P4, Canada
FEATURES Location/Qualifiers
source 1. .2679
/organism="Candida albicans"
/mol_type="genomic DNA"
/strain="CAI4"
/db_xref="taxon:5476"
gene <977. .>2356
/gene="PKA-R"
mRNA <977. .>2356
/gene="PKA-R"
/product="cAMP-dependent protein kinase regulatory
subunit"
CDS 977. .2356
/gene="PKA-R"
/codon_start=1
/transl_table=12
/product="cAMP-dependent protein kinase regulatory
subunit"
/protein_id="AAG38599.1"
/db_xref="GI:11596393"
/translation="MSNPQQQFISDELSQLQKEIISKNPQDVLQFCANYFNTKLQAQR
SELWSQQAKAEAAGIDLFPSVDHVNVNSSGVSIVNDRQPSFKSPFGVNDPHSNHDEDP
HAKDTKTDTAAAAVGGGIFKSNFDVKKSASNPPTKEVDPDDPSKPSSSSQPNQQSASA
SSKTPSSKIPVAFNANRRTSVSAEALNPAKLKLDSWKPPVNNLSITEEETLANNLKNN
FLFKQLDANSKKTVIAALQQKSFAKDTVIIQQGDEGDFFYIIETGTVDFYVNDAKVSS
SSEGSSFGELALMYNSPRAATAVAATDVVCWALDRLTFRRILLEGTFNKRLMYEDFLK
DIEVLKSLSDHARSKLADALSTEMYHKGDKIVTEGEQGENFYLIESGNCQVYNEKLGN
IKQLTKGDYFGELALIKDLPRQATVEALDNVIVATLGKSGFQRLLGPVVEVLKEQDPT
KSQDPTAGH"
ORIGIN
AF317472 Length: 2679 February 16, 2004 17:02 Type: N Check: 9369 ..
1 GAATTCAAAA AATCAAAAAA ATCAAAAAAA AACCGTGGAA GGTAAGTTGT
51 ATATTTATAA ATCAACGTGA ATAATTTTCA ACACTGTGTC AACATCTGTG
101 AAAAAAACCT GTGTGTACTG CATATAGGAC CTCACCTATT ACGTAGAATA
151 TACTAGAAAT AGTTACAACC ATAAAAAGAT TAATTGTGCT TACGTGGCAA
201 CTTTGAGATT TTTCTTTTTT CTGTTTCTTT CTTTCTTTTT TTGGCTTAAA
251 CAACAAATGT CGCAAATTAT ACAAACGACA TTTGCTGCCC ATGTCATTTT
301 GTCGTTATCA CGTGAAGTGT CGCAGATTTA TGTATTCTCA CTTCATTTCT
351 ATGGTCATCA ATTGTTCATT CATTCTCTAT CTTCAAAAAT CTGTGATTTG
401 ATGATTTTGA TTAAAAGAAA GCAAAGAGAA TACTGAAAAA AAGCAAAGAG
451 AATATAGAAA AGAAACAATA AAAGAATAGT TTCTAAGTTA CTTTGGAGTC
501 TGCTATTACC ATGTATCTAT GTGATTGCCC TATCAAATTG GACAATACGG
551 GTTTTTGTTT AGTCACGATA ATCACAAACT TCCCCCAGCA ATGACATACG
601 TAGCAAGTAA TATTTATATC TCTTCTATTT TTTTGATCTT ACATAATCTG
651 TCGTGTTTTT TTAAGTTGTT GTTATGAAGA AGTAATTTCA TAATGATCAA
701 GTGTGTAACT GAAATTTCAT CGCAATTTTA AACAAACAAG CTAATAATTA
751 TTATTATTAA TAGTTAATTT GCTAAGTTGA GTAAAATTTG CTTTTCTTGA
801 GAAAAAGGAG AAATTACTTT GGGAGTGAGT TTGAAGAGAG AAACTAAAGT
851 AAGTAAATGA GTGAGAGGGA GAGACAGAGA GCGAGAGGGG GAGTAAAAAA
901 AAAAGTTGCC CACAAACAAA TTGTGATACC GGTCTTTTAG CATATATCTT
951 CTACTCTTCA ATCAACATCT TTACCAATGT CTAATCCTCA ACAACAATTC
1001 ATATCTGATG AATTGTCGCA GTTACAGAAA GAAATAATTT CCAAAAACCC
1051 GCAAGATGTC TTACAGTTTT GCGCCAACTA TTTCAACACC AAGTTACAAG
1101 CTCAAAGAAG TGAGTTATGG TCGCAACAAG CTAAAGCAGA AGCCGCAGGC
1151 ATCGACTTAT TCCCATCTGT TGATCATGTG AATGTTAATT CTAGTGGTGT
1201 GAGCATTGTG AATGATAGAC AACCAAGTTT TAAATCACCT TTTGGTGTTA
1251 ATGATCCACA TCTGAATCAC GACGAAGATC CCCATGCCAA AGATACCAAA
1301 ACAGATACTG CTGCTGCTGC TGTTGGTGGG GGTATTTTCA AATCAAATTT
1351 TGATGTTAAA AAGAGTGCTT CTAATCCTCC AACCAAGGAA GTAGATCCAG
1401 ATGACCCATC AAAACCATCG TCATCGAGCC AACCAAATCA ACAATCAGCA
1451 TCAGCATCAT CAAAAACGCC ATCATCAAAG ATCCCAGTTG CTTTCAACGC
1501 TAATAGAAGA ACATCTGTAT CTGCTGAAGC CTTGAATCCA GCAAAATTGA
1551 AATTAGATAG TTGGAAACCT CCAGTTAATA ATTTGAGCAT TACCGAAGAA
1601 GAAACATTAG CCAACAATTT AAAGAACAAT TTCCTTTTCA AACAATTGGA
1651 CGCAAACTCT AAGAAAACTG TGATTGCTGC TTTACAACAA AAATCATTTG
1701 CTAAAGATAC AGTAATTATC CAACAAGGTG ATGAAGGGGA CTTTTTTTAC
1751 ATTATTGAAA CTGGTACAGT TGATTTCTAT GTTAATGATG CTAAAGTAAG
1801 TTCCAGTAGC GAAGGGTCAT CTTTTGGGGA ATTGGCTTTG ATGTATAATT
1851 CACCAAGAGC TGCTACGGCA GTTGCTGCCA CCGATGTTGT CTGTTGGGCA
1901 TTGGACCGTT TGACATTCCG TCGAATTCTT TTGGAAGGTA CTTTTAACAA
1951 GAGATTGATG TACGAGGATT TCTTAAAAGA TATTGAGGTT TTGAAATCTC
2001 TTTCGGATCA TGCACGTTCA AAATTGGCAG ATGCATTGAG CACAGAAATG
2051 TATCACAAGG GTGATAAAAT AGTCACTGAA GGTGAACAAG GAGAGAACTT
2101 TTATTTAATA GAAAGTGGAA ACTGTCAAGT TTACAATGAA AAGTTGGGCA
2151 ATATCAAACA ATTAACAAAA GGTGATTATT TTGGTGAGCT TGCATTAATA
2201 AAAGACTTAC CAAGACAAGC TACTGTGGAA GCATTGGATA ATGTAATCGT
2251 TGCCACATTA GGTAAATCCG GGTTCCAAAG ATTATTGGGT CCTGTTGTGG
2301 AGGTATTGAA AGAACAAGAC CCTACAAAGA GTCAAGACCC AACTGCTGGT
2351 CATTAAGTGT ACAATAAGTA GTTGTTTATT ATCTTATATT GTTTTATGTT
2401 AGTATATTCT ATCTTTTTTT TTTTGGCTTA CTCACCTTCT GGTGTTTTCG
2451 TTGCGATTTT GATAATGGAT GGTTGGTGCA AAAGTTCAAC TACATTTCTT
2501 GTTGTCAGGT ATATACGAGA TGGCAGCATG AACGAGCTCA CCATGGGTTG
2551 AACATTATTG AAGTTATCCG GCCGTGCCTT TTGCGAAACA TGGTAACTAA
2601 TATATTGCAA ACTTGGCTTC TACAGAAAAT ATACAATCTA ATACCTTGAG
2651 GAATTTCCTC TATATATAAT AGAGAATTC
I'm not a GCG expert, but is this a correctly formatted GCG file in the first
place? If not, is this an error in the SeqIO parser? I've found this behavior
to be the same on Solaris 8 and on Linux, both running BioPerl 1.4 and Perl
5.8.1.
Thanks a bunch,
Tex Thompson
RIT Bioinformatics
More information about the Bioperl-l
mailing list