[Bioperl-l] Bug in GCG SeqIO Formatting?
Tex Thompson
tex at biosysadmin.com
Tue Feb 17 01:21:51 EST 2004
Hilmar,
Thanks for the tip. There are no stack errors, but here is the output from the
test program shown below:
>AF317472 !!NA_SEQUENCE 1.0LOCUS AF317472 2679 bp DNA linear PLN 07-DEC-2000DEFINITION Candida albicans cAMP-dependent protein kinase regulatory subunit (PKA-R) gene, complete cds.ACCESSION AF317472VERSION AF317472.1 GI:11596392KEYWORDS .SOURCE Candida albicans ORGANISM Candida albicans Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes; Saccharomycetales; mitosporic Saccharomycetales; Candida.REFERENCE 1 (bases 1 to 2679) AUTHORS Giasson,L. and Parrot,M. TITLE Sequence of the Candida albicans cAMP-dependent protein kinase regulatory subunit JOURNAL UnpublishedREFERENCE 2 (bases 1 to 2679) AUTHORS Giasson,L. and Parrot,M. TITLE Direct Submission JOURNAL Submitted (27-OCT-2000) School of Dentistry, Laval University, GREB, Ste-Foy, Quebec G1K 7P4, CanadaFEATURES Location/Qualifiers source 1. .2679
/organism="Candida albicans" /mol_type="genomic DNA" /strain="CAI4" /db_xref="taxon:5476" gene <977. .>2356 /gene="PKA-R" mRNA <977. .>2356 /gene="PKA-R" /product="cAMP-dependent protein kinase regulatory subunit" CDS 977. .2356 /gene="PKA-R" /codon_start=1 /transl_table=12 /product="cAMP-dependent protein kinase regulatory subunit" /protein_id="AAG38599.1" /db_xref="GI:11596393" /translation="MSNPQQQFISDELSQLQKEIISKNPQDVLQFCANYFNTKLQAQR SELWSQQAKAEAAGIDLFPSVDHVNVNSSGVSIVNDRQPSFKSPFGVNDPHSNHDEDP HAKDTKTDTAAAAVGGGIFKSNFDVKKSASNPPTKEVDPDDPSKPSSSSQPNQQSASA SSKTPSSKIPVAFNANR
RTSVSAEALNPAKLKLDSWKPPVNNLSITEEETLANNLKNN FLFKQLDANSKKTVIAALQQKSFAKDTVIIQQGDEGDFFYIIETGTVDFYVNDAKVSS SSEGSSFGELALMYNSPRAATAVAATDVVCWALDRLTFRRILLEGTFNKRLMYEDFLK DIEVLKSLSDHARSKLADALSTEMYHKGDKIVTEGEQGENFYLIESGNCQVYNEKLGN IKQLTKGDYFGELALIKDLPRQATVEALDNVIVATLGKSGFQRLLGPVVEVLKEQDPT KSQDPTAGH"ORIGIN
GAATTCAAAAAATCAAAAAAATCAAAAAAAAACCGTGGAAGGTAAGTTGTATATTTATAA
ATCAACGTGAATAATTTTCAACACTGTGTCAACATCTGTGAAAAAAACCTGTGTGTACTG
CATATAGGACCTCACCTATTACGTAGAATATACTAGAAATAGTTACAACCATAAAAAGAT
TAATTGTGCTTACGTGGCAACTTTGAGATTTTTCTTTTTTCTGTTTCTTTCTTTCTTTTT
TTGGCTTAAACAACAAATGTCGCAAATTATACAAACGACATTTGCTGCCCATGTCATTTT
GTCGTTATCACGTGAAGTGTCGCAGATTTATGTATTCTCACTTCATTTCTATGGTCATCA
ATTGTTCATTCATTCTCTATCTTCAAAAATCTGTGATTTGATGATTTTGATTAAAAGAAA
GCAAAGAGAATACTGAAAAAAAGCAAAGAGAATATAGAAAAGAAACAATAAAAGAATAGT
TTCTAAGTTACTTTGGAGTCTGCTATTACCATGTATCTATGTGATTGCCCTATCAAATTG
GACAATACGGGTTTTTGTTTAGTCACGATAATCACAAACTTCCCCCAGCAATGACATACG
TAGCAAGTAATATTTATATCTCTTCTATTTTTTTGATCTTACATAATCTGTCGTGTTTTT
TTAAGTTGTTGTTATGAAGAAGTAATTTCATAATGATCAAGTGTGTAACTGAAATTTCAT
CGCAATTTTAAACAAACAAGCTAATAATTATTATTATTAATAGTTAATTTGCTAAGTTGA
GTAAAATTTGCTTTTCTTGAGAAAAAGGAGAAATTACTTTGGGAGTGAGTTTGAAGAGAG
AAACTAAAGTAAGTAAATGAGTGAGAGGGAGAGACAGAGAGCGAGAGGGGGAGTAAAAAA
AAAAGTTGCCCACAAACAAATTGTGATACCGGTCTTTTAGCATATATCTTCTACTCTTCA
ATCAACATCTTTACCAATGTCTAATCCTCAACAACAATTCATATCTGATGAATTGTCGCA
GTTACAGAAAGAAATAATTTCCAAAAACCCGCAAGATGTCTTACAGTTTTGCGCCAACTA
TTTCAACACCAAGTTACAAGCTCAAAGAAGTGAGTTATGGTCGCAACAAGCTAAAGCAGA
AGCCGCAGGCATCGACTTATTCCCATCTGTTGATCATGTGAATGTTAATTCTAGTGGTGT
GAGCATTGTGAATGATAGACAACCAAGTTTTAAATCACCTTTTGGTGTTAATGATCCACA
TCTGAATCACGACGAAGATCCCCATGCCAAAGATACCAAAACAGATACTGCTGCTGCTGC
TGTTGGTGGGGGTATTTTCAAATCAAATTTTGATGTTAAAAAGAGTGCTTCTAATCCTCC
AACCAAGGAAGTAGATCCAGATGACCCATCAAAACCATCGTCATCGAGCCAACCAAATCA
ACAATCAGCATCAGCATCATCAAAAACGCCATCATCAAAGATCCCAGTTGCTTTCAACGC
TAATAGAAGAACATCTGTATCTGCTGAAGCCTTGAATCCAGCAAAATTGAAATTAGATAG
TTGGAAACCTCCAGTTAATAATTTGAGCATTACCGAAGAAGAAACATTAGCCAACAATTT
AAAGAACAATTTCCTTTTCAAACAATTGGACGCAAACTCTAAGAAAACTGTGATTGCTGC
TTTACAACAAAAATCATTTGCTAAAGATACAGTAATTATCCAACAAGGTGATGAAGGGGA
CTTTTTTTACATTATTGAAACTGGTACAGTTGATTTCTATGTTAATGATGCTAAAGTAAG
TTCCAGTAGCGAAGGGTCATCTTTTGGGGAATTGGCTTTGATGTATAATTCACCAAGAGC
TGCTACGGCAGTTGCTGCCACCGATGTTGTCTGTTGGGCATTGGACCGTTTGACATTCCG
TCGAATTCTTTTGGAAGGTACTTTTAACAAGAGATTGATGTACGAGGATTTCTTAAAAGA
TATTGAGGTTTTGAAATCTCTTTCGGATCATGCACGTTCAAAATTGGCAGATGCATTGAG
CACAGAAATGTATCACAAGGGTGATAAAATAGTCACTGAAGGTGAACAAGGAGAGAACTT
TTATTTAATAGAAAGTGGAAACTGTCAAGTTTACAATGAAAAGTTGGGCAATATCAAACA
ATTAACAAAAGGTGATTATTTTGGTGAGCTTGCATTAATAAAAGACTTACCAAGACAAGC
TACTGTGGAAGCATTGGATAATGTAATCGTTGCCACATTAGGTAAATCCGGGTTCCAAAG
ATTATTGGGTCCTGTTGTGGAGGTATTGAAAGAACAAGACCCTACAAAGAGTCAAGACCC
AACTGCTGGTCATTAAGTGTACAATAAGTAGTTGTTTATTATCTTATATTGTTTTATGTT
AGTATATTCTATCTTTTTTTTTTTGGCTTACTCACCTTCTGGTGTTTTCGTTGCGATTTT
GATAATGGATGGTTGGTGCAAAAGTTCAACTACATTTCTTGTTGTCAGGTATATACGAGA
TGGCAGCATGAACGAGCTCACCATGGGTTGAACATTATTGAAGTTATCCGGCCGTGCCTT
TTGCGAAACATGGTAACTAATATATTGCAAACTTGGCTTCTACAGAAAATATACAATCTA
ATACCTTGAGGAATTTCCTCTATATATAATAGAGAATTC
It looks like a lot of the header information is all stuck on
that first line. Looking at it more carefully it looks like a
valid FASTA file, but is this really desired behavior?
Thanks for the help,
Tex Thompson
RIT Bioinformatics
On Mon, 16 Feb 2004, Hilmar Lapp wrote:
> Rule #1: If your code doesn't work the way you think it should, or
> fails with an exception, and you do want help from the mailing list,
> then be sure to send along the *complete* output, in particular the
> stack trace if there was any.
>
> Rule #2: Double check that you followed rule #1.
>
> Rule #3: Check again that you followed rule #1.
>
> There really aren't any other rules here. If you choose not to follow
> rule #1 you indicate that you're not actually interested in getting
> help.
>
> -hilmar
>
> On Monday, February 16, 2004, at 02:49 PM, Tex Thompson wrote:
>
> > Hello Mailing List,
> >
> > I have a user complaining that the following code isn't working on his
> > GCG-formatted sequence files:
> >
> > #!/usr/bin/perl
> >
> > use strict;
> >
> > use Bio::SeqIO;
> > my $io = Bio::SeqIO->new( -file => "af317472.gbpln3", -format =>
> > "gcg");
> > my $out = Bio::SeqIO->new( -fh => \*STDOUT, -format => "fasta" );
> >
> > while ( my $seq = $io->next_seq ) {
> > $out->write_seq( $seq );
> > }
> >
> > Here's an example sequence file:
> >
> > !!NA_SEQUENCE 1.0
> > LOCUS AF317472 2679 bp DNA linear PLN
> > 07-DEC-2000
> > DEFINITION Candida albicans cAMP-dependent protein kinase regulatory
> > subunit
> > (PKA-R) gene, complete cds.
> > ACCESSION AF317472
> > VERSION AF317472.1 GI:11596392
> > KEYWORDS .
> > SOURCE Candida albicans
> > ORGANISM Candida albicans
> > Eukaryota; Fungi; Ascomycota; Saccharomycotina;
> > Saccharomycetes;
> > Saccharomycetales; mitosporic Saccharomycetales; Candida.
> > REFERENCE 1 (bases 1 to 2679)
> > AUTHORS Giasson,L. and Parrot,M.
> > TITLE Sequence of the Candida albicans cAMP-dependent protein
> > kinase
> > regulatory subunit
> > JOURNAL Unpublished
> > REFERENCE 2 (bases 1 to 2679)
> > AUTHORS Giasson,L. and Parrot,M.
> > TITLE Direct Submission
> > JOURNAL Submitted (27-OCT-2000) School of Dentistry, Laval
> > University,
> > GREB, Ste-Foy, Quebec G1K 7P4, Canada
> > FEATURES Location/Qualifiers
> > source 1. .2679
> > /organism="Candida albicans"
> > /mol_type="genomic DNA"
> > /strain="CAI4"
> > /db_xref="taxon:5476"
> > gene <977. .>2356
> > /gene="PKA-R"
> > mRNA <977. .>2356
> > /gene="PKA-R"
> > /product="cAMP-dependent protein kinase regulatory
> > subunit"
> > CDS 977. .2356
> > /gene="PKA-R"
> > /codon_start=1
> > /transl_table=12
> > /product="cAMP-dependent protein kinase regulatory
> > subunit"
> > /protein_id="AAG38599.1"
> > /db_xref="GI:11596393"
> >
> > /translation="MSNPQQQFISDELSQLQKEIISKNPQDVLQFCANYFNTKLQAQR
> >
> > SELWSQQAKAEAAGIDLFPSVDHVNVNSSGVSIVNDRQPSFKSPFGVNDPHSNHDEDP
> >
> > HAKDTKTDTAAAAVGGGIFKSNFDVKKSASNPPTKEVDPDDPSKPSSSSQPNQQSASA
> >
> > SSKTPSSKIPVAFNANRRTSVSAEALNPAKLKLDSWKPPVNNLSITEEETLANNLKNN
> >
> > FLFKQLDANSKKTVIAALQQKSFAKDTVIIQQGDEGDFFYIIETGTVDFYVNDAKVSS
> >
> > SSEGSSFGELALMYNSPRAATAVAATDVVCWALDRLTFRRILLEGTFNKRLMYEDFLK
> >
> > DIEVLKSLSDHARSKLADALSTEMYHKGDKIVTEGEQGENFYLIESGNCQVYNEKLGN
> >
> > IKQLTKGDYFGELALIKDLPRQATVEALDNVIVATLGKSGFQRLLGPVVEVLKEQDPT
> > KSQDPTAGH"
> > ORIGIN
> >
> > AF317472 Length: 2679 February 16, 2004 17:02 Type: N Check: 9369
> > ..
> >
> > 1 GAATTCAAAA AATCAAAAAA ATCAAAAAAA AACCGTGGAA GGTAAGTTGT
> >
> > 51 ATATTTATAA ATCAACGTGA ATAATTTTCA ACACTGTGTC AACATCTGTG
> >
> > 101 AAAAAAACCT GTGTGTACTG CATATAGGAC CTCACCTATT ACGTAGAATA
> >
> > 151 TACTAGAAAT AGTTACAACC ATAAAAAGAT TAATTGTGCT TACGTGGCAA
> >
> > 201 CTTTGAGATT TTTCTTTTTT CTGTTTCTTT CTTTCTTTTT TTGGCTTAAA
> >
> > 251 CAACAAATGT CGCAAATTAT ACAAACGACA TTTGCTGCCC ATGTCATTTT
> >
> > 301 GTCGTTATCA CGTGAAGTGT CGCAGATTTA TGTATTCTCA CTTCATTTCT
> >
> > 351 ATGGTCATCA ATTGTTCATT CATTCTCTAT CTTCAAAAAT CTGTGATTTG
> >
> > 401 ATGATTTTGA TTAAAAGAAA GCAAAGAGAA TACTGAAAAA AAGCAAAGAG
> >
> > 451 AATATAGAAA AGAAACAATA AAAGAATAGT TTCTAAGTTA CTTTGGAGTC
> >
> > 501 TGCTATTACC ATGTATCTAT GTGATTGCCC TATCAAATTG GACAATACGG
> >
> > 551 GTTTTTGTTT AGTCACGATA ATCACAAACT TCCCCCAGCA ATGACATACG
> >
> > 601 TAGCAAGTAA TATTTATATC TCTTCTATTT TTTTGATCTT ACATAATCTG
> >
> > 651 TCGTGTTTTT TTAAGTTGTT GTTATGAAGA AGTAATTTCA TAATGATCAA
> >
> > 701 GTGTGTAACT GAAATTTCAT CGCAATTTTA AACAAACAAG CTAATAATTA
> >
> > 751 TTATTATTAA TAGTTAATTT GCTAAGTTGA GTAAAATTTG CTTTTCTTGA
> >
> > 801 GAAAAAGGAG AAATTACTTT GGGAGTGAGT TTGAAGAGAG AAACTAAAGT
> >
> > 851 AAGTAAATGA GTGAGAGGGA GAGACAGAGA GCGAGAGGGG GAGTAAAAAA
> >
> > 901 AAAAGTTGCC CACAAACAAA TTGTGATACC GGTCTTTTAG CATATATCTT
> >
> > 951 CTACTCTTCA ATCAACATCT TTACCAATGT CTAATCCTCA ACAACAATTC
> >
> > 1001 ATATCTGATG AATTGTCGCA GTTACAGAAA GAAATAATTT CCAAAAACCC
> >
> > 1051 GCAAGATGTC TTACAGTTTT GCGCCAACTA TTTCAACACC AAGTTACAAG
> >
> > 1101 CTCAAAGAAG TGAGTTATGG TCGCAACAAG CTAAAGCAGA AGCCGCAGGC
> >
> > 1151 ATCGACTTAT TCCCATCTGT TGATCATGTG AATGTTAATT CTAGTGGTGT
> >
> > 1201 GAGCATTGTG AATGATAGAC AACCAAGTTT TAAATCACCT TTTGGTGTTA
> >
> > 1251 ATGATCCACA TCTGAATCAC GACGAAGATC CCCATGCCAA AGATACCAAA
> >
> > 1301 ACAGATACTG CTGCTGCTGC TGTTGGTGGG GGTATTTTCA AATCAAATTT
> >
> > 1351 TGATGTTAAA AAGAGTGCTT CTAATCCTCC AACCAAGGAA GTAGATCCAG
> >
> > 1401 ATGACCCATC AAAACCATCG TCATCGAGCC AACCAAATCA ACAATCAGCA
> >
> > 1451 TCAGCATCAT CAAAAACGCC ATCATCAAAG ATCCCAGTTG CTTTCAACGC
> >
> > 1501 TAATAGAAGA ACATCTGTAT CTGCTGAAGC CTTGAATCCA GCAAAATTGA
> >
> > 1551 AATTAGATAG TTGGAAACCT CCAGTTAATA ATTTGAGCAT TACCGAAGAA
> >
> > 1601 GAAACATTAG CCAACAATTT AAAGAACAAT TTCCTTTTCA AACAATTGGA
> >
> > 1651 CGCAAACTCT AAGAAAACTG TGATTGCTGC TTTACAACAA AAATCATTTG
> >
> > 1701 CTAAAGATAC AGTAATTATC CAACAAGGTG ATGAAGGGGA CTTTTTTTAC
> >
> > 1751 ATTATTGAAA CTGGTACAGT TGATTTCTAT GTTAATGATG CTAAAGTAAG
> >
> > 1801 TTCCAGTAGC GAAGGGTCAT CTTTTGGGGA ATTGGCTTTG ATGTATAATT
> >
> > 1851 CACCAAGAGC TGCTACGGCA GTTGCTGCCA CCGATGTTGT CTGTTGGGCA
> >
> > 1901 TTGGACCGTT TGACATTCCG TCGAATTCTT TTGGAAGGTA CTTTTAACAA
> >
> > 1951 GAGATTGATG TACGAGGATT TCTTAAAAGA TATTGAGGTT TTGAAATCTC
> >
> > 2001 TTTCGGATCA TGCACGTTCA AAATTGGCAG ATGCATTGAG CACAGAAATG
> >
> > 2051 TATCACAAGG GTGATAAAAT AGTCACTGAA GGTGAACAAG GAGAGAACTT
> >
> > 2101 TTATTTAATA GAAAGTGGAA ACTGTCAAGT TTACAATGAA AAGTTGGGCA
> >
> > 2151 ATATCAAACA ATTAACAAAA GGTGATTATT TTGGTGAGCT TGCATTAATA
> >
> > 2201 AAAGACTTAC CAAGACAAGC TACTGTGGAA GCATTGGATA ATGTAATCGT
> >
> > 2251 TGCCACATTA GGTAAATCCG GGTTCCAAAG ATTATTGGGT CCTGTTGTGG
> >
> > 2301 AGGTATTGAA AGAACAAGAC CCTACAAAGA GTCAAGACCC AACTGCTGGT
> >
> > 2351 CATTAAGTGT ACAATAAGTA GTTGTTTATT ATCTTATATT GTTTTATGTT
> >
> > 2401 AGTATATTCT ATCTTTTTTT TTTTGGCTTA CTCACCTTCT GGTGTTTTCG
> >
> > 2451 TTGCGATTTT GATAATGGAT GGTTGGTGCA AAAGTTCAAC TACATTTCTT
> >
> > 2501 GTTGTCAGGT ATATACGAGA TGGCAGCATG AACGAGCTCA CCATGGGTTG
> >
> > 2551 AACATTATTG AAGTTATCCG GCCGTGCCTT TTGCGAAACA TGGTAACTAA
> >
> > 2601 TATATTGCAA ACTTGGCTTC TACAGAAAAT ATACAATCTA ATACCTTGAG
> >
> > 2651 GAATTTCCTC TATATATAAT AGAGAATTC
> >
> > I'm not a GCG expert, but is this a correctly formatted GCG file in
> > the first
> > place? If not, is this an error in the SeqIO parser? I've found this
> > behavior
> > to be the same on Solaris 8 and on Linux, both running BioPerl 1.4 and
> > Perl
> > 5.8.1.
> >
> > Thanks a bunch,
> >
> > Tex Thompson
> > RIT Bioinformatics
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
More information about the Bioperl-l
mailing list