[Bioperl-l] Bug in GCG SeqIO Formatting?
Hilmar Lapp
hlapp at gmx.net
Tue Feb 17 00:03:45 EST 2004
Rule #1: If your code doesn't work the way you think it should, or
fails with an exception, and you do want help from the mailing list,
then be sure to send along the *complete* output, in particular the
stack trace if there was any.
Rule #2: Double check that you followed rule #1.
Rule #3: Check again that you followed rule #1.
There really aren't any other rules here. If you choose not to follow
rule #1 you indicate that you're not actually interested in getting
help.
-hilmar
On Monday, February 16, 2004, at 02:49 PM, Tex Thompson wrote:
> Hello Mailing List,
>
> I have a user complaining that the following code isn't working on his
> GCG-formatted sequence files:
>
> #!/usr/bin/perl
>
> use strict;
>
> use Bio::SeqIO;
> my $io = Bio::SeqIO->new( -file => "af317472.gbpln3", -format =>
> "gcg");
> my $out = Bio::SeqIO->new( -fh => \*STDOUT, -format => "fasta" );
>
> while ( my $seq = $io->next_seq ) {
> $out->write_seq( $seq );
> }
>
> Here's an example sequence file:
>
> !!NA_SEQUENCE 1.0
> LOCUS AF317472 2679 bp DNA linear PLN
> 07-DEC-2000
> DEFINITION Candida albicans cAMP-dependent protein kinase regulatory
> subunit
> (PKA-R) gene, complete cds.
> ACCESSION AF317472
> VERSION AF317472.1 GI:11596392
> KEYWORDS .
> SOURCE Candida albicans
> ORGANISM Candida albicans
> Eukaryota; Fungi; Ascomycota; Saccharomycotina;
> Saccharomycetes;
> Saccharomycetales; mitosporic Saccharomycetales; Candida.
> REFERENCE 1 (bases 1 to 2679)
> AUTHORS Giasson,L. and Parrot,M.
> TITLE Sequence of the Candida albicans cAMP-dependent protein
> kinase
> regulatory subunit
> JOURNAL Unpublished
> REFERENCE 2 (bases 1 to 2679)
> AUTHORS Giasson,L. and Parrot,M.
> TITLE Direct Submission
> JOURNAL Submitted (27-OCT-2000) School of Dentistry, Laval
> University,
> GREB, Ste-Foy, Quebec G1K 7P4, Canada
> FEATURES Location/Qualifiers
> source 1. .2679
> /organism="Candida albicans"
> /mol_type="genomic DNA"
> /strain="CAI4"
> /db_xref="taxon:5476"
> gene <977. .>2356
> /gene="PKA-R"
> mRNA <977. .>2356
> /gene="PKA-R"
> /product="cAMP-dependent protein kinase regulatory
> subunit"
> CDS 977. .2356
> /gene="PKA-R"
> /codon_start=1
> /transl_table=12
> /product="cAMP-dependent protein kinase regulatory
> subunit"
> /protein_id="AAG38599.1"
> /db_xref="GI:11596393"
>
> /translation="MSNPQQQFISDELSQLQKEIISKNPQDVLQFCANYFNTKLQAQR
>
> SELWSQQAKAEAAGIDLFPSVDHVNVNSSGVSIVNDRQPSFKSPFGVNDPHSNHDEDP
>
> HAKDTKTDTAAAAVGGGIFKSNFDVKKSASNPPTKEVDPDDPSKPSSSSQPNQQSASA
>
> SSKTPSSKIPVAFNANRRTSVSAEALNPAKLKLDSWKPPVNNLSITEEETLANNLKNN
>
> FLFKQLDANSKKTVIAALQQKSFAKDTVIIQQGDEGDFFYIIETGTVDFYVNDAKVSS
>
> SSEGSSFGELALMYNSPRAATAVAATDVVCWALDRLTFRRILLEGTFNKRLMYEDFLK
>
> DIEVLKSLSDHARSKLADALSTEMYHKGDKIVTEGEQGENFYLIESGNCQVYNEKLGN
>
> IKQLTKGDYFGELALIKDLPRQATVEALDNVIVATLGKSGFQRLLGPVVEVLKEQDPT
> KSQDPTAGH"
> ORIGIN
>
> AF317472 Length: 2679 February 16, 2004 17:02 Type: N Check: 9369
> ..
>
> 1 GAATTCAAAA AATCAAAAAA ATCAAAAAAA AACCGTGGAA GGTAAGTTGT
>
> 51 ATATTTATAA ATCAACGTGA ATAATTTTCA ACACTGTGTC AACATCTGTG
>
> 101 AAAAAAACCT GTGTGTACTG CATATAGGAC CTCACCTATT ACGTAGAATA
>
> 151 TACTAGAAAT AGTTACAACC ATAAAAAGAT TAATTGTGCT TACGTGGCAA
>
> 201 CTTTGAGATT TTTCTTTTTT CTGTTTCTTT CTTTCTTTTT TTGGCTTAAA
>
> 251 CAACAAATGT CGCAAATTAT ACAAACGACA TTTGCTGCCC ATGTCATTTT
>
> 301 GTCGTTATCA CGTGAAGTGT CGCAGATTTA TGTATTCTCA CTTCATTTCT
>
> 351 ATGGTCATCA ATTGTTCATT CATTCTCTAT CTTCAAAAAT CTGTGATTTG
>
> 401 ATGATTTTGA TTAAAAGAAA GCAAAGAGAA TACTGAAAAA AAGCAAAGAG
>
> 451 AATATAGAAA AGAAACAATA AAAGAATAGT TTCTAAGTTA CTTTGGAGTC
>
> 501 TGCTATTACC ATGTATCTAT GTGATTGCCC TATCAAATTG GACAATACGG
>
> 551 GTTTTTGTTT AGTCACGATA ATCACAAACT TCCCCCAGCA ATGACATACG
>
> 601 TAGCAAGTAA TATTTATATC TCTTCTATTT TTTTGATCTT ACATAATCTG
>
> 651 TCGTGTTTTT TTAAGTTGTT GTTATGAAGA AGTAATTTCA TAATGATCAA
>
> 701 GTGTGTAACT GAAATTTCAT CGCAATTTTA AACAAACAAG CTAATAATTA
>
> 751 TTATTATTAA TAGTTAATTT GCTAAGTTGA GTAAAATTTG CTTTTCTTGA
>
> 801 GAAAAAGGAG AAATTACTTT GGGAGTGAGT TTGAAGAGAG AAACTAAAGT
>
> 851 AAGTAAATGA GTGAGAGGGA GAGACAGAGA GCGAGAGGGG GAGTAAAAAA
>
> 901 AAAAGTTGCC CACAAACAAA TTGTGATACC GGTCTTTTAG CATATATCTT
>
> 951 CTACTCTTCA ATCAACATCT TTACCAATGT CTAATCCTCA ACAACAATTC
>
> 1001 ATATCTGATG AATTGTCGCA GTTACAGAAA GAAATAATTT CCAAAAACCC
>
> 1051 GCAAGATGTC TTACAGTTTT GCGCCAACTA TTTCAACACC AAGTTACAAG
>
> 1101 CTCAAAGAAG TGAGTTATGG TCGCAACAAG CTAAAGCAGA AGCCGCAGGC
>
> 1151 ATCGACTTAT TCCCATCTGT TGATCATGTG AATGTTAATT CTAGTGGTGT
>
> 1201 GAGCATTGTG AATGATAGAC AACCAAGTTT TAAATCACCT TTTGGTGTTA
>
> 1251 ATGATCCACA TCTGAATCAC GACGAAGATC CCCATGCCAA AGATACCAAA
>
> 1301 ACAGATACTG CTGCTGCTGC TGTTGGTGGG GGTATTTTCA AATCAAATTT
>
> 1351 TGATGTTAAA AAGAGTGCTT CTAATCCTCC AACCAAGGAA GTAGATCCAG
>
> 1401 ATGACCCATC AAAACCATCG TCATCGAGCC AACCAAATCA ACAATCAGCA
>
> 1451 TCAGCATCAT CAAAAACGCC ATCATCAAAG ATCCCAGTTG CTTTCAACGC
>
> 1501 TAATAGAAGA ACATCTGTAT CTGCTGAAGC CTTGAATCCA GCAAAATTGA
>
> 1551 AATTAGATAG TTGGAAACCT CCAGTTAATA ATTTGAGCAT TACCGAAGAA
>
> 1601 GAAACATTAG CCAACAATTT AAAGAACAAT TTCCTTTTCA AACAATTGGA
>
> 1651 CGCAAACTCT AAGAAAACTG TGATTGCTGC TTTACAACAA AAATCATTTG
>
> 1701 CTAAAGATAC AGTAATTATC CAACAAGGTG ATGAAGGGGA CTTTTTTTAC
>
> 1751 ATTATTGAAA CTGGTACAGT TGATTTCTAT GTTAATGATG CTAAAGTAAG
>
> 1801 TTCCAGTAGC GAAGGGTCAT CTTTTGGGGA ATTGGCTTTG ATGTATAATT
>
> 1851 CACCAAGAGC TGCTACGGCA GTTGCTGCCA CCGATGTTGT CTGTTGGGCA
>
> 1901 TTGGACCGTT TGACATTCCG TCGAATTCTT TTGGAAGGTA CTTTTAACAA
>
> 1951 GAGATTGATG TACGAGGATT TCTTAAAAGA TATTGAGGTT TTGAAATCTC
>
> 2001 TTTCGGATCA TGCACGTTCA AAATTGGCAG ATGCATTGAG CACAGAAATG
>
> 2051 TATCACAAGG GTGATAAAAT AGTCACTGAA GGTGAACAAG GAGAGAACTT
>
> 2101 TTATTTAATA GAAAGTGGAA ACTGTCAAGT TTACAATGAA AAGTTGGGCA
>
> 2151 ATATCAAACA ATTAACAAAA GGTGATTATT TTGGTGAGCT TGCATTAATA
>
> 2201 AAAGACTTAC CAAGACAAGC TACTGTGGAA GCATTGGATA ATGTAATCGT
>
> 2251 TGCCACATTA GGTAAATCCG GGTTCCAAAG ATTATTGGGT CCTGTTGTGG
>
> 2301 AGGTATTGAA AGAACAAGAC CCTACAAAGA GTCAAGACCC AACTGCTGGT
>
> 2351 CATTAAGTGT ACAATAAGTA GTTGTTTATT ATCTTATATT GTTTTATGTT
>
> 2401 AGTATATTCT ATCTTTTTTT TTTTGGCTTA CTCACCTTCT GGTGTTTTCG
>
> 2451 TTGCGATTTT GATAATGGAT GGTTGGTGCA AAAGTTCAAC TACATTTCTT
>
> 2501 GTTGTCAGGT ATATACGAGA TGGCAGCATG AACGAGCTCA CCATGGGTTG
>
> 2551 AACATTATTG AAGTTATCCG GCCGTGCCTT TTGCGAAACA TGGTAACTAA
>
> 2601 TATATTGCAA ACTTGGCTTC TACAGAAAAT ATACAATCTA ATACCTTGAG
>
> 2651 GAATTTCCTC TATATATAAT AGAGAATTC
>
> I'm not a GCG expert, but is this a correctly formatted GCG file in
> the first
> place? If not, is this an error in the SeqIO parser? I've found this
> behavior
> to be the same on Solaris 8 and on Linux, both running BioPerl 1.4 and
> Perl
> 5.8.1.
>
> Thanks a bunch,
>
> Tex Thompson
> RIT Bioinformatics
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list