[Bioperl-l] Bug in GCG SeqIO Formatting?

Hilmar Lapp hlapp at gmx.net
Tue Feb 17 00:03:45 EST 2004


Rule #1: If your code doesn't work the way you think it should, or 
fails with an exception, and you do want help from the mailing list, 
then be sure to send along the *complete* output, in particular the 
stack trace if there was any.

Rule #2: Double check that you followed rule #1.

Rule #3: Check again that you followed rule #1.

There really aren't any other rules here. If you choose not to follow 
rule #1 you indicate that you're not actually interested in getting 
help.

	-hilmar

On Monday, February 16, 2004, at 02:49  PM, Tex Thompson wrote:

> Hello Mailing List,
>
> I have a user complaining that the following code isn't working on his
> GCG-formatted sequence files:
>
> #!/usr/bin/perl
>
> use strict;
>
> use Bio::SeqIO;
> my $io  = Bio::SeqIO->new( -file => "af317472.gbpln3", -format => 
> "gcg");
> my $out = Bio::SeqIO->new( -fh => \*STDOUT, -format => "fasta" );
>
> while ( my $seq = $io->next_seq ) {
>    $out->write_seq( $seq );
> }
>
> Here's an example sequence file:
>
> !!NA_SEQUENCE 1.0
> LOCUS       AF317472                2679 bp    DNA     linear   PLN 
> 07-DEC-2000
> DEFINITION  Candida albicans cAMP-dependent protein kinase regulatory 
> subunit
>             (PKA-R) gene, complete cds.
> ACCESSION   AF317472
> VERSION     AF317472.1  GI:11596392
> KEYWORDS    .
> SOURCE      Candida albicans
>   ORGANISM  Candida albicans
>             Eukaryota; Fungi; Ascomycota; Saccharomycotina; 
> Saccharomycetes;
>             Saccharomycetales; mitosporic Saccharomycetales; Candida.
> REFERENCE   1  (bases 1 to 2679)
>   AUTHORS   Giasson,L. and Parrot,M.
>   TITLE     Sequence of the Candida albicans cAMP-dependent protein 
> kinase
>             regulatory subunit
>   JOURNAL   Unpublished
> REFERENCE   2  (bases 1 to 2679)
>   AUTHORS   Giasson,L. and Parrot,M.
>   TITLE     Direct Submission
>   JOURNAL   Submitted (27-OCT-2000) School of Dentistry, Laval 
> University,
>             GREB, Ste-Foy, Quebec G1K 7P4, Canada
> FEATURES             Location/Qualifiers
>      source          1. .2679
>                      /organism="Candida albicans"
>                      /mol_type="genomic DNA"
>                      /strain="CAI4"
>                      /db_xref="taxon:5476"
>      gene            <977. .>2356
>                      /gene="PKA-R"
>      mRNA            <977. .>2356
>                      /gene="PKA-R"
>                      /product="cAMP-dependent protein kinase regulatory
>                      subunit"
>      CDS             977. .2356
>                      /gene="PKA-R"
>                      /codon_start=1
>                      /transl_table=12
>                      /product="cAMP-dependent protein kinase regulatory
>                      subunit"
>                      /protein_id="AAG38599.1"
>                      /db_xref="GI:11596393"
>                      
> /translation="MSNPQQQFISDELSQLQKEIISKNPQDVLQFCANYFNTKLQAQR
>                      
> SELWSQQAKAEAAGIDLFPSVDHVNVNSSGVSIVNDRQPSFKSPFGVNDPHSNHDEDP
>                      
> HAKDTKTDTAAAAVGGGIFKSNFDVKKSASNPPTKEVDPDDPSKPSSSSQPNQQSASA
>                      
> SSKTPSSKIPVAFNANRRTSVSAEALNPAKLKLDSWKPPVNNLSITEEETLANNLKNN
>                      
> FLFKQLDANSKKTVIAALQQKSFAKDTVIIQQGDEGDFFYIIETGTVDFYVNDAKVSS
>                      
> SSEGSSFGELALMYNSPRAATAVAATDVVCWALDRLTFRRILLEGTFNKRLMYEDFLK
>                      
> DIEVLKSLSDHARSKLADALSTEMYHKGDKIVTEGEQGENFYLIESGNCQVYNEKLGN
>                      
> IKQLTKGDYFGELALIKDLPRQATVEALDNVIVATLGKSGFQRLLGPVVEVLKEQDPT
>                      KSQDPTAGH"
> ORIGIN
>
> AF317472  Length: 2679  February 16, 2004 17:02  Type: N  Check: 9369  
> ..
>
>        1  GAATTCAAAA AATCAAAAAA ATCAAAAAAA AACCGTGGAA GGTAAGTTGT
>
>       51  ATATTTATAA ATCAACGTGA ATAATTTTCA ACACTGTGTC AACATCTGTG
>
>      101  AAAAAAACCT GTGTGTACTG CATATAGGAC CTCACCTATT ACGTAGAATA
>
>      151  TACTAGAAAT AGTTACAACC ATAAAAAGAT TAATTGTGCT TACGTGGCAA
>
>      201  CTTTGAGATT TTTCTTTTTT CTGTTTCTTT CTTTCTTTTT TTGGCTTAAA
>
>      251  CAACAAATGT CGCAAATTAT ACAAACGACA TTTGCTGCCC ATGTCATTTT
>
>      301  GTCGTTATCA CGTGAAGTGT CGCAGATTTA TGTATTCTCA CTTCATTTCT
>
>      351  ATGGTCATCA ATTGTTCATT CATTCTCTAT CTTCAAAAAT CTGTGATTTG
>
>      401  ATGATTTTGA TTAAAAGAAA GCAAAGAGAA TACTGAAAAA AAGCAAAGAG
>
>      451  AATATAGAAA AGAAACAATA AAAGAATAGT TTCTAAGTTA CTTTGGAGTC
>
>      501  TGCTATTACC ATGTATCTAT GTGATTGCCC TATCAAATTG GACAATACGG
>
>      551  GTTTTTGTTT AGTCACGATA ATCACAAACT TCCCCCAGCA ATGACATACG
>
>      601  TAGCAAGTAA TATTTATATC TCTTCTATTT TTTTGATCTT ACATAATCTG
>
>      651  TCGTGTTTTT TTAAGTTGTT GTTATGAAGA AGTAATTTCA TAATGATCAA
>
>      701  GTGTGTAACT GAAATTTCAT CGCAATTTTA AACAAACAAG CTAATAATTA
>
>      751  TTATTATTAA TAGTTAATTT GCTAAGTTGA GTAAAATTTG CTTTTCTTGA
>
>      801  GAAAAAGGAG AAATTACTTT GGGAGTGAGT TTGAAGAGAG AAACTAAAGT
>
>      851  AAGTAAATGA GTGAGAGGGA GAGACAGAGA GCGAGAGGGG GAGTAAAAAA
>
>      901  AAAAGTTGCC CACAAACAAA TTGTGATACC GGTCTTTTAG CATATATCTT
>
>      951  CTACTCTTCA ATCAACATCT TTACCAATGT CTAATCCTCA ACAACAATTC
>
>     1001  ATATCTGATG AATTGTCGCA GTTACAGAAA GAAATAATTT CCAAAAACCC
>
>     1051  GCAAGATGTC TTACAGTTTT GCGCCAACTA TTTCAACACC AAGTTACAAG
>
>     1101  CTCAAAGAAG TGAGTTATGG TCGCAACAAG CTAAAGCAGA AGCCGCAGGC
>
>     1151  ATCGACTTAT TCCCATCTGT TGATCATGTG AATGTTAATT CTAGTGGTGT
>
>     1201  GAGCATTGTG AATGATAGAC AACCAAGTTT TAAATCACCT TTTGGTGTTA
>
>     1251  ATGATCCACA TCTGAATCAC GACGAAGATC CCCATGCCAA AGATACCAAA
>
>     1301  ACAGATACTG CTGCTGCTGC TGTTGGTGGG GGTATTTTCA AATCAAATTT
>
>     1351  TGATGTTAAA AAGAGTGCTT CTAATCCTCC AACCAAGGAA GTAGATCCAG
>
>     1401  ATGACCCATC AAAACCATCG TCATCGAGCC AACCAAATCA ACAATCAGCA
>
>     1451  TCAGCATCAT CAAAAACGCC ATCATCAAAG ATCCCAGTTG CTTTCAACGC
>
>     1501  TAATAGAAGA ACATCTGTAT CTGCTGAAGC CTTGAATCCA GCAAAATTGA
>
>     1551  AATTAGATAG TTGGAAACCT CCAGTTAATA ATTTGAGCAT TACCGAAGAA
>
>     1601  GAAACATTAG CCAACAATTT AAAGAACAAT TTCCTTTTCA AACAATTGGA
>
>     1651  CGCAAACTCT AAGAAAACTG TGATTGCTGC TTTACAACAA AAATCATTTG
>
>     1701  CTAAAGATAC AGTAATTATC CAACAAGGTG ATGAAGGGGA CTTTTTTTAC
>
>     1751  ATTATTGAAA CTGGTACAGT TGATTTCTAT GTTAATGATG CTAAAGTAAG
>
>     1801  TTCCAGTAGC GAAGGGTCAT CTTTTGGGGA ATTGGCTTTG ATGTATAATT
>
>     1851  CACCAAGAGC TGCTACGGCA GTTGCTGCCA CCGATGTTGT CTGTTGGGCA
>
>     1901  TTGGACCGTT TGACATTCCG TCGAATTCTT TTGGAAGGTA CTTTTAACAA
>
>     1951  GAGATTGATG TACGAGGATT TCTTAAAAGA TATTGAGGTT TTGAAATCTC
>
>     2001  TTTCGGATCA TGCACGTTCA AAATTGGCAG ATGCATTGAG CACAGAAATG
>
>     2051  TATCACAAGG GTGATAAAAT AGTCACTGAA GGTGAACAAG GAGAGAACTT
>
>     2101  TTATTTAATA GAAAGTGGAA ACTGTCAAGT TTACAATGAA AAGTTGGGCA
>
>     2151  ATATCAAACA ATTAACAAAA GGTGATTATT TTGGTGAGCT TGCATTAATA
>
>     2201  AAAGACTTAC CAAGACAAGC TACTGTGGAA GCATTGGATA ATGTAATCGT
>
>     2251  TGCCACATTA GGTAAATCCG GGTTCCAAAG ATTATTGGGT CCTGTTGTGG
>
>     2301  AGGTATTGAA AGAACAAGAC CCTACAAAGA GTCAAGACCC AACTGCTGGT
>
>     2351  CATTAAGTGT ACAATAAGTA GTTGTTTATT ATCTTATATT GTTTTATGTT
>
>     2401  AGTATATTCT ATCTTTTTTT TTTTGGCTTA CTCACCTTCT GGTGTTTTCG
>
>     2451  TTGCGATTTT GATAATGGAT GGTTGGTGCA AAAGTTCAAC TACATTTCTT
>
>     2501  GTTGTCAGGT ATATACGAGA TGGCAGCATG AACGAGCTCA CCATGGGTTG
>
>     2551  AACATTATTG AAGTTATCCG GCCGTGCCTT TTGCGAAACA TGGTAACTAA
>
>     2601  TATATTGCAA ACTTGGCTTC TACAGAAAAT ATACAATCTA ATACCTTGAG
>
>     2651  GAATTTCCTC TATATATAAT AGAGAATTC
>
> I'm not a GCG expert, but is this a correctly formatted GCG file in 
> the first
> place? If not, is this an error in the SeqIO parser?  I've found this 
> behavior
> to be the same on Solaris 8 and on Linux, both running BioPerl 1.4 and 
> Perl
> 5.8.1.
>
> Thanks a bunch,
>
> Tex Thompson
> RIT Bioinformatics
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the Bioperl-l mailing list