[Bioperl-l] Input for Bio::CodonUsage::IO
Brian Osborne
osborne1 at optonline.net
Fri Apr 21 22:00:11 UTC 2006
Marc,
I spoke too soon, looking at IO.pm's _parse method shows clearly that it's
meant to parse the same format it writes, something like what's shown below.
As you noted, this doesn't look anything like the *codon files, which have a
fasta-style header followed by 64 numbers, clearly these are the counts of
codons in the sequence that's referenced.
In addition IO.pm wouldn't have worked anyway, it was missing a basic "use"
statement, fixed now. Like you I can't connect to Codon at Kasuza, when I
can I'll see if it can provide us with a file formatted something like the
text below.
Brian O.
WDW311031#4\AJ311031\complement(1717..2511)\795\CAC84661.1\Wheat dwarf virus
- [unknown] 1 CDS's
AmAcid Codon Number /1000 Fraction
Gly GGG 0.00 0.00 0.00
Gly GGA 0.00 0.00 0.00
Gly GGT 0.00 0.00 0.00
Gly GGC 0.00 0.00 0.00
Glu GAG 0.00 0.00 0.00
Glu GAA 0.00 0.00 0.00
Asp GAT 0.00 0.00 0.00
Asp GAC 0.00 0.00 0.00
Val GTG 0.00 0.00 0.00
Val GTA 0.00 0.00 0.00
Val GTT 0.00 0.00 0.00
Val GTC 0.00 0.00 0.00
Ala GCG 0.00 0.00 0.00
Ala GCA 0.00 0.00 0.00
Ala GCT 0.00 0.00 0.00
Ala GCC 0.00 0.00 0.00
Arg AGG 0.00 0.00 0.00
Arg AGA 0.00 0.00 0.00
Ser AGT 0.00 0.00 0.00
Ser AGC 0.00 0.00 0.00
Lys AAG 0.00 0.00 0.00
Lys AAA 0.00 0.00 0.00
Asn AAT 0.00 0.00 0.00
Asn AAC 0.00 0.00 0.00
Met ATG 0.00 0.00 0.00
Ile ATA 0.00 0.00 0.00
Ile ATT 0.00 0.00 0.00
Ile ATC 0.00 0.00 0.00
Thr ACG 0.00 0.00 0.00
Thr ACA 0.00 0.00 0.00
Thr ACT 0.00 0.00 0.00
Thr ACC 0.00 0.00 0.00
Trp TGG 0.00 0.00 0.00
Ter TGA 0.00 0.00 0.00
Cys TGT 0.00 0.00 0.00
Cys TGC 0.00 0.00 0.00
Ter TAG 0.00 0.00 0.00
Ter TAA 0.00 0.00 0.00
Tyr TAT 0.00 0.00 0.00
Tyr TAC 0.00 0.00 0.00
Leu TTG 0.00 0.00 0.00
Leu TTA 0.00 0.00 0.00
Phe TTT 0.00 0.00 0.00
Phe TTC 0.00 0.00 0.00
Ser TCG 0.00 0.00 0.00
Ser TCA 0.00 0.00 0.00
Ser TCT 0.00 0.00 0.00
Ser TCC 0.00 0.00 0.00
Arg CGG 0.00 0.00 0.00
Arg CGA 0.00 0.00 0.00
Arg CGT 0.00 0.00 0.00
Arg CGC 0.00 0.00 0.00
Gln CAG 0.00 0.00 0.00
Gln CAA 0.00 0.00 0.00
His CAT 0.00 0.00 0.00
His CAC 0.00 0.00 0.00
Leu CTG 0.00 0.00 0.00
Leu CTA 0.00 0.00 0.00
Leu CTT 0.00 0.00 0.00
Leu CTC 0.00 0.00 0.00
Pro CCG 0.00 0.00 0.00
Pro CCA 0.00 0.00 0.00
Pro CCT 0.00 0.00 0.00
Pro CCC 0.00 0.00 0.00
Coding GC 0%
1st letter GC 0%
2nd letter GC 0%
3rd letter GC 0%
Genetic code 1
On 4/21/06 9:26 AM, "Marc Logghe" <Marc.Logghe at DEVGEN.com> wrote:
> Hi Brian
> Thanks for the reply.
> I might be overlooking something but I dowloaded this last week. The
> tarball contained *.codon and *.spsum files and did not look at all like
> as a codon usage table (kind of pseudo fasta). For that reason, I used
> EMBOSS cutgextract that produced *.cut files starting from the CUTG
> *.codon files.
>
> I finally managed to parse this *.cut files.
> In order to do that I created a Bio::CodonUsage::IO::emboss module that
> only contains the private _parse() method. The setup I used is a copycat
> from Bio::SeqIO.
> Meaning, now you can do:
> my $io = Bio::CodonUsage::IO->new( -file => shift, -format => 'emboss'
> );
>
> In case no format option is given it defaults to the
> Bio::CodonUsage::IO::default module that contains the _parse() method
> from the original Bio::CodonUsage::IO module. Actually, this should be
> changed to a name that makes more sense but I did not know what this
> default format looks like and/or where it comes from. My guess it is
> coming from http://www.kazusa.or.jp but the site seems to be broken. At
> least today.
> Currently I continue with this setup in house, but in case you think it
> is usefull to commit, just let me know.
> Cheers,
> Marc
>
>
>
>> -----Original Message-----
>> From: Brian Osborne [mailto:osborne1 at optonline.net]
>> Sent: Friday, April 21, 2006 3:09 PM
>> To: Marc Logghe; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Input for Bio::CodonUsage::IO
>>
>> Marc,
>>
>> It wants a file from the database CUTG. You can ftp them from
>> this mirror:
>>
>> ftp://ftp.ebi.ac.uk/pub/databases/cutg
>>
>>
>> Brian O.
>>
>>
>> On 4/21/06 4:56 AM, "Marc Logghe" <Marc.Logghe at DEVGEN.com> wrote:
>>
>>> Hi,
>>> I was wondering what format Bio::CodonUsage::IO expects as
>> input for
>>> the -file option.
>>> I tried to pass it a *.cut file generated by EMBOSS'
>> cutgextract that
>>> looks like this:
>>> #Species: Oryza sativa
>>> #Division: gbpln
>>> #Release: CUTG
>>> #CdsCount: 70050
>>>
>>> #Coding GC 55.34%
>>> #1st letter GC 58.41%
>>> #2nd letter GC 46.34%
>>> #3rd letter GC 61.29%
>>>
>>> #Codon AA Fraction Frequency Number
>>> GCA A 0.185 17.382 431151
>>> <skipped>
>>> TGA * 0.435 1.228 30463
>>>
>>> Looking into the _parse() method of Bio::CodonUsage::IO it appears
>>> that the table resembles this kind of format but is actually not
>>> exactly what it expects. My question is: how should it really look
>>> like ? I could not find an example in t/data.
>>> Any clues ?
>>> Thanks,
>>> Marc
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list