[BioRuby] Calculation of Conserved residues.

Toshiaki Katayama ktym at hgc.jp
Mon Apr 9 07:19:32 UTC 2007


Hi,

On 2007/04/08, at 8:31, Yonatan Gross wrote:
> seqs = []
> seqs << 	Bio::Sequence::AA.new(arabidopsis)
> seqs << 	Bio::Sequence::AA.new(tobacco)
   :

The point is that Bio::Sequence::AA.new couldn't accept FASTA format string.
(Should I change this behavior?)

> factory = Bio::ClustalW.new
> report = factory.query_align(seqs)

Also, factory.query(seqs) finally passes seqs to Bio::OriginalAlignment.new(seqs)
and it looks like that this method also expects array of Bio::Sequence::AA objects
or array of objects which respond to one-of 'seq', 'naseq' or 'aaseq' methods.

In your case, I recommend you to have your sequence in the separate file
and read it through the Bio::FlatFile interface (which recognize FASTA
formatted file and iterates on each sequence by creating Bio::FastaFormat
object) then run clustalw through BioRuby.

  % ruby run_clustalw.rb sequences.txt
                   *** :** .:   **  *: .:      *     ::               * *:* ****  **** **: :*******::***  **:**.:** *.* * . .*  ***  ****:. ****** *. ***************** * ****:**************:* *****:********* **:************.*.*:*: ***:****.*******: ***:

The revised version of your script is as follows:

run_clustalw.rb
------------------------------------------------------------
#!/usr/bin/env ruby

require 'bio'

seqs = []

Bio::FlatFile.auto(ARGF).each do |fasta|
  seqs << fasta.seq
end

clustalw = Bio::ClustalW.new
report = clustalw.query(seqs)
puts report.alignment.match_line
------------------------------------------------------------

sequences.txt:
------------------------------------------------------------
>gi|9843639|emb|CAC03598.1| Rieske FeS protein [Arabidopsis thaliana] (arabidopsis)
MASSSLSPATQLGSSRSALMAMSSGLFVKPTKMNHQMVRKEKIGLRIACQASSIPADRVPDMEKRKTLNL
LLLGALSLPTGYMLVPYATFFVPPGTGGGGGGTPAKDALGNDVVAAEWLKTHGPGDRTLTQGLKGDPTYL
VVENDKTLATYGINAVCTHLGCVVPWNKAENKFLCPCHGSQYNAQGRVVRGPAPLSLALAHADIDEAGKV
LFVPWVETDFRTGDAPWWS
>gi|19995|emb|CAA46808.1| Rieske FeS [Nicotiana tabacum] (tobacco)
MASSTLSPVTQLCSSKSGLSSVSQCLLVKPMKINSHGLGKDKRMKVKCMATSIPADDRVPDMEKRNLMNL
LLLGALSLPTAGMLVPYGTFFVPPGSGGGSGGTPAKDALGNDVIASEWLKTHPPGNRTLTQGLKGDPTYL
VVENDGTLATYGINAVCTHLGCVVPFNAAENKFICPCHGSQYNNQGRVVRGPAPLSLALAHADIDDGKVV
FVPWVETDFRTGEDPWWA
>gi|226151|prf||1412276A rieske FeS precursor protein [spinach] (spinach)
MIISIFNQLHLTENSSLMASFTLSSATPSQLCSSKNGMFAPSLALAKAGRVNVLISKERIRGMKLTCQAT
SIPADNVPDMQKRETLNLLLLGALSLPTGYMLLPYASFFVPPGGGAGTGGTIAKDALGNDVIAAEWLKTH
APGDRTLTQGLKGDPTYLVVESDKTLATFGINAVCTHLGCVVPFNAAENKFICPCHGSQYNNQGRVVRGP
APLSLALAHCDVDDGKVVFVPWTETDFRTGEAPWWSA
>gi|115472727|ref|NP_001059962.1| Os07g0556200 [Oryza sativa (japonica cultivar-group)] (rice)
MASTALSTASNPTQLCRSRASLGKPVKGLGFGRERVPRTATTITCQAASSIPADRVPDMGKRQLMNLLLL
GAISLPTVGMLVPYGAFFIPAGSGNAGGGQVAKDKLGNDVLAEEWLKTHGPNDRTLTQGLKGDPTYLVVE
ADKTLATYGINAVCTHLGCVVPWNAAENKFICPCHGSQYNNQGRVVRGPAPLSLALVHADVDDGKVLFVP
WVETDFRTGDNPWWA
>gi|37222949|gb|AAQ90151.1| putative Rieske Fe-S protein precursor [Solanum tuberosum] (potato)
MASSTLSHVTPSQLCSSKSGVSSVSQALLVKPMKINGHGMGKDKRMKAKCMAASIPADDRVPDMEKRNLM
NLLLLGALALPTGGMLVPYATFFAPPGSGGGSSGTIAKDANGNDVVVTEWLKTHSPGTRTLTQGLKGDPT
YLVVENDGTLATYGINAVCTHLGCVVPWNTAENKFICPCHGSQYNNQGKVVRGPAPLSLALAHADIDDGK
VVFVPWVETDFRTGDSPWWA
------------------------------------------------------------

Hope this helps.

Toshiaki






More information about the BioRuby mailing list