[Bioperl-l] automation of translation based on alignment
Chris Fields
cjfields at illinois.edu
Tue Mar 23 04:43:03 UTC 2010
On Mar 22, 2010, at 8:32 PM, Ross KK Leung wrote:
> Chris L,
>
> Your comment is insightful and as a non-virologist, I have never known that
> before. My strategy is just to extract the genomic fragments encoding
> proteins and derive the putative translated sequences. I'll do another round
> of MSA for the protein sequences in order to discover any outliners. There
> may be truncations, but as long as the protease acts post-translationally,
> it's acceptable.
>
> Chris F,
>
> What makes me feel frustrated is the verisimilar data structures and naming
> of Bio objects in Bioperl. If I want to retrieve a genbank file over the
> internet by:
>
> $gb = new Bio::DB::GenBank;
>
> $seq = $gb->get_Seq_by_acc('J00522');
>
> And from:
> http://doc.bioperl.org/releases/bioperl-1.4/Bio/DB/GenBank.html
>
> it says it returns a Bio::Seq object, but in fact it's a Bio::Seq::RichSeq
> so I can't do something like:
A Bio::Seq::RichSeq is-a Bio::Seq (it inherits Bio::Seq and augments it). I believe 'Bio::Seq' in the documents refers to the fact one can retrieve FASTA sequence data (which returns a simple Bio::Seq) or richer records, such as a GenBank record (which returns a Bio::Seq::RichSeq). In this case, it should probably read 'Bio::SeqI' to be more accurate (implements the Bio::SeqI interface).
Beyond the addition of a few accessor methods they are essentially the same, in they both have annotation, features, etc.
> my $seqobj = $seq->next_seq;
You're either not reading the demos or the relevant documentation correctly, or there is a spot in the docs that needs to be fixed (if the latter, please let us know). Bio::Seq does not implement a next_seq() method, but sequence *streams* (ala Bio::SeqIO) do. You are probably thinking of something like this:
my $streamobj = $gb->get_Stream_by_acc(@ids);
while (my $seqobj = $stream->next_seq) {
# do stuff here
}
The above retrieves a stream of Bio::Seq objects (specifically, a Bio::SeqIO stream). '$stream->next_seq()' iterates through them one at a time. Unless you call a stream in some way, that code will not work. If you call the methods below directly on the *sequence* object ($seqobj, retrieved from get_Seq_by_*), NOT the *stream* object (get_Stream_by_*), it should work.
> for my $feat_object ($seqobj->get_SeqFeatures) {
>
> if ($feat_object->primary_tag eq "CDS") {
>
> print $feat_object->spliced_seq->seq,"\n";
>
> if ($feat_object->has_tag('gene')) {
>
> for my $val ($feat_object->get_tag_values('gene')){
>
> print "gene: ",$val,"\n";
>
> }
>
> }
>
> }
>
> }
>
>> From http://doc.bioperl.org/releases/bioperl-1.4/Bio/Seq/RichSeq.html, the
> methods there mention nothing about how to get the features or inter-convert
> among the object types.
Just a note, but make sure to read up-to-date documentation, particularly if you are using the latest code. Here is the pdoc for the latest release:
http://doc.bioperl.org/releases/bioperl-1.6.1/Bio/Seq/RichSeqI.html
This is definitely worth pointing out, and is a good example where we can improve our documentation; I've added some links to classes that would explain more. In the meantime, the best thing to do in this case is to point you to the online documentation (which I think I did already, but just in case):
http://www.bioperl.org/wiki/HOWTO:Beginners
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation
chris
More information about the Bioperl-l
mailing list