[Bioperl-l] Re: help extracting CDS
Hilmar Lapp
hlapp@gnf.org
Wed, 18 Dec 2002 09:34:35 -0800
On Wednesday, December 18, 2002, at 09:27 AM, Hilmar Lapp wrote:
> In case someone was interested, this in straight bioperl does just
> that.
>
> #!/usr/bin/perl
> use Bio::SeqIO;
> use Bio::Seq;
> $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'genbank');
> $out = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'fasta');
> while($seq = $in->next_seq()) {
> @ids = ();
Bummer. This line needs to be at the beginning of the foreach loop, not
here ...
> foreach $cds (grep { $_->primary_tag eq 'CDS'; }
> $seq->get_SeqFeatures()) {
@ids = ();
-hilmar
> push(@ids, grep { /^GI:/; } $cds->get_tag_values('db_xref'));
> push(@ids,$cds->get_tag_values('product')) if
> $cds->has_tag('product');
> push(@ids,$cds->get_tag_values('name')) if $cds->has_tag('name');
> push(@ids,$cds->get_tag_values('protein_id'));
> ($seq) = $cds->get_tag_values('translation');
> $out->write_seq(Bio::Seq->new(-seq => $seq,
> -display_id => join('|',@ids)));
> }
> }
>
> Some remarks: not every CDS feature has a translation tag. The above
> code will fail on these in a non-silent manner, but you can protect
> against that by appending "if $cds->has_tag('translation')" to the
> respective line. Also, double-quotes also aren't always there.
>
> One of the problems with genbank is that it would be relatively
> trivial to parse if they followed their format strictly.
>
> -hilmar
>
> On Wednesday, December 18, 2002, at 07:55 AM, Pedro Antonio Reche
> wrote:
>
>> Hi, the ways of programing with bioperl remains misterious to me.
>> Despite the always excellent help of Jason Stajich I could not find
>> simple way to get the CDS from a genbank record using bioperl. At
>> least
>> for now.
>> Anyway, in case someone was interested in my original post, this code
>> in
>> straight perl does just that.
>> Regards.
>> #!/usr/sbin/perl -w
>> use strict;
>> $/ = "\n CDS";
>> <>; # to skeep header
>> while ( <> ) {
>> my ($gname) = /product="([^"]+)"/;#sometimes /product= is replace
>> by
>> /name=
>> $gname =~ s/\s+//g;
>> my ($ref) = /protein_id="([\w.]+)"/;
>> my ($gid) = /db_xref="(GI:\w+)"/;
>> my ($seq) = /translation="([A-Z\s]+)"/;
>> $seq =~ s/\s+//g;
>>
>> print ">$gid|$gname|$ref\n$seq\n";
>> }
>>>
>>> Hi, I need to extract the CDS from a genbank genome record, saving
>>> them
>>> into file in fasta format, and I wonder if someone can let me know
>>> how
>>> to do this using bioperl.
>>> Tanks in advance for any positive consideration.
>>>
>>> pedro
>>>
>>> *******************************************************************
>>> PEDRO A. RECHE , pHD TL: 617 632 3824
>>> Dana-Farber Cancer Institute, FX: 617 632 4569
>>> Harvard Medical School, EM: reche@research.dfci.harvard.edu
>>> 44 Binney Street, D1510A, EM: reche@mifoundation.org
>>> Boston, MA 02115 URL: http://www.reche.org
>>> *******************************************************************
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@bioperl.org
>>> http://bioperl.org/mailman/listinfo/bioperl-l
>>
>> --
>> *******************************************************************
>> PEDRO A. RECHE , pHD TL: 617 632 3824
>> Dana-Farber Cancer Institute, FX: 617 632 4569
>> Harvard Medical School, EM: reche@research.dfci.harvard.edu
>> 44 Binney Street, D1510A, EM: reche@mifoundation.org
>> Boston, MA 02115 URL: http://www.reche.org
>> *******************************************************************
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@bioperl.org
>> http://bioperl.org/mailman/listinfo/bioperl-l
>>
> --
> -------------------------------------------------------------
> Hilmar Lapp email: lapp at gnf.org
> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
> -------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------