[Bioperl-l] genbank to cds

Jason Stajich jason.stajich at duke.edu
Tue Sep 28 11:49:14 EDT 2004


Escape the STDIN handle
$seqin = Bio::SeqIO->new( '-format' => 'GenBank', -fh => \*STDIN);

You can also use the special handle ARGV to allow both files to be on 
the cmdline or from STDIN
$seqin = Bio::SeqIO->new( '-format' => 'GenBank', -fh => \*ARGV);

This is how you get access to the magic empty diamond (<>) as a 
filehandle you can pass around.

Also if you wanted the protein sequence length for the annotated 
feature (excluding the potential introns) you can call
my $cds_seq = $feat->spliced_seq;
my $cds_len  = $cdsseq->length;
my $pep_len = $cdsseq->translate->length;

n.b. there have been some improvements/bugfixes to spliced_seq since 
the 1.4 release that are only in the CVS tree (http://cvs.open-bio.org) 
until we get the 1.5 release done.

-jason
On Sep 28, 2004, at 11:07 AM, rmaps005 at cib.csic.es wrote:

> hello,
> i was using this program to covert genbank file (actually a human
> chromosome) to cds but even if it does work and generate some cds,
> always comes out with this error:
>
> Argument "*main::STDIN" isn't numeric in numeric eq (==) at
> /usr/lib/perl5/site_perl/5.8.3/Bio/Root/IO.pm line 483.
>
> don't know if it is because runs out of memory or i made a mistake....
> can somebody help me with this?
>
> Thanks
>
> $seqin = Bio::SeqIO->new( '-format' => 'GenBank', -fh => *STDIN);
> while((my $seqobj = $seqin->next_seq()))
> {
>     $genomic_id = $seqobj->display_id;
>     foreach $feat ( $seqobj->top_SeqFeatures() ) {
>         $length = $seqobj->length;
>         if ($feat->primary_tag eq "CDS"){
>             # Check if protein_id tag exists.
>             if (join(" ",$feat->all_tags()) =~ m/db\_xref/){
>                 $protein_id =
> join("",$feat->each_tag_value('db_xref'));
>             }
>             else {
>                 $protein_id = "UNKNOWN";
>             }
>             $cds = $feat->location->to_FTstring;
>             if ($cds =~ m/complement/){
>                 $strand = "-";
>             }
>             else {
>                 $strand = "+";
>             }
>             $cds =~ s/\.\.|,|join\(|\)|complement\(|\(/ /g;
>             print ("$genomic_id\_$protein_id $length $strand  $cds
> \n");
>         }
>     }
> }
>
>
> IlohaCIB
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/



More information about the Bioperl-l mailing list