[Bioperl-l] Re: Bioperl-l Digest, Vol 22, Issue 8

Todd Naumann tan10 at psu.edu
Fri Feb 11 16:55:39 EST 2005


Ryan,

I can't help you as to the logic of behavior when incorrect fasta
files are specified as fasta. I'm sure there are many ways that it
could be handled and the best one depends on your application.

Your attempt to use raw format failed because your files have newlines
which are not allowed. If you replace:

my $seq = $seq_in->next_seq();
print $seq->length;

with a loop:

while (my $inseq = $seq_in->next_seq) {
  print $inseq->length, "\n";
}

you will see that each line is treated as a seperate sequence. If all
you want is the total length of your file, you could add a counter
like:

my $counter;

while (my $inseq = $seq_in->next_seq) {
  $counter += $inseq->length;
}
print $counter, "\n";

should give you the correct length of the sequence. If you want to
use bioperl Sequence object for other functionality you can
1) add a headers to the files
2) use perl to convert your files so they don't have newlines
3) open file with 'normal' perl, save the sequence as a string and
manually construct a Bio::PrimarySeq object:

#!/usr/bin/perl
use Bio::PrimarySeq;

my $usage = "perl bioperl.pl infile";
my $infile = shift or die $usage;
open(IN, $infile) or die $usage;
my $sequence;

while (<IN>){
chomp;
$sequence .= $_
}

my $seq = Bio::PrimarySeq -> new(-seq => $sequence,
				    -id => $infile
				);

print $seq->length, "\n";
exit;


-Todd

Todd Naumann
Post-Doc, Stephen Benkovic lab
Dept. of Chemistry
Penn State University
University Park, PA  16802

>
> Message: 16
> Date: Fri, 11 Feb 2005 14:31:01 -0500
> From: "Ryan Golhar" <golharam at umdnj.edu>
> Subject: [Bioperl-l] SeqIO bug?
> To: "'Bioperl List'" <bioperl-l at bioperl.org>
> Message-ID: <007501c51070$38d34a80$cd028a0a at GOLHARMOBILE1>
> Content-Type: text/plain;	charset="US-ASCII"
>
> I have a bunch of cDNA sequences that I'm trying to process.  The
> sequences are in FASTA format, but they are all missing the FASTA 
> header
> ie that just contain the sequence.  As a test to make sure I'm reading
> them in correctly, I doing the following:
>
>         my $seq_in = Bio::SeqIO->new(-file => "<myseqfile",
> -format => 'fasta');
>         my $seq = $seq_in->next_seq();
>         print $seq->length;
>
> It prints out a number, but reads the first line as the FASTA header
> even though its not there.  Wouldn't it make more sense to either print
> out an error message about the missing FASTA header, or read in the 
> file
> as just the sequence regardless of specifying the FASTA format?
>
> If I try to read the sequence in as "raw", the length is always printed
> out as 70...
>
> Ryan
>
>
>



More information about the Bioperl-l mailing list