[Bioperl-l] How to read in FASTA formatted sequence without
fastaheader?
Jason Stajich
jason.stajich at duke.edu
Fri Sep 30 23:12:54 EDT 2005
On Sep 30, 2005, at 7:16 PM, Ryan Golhar wrote:
> True. Ok, so I have a raw sequence instead of fasta...when I try to
> read in the sequence using raw format, it only reads in the first
> line.
>
> I'm thinking of modifying the raw module and making a multilineraw
> module that will stop reading on a newline or EOF.
>
Well technically will have to detect the presence of multiple
consecutive newlines as it currently separates on single newlines,
hence your problem.
Seems like it is easier to use a standard file format in the future
(and dare I say *standard* for anyone who might come along after you
on a project), but you could probably modify raw.pm locally to
separate on multiline newline.
Thinking about this I'm not sure how much help SeqIO is. You just
need a function that will give you back Bio::PrimarySeq objectsm
isn't much more complicated than this below.
If you just add this to your perl script you will be able to split a
sequence on double newlines and use the 'raw' format.
use strict;
use Bio::SeqIO;
use Bio::SeqIO::raw;
sub Bio::SeqIO::raw::next_seq{
my ($self, at args) = @_;
local $_ = "\n\n";
my $nextline = $self->_readline();
if( !defined $nextline ){ return undef; }
my $sequence = uc($nextline);
$sequence =~ s/\W//g;
return $self->sequence_factory->create(-seq => $sequence);
}
# your perl code now that will eventually do a Bio::SeqIO->new(-
format => 'raw', .... );
> I don't want to modify the actual files because they might screw up
> all
> my other scripts. I could write one to insert the fasta header in
> a tmp
> file then concatente the sequence to the file, but it just doesn't
> seem
> like a clean solution to me.
>
>
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Richard
> Sucgang, PhD
> Sent: Friday, September 30, 2005 5:59 PM
> To: golharam at umdnj.edu
> Cc: 'Bioperl List'
> Subject: Re: [Bioperl-l] How to read in FASTA formatted sequence
> without
> fastaheader?
>
>
>
> Well, maybe I am mistaken, but isn't the header line the item that
> makes a FASTA file a FASTA file?
> As in, now you have a raw sequence.
>
>
> On Sep 30, 2005, at 3:43 PM, Ryan Golhar wrote:
>
>
>> I'm looking for the easier way to read in a fasta file that doesn't
>> contain the fasta header, ie the ">..." line.
>>
>> I tried just specifying fasta, but then the first line of the
>> sequence
>>
>
>
>> is taken as the name of the sequence. I also tried specifying raw,
>> but
>> then only the first line is read.
>>
>> Is there any (easy) way to do this without reformatting the fasta
>> file
>>
>
>
>> or creating a new one? Thanks,
>>
>> Ryan
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/
More information about the Bioperl-l
mailing list