[Bioperl-l] use Bio::SeqIO to read Fasta sequence from pipe, or @ARGV, like "while (<>) {....}"
Mark A. Jensen
maj at fortinbras.us
Sun Jul 13 02:14:57 UTC 2014
Haiyan - I think you want the following--
##########
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use Bio::SeqIO ;
use Getopt::Long;
use Statistics::Descriptive ;
no warnings qw/once/;
my %opt = () ;
GetOptions(
"sum"=>\$opt{sum},
);
my $sta = Statistics::Descriptive::Full->new();
my $in = Bio::SeqIO->new(-format=>"Fasta",-fh=>\*DATA);
while(my $s = $in->next_seq()){
$sta->add_data($s->length()) ;
}
#`rm $tf` ;
unlink $tf;
if ( $opt{sum} ) {
print $sta->sum(), "\n" ;
}
# the DATA goes here:
__END__
>Contig000001
CCACGTAAGAGCACCTGGGTCCCCGCCCGCCAAGCGCCGCGAGCGCCAGCAGCAGCTCGC
>hello
ATATATTTTT
############
On 2014-07-12 21:56, Haiyan Lin wrote:
> Thanks to Mark and Torsten for their advice.
>
> I can got what I want using both "less" and "cat" by writing data
> form
> upstream pipeline into a temporary file, followed by processing and
> removing. But, it is not efficient. I'm looking for a better method.
> Whether can I avoiding writting and removinf the file? Thanks for any
> advic
>
> ##### Following is MyData
> [linhy at bioinfo1 Script]$ more try.fa
>>Contig000001
> CCACGTAAGAGCACCTGGGTCCCCGCCCGCCAAGCGCCGCGAGCGCCAGCAGCAGCTCGC
>>hello
> ATATATTTTT
>
> ##### Following is my code
> [linhy at bioinfo1 Script]$ more fastaLen.pl
> #!/usr/bin/env perl
>
> use strict;
> use warnings;
> use utf8;
> use Bio::SeqIO ;
> use Getopt::Long;
> use Statistics::Descriptive ;
>
> my %opt = () ;
> GetOptions(
> "sum"=>\$opt{sum},
> );
>
> my $tf = "/tmp/subFasta.len.PID$$.fa";
> open WTF,">$tf" or die $!;
> while(<>){
> print WTF ;
> }
> close WTF;
>
> my $sta = Statistics::Descriptive::Full->new();
>
> my $in = Bio::SeqIO->new(-format=>"Fasta",-file=>$tf);
>
> #### I'm failed with the following commented line
> #my $in = Bio::SeqIO->new(-format=>"Fasta",-fh=>\*DATA);
>
> while(my $s = $in->next_seq()){
> $sta->add_data($s->length()) ;
> }
> `rm $tf` ;
>
> if ( $opt{sum} ) {
> print $sta->sum(), "\n" ;
> }
>
> exit;
>
>
> ##### The output by above code and data using by both "cat" and
> "less"
> [linhy at bioinfo1 Script]$ cat try.fa | ./fastaLen.pl -s
> 70
> [linhy at bioinfo1 Script]$ less try.fa | ./fastaLen.pl -s
> 70
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, 2014-07-12 at 20:57 -0400, Mark A Jensen wrote:
>> You should do
>> cat try.fa | perl ..
>> rather than less,IMO. less formats things.
>>
>> You can add
>> no warnings qw/once/;
>> to get rid of that warning, it shouldn't affect
>> the program.
>>
>> MAJ
>>
>>
>> On Sat, Jul 12, 2014 at 8:03 PM, Haiyan Lin <linhy0120 at gmail.com>
>> wrote:
>>
>>
>> Hi, Paul,
>>
>> Thanks for your quick replay and advice. Soryy for my late
>> feedback, I can't send maill in my location yesterday. I
>> have
>> tried your method. But
>> it complains that
>>
>> ------------------------------------
>> [linhy at bioinfo1 Script]$ less try.fa | perl ./fastaLen.pl
>> Name "main::DATA" used only once: possible typo
>> at ./fastaLen.pl line
>> 42.
>> Use of uninitialized value in print at ./fastaLen.pl line
>> 49.
>> ------------------------------------
>>
>>
>> I had also tried the "-fh=>\*STDIN", and "-fh=><>",according
>> to the
>> documentation returned by "perldoc Bio::SeqIO" at termminal,
>> but I still
>> haven't got I want.Thanks again.
>>
>> ... ...
>>
>> Bio::SeqIO->new()
>> $seqIO = Bio::SeqIO->new(-file => 'filename',
>> -format=>$format);
>> $seqIO = Bio::SeqIO->new(-fh => \*FILEHANDLE,
>> -format=>$format);
>> $seqIO = Bio::SeqIO->new(-format => $format);
>>
>> ... ...
>>
>> -fh You may provide new() with a previously-opened
>> filehandle.
>> For example, to read from STDIN:
>>
>> $seqIO = Bio::SeqIO->new(-fh => \*STDIN);
>>
>> Note that you must pass filehandles as
>> references
>> to globs.
>>
>> If neither a filehandle nor a filename is
>> specified, then
>> the module will read from the @ARGV array or STDIN, using
>> the
>> familiar <> semantics.
>>
>> -------------------------------
>>
>>
>>
>> Regards
>>
>> Haiyan
>>
>>
>>
>>
>> On Sat, 2014-07-12 at 10:22 -0400, Paul Cantalupo wrote:
>> > Hi Haiyan,
>> >
>> >
>> > You need to use the '-fh' option in Bio::SeqIO new and
>> have
>> it use
>> > Perl's DATA filehandle like so:
>> > my $in = Bio::SeqIO->new(-fh => \*DATA, -format =>
>> 'fasta');
>> >
>> > This was taken from
>> > http://perldoc.perl.org/perldata.html#Special-Literals:
>> > "Text after __DATA__ may be read via the filehandle
>> PACKNAME::DATA ,
>> > where PACKNAME is the package that was current when the
>> __DATA__ token
>> > was encountered."
>> >
>> >
>> >
>> > Good luck,
>> >
>> > Paul
>> >
>> >
>> >
>> >
>> > Paul Cantalupo
>> > University of Pittsburgh
>> >
>> >
>> >
>> > On Sat, Jul 12, 2014 at 8:35 AM, Haiyan Lin
>> <linhy0120 at gmail.com>
>> > wrote:
>> > Hill, dear perlers,
>> >
>> > I‘m trying to use Bio::SeqIO to read Fasta
>> sequence
>> from pipe,
>> > or @ARGV,
>> > like "while (<>) {....}". After several trier and
>> error, I'm
>> > failed and
>> > need to ask for herp from you. Could you please
>> help
>> me to
>> > check or try
>> > following code?
>> >
>> > Thanks in advance.
>> >
>> > ---------------------------------------
>> > use Bio::SeqIO ;
>> > use Statistics::Descriptive ;
>> >
>> > my %opt = () ;
>> > my $sta = Statistics::Descriptive::Full->new();
>> >
>> > ##### here is the key, I think.
>> > my $in = Bio::SeqIO->new(-format=>"Fasta");
>> > while(my $s = $in->next_seq()){
>> > $sta->add_data($s->length()) ;
>> > }
>> > print $sta->sum() if $opt{sum} ;
>> >
>> > __DATA__
>> > >ct1
>> > AGAGAGAGA
>> > >ctg2
>> > ATATATAT
>> > -----------------------------------------------
>> >
>> > Regards
>> >
>> > Haiyan
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at mailman.open-bio.org
>> >
>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>> >
>> >
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>>
More information about the Bioperl-l
mailing list