[Bioperl-l] Bio::SeqIO can't guess the format of data from a pipe

Florent Angly florent.angly at gmail.com
Sun Aug 28 09:08:32 UTC 2011


Yes indeed, that's a very convenient way to implement a format() methods 
that gets the format of the file. I'll try to implement it today. More 
logic may be involved because of the formats that take variants, e.g. 
the FASTQ format 
(Bio::SeqIO::fastq<http://www.bioperl.org/wiki/Module:Bio::SeqIO::fastq> 
module) has a 'sanger', 'illumina' and 'solexa' variants.
Florent


On 27/08/11 13:43, Hilmar Lapp wrote:
> The format is already available - it is in essence the class of the SeqIO instance:
>
> my $format = ref($in);
>
> Rather than passing that into SeqIO->new(), you can directly instantiate a new object from it:
>
> my $out = ref($in)->new(-file =>  ...);
>
> Would that address what you are trying to accomplish?
>
> -hilmar
>
> Sent with a tap.
>
> On Aug 27, 2011, at 8:12 PM, Florent Angly<florent.angly at gmail.com>  wrote:
>
>> My proposal would be to store the format of a file somewhere in the Bio::SeqIO object and create a new get/set method in Bio::SeqIO called format() to store of access its value. The idea would be that the example code above could be rewritten as:
>>
>>     # Open the file and let BioPerl guess its format
>>     my $in = Bio::SeqIO->new( -file =>  $input_seqfile );
>>
>>     # Retrieve the format guessed by BioPerl
>>     my $format = $in->format( );
>>
>>     # Open the output file using the same format as the input file
>>     my $out = Bio::SeqIO->new( -file =>  ">".$output_seqfile , format =>  $format );
>>
>>     # Now do the work...
>>
>> I think this is more elegant since it is more readable, requires less computation (the file format is guessed once), and is more consistent with other Bio::SeqIO methods like alphabet, that guesses the alphabet but has a get/set method to access it.




More information about the Bioperl-l mailing list