[Bioperl-l] Bio::DB::Fasta problem: unable to fetch all sequences via get_PrimarySeq_stream

Tue Nov 15 10:15:29 UTC 2016

Dear Chris,
thank you for your quick replies :)!

I am having a look at the link you mentioned right now!

I attached some script and the fasta exemple!

Just for the information:
perl --version:
This is perl 5, version 22, subversion 1 (v5.22.1)

&

BioPerl: 1.6.924-3

Thanks again for your answer!

Best regards,

Helene

Le 14/11/2016 à 18:31, Fields, Christopher J a écrit :
>
> We would probably need a list of IDs, but this has happened before a 
> few times.  In some cases it’s an issue of line ending mismatches, 
> which can be normalized using a tool like dos2unix.  However if you 
> have IDs that could be evaluated as False the issue is trickier and 
> not so easy to fix, primarily because the returned value is 
> stringified to the display ID (which is one reason I hate object 
> stringification).
>
> For example, the following would likely short-circuit without showing 
> sequence IDs, as having a seq ID of ‘0’ (note this does not include 
> the description, which is separate) will evaluate as False and kill 
> the while loop:
>
> >0 desc1
>
> ATATATGTGC
>
> >1 desc2
>
> CGCGCCGCGC
>
> The issue, the problems with a fix, and a workaround are described 
> here: https://github.com/bioperl/bioperl-live/issues/170
>
> chris
>
> *From: *Bioperl-l 
> <bioperl-l-bounces+cjfields=illinois.edu at mailman.open-bio.org> on 
> behalf of Helene RIMBERT <helene.rimbert at inra.fr>
> *Date: *Monday, November 14, 2016 at 10:16 AM
> *To: *"bioperl-l at mailman.open-bio.org" <bioperl-l at mailman.open-bio.org>
> *Subject: *[Bioperl-l] Bio::DB::Fasta problem: unable to fetch all 
> sequences via get_PrimarySeq_stream
>
> Dear BioPerl developers,
>
> I come with a question regarding the get_PrimarySeq_stream !
>
> I am using the Bio::DB:Fasta module to access my fasta sequences and i 
> am facing some problem with the get_PrimarySeq_stream().
> When i check the content of the db object, all the sequences are 
> indexed (i mean that i can see all the sequences ids in the offsets hash).
>
> I then use the get_PrimarySeq_stream to loop over all my sequences, 
> but only 1 sequence is retrieved from the stream object.
> I tried to look for some explanations, and the only thing i could find 
> is that it seems that my seq_ids are considered as undef. during the 
> while($dbstream->next_seq()) statement when reaching
> IndexedBase.pm line 1116
>
> I tried to loop over all sequence ids using my @seq_ids = 
> $self->{fastaObj}->get_all_primary_ids; and it works very well.
>
> I don't understand why the stream object does not retrieve all the 
> sequences whereas get_all_primary_ids does!
> Is there something wrong with my input FASTA (my ids are very long...) 
> or am i missing something?
>
> I am really interested in finding out why i am not able to use 
> get_PrimarySeq_stream !
>
> Many thanks in advance :)
>
> Regards,
>
> Helene
>
> #----------------------------------
> # here is the part of code that causes problem:
> # initialize db::fasta object
> $self->{fastaObj} = Bio::DB::Fasta->new("test2.fna", -reindex => 1);
>
> # create stream object
> my $seq_stream = $self->{fastaObj}->get_PrimarySeq_stream();
> $self->{nbSeqFetchedInStream}=0;
>
> # loop over all seq in BioDBFasta obj using stream obj.
> while ($self->{seq} = $seq_stream->next_seq()){
> #foreach my $seq_id (@seq_ids){
>     #$self->{seq} = $self->{fastaObj}->get_Seq_by_id($seq_id); # to 
> use with foreach loop
>
>     print (" New sequence: ", Dumper $self->{seq});
>     $self->{nbSeqFetchedInStream}++;
> }
> print (" Fetched sequences in _PrimarySeq_stream: 
> $self->{nbSeqFetchedInStream}");
> #----------------------------------
>
>
>
>
> -- 
>
> *--> Nouvelle adresse e-mail: helene.rimbert at inra.fr 
> <mailto:helene.rimbert at inra.fr> <--*
>
> Hélène RIMBERT
> Bioinformatic Engineer
> helene.rimbert at inra.fr <mailto:helene.rimbert at inra.fr>
> UMR 1095 INRA/UBP – Site de Crouel
> Tèl. : +33 (0)4 73 62 43 49
> 5 chemin de beaulieu
> 63039 Clermont-Ferrand Cedex 2
> France
> https://www6.ara.inra.fr/umr1095_eng/ 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www6.ara.inra.fr_umr1095-5Feng_&d=DQMDaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=iAuK-qAsrrjM_h3E9YA-ujqtTSn1yoLk7cNZJ6SUYjE&s=5CzTn2cwr47V7x_FBW4PWVEZ_mB6nyuGjo1LgBYcG7U&e=>

-- 

*--> Nouvelle adresse e-mail: helene.rimbert at inra.fr <--*

Hélène RIMBERT
Bioinformatic Engineer
helene.rimbert at inra.fr
UMR 1095 INRA/UBP – Site de Crouel
Tèl. : +33 (0)4 73 62 43 49
5 chemin de beaulieu
63039 Clermont-Ferrand Cedex 2
France
https://www6.ara.inra.fr/umr1095_eng/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/bioperl-l/attachments/20161115/f4934d72/attachment.html>
-------------- next part --------------
>scaffold8141_chunk_0_EXONERATE_BLASTX_RefProtMethForWheatPipe_12682_15840_LOC_Os02g53200.2_Match_0025 .
ATGTGTTTGGTTGCAGACGCGGGCACCGTGGGCGTGAACTGGGGGCGGGTGGCGAACGAC
CTGCCCAGCCCGGCGGCGGTGGTGCAGCTGCTCAAGCAGCACGGCATCGCGCAGGTCAAG
>scaffold8141_chunk_0_EXONERATE_BLASTX_RefProtMethForWheatPipe_19160_19327_Bradi2g26810.1.p_Match_0006 .
ATGAGTGATGGAGGTCAAAACAGGAAGGCCATTGCTTCAGTTATGATGCTTGTCTCATGG
AAAATCTAAAAGGACCGCAACGCAAGAATTTTTCGCAACACTGCTGCTCTGACCTCCATT
>scaffold8141_chunk_0_SIMsearch_CAT01_19443_24928_gene_0002 .
ATGCGGACAAGCGCGGCCACGTCGTCGCCACGGGATGGCGGCTCCGGGGCTCTGACCCCC
GCCCAGGGCGACATGGGGAGCGGCGGCGCCAGGCGGCATTTCTTCCCGCTCACCAGCTTG
>scaffold8141_chunk_0_SIMsearch_CAT01_28795_35559_gene_0001 .
ATGTCGCGCCGCGGCCGCGAGGAGGAGGAGGAGGAGGAGGAGTTCGAGGAGGTAGAGGAG
GAGGAGGAGGCCGACGAGTCGGAGGTGGAGGAGGAGCAGGTGGAGGAGAGGGGCGGCAAG
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bugged.pl
Type: application/x-perl
Size: 1495 bytes
Desc: not available
URL: <http://mailman.open-bio.org/pipermail/bioperl-l/attachments/20161115/f4934d72/attachment.pl>