<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<tt>Dear Chris,<br>
<tt>thank you for your quick replies :)!<br>
<tt><tt><tt><br>
<tt>I am having a look at the link you mentioned right
now!<br>
<br>
</tt><tt><tt>I <tt>a</tt><tt>ttach<tt>ed <tt>some script
and the fasta exemple!<br>
<br>
<tt>Just for the information:<br>
<tt>perl --version: </tt><br>
</tt>This is perl 5, version 22, subversion 1
(v5.22.1)<br>
<tt><br>
&</tt><br>
<br>
Bio<tt>P</tt>erl: 1.6.924-3<br>
<br>
<tt>Thanks again for your answer!<br>
<br>
<tt>Best re<tt>gards,</tt></tt><br>
</tt><br>
<tt>Helene</tt><br>
<br>
</tt></tt></tt></tt></tt><br>
</tt></tt></tt></tt></tt><br>
<div class="moz-cite-prefix">Le 14/11/2016 à 18:31, Fields,
Christopher J a écrit :<br>
</div>
<blockquote
cite="mid:58A99883-47C9-4D4F-AC30-DC0A4EDEBC86@illinois.edu"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Title" content="">
<meta name="Keywords" content="">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Courier New";
panose-1:2 7 3 9 2 2 5 2 4 4;}
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman";}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
tt
{mso-style-priority:99;
font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman";}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Courier;}
span.EmailStyle21
{mso-style-type:personal-reply;
font-family:Calibri;
color:windowtext;}
span.msoIns
{mso-style-type:export-only;
mso-style-name:"";
text-decoration:underline;
color:teal;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:111747589;
mso-list-type:hybrid;
mso-list-template-ids:555136874 2090658488 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;
mso-fareast-font-family:Calibri;
mso-bidi-font-family:"Times New Roman";}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style>
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri">We would
probably need a list of IDs, but this has happened before a
few times. In some cases it’s an issue of line ending
mismatches, which can be normalized using a tool like
dos2unix. However if you have IDs that could be evaluated
as False the issue is trickier and not so easy to fix,
primarily because the returned value is stringified to the
display ID (which is one reason I hate object
stringification).<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri">For example,
the following would likely short-circuit without showing
sequence IDs, as having a seq ID of ‘0’ (note this does not
include the description, which is separate) will evaluate as
False and kill the while loop:<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri">>0 desc1<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri">ATATATGTGC<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri">>1 desc2<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri">CGCGCCGCGC<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri">The issue, the
problems with a fix, and a workaround are described here:</span>
<span style="font-size:11.0pt;font-family:Calibri"><a class="moz-txt-link-freetext" href="https://github.com/bioperl/bioperl-live/issues/170">https://github.com/bioperl/bioperl-live/issues/170</a><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri">chris<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:Calibri"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span
style="font-family:Calibri;color:black">From: </span>
</b><span style="font-family:Calibri;color:black">Bioperl-l
<a class="moz-txt-link-rfc2396E" href="mailto:bioperl-l-bounces+cjfields=illinois.edu@mailman.open-bio.org"><bioperl-l-bounces+cjfields=illinois.edu@mailman.open-bio.org></a> on
behalf of Helene RIMBERT <a class="moz-txt-link-rfc2396E" href="mailto:helene.rimbert@inra.fr"><helene.rimbert@inra.fr></a><br>
<b>Date: </b>Monday, November 14, 2016 at 10:16 AM<br>
<b>To: </b><a class="moz-txt-link-rfc2396E" href="mailto:bioperl-l@mailman.open-bio.org">"bioperl-l@mailman.open-bio.org"</a>
<a class="moz-txt-link-rfc2396E" href="mailto:bioperl-l@mailman.open-bio.org"><bioperl-l@mailman.open-bio.org></a><br>
<b>Subject: </b>[Bioperl-l] Bio::DB::Fasta problem:
unable to fetch all sequences via get_PrimarySeq_stream<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><tt><span
style="font-size:10.0pt">Dear BioPerl developers,</span></tt><span
style="font-size:10.0pt;font-family:"Courier New""><br>
<br>
<tt>I come with a question regarding the
get_PrimarySeq_stream !</tt><br>
<br>
<tt>I am using the Bio::DB:Fasta module to access my fasta
sequences and i am facing some problem with the
get_PrimarySeq_stream().</tt></span><br>
<tt><span style="font-size:10.0pt">When i check the content of
the db object, all the sequences are indexed (i mean that
i can see all the sequences ids in the offsets hash).</span></tt><span
style="font-size:10.0pt;font-family:"Courier New""><br>
<br>
<tt>I then use the get_PrimarySeq_stream to loop over all my
sequences, but only 1 sequence is retrieved from the
stream object.</tt><br>
<tt>I tried to look for some explanations, and the only
thing i could find is that it seems that my seq_ids are
considered as undef. during the
while($dbstream->next_seq()) statement when reaching</tt><br>
<tt>IndexedBase.pm line 1116</tt><br>
<br>
<tt>I tried to loop over all sequence ids using my @seq_ids
= $self->{fastaObj}->get_all_primary_ids; and it
works very well.</tt><br>
<br>
<tt>I don't understand why the stream object does not
retrieve all the sequences whereas get_all_primary_ids
does!</tt><br>
<tt>Is there something wrong with my input FASTA (my ids are
very long...) or am i missing something?</tt><br>
<br>
<tt>I am really interested in finding out why i am not able
to use get_PrimarySeq_stream !</tt><br>
<br>
<tt>Many thanks in advance :)</tt><br>
<br>
<tt>Regards,</tt><br>
<br>
<tt>Helene</tt><br>
<br>
<tt>#----------------------------------</tt><br>
<tt># here is the part of code that causes problem:</tt><br>
<tt># initialize db::fasta object</tt><br>
<tt>$self->{fastaObj} =
Bio::DB::Fasta->new("test2.fna", -reindex => 1);</tt><br>
<br>
<tt># create stream object</tt><br>
<tt>my $seq_stream =
$self->{fastaObj}->get_PrimarySeq_stream();</tt><br>
<tt>$self->{nbSeqFetchedInStream}=0;</tt><br>
<br>
<tt># loop over all seq in BioDBFasta obj using stream obj.</tt><br>
<tt>while ($self->{seq} = $seq_stream->next_seq()){</tt><br>
<tt>#foreach my $seq_id (@seq_ids){</tt><br>
<tt> #$self->{seq} =
$self->{fastaObj}->get_Seq_by_id($seq_id); # to use
with foreach loop</tt><br>
<br>
<tt> print (" New sequence: ", Dumper $self->{seq});</tt><br>
<tt> $self->{nbSeqFetchedInStream}++;</tt><br>
<tt>}</tt><br>
<tt>print (" Fetched sequences in _PrimarySeq_stream:
$self->{nbSeqFetchedInStream}");</tt><br>
<tt>#----------------------------------</tt><br>
<br>
<br>
<br>
<br>
</span><o:p></o:p></p>
<div>
<p class="MsoNormal">-- <o:p></o:p></p>
<p><b>--> Nouvelle adresse e-mail: <a
moz-do-not-send="true"
href="mailto:helene.rimbert@inra.fr">helene.rimbert@inra.fr</a>
<--</b><o:p></o:p></p>
<pre>Hélène RIMBERT<o:p></o:p></pre>
<pre>Bioinformatic Engineer<o:p></o:p></pre>
<pre><a moz-do-not-send="true" href="mailto:helene.rimbert@inra.fr">helene.rimbert@inra.fr</a><o:p></o:p></pre>
<pre>UMR 1095 INRA/UBP – Site de Crouel<o:p></o:p></pre>
<pre>Tèl. : +33 (0)4 73 62 43 49<o:p></o:p></pre>
<pre>5 chemin de beaulieu<o:p></o:p></pre>
<pre>63039 Clermont-Ferrand Cedex 2<o:p></o:p></pre>
<pre>France<o:p></o:p></pre>
<pre><a moz-do-not-send="true" href="https://urldefense.proofpoint.com/v2/url?u=https-3A__www6.ara.inra.fr_umr1095-5Feng_&d=DQMDaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=iAuK-qAsrrjM_h3E9YA-ujqtTSn1yoLk7cNZJ6SUYjE&s=5CzTn2cwr47V7x_FBW4PWVEZ_mB6nyuGjo1LgBYcG7U&e=">https://www6.ara.inra.fr/umr1095_eng/</a><o:p></o:p></pre>
</div>
</div>
</blockquote>
<br>
<div class="moz-signature">-- <br>
<p><b>--> Nouvelle adresse e-mail: <a class="moz-txt-link-abbreviated" href="mailto:helene.rimbert@inra.fr">helene.rimbert@inra.fr</a>
<--</b></p>
<pre>Hélène RIMBERT
Bioinformatic Engineer
<a class="moz-txt-link-abbreviated" href="mailto:helene.rimbert@inra.fr">helene.rimbert@inra.fr</a>
UMR 1095 INRA/UBP – Site de Crouel
Tèl. : +33 (0)4 73 62 43 49
5 chemin de beaulieu
63039 Clermont-Ferrand Cedex 2
France
<a class="moz-txt-link-freetext" href="https://www6.ara.inra.fr/umr1095_eng/">https://www6.ara.inra.fr/umr1095_eng/</a></pre>
</div>
</body>
</html>