[Bioperl-l] [Fwd: Re: A question about iBio::Index: and itscorrect use]

Tue Nov 10 18:47:02 UTC 2009

Page 44 has the custom ID info or look at documentation for  
Bio::DB::Fasta - there is a similar syntax for Bio::Index::Fasta if  
you read the perldoc for the module.

  http://jason.open-bio.org/Bioperl_Tutorials/ProgrammingBiology2008/ProgBiology_BioPerl_I.pdf

Don't re-opening SeqIO each time just do it once at the beginning  
outside of the loop and then call write_seq within the loop.

This is one nuance of doing OO programming vs procedural is that there  
is some outside state information that can persist in an object, but  
conceptually, you want to open a filehandle once and just keep writing  
to it.

-jason
On Nov 10, 2009, at 2:43 AM, jluis.lavin at unavarra.es wrote:

> Hello again,
>
> I tried what Mark told me modifying the code line he told me but  
> there´s
> still a problem that I believe must be due to the sequences name.
> My secuences header on the Fasta file have this format:
>
>> PleosPC9_1_103820|fgenesh1_pg.3_#_1
>
> Th part on the right of the pipe changes depending on the program  
> used to
> create the gene model, for example:
>
>> PleosPC9_1_103820|fgenesh1_pg.3_#_1
>> PleosPC9_1_123413|genemark.2731_g
>> PleosPC9_1_52065|e_gw1.3.64.1
>
> So I guess I need to parse my ids somehow for thr program to detect  
> only
> the first part of the fasta header (the "protein name") and not to get
> messed with the other side of the pipe...
>
> This is the corrected code I wrote following Mark´s indications, but I
> still don´t have any idea about the parsing issue...
>
> #!/c:/Perl -w
> use Bio::Index::Fasta;
> use strict;
> #PC9.fasta is my genomic file
> my $Index_File_Name ="PC9.fasta";
> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx');
> #LCS.txt is my sequences list
> @ARGV = <LCS.txt>;
> foreach  my $id (@ARGV) {
> if ($id eq ''){
> die ("empty list")
> }
> else {
> my $seqobj = $inx->fetch($id);
> my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta",
> -format => 'fasta');
> $out->write_seq($seqobj);
> }
> }
> exit;
> }
>
> Thanks in advance
>
> PD. May it be a faster way of extracting those sequences using plain  
> PERL?
>
>
>
>
> El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribió:
>> Yes, these are files created by the SDBM, Perl's internal db  
>> manager. You
>> should
>> be able to
>> open the index by simply
>> $inx = Bio::Index::Fasta->new('PC9.fasta.idx');
>> and the dbm will know what to do--
>> cheers MAJ
>> ----- Original Message -----
>> From: <jluis.lavin at unavarra.es>
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Cc: <jluis.lavin at unavarra.es>; <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, November 05, 2009 11:21 AM
>> Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index:  
>> and its
>> correct
>> use]
>>
>>
>>> Thank you very much Mark, that´s a good point :$
>>> I guess your correction is referred to the second script, isn´t it?
>>>
>>> If it is so, there is still a problem with the first script, it  
>>> doesn´t
>>> create the PC9.fasta.idx file, instead it creates two files named:
>>> -PC9.fasta.idx.pag
>>> -PC9.fasta.idx.dir
>>>
>>> which seem to be clearly related with some kind of indexing
>>> process...but,
>>> unless the PC9.fasta.idx file is only virtual or remains hidden, I  
>>> can´t
>>> find it anywhere...
>>> Forgive me if I´m talking nosense...
>>>
>>> Thank you very much again for your help ;)
>>>
>>>
>>> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribió:
>>>> Hey José,
>>>> The first thing that jumps out it the index file name. Looks
>>>> like you create it as
>>>> PC9.fasta.idx
>>>> But you read it as
>>>> PC9.fasta
>>>> Not an unusual mistake. Do
>>>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx');
>>>> and see if it works.
>>>> MAJ
>>>> ----- Original Message -----
>>>> From: <jluis.lavin at unavarra.es>
>>>> To: <bioperl-l at lists.open-bio.org>
>>>> Sent: Thursday, November 05, 2009 10:46 AM
>>>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and  
>>>> its
>>>> correct
>>>> use]
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------- Mensaje original
>>>> ----------------------------
>>>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its  
>>>> correct
>>>> use
>>>> From:    jluis.lavin at unavarra.es
>>>> Fecha:   Jue, 5 de Noviembre de 2009, 16:46
>>>> To:      "Mark A. Jensen" <maj at fortinbras.us>
>>>> --------------------------------------------------------------------------
>>>>
>>>> Hi Mark,
>>>>
>>>> I´ve actually got two scripts, the first one is to create the  
>>>> index and
>>>> the second one is to retrieve the sequence lis from the indexed  
>>>> file.
>>>>
>>>> 1)Here is the Index creation script:
>>>>
>>>> #!/c:/Perl -w
>>>> use strict;
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>>
>>>> print "Enter file for indexing: \n";
>>>> my $Index_File_Name = <STDIN>;
>>>> my $inx = Bio::Index::Fasta->new(-filename =>  
>>>> $Index_File_Name.".idx",
>>>>    -write_flag => 1);
>>>> $inx->make_index(my $File_Name);
>>>>
>>>> 2)And here is the sequence retrieval script:
>>>>
>>>> #!/c:/Perl -w
>>>> use Bio::Index::Fasta;
>>>> use strict;
>>>> #PC9.fasta is my genomic file
>>>> my $Index_File_Name ="PC9.fasta";
>>>> my $inx = Bio::Index::Fasta->new($Index_File_Name);
>>>> #LCS.txt is my sequences list
>>>> @ARGV = <lCS.txt>;
>>>> foreach  my $id (@ARGV) {
>>>> if ($id eq ''){
>>>> die ("empty list")
>>>> }
>>>> else {
>>>> my $seqobj = $inx->fetch($id);
>>>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta",
>>>> -format => 'fasta');
>>>> $out->write_seq($seqobj);
>>>> }
>>>> }
>>>> exit;
>>>> }
>>>>
>>>> I hope this code is not a total scum...
>>>>
>>>> Thanks in advance ;)
>>>>
>>>>
>>>>
>>>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribió:
>>>>> José -- It looks like this is a good solution to your problem.  
>>>>> Please
>>>>> send
>>>>> you
>>>>> script so we can look at it-
>>>>> cheers Mark
>>>>> ----- Original Message -----
>>>>> From: <jluis.lavin at unavarra.es>
>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Thursday, November 05, 2009 10:28 AM
>>>>> Subject: [Bioperl-l] A question about iBio::Index: and its  
>>>>> correct use
>>>>>
>>>>>
>>>>>
>>>>> Hello to all,
>>>>>
>>>>> I´m trying to write a script to retrieve a list of sequences  
>>>>> from a
>>>>> local
>>>>> FASTA file (for example a fasta archive where all the protein  
>>>>> models
>>>>> of
>>>>> an
>>>>> organism are stored). This file would be used by me as some kind
>>>>> "local
>>>>> database" (sorry if I mistake a few concepts...)
>>>>> I´ve been reading the BioPerl HOWTOs and I came across the
>>>>> Bio::Index::Fasta tool.
>>>>> If I didn´t misunderstood what I read (which can be easy because  
>>>>> my
>>>>> low
>>>>> level on programming) this Indexing tool should do the job.
>>>>> I wrote a couple of scripts based on the documentation i read  
>>>>> about
>>>>> this
>>>>> tool, but I don´t seem to be able to create the index file to be  
>>>>> used
>>>>> later (to retrieve the sequences from).
>>>>> -First of all, I want to ask the people in this forum if the
>>>>> Bio::Index::Fasta is the right one to chose for this tasks.
>>>>> -Then I´ll beg you to take a look at my scripts, because I don´t  
>>>>> seem
>>>>> to
>>>>> catch the bug...
>>>>>
>>>>> Best wishes to you all and thanks in advance ;)
>>>>>
>>>>> --
>>>>> José Luis Lavín Trueba, PhD
>>>>>
>>>>> Dpto. de Producción Agraria
>>>>> Grupo de Genética y Microbiología
>>>>> Universidad Pública de Navarra
>>>>> 31006 Pamplona
>>>>> Navarra
>>>>> SPAIN
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Dr. José Luis Lavín Trueba
>>>>
>>>> Dpto. de Producción Agraria
>>>> Grupo de Genética y Microbiología
>>>> Universidad Pública de Navarra
>>>> 31006 Pamplona
>>>> Navarra
>>>> SPAIN
>>>>
>>>>
>>>>
>>>> --
>>>> Dr. José Luis Lavín Trueba
>>>>
>>>> Dpto. de Producción Agraria
>>>> Grupo de Genética y Microbiología
>>>> Universidad Pública de Navarra
>>>> 31006 Pamplona
>>>> Navarra
>>>> SPAIN
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Dr. José Luis Lavín Trueba
>>>
>>> Dpto. de Producción Agraria
>>> Grupo de Genética y Microbiología
>>> Universidad Pública de Navarra
>>> 31006 Pamplona
>>> Navarra
>>> SPAIN
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> -- 
> Dr. José Luis Lavín Trueba
>
> Dpto. de Producción Agraria
> Grupo de Genética y Microbiología
> Universidad Pública de Navarra
> 31006 Pamplona
> Navarra
> SPAIN
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org