[Bioperl-l] [Fwd: Re: A question about iBio::Index: anditscorrect use]

Wed Nov 11 15:01:18 UTC 2009

Hi once again,
I have modified the script following the instructions Jason gave me (at
last what I understood, remember it is my first time trying to learn a
programming language...and I´m not the smartest guy in the class, hehe)but
it seems I didn´t fix the problem...
Here´s the new code I wrote:

#!/c:/Perl -w
	use strict;
        use Bio::Index::Fasta;
	use Bio::DB::Fasta;
	use Bio::SeqIO;
	use IO::File;

# assign files to scalars
my $index_file = 'PC91.fasta';
my $id_list = 'LCS2.txt';

# open index file
my $db = Bio::DB::Fasta->new($index_file) or die;

# open the id list
my $in = IO::File->new($id_list) or die;

# open FASTA to write
my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta",
-format => 'fasta');

# retrieve ids loop
foreach my $id ($in) {
if ($id eq ''){
die ("empty list")
}
else {
my $seqobj = my $inx->fetch($id);
$out->write_seq($seqobj);
}
}

# parse fasta headers
sub my_makeid {
my $id = shift;
if ( $id =~ /^>[^:]+:(\S+)/ ) {
return $1;
} elsif ($id =~ /^>(\S+)/) {
return $1;
} else {
warn("cannot parse ID for $id\n");
}
}
exit;

Would anyone, please take a look at it ...

Thanks in advance ;)

El Mar, 10 de Noviembre de 2009, 19:47, Jason Stajich escribió:
> Page 44 has the custom ID info or look at documentation for
> Bio::DB::Fasta - there is a similar syntax for Bio::Index::Fasta if
> you read the perldoc for the module.
>
>   http://jason.open-bio.org/Bioperl_Tutorials/ProgrammingBiology2008/ProgBiology_BioPerl_I.pdf
>
> Don't re-opening SeqIO each time just do it once at the beginning
> outside of the loop and then call write_seq within the loop.
>
> This is one nuance of doing OO programming vs procedural is that there
> is some outside state information that can persist in an object, but
> conceptually, you want to open a filehandle once and just keep writing
> to it.
>
> -jason
> On Nov 10, 2009, at 2:43 AM, jluis.lavin at unavarra.es wrote:
>
>> Hello again,
>>
>> I tried what Mark told me modifying the code line he told me but
>> there´s
>> still a problem that I believe must be due to the sequences name.
>> My secuences header on the Fasta file have this format:
>>
>>> PleosPC9_1_103820|fgenesh1_pg.3_#_1
>>
>> Th part on the right of the pipe changes depending on the program
>> used to
>> create the gene model, for example:
>>
>>> PleosPC9_1_103820|fgenesh1_pg.3_#_1
>>> PleosPC9_1_123413|genemark.2731_g
>>> PleosPC9_1_52065|e_gw1.3.64.1
>>
>> So I guess I need to parse my ids somehow for thr program to detect
>> only
>> the first part of the fasta header (the "protein name") and not to get
>> messed with the other side of the pipe...
>>
>> This is the corrected code I wrote following Mark´s indications, but I
>> still don´t have any idea about the parsing issue...
>>
>> #!/c:/Perl -w
>> use Bio::Index::Fasta;
>> use strict;
>> #PC9.fasta is my genomic file
>> my $Index_File_Name ="PC9.fasta";
>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx');
>> #LCS.txt is my sequences list
>> @ARGV = <LCS.txt>;
>> foreach  my $id (@ARGV) {
>> if ($id eq ''){
>> die ("empty list")
>> }
>> else {
>> my $seqobj = $inx->fetch($id);
>> my $out = new Bio::SeqIO (-file => ">>index_extracted.fasta",
>> -format => 'fasta');
>> $out->write_seq($seqobj);
>> }
>> }
>> exit;
>> }
>>
>> Thanks in advance
>>
>> PD. May it be a faster way of extracting those sequences using plain
>> PERL?
>>
>>
>>
>>
>> El Jue, 5 de Noviembre de 2009, 17:39, Mark A. Jensen escribió:
>>> Yes, these are files created by the SDBM, Perl's internal db
>>> manager. You
>>> should
>>> be able to
>>> open the index by simply
>>> $inx = Bio::Index::Fasta->new('PC9.fasta.idx');
>>> and the dbm will know what to do--
>>> cheers MAJ
>>> ----- Original Message -----
>>> From: <jluis.lavin at unavarra.es>
>>> To: "Mark A. Jensen" <maj at fortinbras.us>
>>> Cc: <jluis.lavin at unavarra.es>; <bioperl-l at lists.open-bio.org>
>>> Sent: Thursday, November 05, 2009 11:21 AM
>>> Subject: Re: [Bioperl-l] [Fwd: Re: A question about iBio::Index:
>>> and its
>>> correct
>>> use]
>>>
>>>
>>>> Thank you very much Mark, that´s a good point :$
>>>> I guess your correction is referred to the second script, isn´t it?
>>>>
>>>> If it is so, there is still a problem with the first script, it
>>>> doesn´t
>>>> create the PC9.fasta.idx file, instead it creates two files named:
>>>> -PC9.fasta.idx.pag
>>>> -PC9.fasta.idx.dir
>>>>
>>>> which seem to be clearly related with some kind of indexing
>>>> process...but,
>>>> unless the PC9.fasta.idx file is only virtual or remains hidden, I
>>>> can´t
>>>> find it anywhere...
>>>> Forgive me if I´m talking nosense...
>>>>
>>>> Thank you very much again for your help ;)
>>>>
>>>>
>>>> El Jue, 5 de Noviembre de 2009, 17:02, Mark A. Jensen escribió:
>>>>> Hey José,
>>>>> The first thing that jumps out it the index file name. Looks
>>>>> like you create it as
>>>>> PC9.fasta.idx
>>>>> But you read it as
>>>>> PC9.fasta
>>>>> Not an unusual mistake. Do
>>>>> my $inx = Bio::Index::Fasta->new('PC9.fasta.idx');
>>>>> and see if it works.
>>>>> MAJ
>>>>> ----- Original Message -----
>>>>> From: <jluis.lavin at unavarra.es>
>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Thursday, November 05, 2009 10:46 AM
>>>>> Subject: [Bioperl-l] [Fwd: Re: A question about iBio::Index: and
>>>>> its
>>>>> correct
>>>>> use]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------- Mensaje original
>>>>> ----------------------------
>>>>> Subject: Re: [Bioperl-l] A question about iBio::Index: and its
>>>>> correct
>>>>> use
>>>>> From:    jluis.lavin at unavarra.es
>>>>> Fecha:   Jue, 5 de Noviembre de 2009, 16:46
>>>>> To:      "Mark A. Jensen" <maj at fortinbras.us>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> Hi Mark,
>>>>>
>>>>> I´ve actually got two scripts, the first one is to create the
>>>>> index and
>>>>> the second one is to retrieve the sequence lis from the indexed
>>>>> file.
>>>>>
>>>>> 1)Here is the Index creation script:
>>>>>
>>>>> #!/c:/Perl -w
>>>>> use strict;
>>>>> use Bio::Index::Fasta;
>>>>> use strict;
>>>>>
>>>>> print "Enter file for indexing: \n";
>>>>> my $Index_File_Name = <STDIN>;
>>>>> my $inx = Bio::Index::Fasta->new(-filename =>
>>>>> $Index_File_Name.".idx",
>>>>>    -write_flag => 1);
>>>>> $inx->make_index(my $File_Name);
>>>>>
>>>>> 2)And here is the sequence retrieval script:
>>>>>
>>>>> #!/c:/Perl -w
>>>>> use Bio::Index::Fasta;
>>>>> use strict;
>>>>> #PC9.fasta is my genomic file
>>>>> my $Index_File_Name ="PC9.fasta";
>>>>> my $inx = Bio::Index::Fasta->new($Index_File_Name);
>>>>> #LCS.txt is my sequences list
>>>>> @ARGV = <lCS.txt>;
>>>>> foreach  my $id (@ARGV) {
>>>>> if ($id eq ''){
>>>>> die ("empty list")
>>>>> }
>>>>> else {
>>>>> my $seqobj = $inx->fetch($id);
>>>>> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta",
>>>>> -format => 'fasta');
>>>>> $out->write_seq($seqobj);
>>>>> }
>>>>> }
>>>>> exit;
>>>>> }
>>>>>
>>>>> I hope this code is not a total scum...
>>>>>
>>>>> Thanks in advance ;)
>>>>>
>>>>>
>>>>>
>>>>> El Jue, 5 de Noviembre de 2009, 16:39, Mark A. Jensen escribió:
>>>>>> José -- It looks like this is a good solution to your problem.
>>>>>> Please
>>>>>> send
>>>>>> you
>>>>>> script so we can look at it-
>>>>>> cheers Mark
>>>>>> ----- Original Message -----
>>>>>> From: <jluis.lavin at unavarra.es>
>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>> Sent: Thursday, November 05, 2009 10:28 AM
>>>>>> Subject: [Bioperl-l] A question about iBio::Index: and its
>>>>>> correct use
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hello to all,
>>>>>>
>>>>>> I´m trying to write a script to retrieve a list of sequences
>>>>>> from a
>>>>>> local
>>>>>> FASTA file (for example a fasta archive where all the protein
>>>>>> models
>>>>>> of
>>>>>> an
>>>>>> organism are stored). This file would be used by me as some kind
>>>>>> "local
>>>>>> database" (sorry if I mistake a few concepts...)
>>>>>> I´ve been reading the BioPerl HOWTOs and I came across the
>>>>>> Bio::Index::Fasta tool.
>>>>>> If I didn´t misunderstood what I read (which can be easy because
>>>>>> my
>>>>>> low
>>>>>> level on programming) this Indexing tool should do the job.
>>>>>> I wrote a couple of scripts based on the documentation i read
>>>>>> about
>>>>>> this
>>>>>> tool, but I don´t seem to be able to create the index file to be
>>>>>> used
>>>>>> later (to retrieve the sequences from).
>>>>>> -First of all, I want to ask the people in this forum if the
>>>>>> Bio::Index::Fasta is the right one to chose for this tasks.
>>>>>> -Then I´ll beg you to take a look at my scripts, because I don´t
>>>>>> seem
>>>>>> to
>>>>>> catch the bug...
>>>>>>
>>>>>> Best wishes to you all and thanks in advance ;)
>>>>>>
>>>>>> --
>>>>>> José Luis Lavín Trueba, PhD
>>>>>>
>>>>>> Dpto. de Producción Agraria
>>>>>> Grupo de Genética y Microbiología
>>>>>> Universidad Pública de Navarra
>>>>>> 31006 Pamplona
>>>>>> Navarra
>>>>>> SPAIN
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dr. José Luis Lavín Trueba
>>>>>
>>>>> Dpto. de Producción Agraria
>>>>> Grupo de Genética y Microbiología
>>>>> Universidad Pública de Navarra
>>>>> 31006 Pamplona
>>>>> Navarra
>>>>> SPAIN
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dr. José Luis Lavín Trueba
>>>>>
>>>>> Dpto. de Producción Agraria
>>>>> Grupo de Genética y Microbiología
>>>>> Universidad Pública de Navarra
>>>>> 31006 Pamplona
>>>>> Navarra
>>>>> SPAIN
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Dr. José Luis Lavín Trueba
>>>>
>>>> Dpto. de Producción Agraria
>>>> Grupo de Genética y Microbiología
>>>> Universidad Pública de Navarra
>>>> 31006 Pamplona
>>>> Navarra
>>>> SPAIN
>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>> --
>> Dr. José Luis Lavín Trueba
>>
>> Dpto. de Producción Agraria
>> Grupo de Genética y Microbiología
>> Universidad Pública de Navarra
>> 31006 Pamplona
>> Navarra
>> SPAIN
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
Dr. José Luis Lavín Trueba

Dpto. de Producción Agraria
Grupo de Genética y Microbiología
Universidad Pública de Navarra
31006 Pamplona
Navarra
SPAIN