[Bioperl-l] need help ??parse AcNum from fasta?

Smithies, Russell Russell.Smithies at agresearch.co.nz
Tue Oct 2 21:34:20 UTC 2007


I know this is the Bioperl list but how about just doing it with grep?

	grep -P '^>.*XM_001666470[\s^>]*' sequences.fasta



> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of outaleb Issame
> Sent: Wednesday, 3 October 2007 3:51 a.m.
> To: outaleb Issame
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] need help ??parse AcNum from fasta?
> 
> hi again,
> i think i can resolve this problem with the method : id_parser();
> how can i do that?
> any suggestion .or experience??
> ehx again
> 
> 
> 
> outaleb Issame wrote:
> 
> >thx for the help, but i got a empty output file,
> >i think its problem with matching the acc number, my fasta file look
like:
> >
> >*>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3
> protein
> >DDHHHU...
> > >IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3
> protein
> >DDHHHU..
> > >IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3
> protein
> >MMMMM..*
> >
> >and my i Accnum File look like:
> >*IPI00177321
> >IPI00453473
> >
> >*i hopt it helps to understand.*
> >*.
> >
> >
> >Nathan S. Haigh wrote:
> >
> >
> >
> >>outaleb Issame wrote:
> >>
> >>
> >>
> >>
> >>>hi,
> >>>with this file i mean, i picked out this Accession Number from
> >>>IPI-Human Dbase,they come from a fasta file,
> >>>so they re under eachother like a i a table in separate file now.
> >>>what i want is how how can i check it in the fasta File (so in the
> >>>IPI-Human FAsta File), i they re really there;
> >>>if yes please copy the entire entry of this Number (>....the
sequence
> >>>also)in new fasta file.so that i get at the end a new
> >>>FASTA file with jus this IPI Accession Number.
> >>>thx and hope was clearly.
> >>>
> >>>
> >>>
> >>>
> >>Ok, first of all, I'd read the contents of your Accession numbers
into a
> >>hash, something like the following (this could be written in a
shorter
> >>form, but since you're a newbie I'll leave it in a longer form so
you
> >>can follow easier).
> >>
> >>-- start script --
> >>use strict;
> >>use Bio::SeqIO;
> >>
> >># change the following three lines to point to the relevant paths
> >># of your list of accessions file, your fasta file and your output
> >># fasta file
> >>my $acc_file = "/path/to/your/file";
> >>my $fasta_file_in = "/path/to/your/fasta/file";
> >>my $fasta_file_out = "/path/to/your/fasta/output/file";
> >>
> >># Use a hash to keep a record of accessions we want to find
> >>my %hash_of_req_acc;
> >>
> >># read all the required accessions from the file into the hash as
keys
> >>open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n";
> >>while (<ACC_FILE>) {
> >> my $line = $_;
> >> chomp $line;
> >> $hash_of_req_acc{$_} = 1;
> >>}
> >>close ACC_FILE;
> >>
> >>my $seqio_object_in = Bio::SeqIO->new(
> >> -file => $fasta_file_in,
> >> -format => 'fasta'
> >>);
> >>my $seqio_object_out = Bio::SeqIO->new(
> >> -file => $fasta_file_out,
> >> -format => 'fasta'
> >>);
> >>
> >># loop through all the sequences in the fasta file
> >>while (my $seq_object = $seqio_object_in->next_seq) {
> >> # get the sequence accession for easy matching
> >> my $seq_acc = $seq_object->accession_number;
> >>
> >> # write the sequence object to the output fasta file if we have a
> >>matching accession
> >> $seqio_object_out->write_seq($seq_object) if exists
> >>$hash_of_req_acc{$seq_acc};
> >>}
> >>-- end script --
> >>
> >>I haven't tested this, but it should at least get you started. Also,
the
> >>fasta description line in the output file may not be exactly as it
was
> >>in the input fasta file - if this really matters, you may need to
get
> >>back to us. Also, if the input fasta file is huge (many thousands of
> >>sequences) it may be wise to create an index of the fasta file in
order
> >>to speed up retrieval.
> >>
> >>You may find this page helpful:
> >>http://www.bioperl.org/wiki/HOWTO:SeqIO
> >>
> >>Anyway, hope this helps to get you started.
> >>Nath
> >>
> >>
> >>_______________________________________________
> >>Bioperl-l mailing list
> >>Bioperl-l at lists.open-bio.org
> >>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >>
> >>
> >>
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list