[Bioperl-l] need help ??parse AcNum from fasta?

Tue Oct 2 10:50:32 UTC 2007

thx for the help, but i got a empty output file,
i think its problem with matching the acc number, my fasta file look like:

*>IPI:IPI00453473.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein
DDHHHU...
 >IPI:IPI00177321.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein
DDHHHU..
 >IPI:IPI00027547.1|REFSEQ_XP:XP_168060 Tax_Id=9606 similar to NOD3 protein
MMMMM..*

and my i Accnum File look like:
*IPI00177321
IPI00453473

*i hopt it helps to understand.*
*.

Nathan S. Haigh wrote:

>outaleb Issame wrote:
>  
>
>>hi,
>>with this file i mean, i picked out this Accession Number from
>>IPI-Human Dbase,they come from a fasta file,
>>so they re under eachother like a i a table in separate file now.
>>what i want is how how can i check it in the fasta File (so in the
>>IPI-Human FAsta File), i they re really there;
>>if yes please copy the entire entry of this Number (>....the sequence
>>also)in new fasta file.so that i get at the end a new
>>FASTA file with jus this IPI Accession Number.
>>thx and hope was clearly.
>>    
>>
>
>Ok, first of all, I'd read the contents of your Accession numbers into a
>hash, something like the following (this could be written in a shorter
>form, but since you're a newbie I'll leave it in a longer form so you
>can follow easier).
>
>-- start script --
>use strict;
>use Bio::SeqIO;
>
># change the following three lines to point to the relevant paths
># of your list of accessions file, your fasta file and your output
># fasta file
>my $acc_file = "/path/to/your/file";
>my $fasta_file_in = "/path/to/your/fasta/file";
>my $fasta_file_out = "/path/to/your/fasta/output/file";
>
># Use a hash to keep a record of accessions we want to find
>my %hash_of_req_acc;
>
># read all the required accessions from the file into the hash as keys
>open (ACC_FILE, $acc_file) or die "Couldn't open file: $!\n";
>while (<ACC_FILE>) {
>  my $line = $_;
>  chomp $line;
>  $hash_of_req_acc{$_} = 1;
>}
>close ACC_FILE;
>
>my $seqio_object_in = Bio::SeqIO->new(
>  -file => $fasta_file_in,
>  -format => 'fasta'
>);
>my $seqio_object_out = Bio::SeqIO->new(
>  -file => $fasta_file_out,
>  -format => 'fasta'
>);
>
># loop through all the sequences in the fasta file
>while (my $seq_object = $seqio_object_in->next_seq) {
>  # get the sequence accession for easy matching
>  my $seq_acc = $seq_object->accession_number;
>
>  # write the sequence object to the output fasta file if we have a
>matching accession
>  $seqio_object_out->write_seq($seq_object) if exists
>$hash_of_req_acc{$seq_acc};
>}
>-- end script --
>
>I haven't tested this, but it should at least get you started. Also, the
>fasta description line in the output file may not be exactly as it was
>in the input fasta file - if this really matters, you may need to get
>back to us. Also, if the input fasta file is huge (many thousands of
>sequences) it may be wise to create an index of the fasta file in order
>to speed up retrieval.
>
>You may find this page helpful:
>http://www.bioperl.org/wiki/HOWTO:SeqIO
>
>Anyway, hope this helps to get you started.
>Nath
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>