[BioRuby] How to get organism name from a fasta file?

Francesco Strozzi francesco.strozzi at gmail.com
Mon Oct 15 12:30:21 UTC 2007


Hi,
I'm new here, but I think that this can be done using regular expressions,
applied to the output of the "definition()" method. This should work with
your example:

re = /\S+\s(\S+)\s.*/
Bio::FlatFile.open(Bio::FastaFormat,ARGF) do |file|
        file.each do |f|
                m = re.match(f.definition)
                puts m[1]
        end
end

This will display only the part you are interested in from the comment of
your FASTA sequences. I take a look at the RDOC of BioRuby and I didn't find
any method to grab this information directly. I've seen the method
"identifiers()", used to grab informations from NCBI like IDs, but I think
this is not your case. Basically, here you have to parse the definition of
the FASTA sequence and extract all the informations between the first and
the second space characters. If there is a more simple and elegant way,
please let me know!

Cheers
Francesco

2007/10/15, Kristen <revalia at gmail.com>:
>
> Hello,
>
> I have many entries that look familiar to this in one big fasta file:
>
> >Gene:IGI00206306|PYRAB16740 Proteome:37|P_abyssi_Orsay
> ProteinIDs:CAB50578
> Product:Q9UY34|N-terminal acetyltransferase
> atggaagacatcctcgaaaacaaaggcgaagtcaagaagaaaattccgatttccttgata
> actataaggagtgcaaaactgtttgatattccctatattatgaggatagagcaggcatcg
>
>
> I would like to retrieve the part that says "Proteome:37|P_abyssi_Orsay",
> but not sure how to do this.   The tutorial shows how to loop through all
> the entries in the fasta file, but this doesnt help me.
> Is there an easy way to retrieve this information from a fasta object?
> Or is there a way to output the definition info of the first fasta entry?
> Maybe something like:
> ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF)
> puts ff[1].definiton
>
> Thanks in advance,
> Kristen
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>



More information about the BioRuby mailing list