[BioRuby] Fasta and Bio::Sequence

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Wed May 21 11:00:24 UTC 2008


Hi Ra,

On Tue, 20 May 2008 16:37:22 +0200
Raoul Jean Pierre Bonnal <raoul.bonnal at itb.cnr.it> wrote:

> Dear list,
> converting a "CUSTOM" fasta entry to Bio::Sequence with to_seq a lot of
> things are missing: primary_accession, sequence_version and entry_id,
> probably setting accession and entry to entry.entry_id and
> sequence_version to Zero by default would be a good choice. 
> What do you think ?

To distinguish whether the values are comming from the data,
being set by user or given as default values, I prefer not
to set default values if possible.

For entry_id, I think it is given by the fasta definition line.

For "accession", because accession is mainly used in GenBank/EMBL/DDBJ
data and their variants, and no accession may be given for other data,
I think nil is the best by default. It is good to parse the
"gi|gi-number|gb|accession|locus" in fasta defline and to set accession
from the parse result. (the parser is already available in BioRuby.)

In case of sequence_version, it might not be a number in some cases.
(for example, "hg18" and "mm9" for human and mouse genomes.)
So, I think nil is also good by default.

-- 
Naohisa Goto
ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp



More information about the BioRuby mailing list