[BioRuby] EMBL / ENA parser error

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Wed Dec 7 16:32:49 UTC 2011


Hi Michael,

We must first read official documents provided by EMBL-EBI.

EMBL User Manual:
http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html

In "3.4.7 The OS Line", two examples that don't start with
an uppercase letter are shown.

> OS   unidentified bacterium B8
> OS   uncultured proteobacterium

Therefore, the issue should be treated as a bug of BioRuby.
The regexp and/or the logic for OS lines will be changed.

Thank you for reporting the issue,

Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org


On Thu, 01 Dec 2011 16:30:22 +0000
Michael Paulini <mh6 at sanger.ac.uk> wrote:

> Hi fellow biorubysts,
> 
> I tried to parse EMBL/ENA entry DQ471885 with the bioruby EMBL parser,
> and it dies when it tries to parse:
> OS   uncultured nematode
> 
> due to the regexp in embl/common.rb being:
> ==================================
> if tmp =~ /([A-Z][a-z]* *[\w\d \:\'\+\-]+[\w\d])/
>          org = $1
>          tmp =~ /(\(.+\))/
>          os.push({'name' => $1, 'os' => org})
> else
>          raise "Error: OS Line. #{$!}\n#{fetch('OS')}\n"
> end
> ================================
> as it doesn't start with an uppercase letter.
> 
> Shouud we change the regexp, or file a bug with ENA?
> 
> thanks,
> 
> Michael
> 
> 
> -- 
>  The Wellcome Trust Sanger Institute is operated by Genome Research 
>  Limited, a charity registered in England with number 1021457 and a 
>  company registered in England with number 2742969, whose registered 
>  office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby




More information about the BioRuby mailing list