[Bioperl-l] ASN.1 and BioPerl ?

Peter.Robinson at t-online.de Peter.Robinson at t-online.de
Sat Feb 12 16:37:56 EST 2005


On Sat, Feb 12, 2005 at 01:20:30PM -0800, Hilmar Lapp wrote:
> The ASN.1	 parser would be very useful, in particular for implementing 
> the NCBI Gene parser I suppose.
> 
> I do suggest though that you publish this as a separate module on CPAN, 
> as supposedly it is (or meant to be?) generically useful, so I 
> completely agree with Chris on this.


I also agree that it would be better to have the module on CPAN; if you 
been inspired to use the module to incorporate Entrez Gene into BioPerl I 
would be happy to help out as I can. My initial experiences with this suggest it will not be easy.


> 
> I need an NCBI Gene parser implemented in the Bio::SeqIO framework 
> returning compatible Bio::SeqI objects within the next few weeks. The 
> speed needs to be at least several records per second, ideally 10/s or 
> higher.
> 
> My understanding is that Peter has a grammar-based parser in Java 
> (speed I don't know), and Steve has a Parse::RecDescent-based parser in 
> perl (not bioperl) which is (expectedly) slow.
> 
> I've seen Graham Barr's module on CPAN but haven't tried it yet; it 
> seemed to me that you need the ASN model definition to start with, 
> which I haven't seen at any obvious or not-so-obvious place on the NCBI 
> ftp site, so I either missed something or you have to download the 
> entire toolkit or something else.


You might want to take a look at this

http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/objects/entrezgene/entrezgene.asn

note that there appear to be some inconsistencies between some Entrez Gene records and this specification (or I have misunderstood something).

After having played around with perl, bioperl, lec/yacc and more recently antlr, I have the impression that this is a doable task using antlr and a modest amount of Java code. (Doable meaning it is possible to extract the information one wants from a species-specific ASN.1 Gene file). Given my schedule I don't know when I will be able to finish this, but I will send the list a mail presuming there is no bioperl tool to do this by then.

-Peter


More information about the Bioperl-l mailing list