[Biojava-l] Different implementation of Sequence?

Simon Foote simon.foote at nrc-cnrc.gc.ca
Mon Jun 9 09:01:46 EDT 2003


Hi George,

Here's the result of tests with my database searcher webapp against 
several genomes individually.  Unfortunately, it's not a pure CDS search 
as I recently modified the app to also return the DNA sequence for the 
feature, so that adds additional time depending upon the size of the genome.

The search gene is dnaA, the search filter consists of:

            FeatureFilter ff1 = new FeatureFilter.ByType("CDS");
            FeatureFilter ff2 = new 
FeatureFilter.AnnotationContains("gene", geneId);
            FeatureFilter ff3 = new 
FeatureFilter.AnnotationContains("ibs_id", geneId);
            FeatureFilter ff5 = new 
FeatureFilter.AnnotationContains("gene_id", geneId);
            FeatureFilter ff4 = new FeatureFilter.Or(ff2, ff3);
            FeatureFilter ff6 = new FeatureFilter.Or(ff4, ff5);
            FeatureFilter ff7 = new FeatureFilter.And(ff1, ff6);
            FeatureHolder fh = seq.filter(ff7, false);

Bacteria                Search Time
E.coli K12            6 seconds
H. influenzae        4 seconds
C. jejuni                6 seconds
H. pylori J99        6 seconds

Note:  Which version of the biosql schema are you using and which 
version of biojava

Regards,
Simon

Y D Sun wrote:

>  
>
>>-----Original Message-----
>>From: Simon Foote [mailto:simon.foote at nrc-cnrc.gc.ca] 
>>Sent: 05 June 2003 12:59
>>To: Y D Sun
>>Cc: biojava-l at biojava.org
>>Subject: Re: [Biojava-l] Different implementation of Sequence?
>>
>>
>>Just to add my 2 cents worth.
>>
>>I'm using the latest version of the BioSQL schema within 
>>MySQL and the 
>>filters are quite fast.  On a database containing 18 complete 
>>bacterial 
>>genomes, fetching a given gene by name which uses a combination of 5 
>>filters in my case, takes approx. 1-2 seconds.
>>
>>    
>>
>
>Simon,
>
>Have you tried to filter all CDS sections of a complete bacterial
>genome? In my experience with PostgreSQL, it takes only a few seconds to
>filter a simple feature. However, it needs more than one minute to
>filter thousands of CDS's in a genome. 
>
>George
>  
>

-- 
Bioinformatics Specialist
Institute for Biological Sciences
National Research Council of Canada
[T] 613-990-0561  [F] 613-952-9092
simon.foote at nrc-cnrc.gc.ca




More information about the Biojava-l mailing list