[Biojava-dev] org.biojava3.genome.parsers.gff.GFF3Reader.java

Ryan Golhar ngsbioinformatics at gmail.com
Thu May 31 02:49:35 UTC 2012


I'm reading in refGene from hg19 from UCSC.  I timed both versions using
System.currentTimeMillis().  The old version ran in about 5,950
milliseconds.  The revised version ran in about 6,385 milliseconds.  To me,
that seems negligible.

The bigger problem I see with this is that the entire file is loaded into
memory all at once, instead of reading on demand, or indexing the GTF file.


In either case, I've attached my patch file.

Ryan


On Wed, May 30, 2012 at 4:42 PM, Scooter Willis <HWillis at scripps.edu> wrote:

> Ryan
>
> You can send it to me. Can you see if there is a performance difference
> for parsing a reasonable size GFF3 file?
>
> Thanks
>
> Scooter
>
>
>
> On 5/30/12 4:38 PM, "Ryan Golhar" <ngsbioinformatics at gmail.com> wrote:
>
> >Hi all - I'm using the GFF3 parser and noticed in the function, parseLine,
> >there is a note to update the code to use RegEx split.  Well, I've made
> >the
> >change and all looks good.  Who can I send this patch to?
> >_______________________________________________
> >biojava-dev mailing list
> >biojava-dev at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GFF3Reader.patch
Type: application/octet-stream
Size: 3301 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20120530/6939905e/attachment-0002.obj>


More information about the biojava-dev mailing list