[DAS] 1.6 draft 7

Andy Jenkinson andy.jenkinson at ebi.ac.uk
Mon Oct 4 08:54:38 UTC 2010


On 1 Oct 2010, at 20:57, Mitch Skinner wrote:

> In JBrowse, the "schema" can vary by track; my assumption was that the set of populated attributes in an individual track would be pretty uniform.  Some tracks might not use the "phase" field, for example, but if a given track used phase information, then I figured that almost all of the features in that track would populate that field.

Indeed, and so long as your server/data file can determine in advance that phase is not used at all (as you say), it can omit it entirely. I'd say most of the time that's going to be perfectly possible. And to be honest I'm not wholly convinced that DAS has significantly less uniformity in the fields used within a data set, but it's worth us bearing in mind.

> Also, javascript allows for array entries to be omitted entirely, like:
> 
> [10000, 15000, , "foo"]
> 
> JSON theoretically doesn't allow this; omitted entries become "undefined" in javascript, and the "official" JSON spec disallows "undefined"

Interesting, I must admit my use of JSON is fairly limited so I wasn't aware of this possibility. It doesn't sound "nice", but when you're on the edge trying to squeeze out all the speed you can, it doesn't seem an issue.

> 
> Also, depending on the use case, I wonder if the difference in (de)compression time between indexed and keyed JSON would matter.  If you have your generated data handy still, I'd be curious to know what the difference is.

I don't have the original files, but I do have the script to generate some more similar ones (attached). You could play around with whitespace too if you want to do some optimisation.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test-filesize.pl
Type: text/x-perl-script
Size: 1117 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/das/attachments/20101004/56e48ee3/attachment-0002.bin>
-------------- next part --------------


I'm sure there is a lot of existing study behind this very question. My expectation would be that the overhead of compression/decompression is going to be worth it for anything larger than 100 kb or so (probably even less), such is the huge difference it makes to the size of these files and bandwidth is usually the rate-limiting step. Up to now I have always considered that the chances are, if you're worried about speed and the size of your files, you probably need compression.

Cheers,
Andy


More information about the DAS mailing list