[Cubicweb] lgc.cache in cubicweb

Aurélien Campéas aurelien.campeas at logilab.fr
Wed Sep 7 14:21:10 CEST 2011


Hello cubicwebistas,

While tracking potential optimisation topics in the cw code base, I 
crossed logilab.common.cache usages in the querier and native source.

Reading the lgc.cache code it occurred to me that it may not be optimal 
(for the cw usage at least).

So I've written a replacement that tries to be less ruthlessly 
inefficient and at the same time attempts to not leak memory.

Here's the output of a small benchmark I just made up using pre-recorded 
queries from one big unit test (one that generates a lot of requests). 
The cache sizes are up-varying.


********** cachesize = 50 **********
scache (time, hits, misses) 33.54 313 31292
fcache (time, hits, misses) 9.79 4744 26861 (actual internal size = 400)
dict   (time, hits, misses) 1.00 29730 1875
********** cachesize = 150 **********
scache (time, hits, misses) 56.28 2478 29127
fcache (time, hits, misses) 10.85 3141 28464 (actual internal size = 300)
dict   (time, hits, misses) 1.00 29730 1875
********** cachesize = 300 **********
scache (time, hits, misses) 96.77 2758 28847
fcache (time, hits, misses) 9.98 3141 28464 (actual internal size = 300)
dict   (time, hits, misses) 1.00 29730 1875
********** cachesize = 600 **********
scache (time, hits, misses) 152.75 5020 26585
fcache (time, hits, misses) 9.08 7958 23647 (actual internal size = 600)
dict   (time, hits, misses) 1.00 29730 1875
********** cachesize = 1200 **********
scache (time, hits, misses) 351.62 5304 26301
fcache (time, hits, misses) 9.38 11393 20212 (actual internal size = 1200)
dict   (time, hits, misses) 1.00 29730 1875
********** cachesize = 2400 **********
scache (time, hits, misses) 137.16 29730 1875
fcache (time, hits, misses) 6.69 29730 1875 (actual internal size = 2400)
dict   (time, hits, misses) 1.00 29730 1875


scache: the original lgc.cache
fcache: a possible replacement
dict: a plain dict

The time column reports the time relative to the plain dict run time.

Some notes:

* lgc.cache is quite slow, and its performance decreases as the cache 
size grows (this is because it uses an internal python list as an lru, 
and this list is in practice very frequently traversed (almost) completely)

* the proposed replacement tries to produce more hits, at the cost of 
more memory consumption (it actually is specifically tailored for cw, 
where I expect 2 cases to happen:
   * a potentially huge set of (pathological) one-shot queries like 
'Any X WHERE X eid 42'
   * a potentially big, but bounded set of cacheable queries)

* the replacement allows itself to grow over its initial size, I guess 
there should be some absolute maximum size internal limit (safety belt) 
with warning spitting if reached
* the default 300 size from cw default configuration (all-in-one) 
template is too low (3000 may be a much more reasonnable value)

Regards,
Aurélien.





More information about the Cubicweb mailing list