[Cubicweb] Persistent sessions: a status report

Aurélien Campéas aurelien.campeas at logilab.fr
Wed May 14 13:22:14 CEST 2014


Dear CubicWebians,


In preparation of the next Logilab-internal "copil" meeting, I've
prepared a small expose of the current state of things.

The current situation
---------------------

* sessions are not "persistent", ie they are in-memory objects and may
  reside in two python dictionnaries, in the web-side and in the
  repo-side (the web-side because of the old "web fence" thing)

* because of that, multi-instance deployments must peruse a frontend
  that support "sticky sessions" configurations

* also multi-process wsgi deployments are not really possible (unless
  again some "sticky session" trick is done)

* when doing maintenance stops of instances, the sessions are closed
  and users must re-log themselves after the restart.

I won't expand on how long we have wanted to get past this, and have
CubicWeb single instances use multiple processes and be scalable
out-of-the-box (with respect to CPU usage).

The work made in CubicWeb 3.19 (separation of session-and-connection)
is a solid ground to build a solution upon.



What is going on
----------------

* there exist today two patch sets addressing the "persistent session"
  issue

* to be feature-complete, we probably want a third one (more on that
  below)

The first patch set replaces the in-memory sessions with an
in-database CWSession object. The details lay in the patch set. (There
also comes a slight simplification of the web auth stack architecture)

The second patch set introduces a cache API and cache backends that
can be used to reduce the (slight) performance impact of the i/o
associated with session serialisation/deserialisation. It is mostly a
POC but should be adaptable to the top of the first patch set without
pain.



What is to be done
------------------

* baseline distributed task handling in the core

The missing feature will be the handling of "tasks", currently
provided on a single-instance basis, and with instances configuration
variables.

For instance (pun assumed), session expiration should be handled by
such a task, but we don't want all instances to compete anarchically
for this.

Another one: the `vcsfile` cube should be able to schedule a periodic
mercurial-repositories-synchronisation task, with multiple instances,
without configuration option hacks (which are costly not only in terms
of code complexity but also in sysadmin time), without chaos
happening.

What is proposed is the integration of most of the `worker` cube, plus
some missing bits, within the core. This will provide:

* a database-enforced serialized synchronisation device
* a distributed task API

The missing bits are related to how a process "worker" will be elected
to perform a task (it could be by explicit configuration, for complex
deployments, but a zero-configuration lightweight default should be
provided).

This task implementation will be suitable for default deployments with
low-frequency task requirements. Using `celery` or the du jour task
handling lib, or direct zmq interprocess communications is left for
later. I'd like we start with a very simple design that is known to
work today.

I'm waiting for your inputs on all of this.

Regards,
Aurélien.






More information about the Cubicweb mailing list