[Cubicweb] massive deletion performance
alexandre.fayolle at logilab.fr
Mon Jan 17 18:55:07 CET 2011
This mail is mainly a reminder for myself tomorrow when I resume working on
this, but I'll be happy if anyone cares to share his insight on the topic.
I believe I'm not the first one to be hit by the dreadful performance when
massive deletion are performed in CW. The current example I have at hand
involves deleting ~2000 entities, which will through composite relations cause
the cascade deletion of ~45000 entities. The first DELETE statement takes
~300s on my computer and the commit (during which the 45k related entities are
deleted) takes ages.
I would really like to avoid processing these 45k entities one by one, because
I have the feeling that the situation would improve, so I've started working
on a patch.
The first step is changing hooks.integrity._DelayedDeleteOp implementation to
give it a chance of processing the entities by chunks of reasonnable size.
Then I hit the implementation of ssplanner.DeleteEntityStep which calls
glob_delete_entity once for each entity. This is easy to change, but things
get a bit trickier in glob_delete_entity, and here comes the first question:
Q1: will I break anything if I call delete all the entities I want to delete
before calling the after_delete_entity hooks for all the entities?
Assuming the answer is No, I will need to change source/native.py to allow
delete_entity (or a variant of that method) to delete more than one entity.
Changing the SQL is easy, but I'm a bit lost by the with self._storage_handler
Q2: is the _storage_handler context manager only useful for the
undoable_action support? If no, what does this piece of code do in the context
Alexandre Fayolle LOGILAB, Paris (France)
Formations Python, CubicWeb, Debian : http://www.logilab.fr/formations
Développement logiciel sur mesure : http://www.logilab.fr/services
Informatique scientifique: http://www.logilab.fr/science
More information about the Cubicweb