[Cubicweb] massive deletion performance

Alexandre Fayolle alexandre.fayolle at logilab.fr
Mon Jan 17 18:55:07 CET 2011


This mail is mainly a reminder for myself tomorrow when I resume working on 
this, but I'll be happy if anyone cares to share his insight on the topic. 

I believe I'm not the first one to be hit by the dreadful performance when 
massive deletion are performed in CW. The current example I have at hand 
involves deleting ~2000 entities, which will through composite relations cause 
the cascade deletion of ~45000 entities. The first DELETE statement takes 
~300s on my computer and the commit (during which the 45k related entities are 
deleted) takes ages. 

I would really like to avoid processing these 45k entities one by one, because 
I have the feeling that the situation would improve, so I've started working 
on a patch. 

The first step is changing hooks.integrity._DelayedDeleteOp implementation to 
give it a chance of processing the entities by chunks of reasonnable size. 

Then I hit the implementation of ssplanner.DeleteEntityStep which calls 
glob_delete_entity once for each entity. This is easy to change, but things 
get a bit trickier in glob_delete_entity, and here comes the first question:

Q1: will I break anything if I call delete all the entities I want to delete 
before calling the after_delete_entity hooks for all the entities?

Assuming the answer is No, I will need to change source/native.py to allow 
delete_entity (or a variant of that method) to delete more than one entity. 
Changing the SQL is easy, but I'm a bit lost by the with self._storage_handler 
bits. 

Q2: is the _storage_handler context manager only useful for the 
undoable_action support? If no, what does this piece of code do in the context 
of delete_entity?


-- 
Alexandre Fayolle                              LOGILAB, Paris (France)
Formations Python, CubicWeb, Debian :  http://www.logilab.fr/formations
Développement logiciel sur mesure :      http://www.logilab.fr/services
Informatique scientifique:               http://www.logilab.fr/science



More information about the Cubicweb mailing list