[Cubicweb] work on dataimport stores

aurélien campéas aurelien.campeas at gmail.com
Thu Feb 26 16:01:57 CET 2015


I just added something to stop the big leak in fastimport.
I don't thinks it's not a general protection against such problems in the
absolute though,
but for cnx.user we just skip the cache updating part.

Given that, and disabling a bunch of extremely costly hooks, it gets
interesting:

Auc:
==========  ========== =========
Store       time.clock time.time
==========  ========== =========
massive          24.64     34.86
fi nohooks       97.05    183.74
sqlgen          161.51    204.00
fi hooks        177.54    292.10
nohook             Nan       Nan
==========  ========== =========


That'll be all for today ....

Regards,
Aurélien.


2015-02-26 13:03 GMT+01:00 aurélien campéas <aurelien.campeas at gmail.com>:

> A patch to fix hooks handling for fastimport.
>
> Also trying to run the big test ends with:
>
> LOG:  unexpected EOF on client connection with an open transaction
> Processus arrêté
>
>
> 2015-02-26 12:38 GMT+01:00 aurélien campéas <aurelien.campeas at gmail.com>:
>
>>
>>
>> 2015-02-26 9:38 GMT+01:00 Sylvain Thénault <sylvain.thenault at logilab.fr>:
>>
>>> On 26 février 09:10, aurélien campéas wrote:
>>> > 2015-02-25 23:30 GMT+01:00 Sylvain Thénault <
>>> sylvain.thenault at logilab.fr>:
>>> > > This is probably related to the fact that I've updated the overall
>>> > > benchmark
>>> > > code to use the latest skos cube, there was a severe issue before
>>> that:
>>> > > hooks
>>> > > we actually not activated!
>>> > >
>>> >
>>> > Yes indeed. Wasn't it by design ? That's what I thought anyway.
>>>
>>> no that was because of a misunderstanding. If we want to compare the
>>> store
>>> performance vs pro and cons of each one, we should imo run the test with
>>> hooks
>>> activated.
>>>
>>
>> If we want to compare comparable things, we should have at least a run
>> with hooks explicitly disabled. That's what I do in the first joint diff.
>>
>> Note that fastimport may appear faster than it should because it is
>> missing
>> symmetric relation handling (and I havent quantified its impact, but it
>> should
>> not be huge).
>>
>> With hooks activated (followup diff) the run time explodes. It shows
>> how ruthlessly inefficient the cubicweb hook system is currently... but
>> nothing that we didn't knew before hand.
>>
>> Note that fastimport precisely gives tools to fine-tune the runnable hooks
>> (decide which to skip or which to defer in later transactions) to work
>> around
>> this general issue.
>>
>>
>>>
>>> > > > Note that to make the test pass I had to de-inline the pref_label
>>> > > relation.
>>> > > >
>>> > > > I'm seriously considering adding inlined relation support to
>>> > > > .insert_relations.
>>> > >
>>> > > If not, this deserves a note in the results.txt file. Notice that
>>> iirc
>>> > > other stores
>>> > > rely on inlined relations being set through insert_entity or similar.
>>> >
>>> > Maybe but that relation is reported missing in the check after the
>>> > fastimport test, even though it now uses the standard skos import code.
>>>
>>> Though this used to work on my first implementation using the fastimport
>>> cube.
>>> It's unclear to me what's changed that may cause that.
>>>
>>
>> Fixed now.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cubicweb.org/pipermail/cubicweb/attachments/20150226/c6a69d32/attachment-0165.html>
-------------- next part --------------
# HG changeset patch
# User Aurelien Campeas <aurelien.campeas at logilab.fr>
# Date 1424961920 -3600
#      Thu Feb 26 15:45:20 2015 +0100
# Node ID abf4d559e41147be6104001c689c51df86c95141
# Parent  d5a5989767f12e53efbcaa24a016eda773285e88
fastimports + hooks for eurovoc

diff --git a/benchmark.py b/benchmark.py
--- a/benchmark.py
+++ b/benchmark.py
@@ -69,13 +69,13 @@ class PostgresImportTC(CubicWebTC):
     #data_file, check_imported = 'siaf_matieres.xml', check_siaf_shortened
 
     @timed
-    def _test_nohook(self):
+    def test_nohook(self):
         with self.admin_access.repo_cnx() as cnx:
             _skos_import(cnx, self.data_file, 'nohookos')
         self.check_imported()
 
     @timed
-    def _test_sqlgen(self):
+    def test_sqlgen(self):
         with self.admin_access.repo_cnx() as cnx:
             _skos_import(cnx, self.data_file, 'sqlgenos')
         self.check_imported()
@@ -87,7 +87,7 @@ class PostgresImportTC(CubicWebTC):
         self.check_imported()
 
     @timed
-    def test_amassive(self):
+    def test_massive(self):
         with self.admin_access.repo_cnx() as cnx:
             _skos_import(cnx, self.data_file, 'massive')
         self.check_imported()
@@ -166,27 +166,34 @@ def _skos_import(cnx, fpath, impl):
             if extentity.extid not in extid2eid:
                 extentity.values['cwuri'] = set([unicode(extentity.extid)])
             return extentity
-        stats = importer.import_entities(imap(set_cwuri_if_needed, entities))
-        if impl in 'sqlgenos':
-            store.flush()
-        elif impl in 'fastimport':
-            from cubes.worker.testutils import run_all_tasks
-            errors = []
-            store.flush()
-            assert not errors
-            cnx.commit()
-            print 'run deferred hooks'
-            store.fc.run_deferred_hooks(errors)
-            cnx.commit()
-            run_all_tasks(cnx)
-            cnx.commit()
-        elif impl == 'massive':
-            store.flush_meta_data()
-            for rdef in iter_rdef(cnx.vreg.schema):
-                store.convert_relations(*rdef)
-            store.commit()
-            store.cleanup()
-        cnx.commit()
+        from cubicweb.server import debugged
+        with debugged('DBG_HOOKS'):
+            with cnx.allow_all_hooks_but('security', 'notification', 'metadata'):
+                stats = importer.import_entities(imap(set_cwuri_if_needed, entities))
+                if impl in 'sqlgenos':
+                    store.flush()
+                elif impl in 'fastimport':
+                    from cubes.worker.testutils import run_all_tasks
+                    errors = []
+                    print 'flush'
+                    store.flush()
+                    assert not errors
+                    cnx.commit()
+                    print 'run deferred hooks'
+                    store.fc.run_deferred_hooks(errors)
+                    print 'commit'
+                    cnx.commit()
+                    print 'run tasks'
+                    run_all_tasks(cnx)
+                    print 'commit'
+                    cnx.commit()
+                elif impl == 'massive':
+                    store.flush_meta_data()
+                    for rdef in iter_rdef(cnx.vreg.schema):
+                        store.convert_relations(*rdef)
+                    store.commit()
+                    store.cleanup()
+                cnx.commit()
     return stats
 
 
diff --git a/results.txt b/results.txt
--- a/results.txt
+++ b/results.txt
@@ -59,16 +59,19 @@ sqlgen          197.90    306.66
 nohook          369.23   4947.68
 ==========  ========== =========
 
-Auc: (all without hooks)
+Auc:
 ==========  ========== =========
 Store       time.clock time.time
 ==========  ========== =========
 massive          24.64     34.86
-fastimport       97.05    183.74
+fi nohooks       97.05    183.74
 sqlgen          161.51    204.00
+fi hooks*       177.54    292.10
 nohook             Nan       Nan
 ==========  ========== =========
 
+* the repo constraints for skos/schema/pref_label & friends were disabled
+
 
 API
 ===


More information about the Cubicweb mailing list