[Cubicweb] work on dataimport stores
aurélien campéas
aurelien.campeas at gmail.com
Thu Feb 26 16:01:57 CET 2015
I just added something to stop the big leak in fastimport.
I don't thinks it's not a general protection against such problems in the
absolute though,
but for cnx.user we just skip the cache updating part.
Given that, and disabling a bunch of extremely costly hooks, it gets
interesting:
Auc:
========== ========== =========
Store time.clock time.time
========== ========== =========
massive 24.64 34.86
fi nohooks 97.05 183.74
sqlgen 161.51 204.00
fi hooks 177.54 292.10
nohook Nan Nan
========== ========== =========
That'll be all for today ....
Regards,
Aurélien.
2015-02-26 13:03 GMT+01:00 aurélien campéas <aurelien.campeas at gmail.com>:
> A patch to fix hooks handling for fastimport.
>
> Also trying to run the big test ends with:
>
> LOG: unexpected EOF on client connection with an open transaction
> Processus arrêté
>
>
> 2015-02-26 12:38 GMT+01:00 aurélien campéas <aurelien.campeas at gmail.com>:
>
>>
>>
>> 2015-02-26 9:38 GMT+01:00 Sylvain Thénault <sylvain.thenault at logilab.fr>:
>>
>>> On 26 février 09:10, aurélien campéas wrote:
>>> > 2015-02-25 23:30 GMT+01:00 Sylvain Thénault <
>>> sylvain.thenault at logilab.fr>:
>>> > > This is probably related to the fact that I've updated the overall
>>> > > benchmark
>>> > > code to use the latest skos cube, there was a severe issue before
>>> that:
>>> > > hooks
>>> > > we actually not activated!
>>> > >
>>> >
>>> > Yes indeed. Wasn't it by design ? That's what I thought anyway.
>>>
>>> no that was because of a misunderstanding. If we want to compare the
>>> store
>>> performance vs pro and cons of each one, we should imo run the test with
>>> hooks
>>> activated.
>>>
>>
>> If we want to compare comparable things, we should have at least a run
>> with hooks explicitly disabled. That's what I do in the first joint diff.
>>
>> Note that fastimport may appear faster than it should because it is
>> missing
>> symmetric relation handling (and I havent quantified its impact, but it
>> should
>> not be huge).
>>
>> With hooks activated (followup diff) the run time explodes. It shows
>> how ruthlessly inefficient the cubicweb hook system is currently... but
>> nothing that we didn't knew before hand.
>>
>> Note that fastimport precisely gives tools to fine-tune the runnable hooks
>> (decide which to skip or which to defer in later transactions) to work
>> around
>> this general issue.
>>
>>
>>>
>>> > > > Note that to make the test pass I had to de-inline the pref_label
>>> > > relation.
>>> > > >
>>> > > > I'm seriously considering adding inlined relation support to
>>> > > > .insert_relations.
>>> > >
>>> > > If not, this deserves a note in the results.txt file. Notice that
>>> iirc
>>> > > other stores
>>> > > rely on inlined relations being set through insert_entity or similar.
>>> >
>>> > Maybe but that relation is reported missing in the check after the
>>> > fastimport test, even though it now uses the standard skos import code.
>>>
>>> Though this used to work on my first implementation using the fastimport
>>> cube.
>>> It's unclear to me what's changed that may cause that.
>>>
>>
>> Fixed now.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cubicweb.org/pipermail/cubicweb/attachments/20150226/c6a69d32/attachment-0186.html>
-------------- next part --------------
# HG changeset patch
# User Aurelien Campeas <aurelien.campeas at logilab.fr>
# Date 1424961920 -3600
# Thu Feb 26 15:45:20 2015 +0100
# Node ID abf4d559e41147be6104001c689c51df86c95141
# Parent d5a5989767f12e53efbcaa24a016eda773285e88
fastimports + hooks for eurovoc
diff --git a/benchmark.py b/benchmark.py
--- a/benchmark.py
+++ b/benchmark.py
@@ -69,13 +69,13 @@ class PostgresImportTC(CubicWebTC):
#data_file, check_imported = 'siaf_matieres.xml', check_siaf_shortened
@timed
- def _test_nohook(self):
+ def test_nohook(self):
with self.admin_access.repo_cnx() as cnx:
_skos_import(cnx, self.data_file, 'nohookos')
self.check_imported()
@timed
- def _test_sqlgen(self):
+ def test_sqlgen(self):
with self.admin_access.repo_cnx() as cnx:
_skos_import(cnx, self.data_file, 'sqlgenos')
self.check_imported()
@@ -87,7 +87,7 @@ class PostgresImportTC(CubicWebTC):
self.check_imported()
@timed
- def test_amassive(self):
+ def test_massive(self):
with self.admin_access.repo_cnx() as cnx:
_skos_import(cnx, self.data_file, 'massive')
self.check_imported()
@@ -166,27 +166,34 @@ def _skos_import(cnx, fpath, impl):
if extentity.extid not in extid2eid:
extentity.values['cwuri'] = set([unicode(extentity.extid)])
return extentity
- stats = importer.import_entities(imap(set_cwuri_if_needed, entities))
- if impl in 'sqlgenos':
- store.flush()
- elif impl in 'fastimport':
- from cubes.worker.testutils import run_all_tasks
- errors = []
- store.flush()
- assert not errors
- cnx.commit()
- print 'run deferred hooks'
- store.fc.run_deferred_hooks(errors)
- cnx.commit()
- run_all_tasks(cnx)
- cnx.commit()
- elif impl == 'massive':
- store.flush_meta_data()
- for rdef in iter_rdef(cnx.vreg.schema):
- store.convert_relations(*rdef)
- store.commit()
- store.cleanup()
- cnx.commit()
+ from cubicweb.server import debugged
+ with debugged('DBG_HOOKS'):
+ with cnx.allow_all_hooks_but('security', 'notification', 'metadata'):
+ stats = importer.import_entities(imap(set_cwuri_if_needed, entities))
+ if impl in 'sqlgenos':
+ store.flush()
+ elif impl in 'fastimport':
+ from cubes.worker.testutils import run_all_tasks
+ errors = []
+ print 'flush'
+ store.flush()
+ assert not errors
+ cnx.commit()
+ print 'run deferred hooks'
+ store.fc.run_deferred_hooks(errors)
+ print 'commit'
+ cnx.commit()
+ print 'run tasks'
+ run_all_tasks(cnx)
+ print 'commit'
+ cnx.commit()
+ elif impl == 'massive':
+ store.flush_meta_data()
+ for rdef in iter_rdef(cnx.vreg.schema):
+ store.convert_relations(*rdef)
+ store.commit()
+ store.cleanup()
+ cnx.commit()
return stats
diff --git a/results.txt b/results.txt
--- a/results.txt
+++ b/results.txt
@@ -59,16 +59,19 @@ sqlgen 197.90 306.66
nohook 369.23 4947.68
========== ========== =========
-Auc: (all without hooks)
+Auc:
========== ========== =========
Store time.clock time.time
========== ========== =========
massive 24.64 34.86
-fastimport 97.05 183.74
+fi nohooks 97.05 183.74
sqlgen 161.51 204.00
+fi hooks* 177.54 292.10
nohook Nan Nan
========== ========== =========
+* the repo constraints for skos/schema/pref_label & friends were disabled
+
API
===
More information about the Cubicweb
mailing list