[Cubicweb] work on dataimport stores
aurélien campéas
aurelien.campeas at gmail.com
Thu Feb 26 13:03:49 CET 2015
A patch to fix hooks handling for fastimport.
Also trying to run the big test ends with:
LOG: unexpected EOF on client connection with an open transaction
Processus arrêté
2015-02-26 12:38 GMT+01:00 aurélien campéas <aurelien.campeas at gmail.com>:
>
>
> 2015-02-26 9:38 GMT+01:00 Sylvain Thénault <sylvain.thenault at logilab.fr>:
>
>> On 26 février 09:10, aurélien campéas wrote:
>> > 2015-02-25 23:30 GMT+01:00 Sylvain Thénault <
>> sylvain.thenault at logilab.fr>:
>> > > This is probably related to the fact that I've updated the overall
>> > > benchmark
>> > > code to use the latest skos cube, there was a severe issue before
>> that:
>> > > hooks
>> > > we actually not activated!
>> > >
>> >
>> > Yes indeed. Wasn't it by design ? That's what I thought anyway.
>>
>> no that was because of a misunderstanding. If we want to compare the store
>> performance vs pro and cons of each one, we should imo run the test with
>> hooks
>> activated.
>>
>
> If we want to compare comparable things, we should have at least a run
> with hooks explicitly disabled. That's what I do in the first joint diff.
>
> Note that fastimport may appear faster than it should because it is missing
> symmetric relation handling (and I havent quantified its impact, but it
> should
> not be huge).
>
> With hooks activated (followup diff) the run time explodes. It shows
> how ruthlessly inefficient the cubicweb hook system is currently... but
> nothing that we didn't knew before hand.
>
> Note that fastimport precisely gives tools to fine-tune the runnable hooks
> (decide which to skip or which to defer in later transactions) to work
> around
> this general issue.
>
>
>>
>> > > > Note that to make the test pass I had to de-inline the pref_label
>> > > relation.
>> > > >
>> > > > I'm seriously considering adding inlined relation support to
>> > > > .insert_relations.
>> > >
>> > > If not, this deserves a note in the results.txt file. Notice that iirc
>> > > other stores
>> > > rely on inlined relations being set through insert_entity or similar.
>> >
>> > Maybe but that relation is reported missing in the check after the
>> > fastimport test, even though it now uses the standard skos import code.
>>
>> Though this used to work on my first implementation using the fastimport
>> cube.
>> It's unclear to me what's changed that may cause that.
>>
>
> Fixed now.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cubicweb.org/pipermail/cubicweb/attachments/20150226/03ebf5bc/attachment-0186.html>
-------------- next part --------------
diff --git a/benchmark.py b/benchmark.py
--- a/benchmark.py
+++ b/benchmark.py
@@ -65,8 +65,8 @@ class PostgresImportTC(CubicWebTC):
self.assertEqual(label.label, u'communications')
self.failIf(cnx.execute('Any L WHERE NOT EXISTS(L pref_label_of X) AND NOT EXISTS(L alt_label_of Y) AND NOT EXISTS(L hidden_label_of Z)'))
- #data_file, check_imported = 'eurovoc_skos.rdf', lambda *args: None
- data_file, check_imported = 'siaf_matieres.xml', check_siaf_shortened
+ data_file, check_imported = 'eurovoc_skos.rdf', lambda *args: None
+ #data_file, check_imported = 'siaf_matieres.xml', check_siaf_shortened
@timed
def _test_nohook(self):
@@ -75,7 +75,7 @@ class PostgresImportTC(CubicWebTC):
self.check_imported()
@timed
- def test_sqlgen(self):
+ def _test_sqlgen(self):
with self.admin_access.repo_cnx() as cnx:
_skos_import(cnx, self.data_file, 'sqlgenos')
self.check_imported()
@@ -87,7 +87,7 @@ class PostgresImportTC(CubicWebTC):
self.check_imported()
@timed
- def test_massive(self):
+ def test_amassive(self):
with self.admin_access.repo_cnx() as cnx:
_skos_import(cnx, self.data_file, 'massive')
self.check_imported()
@@ -175,7 +175,9 @@ def _skos_import(cnx, fpath, impl):
store.flush()
assert not errors
cnx.commit()
+ print 'run deferred hooks'
store.fc.run_deferred_hooks(errors)
+ cnx.commit()
run_all_tasks(cnx)
cnx.commit()
elif impl == 'massive':
@@ -212,6 +214,7 @@ class FastExtEntitiesImporter(ExtEntitie
def _import_entities(self, *args):
values = super(FastExtEntitiesImporter, self)._import_entities(*args)
+ print 'flush'
self.store.flush()
return values
diff --git a/results.txt b/results.txt
--- a/results.txt
+++ b/results.txt
@@ -38,7 +38,7 @@ Store time.clock time.time
========== ========== =========
massive 0.82 1.03
fi nohooks 1.74 2.85
-fi hooks 96.59 102.50
+fi hooks 34.35 37.60
sqlgen 2.33 2.95
nohook 4.06 6.78
========== ========== =========
More information about the Cubicweb
mailing list