[Cubicweb] work on dataimport stores

aurélien campéas aurelien.campeas at gmail.com
Thu Feb 26 13:03:49 CET 2015


A patch to fix hooks handling for fastimport.

Also trying to run the big test ends with:

LOG:  unexpected EOF on client connection with an open transaction
Processus arrêté


2015-02-26 12:38 GMT+01:00 aurélien campéas <aurelien.campeas at gmail.com>:

>
>
> 2015-02-26 9:38 GMT+01:00 Sylvain Thénault <sylvain.thenault at logilab.fr>:
>
>> On 26 février 09:10, aurélien campéas wrote:
>> > 2015-02-25 23:30 GMT+01:00 Sylvain Thénault <
>> sylvain.thenault at logilab.fr>:
>> > > This is probably related to the fact that I've updated the overall
>> > > benchmark
>> > > code to use the latest skos cube, there was a severe issue before
>> that:
>> > > hooks
>> > > we actually not activated!
>> > >
>> >
>> > Yes indeed. Wasn't it by design ? That's what I thought anyway.
>>
>> no that was because of a misunderstanding. If we want to compare the store
>> performance vs pro and cons of each one, we should imo run the test with
>> hooks
>> activated.
>>
>
> If we want to compare comparable things, we should have at least a run
> with hooks explicitly disabled. That's what I do in the first joint diff.
>
> Note that fastimport may appear faster than it should because it is missing
> symmetric relation handling (and I havent quantified its impact, but it
> should
> not be huge).
>
> With hooks activated (followup diff) the run time explodes. It shows
> how ruthlessly inefficient the cubicweb hook system is currently... but
> nothing that we didn't knew before hand.
>
> Note that fastimport precisely gives tools to fine-tune the runnable hooks
> (decide which to skip or which to defer in later transactions) to work
> around
> this general issue.
>
>
>>
>> > > > Note that to make the test pass I had to de-inline the pref_label
>> > > relation.
>> > > >
>> > > > I'm seriously considering adding inlined relation support to
>> > > > .insert_relations.
>> > >
>> > > If not, this deserves a note in the results.txt file. Notice that iirc
>> > > other stores
>> > > rely on inlined relations being set through insert_entity or similar.
>> >
>> > Maybe but that relation is reported missing in the check after the
>> > fastimport test, even though it now uses the standard skos import code.
>>
>> Though this used to work on my first implementation using the fastimport
>> cube.
>> It's unclear to me what's changed that may cause that.
>>
>
> Fixed now.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cubicweb.org/pipermail/cubicweb/attachments/20150226/03ebf5bc/attachment-0186.html>
-------------- next part --------------
diff --git a/benchmark.py b/benchmark.py
--- a/benchmark.py
+++ b/benchmark.py
@@ -65,8 +65,8 @@ class PostgresImportTC(CubicWebTC):
             self.assertEqual(label.label, u'communications')
             self.failIf(cnx.execute('Any L WHERE NOT EXISTS(L pref_label_of X) AND NOT EXISTS(L alt_label_of Y) AND NOT EXISTS(L hidden_label_of Z)'))
 
-    #data_file, check_imported = 'eurovoc_skos.rdf', lambda *args: None
-    data_file, check_imported = 'siaf_matieres.xml', check_siaf_shortened
+    data_file, check_imported = 'eurovoc_skos.rdf', lambda *args: None
+    #data_file, check_imported = 'siaf_matieres.xml', check_siaf_shortened
 
     @timed
     def _test_nohook(self):
@@ -75,7 +75,7 @@ class PostgresImportTC(CubicWebTC):
         self.check_imported()
 
     @timed
-    def test_sqlgen(self):
+    def _test_sqlgen(self):
         with self.admin_access.repo_cnx() as cnx:
             _skos_import(cnx, self.data_file, 'sqlgenos')
         self.check_imported()
@@ -87,7 +87,7 @@ class PostgresImportTC(CubicWebTC):
         self.check_imported()
 
     @timed
-    def test_massive(self):
+    def test_amassive(self):
         with self.admin_access.repo_cnx() as cnx:
             _skos_import(cnx, self.data_file, 'massive')
         self.check_imported()
@@ -175,7 +175,9 @@ def _skos_import(cnx, fpath, impl):
             store.flush()
             assert not errors
             cnx.commit()
+            print 'run deferred hooks'
             store.fc.run_deferred_hooks(errors)
+            cnx.commit()
             run_all_tasks(cnx)
             cnx.commit()
         elif impl == 'massive':
@@ -212,6 +214,7 @@ class FastExtEntitiesImporter(ExtEntitie
 
     def _import_entities(self, *args):
         values = super(FastExtEntitiesImporter, self)._import_entities(*args)
+        print 'flush'
         self.store.flush()
         return values
 
diff --git a/results.txt b/results.txt
--- a/results.txt
+++ b/results.txt
@@ -38,7 +38,7 @@ Store      time.clock time.time
 ========== ========== =========
 massive          0.82      1.03
 fi nohooks       1.74      2.85
-fi hooks        96.59    102.50
+fi hooks         34.35    37.60
 sqlgen           2.33      2.95
 nohook           4.06      6.78
 ========== ========== =========


More information about the Cubicweb mailing list