[Cubicweb] moving MassiveObjectStore from dataio to cubicweb

Julien Cristau julien.cristau at logilab.fr
Wed Nov 4 10:54:17 CET 2015


there's been some work recently towards moving the MassiveObjectStore
from cubicweb-dataio
(http://hg.logilab.org/master/cubes/dataio/file/tip/dataimport.py) to
cubicweb itself, see https://www.cubicweb.org/ticket/5414760

I have a few questions about the API before including that work, for
people who have used dataio so far.  My main issue is the number of
parameters for MassiveObjectStore.__init__:

- is drop_index ever set to False?  (it seems like not dropping
  constraints can't possibly work, at least as of cubicweb 3.21)
- same question about the autoflush_metadata / commit_at_flush options
- is there any point in replace_sep being part of the API?
- can't the pg_schema parameter be removed, by looking at
- I feel like eids_seq_range / eids_seq_start / iid_maxsize, if they're
  needed, could be class attributes rather than __init__ parameters
- uri_param_name and its default 'rdf:about' value seem out of place in
  a generic API, it seems this could easily be handled in the caller, or
  an application-specific subclass or importer?

If all of those can be removed that would leave cnx, optional commit and
rollback callbacks, slave_mode boolean and optional source as
parameters, which feels slightly more reasonable.

My other gripes so far are with:

- "except ProgrammingError: pass" kind of error handling
- silent data mangling in the apply_size_constraints method
- lack of tests and docs for relate_by_iid.  Although generally the
  business with "iid" seems out of place, from what I understand of the
  intended importer vs store split, turning external identifiers into
  cubicweb eids seems like the importer's job?

I suppose those can be fixed up later.  Maybe.

Julien Cristau          <julien.cristau at logilab.fr>
Logilab		        http://www.logilab.fr/
Informatique scientifique & gestion de connaissances

More information about the Cubicweb mailing list