[Cubicweb] Multisource in CW

Sylvain Thénault sylvain.thenault at logilab.fr
Wed May 23 10:24:32 CEST 2012

On 21 mai 17:56, Vincent Michel wrote:
> ------------------
> I have in my instance an entity class "Article", with a relation 
> "contains_reference" that stores some URIs to external databases (e.g. 
> http://dbpedia.org/foobar).
> For this, I have modified the entities table by dropping the NOT NULL 
> constraints on "source", "asource", and I have added a column "exturi 
> VARCHAR(256)". As I don't want to store external references in a specific 
> entity type with its own table, I have introduced a new base entity type 
> "Thing".
> The classical API is still available:
>                         relate(1234, "contains_reference", 4567)
> but now, one can execute:
>                         relate(1234, "contains_reference", http://dbpedia.org/foobar)
I'm not aware of this 'relate' api, which is a bit surprising :p
> This will create (if not already existing), an entry in the "entities" table,
> with the following data:
> EID             8910
> type            Thing
> source  NULL
> asource         NULL
> mtime   XX:YY:ZZ
> extid   NULL
> exturi  http://dbpedia.org/foobar
> and it will push the following line in the table 
> "contains_reference_relation":
>                         1234        8910
> This behavior relies on very slight code modifications in the functions 
> "related()" and "add_info()". The relations management of CubicWeb stay 
> unchanged.
> The main idea here is to keep as less as possible information in the database, 
> and the reference is an URI, which is universal and does not rely on a 
> specific eid in a distant instance.

> But now, we can joint with distant databases, using the following API:
> ' |<appid>-<variable use for join> <DISTANT QUERY>'
> For example:
> rset = rql('Any X, L, D WHERE X contains_reference Y|dbpedia-Y Y label L, Y 
> depiction D')

Have you actually implemented this?
Are 'label' and 'depiction' defined in the schema of 'Thing'?
> Information of Dbpedia, Geonames, etc... can now be mutualized across 
> instances, and, even if the internal eids of these databases changed, the 
> queries are still valid.
I'm not sure to get your point here. 
Here are my points :

* the source abstraction has been introduced to be able to code application 
  independantly from its data sources. And this is imo valuable and kept in 
  mind, even if we may need specific api/rql syntax to allow application 
  specific optimization

* I'm not sure we need all that specific stuff and not reusing existing 

  - provided you've a e.g. geoname source which is able to fetch attributes
    from an url

  - the extid column is the same things as your exturi (but base64 encoded, that
    should not be a pb)

  - no data stored in entity type tables

  then everything should work transparently.

    Any X,L,D WHERE X contains_reference Y, Y label L, Y depiction D

  would work provided we've a new query planner that first execute 
  "X contains_reference Y", then call the source to fetch L and D.

Sylvain Thénault, LOGILAB, Paris ( - Toulouse (
Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations
Développement logiciel sur mesure:       http://www.logilab.fr/services
CubicWeb, the semantic web framework:    http://www.cubicweb.org

More information about the Cubicweb mailing list