[Cubicweb] Multisource in CW

Nicolas Chauvat nicolas.chauvat at logilab.fr
Thu May 24 12:09:59 CEST 2012

Hi All,

On Thu, May 24, 2012 at 09:26:54AM +0200, Vincent Michel wrote:
> - we do not include at all the schema, and let the user deal with the remote 
> schema within the RQL request. I think that this is an interesting option if 
> we consider that this multisource is dedicated to quick and on-the-fly joints 
> to remote instances (with schemas that may changed...), and that we do not 
> want to migrate the local instance.

The key point is that in order to use CW's "base ui framework" that
generates a large part of the UI, we need a datamodel/schema. Quick
joints and on-the-fly queries without previous knowledge of the schema
means that you have to code a lot by hand in the view that fetches
this external data.

I am not saying I have the answer that strikes the right balance here,
I am just trying to have a clear understanding of the problem.

> - we include the schema of the remote instance WITHOUT creating tables. 
> Indeed, storing the remote entities in the local database may be interesting 
> for few thousands of entities, but with Dbpedia or Geonames, one may pollute 
> the local instance with hundreds of thousands of entities.
> ...
> > Could it be interesting to allow any entity to be related to a Thing
> > (defined by a URL) and have some kind of Datafeed fetch the
> > information in the background and make a local copy (reading the
> > schema of the remote instance and creating cw_* tables when needed) ?
> Making local copy will be painful as soon as we will use huge remote 
> instances. Moreover, it will depend on the schema of the distant instance.
> ...
> - a local cache handling system that avoids to perform multiple similar 
> queries (but which is not stored in the database).
> - to allow any entity to be related to a Thing (defined by a URL).
> Thus, any relation may have Thing as object.

That's more or less what I was suggesting: let the user link to a
Thing defined by a URL and have a background task that will lookup the
url and cache a local copy of the data to make it available for the
upcoming queries.

> I don't have a source Geoname, I have an instance with a Geoname cube and 
> Geoname data. Thus, it may be fetched from an URL (but for now, it is only 
> in_memory connections). It may be even possible to think a future improvement 
> that allows to query SPARQL endpoints or JSONp endpoints, rather that CubicWeb 
> instances..

Allowing the query planner to mix and match CubicWeb, SPARQL and
JSONp... sounds interesting, but difficult.

Please read http://www.w3.org/TR/2010/WD-sparql11-federated-query-20100601/

> Thanks again for the comments ! I will take a look to the RQL querier to see 
> what I can do with it !

That's definitely a good read.

Nicolas Chauvat

logilab.fr - services en informatique scientifique et gestion de connaissances  

More information about the Cubicweb mailing list