[Cubicweb] Multisource in CW
nicolas.chauvat at logilab.fr
Thu May 24 12:09:59 CEST 2012
On Thu, May 24, 2012 at 09:26:54AM +0200, Vincent Michel wrote:
> - we do not include at all the schema, and let the user deal with the remote
> schema within the RQL request. I think that this is an interesting option if
> we consider that this multisource is dedicated to quick and on-the-fly joints
> to remote instances (with schemas that may changed...), and that we do not
> want to migrate the local instance.
The key point is that in order to use CW's "base ui framework" that
generates a large part of the UI, we need a datamodel/schema. Quick
joints and on-the-fly queries without previous knowledge of the schema
means that you have to code a lot by hand in the view that fetches
this external data.
I am not saying I have the answer that strikes the right balance here,
I am just trying to have a clear understanding of the problem.
> - we include the schema of the remote instance WITHOUT creating tables.
> Indeed, storing the remote entities in the local database may be interesting
> for few thousands of entities, but with Dbpedia or Geonames, one may pollute
> the local instance with hundreds of thousands of entities.
> > Could it be interesting to allow any entity to be related to a Thing
> > (defined by a URL) and have some kind of Datafeed fetch the
> > information in the background and make a local copy (reading the
> > schema of the remote instance and creating cw_* tables when needed) ?
> Making local copy will be painful as soon as we will use huge remote
> instances. Moreover, it will depend on the schema of the distant instance.
> - a local cache handling system that avoids to perform multiple similar
> queries (but which is not stored in the database).
> - to allow any entity to be related to a Thing (defined by a URL).
> Thus, any relation may have Thing as object.
That's more or less what I was suggesting: let the user link to a
Thing defined by a URL and have a background task that will lookup the
url and cache a local copy of the data to make it available for the
> I don't have a source Geoname, I have an instance with a Geoname cube and
> Geoname data. Thus, it may be fetched from an URL (but for now, it is only
> in_memory connections). It may be even possible to think a future improvement
> that allows to query SPARQL endpoints or JSONp endpoints, rather that CubicWeb
Allowing the query planner to mix and match CubicWeb, SPARQL and
JSONp... sounds interesting, but difficult.
Please read http://www.w3.org/TR/2010/WD-sparql11-federated-query-20100601/
> Thanks again for the comments ! I will take a look to the RQL querier to see
> what I can do with it !
That's definitely a good read.
logilab.fr - services en informatique scientifique et gestion de connaissances
More information about the Cubicweb