[Cubicweb] Multisource in CW

Sylvain Thénault sylvain.thenault at logilab.fr
Thu May 24 18:19:19 CEST 2012

On 24 mai 12:09, Nicolas Chauvat wrote:
> Hi All,
> On Thu, May 24, 2012 at 09:26:54AM +0200, Vincent Michel wrote:
> > - we do not include at all the schema, and let the user deal with the remote 
> > schema within the RQL request. I think that this is an interesting option if 
> > we consider that this multisource is dedicated to quick and on-the-fly joints 
> > to remote instances (with schemas that may changed...), and that we do not 
> > want to migrate the local instance.
> The key point is that in order to use CW's "base ui framework" that
> generates a large part of the UI, we need a datamodel/schema. Quick
> joints and on-the-fly queries without previous knowledge of the schema
> means that you have to code a lot by hand in the view that fetches
> this external data.

That's a basement question : should we consider that having to define
a schema for the external source is a pb or not? CW without schema
information doesn't sounds like CW anymore, so I hope the answer is no :)
> > I don't have a source Geoname, I have an instance with a Geoname cube and 
> > Geoname data. Thus, it may be fetched from an URL (but for now, it is only 
> > in_memory connections). It may be even possible to think a future improvement 
> > that allows to query SPARQL endpoints or JSONp endpoints, rather that CubicWeb 
> > instances..

So I guess you're doing things for experimental purpose, since once you've
a geoname CW instance, you can use regular pyrorql source right ?
> Allowing the query planner to mix and match CubicWeb, SPARQL and
> JSONp... sounds interesting, but difficult.

Writing SPARQL/JSONp source shouldn't be that hard.

I still suspect Vincent's problem lies in the way the multi-sources query
planner is implemented currently, eg :

 "Any X,XA WHERE Y linked_to X, X attribute A" 

where X could come from an external (but not Y) is currently executed with 
the following steps:

1. fetch "Any X,XA WHERE X attribute A" from the external source and store
   results in a temporary table, along with records for this query from the
   system source

2. execute "Any X,XA WHERE Y linked_to X, X attribute A" on the system source
   using the temporary table for X/XA

while we could want:

1. execute "Any X WHERE Y linked_to X" on the system source

2. retrieve XA from external sources for each X returned by the previous step

3. build rset for X,XA

which would be practical on situation where the source is e.g. geonames or
dbpedia while the former isn't.

Sylvain Thénault, LOGILAB, Paris ( - Toulouse (
Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations
Développement logiciel sur mesure:       http://www.logilab.fr/services
CubicWeb, the semantic web framework:    http://www.cubicweb.org

More information about the Cubicweb mailing list