[Cubicweb] Multisource in CW

Nicolas Chauvat nicolas.chauvat at logilab.fr
Thu May 24 19:13:14 CEST 2012

On Thu, May 24, 2012 at 06:19:19PM +0200, Sylvain Thénault wrote:
> > 
> > The key point is that in order to use CW's "base ui framework" that
> > generates a large part of the UI, we need a datamodel/schema. Quick
> > joints and on-the-fly queries without previous knowledge of the schema
> > means that you have to code a lot by hand in the view that fetches
> > this external data.
> That's a basement question : should we consider that having to define
> a schema for the external source is a pb or not? CW without schema
> information doesn't sounds like CW anymore, so I hope the answer is no :)

Maybe we need to get back to the basics.

The semantic web is about:

1/ using URLs to JOIN records over distributed (web)sites

2/ having some kind of laxity/extensibility regarding the data
model/schema (think of browsers that ignore the HTML tags they do not
know about, but still show the rest of the page and try to apply that
to databases)

[...add your own...]

An early design decision for CW was that in order to usefully display
information, you need a developer that knows about it and writes the
code or you need a description of the data model and use a generic way
to display that information based on its generic description.

We have several different use cases here:

1/ use a URL to link to remote entities that are not stored locally

2/ for performance's sake, do not duplicate large reference
databases/catalogs like dbpedia, geonames, databnf (if you have 4 apps
that want to use dbpedia data, you do not want to have 4 copies of the
data stored in your db cluster)

3/ figure out what to do regarding that data model
laxity/extensibility that everyone wants (I do not have an opinion yet)

> So I guess you're doing things for experimental purpose, since once you've
> a geoname CW instance, you can use regular pyrorql source right ?

Yes... but we would need the reference to the remote geonames instance
to be a URL like http://www.geonames.org/1234 and not the remote eid.

> > Allowing the query planner to mix and match CubicWeb, SPARQL and
> > JSONp... sounds interesting, but difficult.
> Writing SPARQL/JSONp source shouldn't be that hard.

Too bad you do not have time to do it :(

> I still suspect Vincent's problem lies in the way the multi-sources query
> planner is implemented currently, eg :
> ...

I am afraid of getting into the development of a full-blown db querier
that generates plans, picks the best one after running a genetic
algorithm, collects statisticts during execution and vaccuum cycles
and feeds back that info to the plan generation part...

Nicolas Chauvat

logilab.fr - services en informatique scientifique et gestion de connaissances  

More information about the Cubicweb mailing list