[Cubicweb] Multisource in CW
sylvain.thenault at logilab.fr
Wed May 23 10:24:32 CEST 2012
On 21 mai 17:56, Vincent Michel wrote:
> STORING REFERENCES
> I have in my instance an entity class "Article", with a relation
> "contains_reference" that stores some URIs to external databases (e.g.
> For this, I have modified the entities table by dropping the NOT NULL
> constraints on "source", "asource", and I have added a column "exturi
> VARCHAR(256)". As I don't want to store external references in a specific
> entity type with its own table, I have introduced a new base entity type
> The classical API is still available:
> relate(1234, "contains_reference", 4567)
> but now, one can execute:
> relate(1234, "contains_reference", http://dbpedia.org/foobar)
I'm not aware of this 'relate' api, which is a bit surprising :p
> This will create (if not already existing), an entry in the "entities" table,
> with the following data:
> EID 8910
> type Thing
> source NULL
> asource NULL
> mtime XX:YY:ZZ
> extid NULL
> exturi http://dbpedia.org/foobar
> and it will push the following line in the table
> 1234 8910
> This behavior relies on very slight code modifications in the functions
> "related()" and "add_info()". The relations management of CubicWeb stay
> The main idea here is to keep as less as possible information in the database,
> and the reference is an URI, which is universal and does not rely on a
> specific eid in a distant instance.
> But now, we can joint with distant databases, using the following API:
> ' |<appid>-<variable use for join> <DISTANT QUERY>'
> For example:
> rset = rql('Any X, L, D WHERE X contains_reference Y|dbpedia-Y Y label L, Y
> depiction D')
Have you actually implemented this?
Are 'label' and 'depiction' defined in the schema of 'Thing'?
> Information of Dbpedia, Geonames, etc... can now be mutualized across
> instances, and, even if the internal eids of these databases changed, the
> queries are still valid.
I'm not sure to get your point here.
Here are my points :
* the source abstraction has been introduced to be able to code application
independantly from its data sources. And this is imo valuable and kept in
mind, even if we may need specific api/rql syntax to allow application
* I'm not sure we need all that specific stuff and not reusing existing
- provided you've a e.g. geoname source which is able to fetch attributes
from an url
- the extid column is the same things as your exturi (but base64 encoded, that
should not be a pb)
- no data stored in entity type tables
then everything should work transparently.
Any X,L,D WHERE X contains_reference Y, Y label L, Y depiction D
would work provided we've a new query planner that first execute
"X contains_reference Y", then call the source to fetch L and D.
Sylvain Thénault, LOGILAB, Paris (01.45.32.03.12) - Toulouse (09.54.03.55.76)
Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations
Développement logiciel sur mesure: http://www.logilab.fr/services
CubicWeb, the semantic web framework: http://www.cubicweb.org
More information about the Cubicweb