[Cubicweb] CWEP-0002 - RQL rewriting

Léa Capgen lea.capgen at logilab.fr
Wed Dec 18 15:18:04 CET 2013


Hi Cubicweb users,

Here is the update of CWEP-0002 according to your feedbacks.

Regards,
Léa,




updated 2013/12/18

Rationale
==========


Logilab has been thinking about rules and RQL rewriting for years, but 
never had the time. Using rules in database queries is not a new topic, 
that dates back to the 70s (read about Datalog_ for example).

.. _Datalog: http://en.wikipedia.org/wiki/Datalog

With rules and RQL rewriting, we could have syntactic sugar for almost 
free and have more flexible schemas that define relations and attributes 
that we can chose not to materialize at the SQL table level.

Here are a few examples of use cases. A more complete document can be 
found here:

http://hg.logilab.org/users/lcapgen/cw_rules/file/44522d5906d4/cw-regles-utilisations.rst

Proposal
========

Use cases
-----------

Relation rewriting
........................

Instead of::

   Any A,B WHERE C is Contribution, C contributor A, C manifestation B,
                 C role R, R name "illustrator"

we would like to write::

   Any A,B WHERE A illustrator_of B

after adding a rule to the schema as follows:

.. sourcecode:: python

     class illustrator_of(RelationDefinition):
         subject = 'Person'
         object = 'Manifestation'
         rule  = ('C is Contribution, C contributor S, C manifestation O,'
                  'C role R, R name "illustrator"')



Computed attribute
..........................

Instead of::

   Any SUM(SA) GROUPBY S WHERE P works_for S, P salary SA

we would like to write::

   Any A WHERE S total_salary A

after adding to the schema a rule for the computed attribute as follows:

.. sourcecode:: python

     class Company(EntityType):
         name = String()
         total_salary = Int(computed=('Any SUM(SA) GROUPBY S WHERE P 
works_for S, P salary SA'))


Further use cases
----------------------

Map RDF to Yams
...................

Given the following XY equivalences:

.. sourcecode:: python

    xy.add_equivalence('CWUser', 'foaf:Person')
    xy.add_equivalence('Person', 'foaf:Person')

when writing::

     Any S WHERE S is foaf:Person

we would like the query to be rewritten as::

     Any A WHERE A is IN(Person, User)




Choice of materializing at the SQL table level
-----------------------------------------------
Relation rewriting
....................

The default behavior for relation rewriting implies that the matching 
SQL tables are not materialized. Each time a request uses the relation, 
it rewrites it.
If one wants to materialized the relation at the SQL table level, one 
has to add an annotations to the schema as follows:

.. sourcecode:: python

     class illustrator_of(RelationDefinition):
         subject = 'Person'
         object = 'Manifestation'
         rule  = ('C is Contribution, C contributor S, C manifestation O,'
                  'C role R, R name "illustrator"')
         __annotations__ = {'materialized' : True}

SQL tables will be created and filled with a hook using the rule.

Computed attribute rewriting
.............................

The default behavior for a computed attribute rewriting implies that the 
matching SQL tables are materialized. SQL tables will be created and 
filled with a hook using the computed rule.
If one wants not to materialized the relation at the SQL table level, 
one has to add an annotations to the computed attribute as follows:

.. sourcecode:: python

     class Company(EntityType):
         name = String()
         total_salary = Int(computed=('Any SUM(SA) GROUPBY S WHERE P 
works_for S, P salary SA'),
                             __annotations__ = {'virtual' : True})

Each time a request uses the computed attribute, it rewrites it.

Implementation
===============

A first implementation is available here:

   http://hg.logilab.org/review/rql/rev/3a74699faa12

   http://hg.logilab.org/review/yams/rev/501931830b3a

   http://hg.logilab.org/review/cubicweb/rev/972a53c416a8




Limitations
-------------

Limitations for relation rewriting
..........................................

+ In SET tree case,for the moment the RQL rewriting cannot rewrite 
relation than the request is going to modify, t can only rewrite 
relations in the WHERE node.

+ The rules are written in a dictionnary like {premisse : conclusions}. 
The premisse is a unique relation.

+ A relation cannot be in both premisse and conclusions.

+ The rewriting cannot stand for a subquery in another subquery (nested 
subquery)

Limitations for computed attribute rewriting
..................................................

+ In SET tree case, for the moment the RQL rewriting cannot rewrite 
computed attribute than the request is going to modify.
+ For reasons of terminations, we cannot have both
     * a premisse of a rule in the definition of a computed attribute AND
     * a computed attribute in a conclusion of a rule






Discussion
==========

Please comment.


Materializing formalism
------------------------

What is existed now
....................
For the moment, we have two __annotations__:
     + __annotations__ = {'materialized' : True} to force rewritten 
relation to be materialized at SQL table level
     + __annotations__ = {'virtual' : True} to force computed attribute 
to not be materialized at SQL table level

Our possible choices
.....................
1) We prefer the consistency of the formalism:
     - In this case, we have to choice between __annotations__ = 
{'materialized' : True} and __annotations__ = {'virtual' : True}.
     - It implies that relation and computed attribute have **the same 
default behavior**.

2) We consider CA and relation have distinct behavior:
     - It is what is existed now



More information about the Cubicweb mailing list