[Cubicweb] experimenting with a DataFeed entity

Nicolas Chauvat nicolas.chauvat at logilab.fr
Mon Feb 15 13:17:22 CET 2010


Hi,

On Mon, Feb 15, 2010 at 09:44:17AM +0100, Florent Cayré wrote:
> this experimentation is a good start towards effectively using LinkedData,
> which is a challenge because :
> 
> * we need to reference entities in someone else's database and to use it
> effectively (there are of course performance issues => caching is needed) ;
> 
> * we want to keep data as up-to-date as possible, the best being our copy
> being identical to the original (=> caching becomes a problem).

Yes, caching is an important issue when it comes to using data stored
in remote databases. I am not sure how others are doing it at the
moment. I will try to find something useful along the lines of
http://www.google.com/search?&q=sparql%20federation

For now, since we are mostly manipulating text, I was planning on
storing a local copy everything the app downloads from third parties.

The time for which the data stays valid should probably be inferred
from the HTTP headers.

> As far as I understand, you are trying to address performance issues using
> caching, also trying to keep data up-to-date through regular polling.

http://www.w3.org/Submission/SPARQL-Update/ is the 'push' option that
was advertised lately on the semantic-web at w3c mailing list.

Polling is bad for performance from a global point of view, but it is
also very easy to get working as a first step.

> I would propose two other *complementary* approaches regarding data
> freshness issue :
> 
> * web hooks (http://www.webhooks.org/) : in the long term, we need a way to
> be notified when the distant data changes ; the problem is we need a
> standard to do so, thus it is probably not a short term solution, although
> necessary to promote LinkedData usage I think ;

I will look at it, but the links to wiki and blog on the home page are
dead links.

> * browser data freshness check : each time we use a distant data cache, we
> could ask the user's browser (through a javascript snippet) to check data
> freshness by querying the original entity last modification date (or such)
> on the original website, and use this information to eventually refresh our
> copy. We just need a way to query an entity modification date effectively.

Is this the same as looking at the HTTP headers?

http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html

-- 
Nicolas Chauvat

logilab.fr - services en informatique scientifique et gestion de connaissances  



More information about the Cubicweb mailing list