aurelien.campeas at logilab.fr
Mon Dec 16 10:33:04 CET 2013
On 16/12/2013 09:10, David Douard wrote:
> On 04/12/2013 09:36, Dimitri Papadopoulos Orfanos wrote:
>> I've found a few words on datafeed in the Logilab blog about CubicWeb 3.11:
>> A new 'datafeed' source was introduced, inspired by the
>> soon to be deprecated datafeed cube. It needs polishing
>> but sets the foundation for advanced semantic web
>> applications that import content from others site using
>> simple http request.
>> A 'datafeed' source is associated to a parser that
>> analyses the imported data and then creates/updates
>> entities accordingly. There is currently a single parser
>> in the core that imports CubicWeb-generated xml and needs
>> to be configured with a mapping information that defines
>> how relations are to be followed. It provides a viable
>> alternative to 'pyrorql' sources. Other parsers to import
>> RDF, RSS, etc should come soon.
>> From what I gather datafeed can help import data from structured Web
>> sites. It doesn't help importing data from files (we have to parse at
>> least CSV, XML, DICOM files), does it?
> Not really, datafeed really is about importing (part of) an external
> database in a CW instance. datafeed can update entities as long as it
> can identify a unique identifier for that entity in the remote database
> (typically an URI).
>> In our application we repeatedly scan a directory for new files and data
>> appended to existing files. Can datafeed help in that case? I'm thinking
>> here about the "creates/updates entities accordingly" part of datafeed.
> At update time, it needs to be able to ask the objects in the remote
> database that have been created/modified since last synchronization. All
> these stuff you won't (easily) have from files. What you need is a set
> of (smart) importation scripts.
You can also remain within the source/datafeed realm and implement
your own datafeed.DataFeedParser subclass to handle sync from files.
More information about the Cubicweb