[Cubicweb] Dataimport

Antoine Grigis antoine.grigis at cea.fr
Thu Dec 10 16:26:34 CET 2015


I have the filling that the CW 3.20.9 machinery supports Binary fields.
We just need to restore the previous behavior of the 
'_create_copyfrom_buffer' function.
Here is quick test with my modifications in bold:

*from psycopg2.extensions import Binary*

def _create_copyfrom_buffer(data, columns=None, **convert_opts):
     """
     Create a StringIO buffer for 'COPY FROM' command.
     Deals with Unicode, Int, Float, Date... (see ``converters``)

     :data: a sequence/dict of tuples
     :columns: list of columns to consider (default to all columns)
     :converter_opts: keyword arguements given to converters
     """
     # Create a list rather than directly create a StringIO
     # to correctly write lines separated by '\n' in a single step
     rows = []
     if columns is None:
         if isinstance(data[0], (tuple, list)):
             columns = range(len(data[0]))
         elif isinstance(data[0], dict):
             columns = data[0].keys()
         else:
             raise ValueError('Could not get columns: you must provide 
columns.')
     for row in data:
         # Iterate over the different columns and the different values
         # and try to convert them to a correct datatype.
         # If an error is raised, do not continue.
         formatted_row = []
         for col in columns:
             try:
                 value = row[col]
             except KeyError:
                 warnings.warn(u"Column %s is not accessible in row %s"
                               % (col, row), RuntimeWarning)
                 # XXX 'value' set to None so that the import does not 
end in
                 # error.
                 # Instead, the extra keys are set to NULL from the
                 # database point of view.
                 value = None
*            if isinstance(value, type(Binary(""))):**
**                return None*
             for types, converter in _COPYFROM_BUFFER_CONVERTERS:
                 if isinstance(value, types):
                     value = converter(value, **convert_opts)
                     break
             else:
                 raise ValueError("Unsupported value type %s" % type(value))
             # We push the value to the new formatted row
             # if the value is not None and could be converted to a string.
             formatted_row.append(value)
         rows.append('\t'.join(formatted_row))
     return StringIO('\n'.join(rows))

Using a 'SQLGenObjectStore' I am then able to insert Binary fields.
Do you think such modification is valid and could be integrated in a 
future release?

Antoine

Le 04/12/2015 14:44, Rémi Cardona a écrit :
> Le 04/12/2015 13:44, Antoine Grigis a écrit :
>> I noticed a change in the 'dataimport' module between CubicWeb (CW)
>> 3.19.6 and CW 3.20.9.
>> The '_create_copyfrom_buffer' function has been refactored and the
>> default return case removed.
>> In the former version, the function default returned code was 'None'.
>> This is catched in both CW versions by the '_execmany_thread_copy_from'
>> function in order to execute thread without 'copy from' tabular data
>> speedup.
>>
>> Thus in CW 3.20.9 inserting a 'Binray' entity field through the
>> 'SQLGenObjectStore' raises an Exception.
>>
>> Is it a normal behavior? Is it still possible to insert a 'Binary' field
>> using a store and a schema with inlined relations?
>
> Hi Antoine,
>
> The dataimport module was made stricter in CubicWeb 3.20 and newer 
> releases. Its handling of binary objects is one such case.
>
> Do note that the current code (in 3.20 and all newer releases) relies 
> on pyscopg2's copy_from() method [1] which only supports PostgreSQL's 
> "text" format [2].
>
> AIUI, this "text" format cannot be used to deal with binary data, 
> though we haven't given it much thought.
>
> If dealing with binary is possible, please let us know how. Patches 
> will be greatly appreciated (even in an unfinished state).
>
> Cheers,
>
> Rémi
>
> [1] http://initd.org/psycopg/docs/usage.html#using-copy-to-and-copy-from
> [2] http://www.postgresql.org/docs/9.4/static/sql-copy.html#AEN71994
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cubicweb.org/pipermail/cubicweb/attachments/20151210/3354e383/attachment-0186.html>


More information about the Cubicweb mailing list