[PATCH 08 of 11 saem_ref] [ark] Introduce data declaration for "ark" table and PL/pgSQL function for identifier generation

Denis Laxalde denis.laxalde at logilab.fr
Mon Feb 12 16:56:32 CET 2018


Denis Laxalde a écrit :
> # HG changeset patch
> # User Denis Laxalde <denis.laxalde at logilab.fr>
> # Date 1518430204 -3600
> #      Mon Feb 12 11:10:04 2018 +0100
> # Node ID cdc498fe733dd4496575c874140c10e21e66fd3f
> # Parent  43cf7062f7aa13401eb6e7dc839a621d72e8b7c8
> # Available At http://hg.logilab.org/review/cubes/saem_ref
> #              hg pull http://hg.logilab.org/review/cubes/saem_ref -r cdc498fe733d
> # EXP-Topic ark/postgres-generated
> [ark] Introduce data declaration for "ark" table and  PL/pgSQL function for identifier generation
> 
> We add two SQL scripts in schema/ directory, one to install the
> "pgcrypto" extension (needed for random()) and one containing "ark"
> table definition and a couple of function to generate unique and random
> identifiers:
> 
> * `gen_ark_name` cares about the "name" part of ARK identifiers and
>   returns a random text value ensuring structural constraints are
>   fulfilled (no vowels, no more than 3 consecutive letters, possibly a
>   prefix and a trailing control character)
> * `random_char` is a simpler helper producing a random string from a set
>   of characters
> * `gen_ark` is the main function that will be used to generate records
>   in the "ark" table; it loops by trying to INSERT a record with
>   randomly generated column values (only the "name" part for now) and
>   will return once there is no unique constraint violation
> 
> Python tests added in test_ark.py. As noted in comments of the test case
> class, we use CubicWebTC despite we don't need access to a CubicWeb
> database (nor schema): this is only by convenience to get the postgresql
> cluster setup (and teardown) at the cost for more expensive test
> runtime.
> 
> Related to extranet #46881387.
> 
> diff --git a/MANIFEST.in b/MANIFEST.in
> --- a/MANIFEST.in
> +++ b/MANIFEST.in
> @@ -1,6 +1,7 @@
>  recursive-include cubicweb_saem_ref/data *.gif *.jpg *.png *.ico *.css *.js *.xml
>  recursive-include test/data *.xml *.xsd *.rdf *.csv
>  include test/*.py test/data/bootstrap_cubes
> +include cubicweb_saem_ref/schema/*.sql
>  include cubicweb_saem_ref/i18n/*.po
>  include cubicweb_saem_ref/i18n/*.pot
>  include cubicweb_saem_ref/migration/*.py
> diff --git a/cubicweb_saem_ref/schema/_ext.postgres.sql b/cubicweb_saem_ref/schema/_ext.postgres.sql
> new file mode 100644
> --- /dev/null
> +++ b/cubicweb_saem_ref/schema/_ext.postgres.sql
> @@ -0,0 +1,1 @@
> +CREATE EXTENSION IF NOT EXISTS pgcrypto;
> diff --git a/cubicweb_saem_ref/schema/ark.postgres.sql b/cubicweb_saem_ref/schema/ark.postgres.sql
> new file mode 100644
> --- /dev/null
> +++ b/cubicweb_saem_ref/schema/ark.postgres.sql
> @@ -0,0 +1,73 @@
> +CREATE TABLE ark (
> +    name TEXT NOT NULL,
> +    qualifier TEXT NOT NULL DEFAULT '',
> +    CONSTRAINT qualified_name PRIMARY KEY (name, qualifier)
> +);
> +
> +-- Return a part of an ARK identifier of `len` length with `prefix` prefix and
> +-- `control_char` trailing character.
> +CREATE OR REPLACE
> +FUNCTION gen_ark_name(len INTEGER, prefix TEXT, control_char TEXT)
> +RETURNS text AS $$
> +DECLARE
> +    ark TEXT;
> +    letters TEXT;
> +    numbers TEXT;
> +    characters TEXT;
> +    next_char TEXT;
> +    tail TEXT;
> +BEGIN
> +    ASSERT len > length(prefix) + length(control_char);
> +    letters := 'bcdfghjkmnpqrstvwxz';
> +    numbers := '0123456789';
> +    characters := letters || numbers;
> +    -- Prefix ARK identifier.
> +    ark := prefix;
> +    LOOP
> +        tail := substring(ark, length(ark) - 2, 3);
> +        -- No more than 3 consecutive consonants.
> +        IF tail ~ '([a-z]){3}' THEN
> +            next_char := random_char(numbers);
> +        ELSE
> +            next_char := random_char(characters);
> +        END IF;
> +        ark := ark || next_char;
> +        IF length(ark) = len - 1 THEN
> +            EXIT;
> +        END IF;
> +    END LOOP;
> +    -- Add control character.
> +    RETURN ark || control_char;
> +END;

(Note to myself: numbers in this function are hardcoded, they should be
computed from the length of "prefix" and "control_char".)

> +$$ language 'plpgsql';
> +
> +-- Return an random character within given string
> +CREATE OR REPLACE FUNCTION random_char(chars TEXT) RETURNS text AS $$
> +BEGIN
> +    RETURN substr(chars, (random() * (length(chars) -1) + 1)::INTEGER, 1);
> +END;
> +$$ language 'plpgsql';
> +
> +-- Insert a record in "ark" table ensuring uniqueness constraints are
> +-- fulfilled.
> +CREATE OR REPLACE
> +FUNCTION gen_ark(len INTEGER, prefix TEXT, control_char TEXT)
> +RETURNS TEXT AS $$
> +DECLARE
> +    ark_name TEXT;
> +BEGIN
> +    LOOP
> +        BEGIN
> +            INSERT INTO ark
> +                VALUES (gen_ark_name(len, prefix, control_char), DEFAULT)
> +                RETURNING name INTO ark_name;
> +            RETURN ark_name;
> +        EXCEPTION
> +            WHEN unique_violation THEN
> +                -- Continue and try with another "name".
> +                RAISE WARNING 'unique_violation';
> +                NULL;
> +        END;
> +    END LOOP;
> +END;
> +$$ LANGUAGE plpgsql;



More information about the saem-devel mailing list