Creatio administration

Setting up bulk duplicate search

Bulk duplicate search is a third-party service for bulk deduplication of bpm’online section records.

Attention

Set up the global search service in ElasticSearch to ensure correct operation of the bulk duplicate search. More information about global search setup is available in the “Setting up the global search service (version 1.6)” article.
Basic knowledge of Docker software and Linux OS administration is required to set up the service.

We recommend implementing the operating-system-level virtualization (containerization) to ensure the correct operation of bulk duplicate search. The corresponding system server requirements are available in the “Containerization (operating-system-level virtualization) requirements” article.

Components of the bulk duplicate search service

Prerequisites:

1.Global search components. The list is available in the “Setting up the global search service (version 1.6)” article.

2.Components of the bulk duplicate search service. The list of components is available below.

The following component should be deployed on a separate server:

Mongodb – document-oriented DBMS.

The following components should be deployed on a single server:

dd-web-api – web service for communicating in bpm’online.

dd-data-service – internal service for communication with mongdb.

dd-duplicates-search-worker – duplicate search component.

dd-duplicates-deletion-worker – duplicate deletion component.

dd-duplicates-confirmation-worker – component that performs grouping and filtering of the detected duplicates taking into account their uniqueness.

dd-duplicates-cleaner – component for clearing the duplicates.

dd-deduplication-task-worker – component for setting the deduplication task.

dd-deduplication-preparation-worker – component for preparing the deduplication process, it generates queries for duplicate search according to the rules.

To set up the components, download the source files. Download files.

Algorithm for setting up the bulk duplicate search service

1.Set up global search. Read more >>>

2.Set up mongodb. Read more >>>

3.Download and unzip the necessary setup source files. Copy them to the computer with the installed “docker-compose” software. Download files.

4.Set up the environment variables. Read more >>>

5.Launch the containers. Read more >>>

6.Enable the bulk duplicate search function in bpm’online. Read more >>>

Deploying and setting up mongodb

The install and setup instruction is available on the mongodb official website. Go to the website.

Setting up the environment variables

The environment variables are contained in the compose.env file. Edit this file to set the values of variables.

Variable name

Details

Default value

DEDUPLICATION_TAG

Container tag on

dockerhub.

Version 1.0 if the bulk duplicate search service version is also 1.0.

Version 1.2 if the bulk duplicate search service version is also 1.2.

Version 1.3 if the bulk duplicate search service version is also 1.3.

Version 1.4 if the bulk duplicate search service version is also 1.4.

ELASTICSEARCH_URI

External path to elasticsearch

that was deployed on

the global search setup

step in bpm’online.

http://user:password@external.

elasticsearch:9200/

WEB_API_PORT

The port that will contain

the web-api component

of the bulk duplicate

search.

8086

REDIS_CACHE_OPTIONS_CONNECTION

_STRING

External path to redis

that was deployed on

the global search setup

step in bpm’online.

external.redis:6379,defaultDat

abase=0,syncTimeout=5000

RABBITMQ_URI

External path to rabbitmq

that was deployed on

the global search setup

step in bpm’online.

amqp://gs:gs@external.rebbitmq

:5672

MONGODB_CONNECTION_STRING

External path to mongodb.

mongodb://dd:dd@external.mongo

db:27017?uuidRepresentation=Standard

Launching the containers

To launch the containers, execute the following command:

cd compose # switch to the ‘compose’ folder
docker-compose up -d

Enabling the bulk duplicate search function in bpm’online

Settings on the bpm’online side

1.Set up the “Deduplication service api address” system setting value. Read more >>>

2.Set up the “Duplicates search” operation permissions. Read more >>>

3.Enable the bulk duplicate search functionality in bpm’online. Read more >>>

a.For MS SQL DBMS. Read more >>>

b.For Oracle DBMS. Read more >>>

c.For PostgreSql DBMS. Read more >>>

4.Restart the bpm’online application. Read more >>>

Modifying the “Deduplication service api address” system setting value

In the [System settings] section, find the “Deduplication service api address” system setting (DeduplicationWebApiUrl) and specify the URL to dd-web-api, string of the following type: http://external.deduplication-web-api:8086.

Adding the “Duplicates search” system operation

In the [Operation permissions] section, open the “Duplicates search” system operation (CanSearchDuplicates) and, on the [Operation permission] detail, provide permissions to the necessary users/roles, who will be able to perform the search for duplicates.

Enabling the bulk duplicate search functionality in bpm’online

You can enable the bulk duplicate search (Feature Toggle) functionality (Deduplication, ESDeduplication, BulkESDeduplication) by running a specific SQL script. The script will differ depending on the used DBMS – MS SQL, Oracle or Postgre SQL.

For MS SQL DBMS

DECLARE @DeduplicationFeature NVARCHAR(50) = 'Deduplication';
DECLARE @DeduplicationFeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id FROM Feature WHERE Code = @DeduplicationFeature);

DECLARE @ESDeduplicationFeature NVARCHAR(50) = 'ESDeduplication';
DECLARE @DeduplicationFeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id FROM Feature WHERE Code = @ESDeduplicationFeature);

DECLARE @Bulk_ES_DD_Feature NVARCHAR(50) = 'BulkESDeduplication';
DECLARE @Bulk_ES_DD_FeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id
FROM Feature WHERE Code =@Bulk_ES_DD_Feature);

DECLARE @allEmployeesId UNIQUEIDENTIFIER = 'A29A3BA5-4B0D-DE11-9A51-005056C00008';
IF (@DeduplicationFeatureId IS NOT NULL)
BEGIN
      IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @DeduplicationFeatureId)
       UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId =@DeduplicationFeatureId
      ELSE
       INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1',
@DeduplicationFeatureId)
END
ELSE
BEGIN
      SET @DeduplicationFeatureId = NEWID()
      INSERT INTO Feature (Id, Name, Code) VALUES
(@DeduplicationFeatureId, @DeduplicationFeature, @DeduplicationFeature)
       INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @DeduplicationFeatureId)
END

IF (@ESDeduplicationFeatureId IS NOT NULL)
BEGIN
      IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @ESDeduplicationFeatureId)
     UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId = @ESDeduplicationFeatureId
      ELSE
       INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @ESDeduplicationFeatureId)
END
ELSE
BEGIN
      SET @ESDeduplicationFeatureId = NEWID()
      INSERT INTO Feature (Id, Name, Code) VALUES (@ESDeduplicationFeatureId, @ESDeduplicationFeature, @ESDeduplicationFeature)
      INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @ESDeduplicationFeatureId)
END

IF (@Bulk_ES_DD_FeatureId IS NOT NULL)
BEGIN
      IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @Bulk_ES_DD_FeatureId)
       UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId =@Bulk_ES_DD_FeatureId
      ELSE
       INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState,FeatureId) VALUES (@allEmployeesId, '1', @Bulk_ES_DD_FeatureId)
END
ELSE
BEGIN
      SET @Bulk_ES_DD_FeatureId = NEWID()
      INSERT INTO Feature (Id, Name, Code) VALUES (@Bulk_ES_DD_FeatureId, @Bulk_ES_DD_Feature, @Bulk_ES_DD_Feature)
      INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @Bulk_ES_DD_FeatureId)
END

For Oracle DBMS

CREATE OR REPLACE FUNCTION
generate_uuid return varchar2 is
       v_uuid varchar2(38);
       v_guid varchar2(32);
BEGIN
       v_guid := sys_guid();
       v_uuid := lower(
'{' ||
       substr(v_guid, 1,8) || '-' ||
       substr(v_guid, 9,4) || '-' ||
       substr(v_guid, 13,4) || '-' ||
       substr(v_guid, 17,4) || '-' ||
       substr(v_guid, 21) ||
       '}'
       );
       RETURN v_uuid;
END;
/
DECLARE
      DeduplicationFeature VARCHAR(50) := 'Deduplication';
      DeduplicationFeatureId VARCHAR(38) := NULL;
      DeduplicationFeatureId_GUID VARCHAR(38) := generate_uuid();
      ESDeduplicationFeature VARCHAR(50) := 'ESDeduplication';
      ESDeduplicationFeatureId VARCHAR(38) := NULL;
      ESDeduplicationFeatureId_GUID VARCHAR(38) := generate_uuid();
      BulkESDeduplicationFeature VARCHAR(50) := 'BulkESDeduplication';
      BulkESDeduplicationFeatureId VARCHAR(38) := NULL;
      Bulk_ES_DD_GUID VARCHAR(38) := generate_uuid();
      allEmployeesId VARCHAR(38) := '{7F3B869F-34F3-4F20-AB4D-7480A5FDF647}';
      State_Deduplication VARCHAR(1) := NULL;
      State_ESDeduplication VARCHAR(1) := NULL;
      State_BulkESDeduplication VARCHAR(1) := NULL;
BEGIN
      SELECT MAX("Id") INTO DeduplicationFeatureId FROM "Feature" WHERE "Code" = DeduplicationFeature AND rownum = 1;
      SELECT MAX("Id") INTO ESDeduplicationFeatureId FROM "Feature" WHERE "Code" = ESDeduplicationFeature AND rownum = 1;
      SELECT MAX("Id") INTO BulkESDeduplicationFeatureId FROM "Feature" WHERE "Code" = BulkESDeduplicationFeature AND rownum = 1;
      SELECT MAX("FeatureState") INTO State_Deduplication FROM "AdminUnitFeatureState" WHERE "FeatureId" = DeduplicationFeatureId AND rownum = 1;
      SELECT MAX("FeatureState") INTO State_ESDeduplication FROM "AdminUnitFeatureState" WHERE "FeatureId" = BulkESDeduplicationFeatureId AND rownum = 1;
      SELECT MAX("FeatureState") INTO State_BulkESDeduplication FROM "AdminUnitFeatureState" WHERE "FeatureId" = BulkESDeduplicationFeatureId AND rownum =1;
      IF (DeduplicationFeatureId IS NULL) THEN
       INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (DeduplicationFeatureId_GUID, DeduplicationFeature, DeduplicationFeature);
       INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId_GUID);
       ELSE
       IF (State_Deduplication IS NOT NULL) THEN
       UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = DeduplicationFeatureId;
       ELSE
       INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId_GUID);
       END IF;
END IF;
IF (ESDeduplicationFeatureId IS NULL) THEN
       INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (ESDeduplicationFeatureId_GUID, ESDeduplicationFeature, ESDeduplicationFeature);
       INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId_GUID);
       ELSE
       IF (State_ESDeduplication IS NOT NULL) THEN
       UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = ESDeduplicationFeatureId;
       ELSE
       INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId_GUID);
       END IF;
END IF;
IF (BulkESDeduplicationFeatureId IS NULL) THEN
       INSERT INTO "Feature" ("Id", "Name", "Code") VALUES(Bulk_ES_DD_GUID, BulkESDeduplicationFeature, BulkESDeduplicationFeature);
       INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_GUID);
       ELSE
       IF (State_BulkESDeduplication IS NOT NULL) THEN
       UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = BulkESDeduplicationFeatureId;

       ELSE
       INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_GUID);
       END IF;
      END IF;
END;

For PostgreSql DBMS

DO $$
DECLARE
         DeduplicationFeature VARCHAR(50) := 'Deduplication';
         DeduplicationFeatureId uuid;

         ESDeduplicationFeature VARCHAR(50) := 'ESDeduplication';
         ESDeduplicationFeatureId uuid;

         Bulk_ES_DD_Feature VARCHAR(50) := 'BulkESDeduplication';
         Bulk_ES_DD_FeatureId uuid;

       allEmployeesId uuid := 'A29A3BA5-4B0D-DE11-9A51-005056C00008';

BEGIN

         SELECT "Id" INTO DeduplicationFeatureId FROM "Feature"
         WHERE "Code" = DeduplicationFeature
         LIMIT 1;
         IF (DeduplicationFeatureId IS NOT NULL)
            THEN
               IF EXISTS (SELECT * FROM "AdminUnitFeatureState" WHERE "FeatureId" = DeduplicationFeatureId) THEN
                  UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = DeduplicationFeatureId;
               ELSE
                  INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId);
               END IF;
         ELSE
            DeduplicationFeatureId := uuid_generate_v4();
            INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (DeduplicationFeatureId, DeduplicationFeature, DeduplicationFeature);
            INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId);
         END IF;

         SELECT "Id" INTO ESDeduplicationFeatureId FROM "Feature"
         WHERE "Code" = ESDeduplicationFeature
         LIMIT 1;
         IF (ESDeduplicationFeatureId IS NOT NULL)
            THEN
               IF EXISTS (SELECT * FROM "AdminUnitFeatureState" WHERE "FeatureId" = ESDeduplicationFeatureId) THEN
                  UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = ESDeduplicationFeatureId;
               ELSE
                  INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId);
               END IF;
            ELSE
               ESDeduplicationFeatureId := uuid_generate_v4()
               INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (ESDeduplicationFeatureId, ESDeduplicationFeature, ESDeduplicationFeature);
               INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1',ESDeduplicationFeatureId);
            END IF;

         SELECT "Id" INTO Bulk_ES_DD_FeatureId FROM "Feature"
         WHERE "Code" = Bulk_ES_DD_Feature
         LIMIT 1;
         IF (Bulk_ES_DD_FeatureId IS NOT NULL
            THEN
               IF EXISTS (SELECT * FROM "AdminUnitFeatureState" WHERE "FeatureId" = Bulk_ES_DD_FeatureId) THEN
                  UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = Bulk_ES_DD_FeatureId;
               ELSE
                  INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_FeatureId);
               END IF;
         ELSE
            Bulk_ES_DD_FeatureId := uuid_generate_v4();
            INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (Bulk_ES_DD_FeatureId, Bulk_ES_DD_Feature, Bulk_ES_DD_Feature);
            INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_FeatureId);
         END IF;
END $$;

Restarting the bpm’online application

Clear redis, restart the bpm’online application and log in.

See also

Global search setup

Deduplication

Did you find this information useful?

How can we improve it?