Setting up bulk duplicate search
Bulk duplicate search is a third-party service for bulk deduplication of bpm’online section records.
Attention
Set up the global search service in ElasticSearch to ensure correct operation of the bulk duplicate search. More information about global search setup is available in the “Setting up the global search service (version 1.6)” article.
Basic knowledge of Docker software and Linux OS administration is required to set up the service.
We recommend implementing the operating-system-level virtualization (containerization) to ensure the correct operation of bulk duplicate search. The corresponding system server requirements are available in the “Containerization (operating-system-level virtualization) requirements” article.
Components of the bulk duplicate search service
Prerequisites:
1.Global search components. The list is available in the “Setting up the global search service (version 1.6)” article.
2.Components of the bulk duplicate search service. The list of components is available below.
The following component should be deployed on a separate server:
Mongodb – document-oriented DBMS.
The following components should be deployed on a single server:
dd-web-api – web service for communicating in bpm’online.
dd-data-service – internal service for communication with mongdb.
dd-duplicates-search-worker – duplicate search component.
dd-duplicates-deletion-worker – duplicate deletion component.
dd-duplicates-confirmation-worker – component that performs grouping and filtering of the detected duplicates taking into account their uniqueness.
dd-duplicates-cleaner – component for clearing the duplicates.
dd-deduplication-task-worker – component for setting the deduplication task.
dd-deduplication-preparation-worker – component for preparing the deduplication process, it generates queries for duplicate search according to the rules.
To set up the components, download the source files. Download files.
Algorithm for setting up the bulk duplicate search service
1.Set up global search. Read more >>>
2.Set up mongodb. Read more >>>
3.Download and unzip the necessary setup source files. Copy them to the computer with the installed “docker-compose” software. Download files.
4.Set up the environment variables. Read more >>>
5.Launch the containers. Read more >>>
6.Enable the bulk duplicate search function in bpm’online. Read more >>>
Deploying and setting up mongodb
The install and setup instruction is available on the mongodb official website. Go to the website.
Setting up the environment variables
The environment variables are contained in the compose.env file. Edit this file to set the values of variables.
Variable name | Details | Default value |
---|---|---|
DEDUPLICATION_TAG | Container tag on dockerhub. | Version 1.0 if the bulk duplicate search service version is also 1.0. Version 1.2 if the bulk duplicate search service version is also 1.2. Version 1.3 if the bulk duplicate search service version is also 1.3. Version 1.4 if the bulk duplicate search service version is also 1.4. |
ELASTICSEARCH_URI | External path to elasticsearch that was deployed on the global search setup step in bpm’online. | http://user:password@external. elasticsearch:9200/ |
WEB_API_PORT | The port that will contain the web-api component of the bulk duplicate search. | 8086 |
REDIS_CACHE_OPTIONS_CONNECTION _STRING | External path to redis that was deployed on the global search setup step in bpm’online. | external.redis:6379,defaultDat abase=0,syncTimeout=5000 |
RABBITMQ_URI | External path to rabbitmq that was deployed on the global search setup step in bpm’online. | amqp://gs:gs@external.rebbitmq :5672 |
MONGODB_CONNECTION_STRING | External path to mongodb. | mongodb://dd:dd@external.mongo db:27017?uuidRepresentation=Standard |
To launch the containers, execute the following command:
cd compose # switch to the ‘compose’ folder
docker-compose up -d
Enabling the bulk duplicate search function in bpm’online
Settings on the bpm’online side
1.Set up the “Deduplication service api address” system setting value. Read more >>>
2.Set up the “Duplicates search” operation permissions. Read more >>>
3.Enable the bulk duplicate search functionality in bpm’online. Read more >>>
a.For MS SQL DBMS. Read more >>>
b.For Oracle DBMS. Read more >>>
c.For PostgreSql DBMS. Read more >>>
4.Restart the bpm’online application. Read more >>>
Modifying the “Deduplication service api address” system setting value
In the [System settings] section, find the “Deduplication service api address” system setting (DeduplicationWebApiUrl) and specify the URL to dd-web-api, string of the following type: http://external.deduplication-web-api:8086.
Adding the “Duplicates search” system operation
In the [Operation permissions] section, open the “Duplicates search” system operation (CanSearchDuplicates) and, on the [Operation permission] detail, provide permissions to the necessary users/roles, who will be able to perform the search for duplicates.
Enabling the bulk duplicate search functionality in bpm’online
You can enable the bulk duplicate search (Feature Toggle) functionality (Deduplication, ESDeduplication, BulkESDeduplication) by running a specific SQL script. The script will differ depending on the used DBMS – MS SQL, Oracle or Postgre SQL.
DECLARE @DeduplicationFeature NVARCHAR(50) = 'Deduplication';
DECLARE @DeduplicationFeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id FROM Feature WHERE Code = @DeduplicationFeature);
DECLARE @ESDeduplicationFeature NVARCHAR(50) = 'ESDeduplication';
DECLARE @DeduplicationFeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id FROM Feature WHERE Code = @ESDeduplicationFeature);
DECLARE @Bulk_ES_DD_Feature NVARCHAR(50) = 'BulkESDeduplication';
DECLARE @Bulk_ES_DD_FeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id
FROM Feature WHERE Code =@Bulk_ES_DD_Feature);
DECLARE @allEmployeesId UNIQUEIDENTIFIER = 'A29A3BA5-4B0D-DE11-9A51-005056C00008';
IF (@DeduplicationFeatureId IS NOT NULL)
BEGIN
IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @DeduplicationFeatureId)
UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId =@DeduplicationFeatureId
ELSE
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1',
@DeduplicationFeatureId)
END
ELSE
BEGIN
SET @DeduplicationFeatureId = NEWID()
INSERT INTO Feature (Id, Name, Code) VALUES
(@DeduplicationFeatureId, @DeduplicationFeature, @DeduplicationFeature)
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @DeduplicationFeatureId)
END
IF (@ESDeduplicationFeatureId IS NOT NULL)
BEGIN
IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @ESDeduplicationFeatureId)
UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId = @ESDeduplicationFeatureId
ELSE
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @ESDeduplicationFeatureId)
END
ELSE
BEGIN
SET @ESDeduplicationFeatureId = NEWID()
INSERT INTO Feature (Id, Name, Code) VALUES (@ESDeduplicationFeatureId, @ESDeduplicationFeature, @ESDeduplicationFeature)
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @ESDeduplicationFeatureId)
END
IF (@Bulk_ES_DD_FeatureId IS NOT NULL)
BEGIN
IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @Bulk_ES_DD_FeatureId)
UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId =@Bulk_ES_DD_FeatureId
ELSE
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState,FeatureId) VALUES (@allEmployeesId, '1', @Bulk_ES_DD_FeatureId)
END
ELSE
BEGIN
SET @Bulk_ES_DD_FeatureId = NEWID()
INSERT INTO Feature (Id, Name, Code) VALUES (@Bulk_ES_DD_FeatureId, @Bulk_ES_DD_Feature, @Bulk_ES_DD_Feature)
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @Bulk_ES_DD_FeatureId)
END
CREATE OR REPLACE FUNCTION
generate_uuid return varchar2 is
v_uuid varchar2(38);
v_guid varchar2(32);
BEGIN
v_guid := sys_guid();
v_uuid := lower(
'{' ||
substr(v_guid, 1,8) || '-' ||
substr(v_guid, 9,4) || '-' ||
substr(v_guid, 13,4) || '-' ||
substr(v_guid, 17,4) || '-' ||
substr(v_guid, 21) ||
'}'
);
RETURN v_uuid;
END;
/
DECLARE
DeduplicationFeature VARCHAR(50) := 'Deduplication';
DeduplicationFeatureId VARCHAR(38) := NULL;
DeduplicationFeatureId_GUID VARCHAR(38) := generate_uuid();
ESDeduplicationFeature VARCHAR(50) := 'ESDeduplication';
ESDeduplicationFeatureId VARCHAR(38) := NULL;
ESDeduplicationFeatureId_GUID VARCHAR(38) := generate_uuid();
BulkESDeduplicationFeature VARCHAR(50) := 'BulkESDeduplication';
BulkESDeduplicationFeatureId VARCHAR(38) := NULL;
Bulk_ES_DD_GUID VARCHAR(38) := generate_uuid();
allEmployeesId VARCHAR(38) := '{7F3B869F-34F3-4F20-AB4D-7480A5FDF647}';
State_Deduplication VARCHAR(1) := NULL;
State_ESDeduplication VARCHAR(1) := NULL;
State_BulkESDeduplication VARCHAR(1) := NULL;
BEGIN
SELECT MAX("Id") INTO DeduplicationFeatureId FROM "Feature" WHERE "Code" = DeduplicationFeature AND rownum = 1;
SELECT MAX("Id") INTO ESDeduplicationFeatureId FROM "Feature" WHERE "Code" = ESDeduplicationFeature AND rownum = 1;
SELECT MAX("Id") INTO BulkESDeduplicationFeatureId FROM "Feature" WHERE "Code" = BulkESDeduplicationFeature AND rownum = 1;
SELECT MAX("FeatureState") INTO State_Deduplication FROM "AdminUnitFeatureState" WHERE "FeatureId" = DeduplicationFeatureId AND rownum = 1;
SELECT MAX("FeatureState") INTO State_ESDeduplication FROM "AdminUnitFeatureState" WHERE "FeatureId" = BulkESDeduplicationFeatureId AND rownum = 1;
SELECT MAX("FeatureState") INTO State_BulkESDeduplication FROM "AdminUnitFeatureState" WHERE "FeatureId" = BulkESDeduplicationFeatureId AND rownum =1;
IF (DeduplicationFeatureId IS NULL) THEN
INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (DeduplicationFeatureId_GUID, DeduplicationFeature, DeduplicationFeature);
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId_GUID);
ELSE
IF (State_Deduplication IS NOT NULL) THEN
UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = DeduplicationFeatureId;
ELSE
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId_GUID);
END IF;
END IF;
IF (ESDeduplicationFeatureId IS NULL) THEN
INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (ESDeduplicationFeatureId_GUID, ESDeduplicationFeature, ESDeduplicationFeature);
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId_GUID);
ELSE
IF (State_ESDeduplication IS NOT NULL) THEN
UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = ESDeduplicationFeatureId;
ELSE
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId_GUID);
END IF;
END IF;
IF (BulkESDeduplicationFeatureId IS NULL) THEN
INSERT INTO "Feature" ("Id", "Name", "Code") VALUES(Bulk_ES_DD_GUID, BulkESDeduplicationFeature, BulkESDeduplicationFeature);
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_GUID);
ELSE
IF (State_BulkESDeduplication IS NOT NULL) THEN
UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = BulkESDeduplicationFeatureId;
ELSE
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_GUID);
END IF;
END IF;
END;
DO $$
DECLARE
DeduplicationFeature VARCHAR(50) := 'Deduplication';
DeduplicationFeatureId uuid;
ESDeduplicationFeature VARCHAR(50) := 'ESDeduplication';
ESDeduplicationFeatureId uuid;
Bulk_ES_DD_Feature VARCHAR(50) := 'BulkESDeduplication';
Bulk_ES_DD_FeatureId uuid;
allEmployeesId uuid := 'A29A3BA5-4B0D-DE11-9A51-005056C00008';
BEGIN
SELECT "Id" INTO DeduplicationFeatureId FROM "Feature"
WHERE "Code" = DeduplicationFeature
LIMIT 1;
IF (DeduplicationFeatureId IS NOT NULL)
THEN
IF EXISTS (SELECT * FROM "AdminUnitFeatureState" WHERE "FeatureId" = DeduplicationFeatureId) THEN
UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = DeduplicationFeatureId;
ELSE
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId);
END IF;
ELSE
DeduplicationFeatureId := uuid_generate_v4();
INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (DeduplicationFeatureId, DeduplicationFeature, DeduplicationFeature);
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId);
END IF;
SELECT "Id" INTO ESDeduplicationFeatureId FROM "Feature"
WHERE "Code" = ESDeduplicationFeature
LIMIT 1;
IF (ESDeduplicationFeatureId IS NOT NULL)
THEN
IF EXISTS (SELECT * FROM "AdminUnitFeatureState" WHERE "FeatureId" = ESDeduplicationFeatureId) THEN
UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = ESDeduplicationFeatureId;
ELSE
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId);
END IF;
ELSE
ESDeduplicationFeatureId := uuid_generate_v4()
INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (ESDeduplicationFeatureId, ESDeduplicationFeature, ESDeduplicationFeature);
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1',ESDeduplicationFeatureId);
END IF;
SELECT "Id" INTO Bulk_ES_DD_FeatureId FROM "Feature"
WHERE "Code" = Bulk_ES_DD_Feature
LIMIT 1;
IF (Bulk_ES_DD_FeatureId IS NOT NULL
THEN
IF EXISTS (SELECT * FROM "AdminUnitFeatureState" WHERE "FeatureId" = Bulk_ES_DD_FeatureId) THEN
UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = Bulk_ES_DD_FeatureId;
ELSE
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_FeatureId);
END IF;
ELSE
Bulk_ES_DD_FeatureId := uuid_generate_v4();
INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (Bulk_ES_DD_FeatureId, Bulk_ES_DD_Feature, Bulk_ES_DD_Feature);
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_FeatureId);
END IF;
END $$;
Restarting the bpm’online application
Clear redis, restart the bpm’online application and log in.
See also