Bulk duplicate search

PDF
Products
All Creatio products

Bulk duplicate search is a third-party service for bulk deduplication of Creatio section records.

Attention. Set up the global search service in ElasticSearch to ensure correct operation of the bulk duplicate search. Learn more about how to set up global search: Set up the global search service (version 2.0)

Basic knowledge of docker-compose and Linux OS administration is required to set up the bulk duplicate search service.

The bulk duplicate search service uses version control functionality. This instruction is universal for all service versions. Versions 1.0–1.5 are compatible with any Creatio versions that use the bulk duplicate search functionality. Version 2.0 is compatible with Creatio 7.15.4 and later.

Use the requirements calculator to check the server requirements.

Components of the bulk duplicate search service 

Prerequisites:

  1. Global search components. View the list: Set up the global search service (version 2.0).

  2. Components of the bulk duplicate search service. The list of components is available below.

MongoDB – document-oriented DBMS.

dd-web-api – web service for communicating in Creatio.

dd-data-service – internal service for communication with MongoDB.

dd-duplicates-search-worker – duplicate search component.

dd-duplicates-deletion-worker – duplicate deletion component.

dd-duplicates-confirmation-worker – component that performs grouping and filtering of the detected duplicates taking into account their uniqueness.

dd-duplicates-cleaner – component for clearing the duplicates.

dd-deduplication-task-worker – component for setting the deduplication task.

dd-deduplication-preparation-worker – component for preparing the deduplication process, generates queries for duplicate search according to the rules.

To set up the components, download the source files. Download files.

  1. Set up global search.

  2. Set up MongoDB.

  3. Download and extract the necessary setup source files. Copy them to the computer with the installed “docker-compose” software. Download files.

  4. Set up the environment variables.

  5. Launch the containers.

  6. Verify successful running of containers.

  7. Verify logging.

  8. Enable the bulk duplicate search function in Creatio.

Set up the environment variables 

The environment variables are contained in the compose.env file. Edit this file to set the values of variables.

Variable name

Details

Default value

ELASTICSEARCH_URI

IP address of the server where ElasticSearch was deployed on

the step of setting up the global search in Creatio.

http://user:password@external.elasticsearch-ip:9200/

Launch the containers 

To launch the containers, execute the following command:

cd compose # switch to the ‘compose’ folder
docker-compose up -d

Verify successful running of containers 

To view the list of all containers that are run, execute the following command in the console:

docker ps --filter "label=service=dd" -a --format "table {{.Names}}\t{{.Ports}}\t{{.Status}}\t{{.RunningFor}}"

The containers that are run will have an “Up” status.

Verify logging 

By default, logging is performed during the “stdout” container command execution. To view the last 100 records from the dd-data-service container, execute the following command:

docker logs --tail 100 dd-data-service

Enable the bulk duplicate search functionality in Creatio 

Settings on Creatio's end

  1. Set up the “Deduplication service api address” system setting value.
  2. Set up the “Duplicates search” operation permissions.
  3. Enable the bulk duplicate search functionality in Creatio. Note that this setting is different for different DBMS.
  4. Restart the Creatio application.

Modify the “Deduplication service api address” system setting value 

In the System settings section, find the “Deduplication service api address” (“DeduplicationWebApiUrl” code) system setting and specify the URL to dd-web-api, string of the following type: http://external.deduplication-web-api:8086.

Add the “Duplicates search” system operation 

In the Operation permissions section, open the “Duplicates search” (“CanSearchDuplicates” code) system operation and, on the Operation permission detail, provide permissions to the necessary users/roles, who will be able to perform the search for duplicates.

Enable the bulk duplicate search functionality in Creatio 

You can enable the bulk duplicate search (Feature Toggle) functionality (Deduplication, ESDeduplication, BulkESDeduplication) by running a specific SQL script. The script will differ depending on the used DBMS: Microsoft SQL, Oracle, or Postgre SQL.

For Microsoft SQL DBMS 

DECLARE @DeduplicationFeature NVARCHAR(50) = 'Deduplication';
DECLARE @DeduplicationFeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id FROM Feature WHERE Code = @DeduplicationFeature);

DECLARE @ESDeduplicationFeature NVARCHAR(50) = 'ESDeduplication';
DECLARE @ESDeduplicationFeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id FROM Feature WHERE Code = @ESDeduplicationFeature);

DECLARE @Bulk_ES_DD_Feature NVARCHAR(50) = 'BulkESDeduplication';
DECLARE @Bulk_ES_DD_FeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id 
FROM Feature WHERE Code =@Bulk_ES_DD_Feature);

DECLARE @allEmployeesId UNIQUEIDENTIFIER = 'A29A3BA5-4B0D-DE11-9A51-005056C00008';
IF (@DeduplicationFeatureId IS NOT NULL)
BEGIN
      IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @DeduplicationFeatureId)
       UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId =@DeduplicationFeatureId
      ELSE
       INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1',
@DeduplicationFeatureId)
END;
ELSE
BEGIN
      SET @DeduplicationFeatureId = NEWID()
      INSERT INTO Feature (Id, Name, Code) VALUES
(@DeduplicationFeatureId, @DeduplicationFeature, @DeduplicationFeature)
       INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @DeduplicationFeatureId)
END;

IF (@ESDeduplicationFeatureId IS NOT NULL)
BEGIN
      IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @ESDeduplicationFeatureId)
     UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId = @ESDeduplicationFeatureId
      ELSE
       INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @ESDeduplicationFeatureId)
END;
ELSE
BEGIN
      SET @ESDeduplicationFeatureId = NEWID()
      INSERT INTO Feature (Id, Name, Code) VALUES (@ESDeduplicationFeatureId, @ESDeduplicationFeature, @ESDeduplicationFeature)
      INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @ESDeduplicationFeatureId)
END;

IF (@Bulk_ES_DD_FeatureId IS NOT NULL)
BEGIN
      IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @Bulk_ES_DD_FeatureId)
       UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId =@Bulk_ES_DD_FeatureId
      ELSE
       INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState,FeatureId) VALUES (@allEmployeesId, '1', @Bulk_ES_DD_FeatureId)
END;
ELSE
BEGIN
      SET @Bulk_ES_DD_FeatureId = NEWID()
      INSERT INTO Feature (Id, Name, Code) VALUES (@Bulk_ES_DD_FeatureId, @Bulk_ES_DD_Feature, @Bulk_ES_DD_Feature)
      INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @Bulk_ES_DD_FeatureId)
END;

For Oracle DBMS 

CREATE OR REPLACE FUNCTION
generate_uuid return varchar2 is
       v_uuid varchar2(38);
       v_guid varchar2(32);
BEGIN
       v_guid := sys_guid();
       v_uuid := lower(
'{' ||
       substr(v_guid, 1,8) || '-' ||
       substr(v_guid, 9,4) || '-' ||
       substr(v_guid, 13,4) || '-' ||
       substr(v_guid, 17,4) || '-' ||
       substr(v_guid, 21) ||
       '}'
       );
       RETURN v_uuid;
END;
/
DECLARE
      DeduplicationFeature VARCHAR(50) := 'Deduplication';
      DeduplicationFeatureId VARCHAR(38) := NULL;
      DeduplicationFeatureId_GUID VARCHAR(38) := generate_uuid();
      ESDeduplicationFeature VARCHAR(50) := 'ESDeduplication';
      ESDeduplicationFeatureId VARCHAR(38) := NULL;
      ESDeduplicationFeatureId_GUID VARCHAR(38) := generate_uuid();
      BulkESDeduplicationFeature VARCHAR(50) := 'BulkESDeduplication';
      BulkESDeduplicationFeatureId VARCHAR(38) := NULL;
      Bulk_ES_DD_GUID VARCHAR(38) := generate_uuid();
      allEmployeesId VARCHAR(38) := '{7F3B869F-34F3-4F20-AB4D-7480A5FDF647}';
      State_Deduplication VARCHAR(1) := NULL;
      State_ESDeduplication VARCHAR(1) := NULL;
      State_BulkESDeduplication VARCHAR(1) := NULL;
BEGIN
      SELECT MAX("Id") INTO DeduplicationFeatureId FROM "Feature" WHERE "Code" = DeduplicationFeature AND rownum = 1;
      SELECT MAX("Id") INTO ESDeduplicationFeatureId FROM "Feature" WHERE "Code" = ESDeduplicationFeature AND rownum = 1;
      SELECT MAX("Id") INTO BulkESDeduplicationFeatureId FROM "Feature" WHERE "Code" = BulkESDeduplicationFeature AND rownum = 1;
      SELECT MAX("FeatureState") INTO State_Deduplication FROM "AdminUnitFeatureState" WHERE "FeatureId" = DeduplicationFeatureId AND rownum = 1;
      SELECT MAX("FeatureState") INTO State_ESDeduplication FROM "AdminUnitFeatureState" WHERE "FeatureId" = BulkESDeduplicationFeatureId AND rownum = 1;
      SELECT MAX("FeatureState") INTO State_BulkESDeduplication FROM "AdminUnitFeatureState" WHERE "FeatureId" = BulkESDeduplicationFeatureId AND rownum =1;
      IF (DeduplicationFeatureId IS NULL) THEN
       INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (DeduplicationFeatureId_GUID, DeduplicationFeature, DeduplicationFeature);
       INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId_GUID);
       ELSE
       IF (State_Deduplication IS NOT NULL) THEN
       UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = DeduplicationFeatureId;
       ELSE
       INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId_GUID);
       END IF;
END IF;
IF (ESDeduplicationFeatureId IS NULL) THEN
       INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (ESDeduplicationFeatureId_GUID, ESDeduplicationFeature, ESDeduplicationFeature);
       INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId_GUID);
       ELSE
       IF (State_ESDeduplication IS NOT NULL) THEN
       UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = ESDeduplicationFeatureId;
       ELSE
       INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId_GUID);
       END IF;
END IF;
IF (BulkESDeduplicationFeatureId IS NULL) THEN
       INSERT INTO "Feature" ("Id", "Name", "Code") VALUES(Bulk_ES_DD_GUID, BulkESDeduplicationFeature, BulkESDeduplicationFeature);
       INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_GUID);
       ELSE
       IF (State_BulkESDeduplication IS NOT NULL) THEN
       UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = BulkESDeduplicationFeatureId;

       ELSE
       INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_GUID);
       END IF;
      END IF;
END;

For PostgreSQL DBMS 

DO $$

DECLARE
	DeduplicationFeature VARCHAR(50) := 'Deduplication';
	DeduplicationFeatureId uuid;

	ESDeduplicationFeature VARCHAR(50) := 'ESDeduplication';
	ESDeduplicationFeatureId uuid;

	Bulk_ES_DD_Feature VARCHAR(50) := 'BulkESDeduplication';
	Bulk_ES_DD_FeatureId uuid;

  allEmployeesId uuid := 'A29A3BA5-4B0D-DE11-9A51-005056C00008';

BEGIN

	SELECT "Id" INTO DeduplicationFeatureId FROM "Feature"
	WHERE "Code" = DeduplicationFeature
	LIMIT 1;
	IF (DeduplicationFeatureId IS NOT NULL)
		THEN
			IF EXISTS (SELECT * FROM "AdminUnitFeatureState" WHERE "FeatureId" = DeduplicationFeatureId) THEN
				UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = DeduplicationFeatureId;
			ELSE
				INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId);
			END IF;
	ELSE
		DeduplicationFeatureId := uuid_generate_v4();
		INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (DeduplicationFeatureId, DeduplicationFeature, DeduplicationFeature);
		INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId);
	END IF;

	SELECT "Id" INTO ESDeduplicationFeatureId FROM "Feature"
	WHERE "Code" = ESDeduplicationFeature
	LIMIT 1;
	IF (ESDeduplicationFeatureId IS NOT NULL)
		THEN
			IF EXISTS (SELECT * FROM "AdminUnitFeatureState" WHERE "FeatureId" = ESDeduplicationFeatureId) THEN
				UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = ESDeduplicationFeatureId;
			ELSE
				INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId);
			END IF;
	ELSE
		ESDeduplicationFeatureId := uuid_generate_v4();
		INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (ESDeduplicationFeatureId, ESDeduplicationFeature, ESDeduplicationFeature);
		INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId);
	END IF;

	SELECT "Id" INTO Bulk_ES_DD_FeatureId FROM "Feature"
	WHERE "Code" = Bulk_ES_DD_Feature
	LIMIT 1;
	IF (Bulk_ES_DD_FeatureId IS NOT NULL)
		THEN
			IF EXISTS (SELECT * FROM "AdminUnitFeatureState" WHERE "FeatureId" = Bulk_ES_DD_FeatureId) THEN
				UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = Bulk_ES_DD_FeatureId;
			ELSE
				INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_FeatureId);
			END IF;
	ELSE
		Bulk_ES_DD_FeatureId := uuid_generate_v4();
		INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (Bulk_ES_DD_FeatureId, Bulk_ES_DD_Feature, Bulk_ES_DD_Feature);
		INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_FeatureId);
	END IF;
END $$;

Restart the Creatio application 

Clear Redis, restart the Creatio application and log in.

Recommended service maintenance operations 

We recommend performing MongoDB backup once a day to maintain the functionality of the service and enable restoring of data, e. g., in case of a power outage.