Bulk duplicate search
Use bulk duplicate search to deduplicate Creatio section records.
Set up the global search service in ElasticSearch to ensure correct operation of the bulk duplicate search. Learn more in a separate article: Global search. You need repository access to set up the current version of the global search service. Contact Creatio support to verify your license and gain access to the repository.
Basic knowledge of kubernetes or docker-compose and Linux administration is required to set up bulk duplicate search service.
You can deploy the components of global duplicate search using Kubernetes orchestrator and Helm package manager or Docker.
Use the requirements calculator to check the server requirements.
Set up the bulk duplicate search service using Kubernetes
To set up the service, download the source files. Download files.
To install the service:
-
Set up the target environment:
- Kubernetes cluster. Learn more about setting up and managing the cluster in the Kubernetes documentation.
- Helm package manager. Learn more about installing the package manager in the Helm documentation.
- Global search service via Kubernetes. Learn more in a separate article: Global search.
-
Unpack the values-onsite.yaml file from the source file archive and save the file to the same directory as the archive.
-
Open the file. View the main parameters the file uses in the table below. To optimize the load on Redis and RabbitMQ, we recommend using the same service instances for both global search and bulk duplicate search. Bulk duplicate search always uses the same ElasticSearch service as global search.
noteBy default, the Mongodb service is deployed together with the duplicate search service. If you need to change the Mongodb deployment parameters, unpack the source files from the archive and change the parameters in the mongodb section of the values-onsite.yaml file.
-
Set up the connection to services you installed previously, for example, when setting up global search, in the values-onsite.yaml source file.
-
For Redis:
Set up the connection to the previously installed Redis. To do this, specify the connection parameters in the
global.redis
section of the values-onsite.yaml file.# Multi pods global parameters
global:
# Redis server connection parameters
redis:
host: [host]
port: [port]
database: [database]Where
[host] is the address of the Redis server.
[port] is the connection port of the Redis server.
[database] is the name of the Redis database.
-
For RabbitMQ:
Set up the connection to the previously installed RabbitMQ. To do this, specify the connection parameters in the
global.rabbitmq
section of the values-onsite.yaml file.# Multi pods global parameters
global:
# RabbitMQ connection parameters
rabbitmq:
host: [host]
vhost: [vhost]
port: [port]
user: [user]
password: [password]Where
[host] is the address of the RabbitMQ service.
[vhost] is the virtual host address of the RabbitMQ service.
[port] is the amqp connection port of RabbitMQ.
[user] is the RabbitMQ user.
[password] is the RabbitMQ user password.
-
For Mongodb:
-
Disable the Mongodb installation. To do this, specify
enabled: false
in themongodb
section of the values-onsite.yaml file.mongodb:
enabled: false -
Set up the connection to external mongodb. To do this, specify the connection parameters in the
global.mongodb
section of the values-onsite.yaml file.# Multi pods global parameters
global:
# Deduplication database connection parameters
mongodb:
host: [host]
port: [port]
user: [user]
password: [password]Where
[host] is the address of the Mongodb service.
[port] is the port to connect to the service.
[user] is the Mongodb user on whose behalf to connect.
[password] is the Mongodb user password.
-
-
For ElasticSearch:
Set up the connection to the previously installed ElasticSearch. To do this, specify the connection parameters in the
global.elasticsearch
section of the values-onsite.yaml file.# Multi pods global parameters
global:
# Elastic search connection parameters
elasticsearch:
protocol: [protocol]
host: [host]
port: [port]
path: [path]
user: [user]
password: [password]Where
[user] is the ElasticSearch user.
[password] is the ElasticSearch user password.
[port] is the port to connect to ElasticSearch.
[path] is the path parameter of the ElasticSearch service (by default, ).
[protocol] is the ElasticSearch connection protocol (by default, http).
[host] is the address of the ElasticSearch service.
-
-
Run the
helm install gs -f values-onsite.yaml deduplication.tgz
command. As a result, Helm will install the bulk duplicate search service and selected dependencies.noteBy default, Helm deploys services with NodePort type.
The main parameters of the global search service the values.yaml file uses.
Parameter | Parameter description |
---|---|
duplicatesSearchWorker.maxDuplicatesPerRecord | The maximum allowable number of duplicates for a single record. |
log4Net | Logging settings. |
global.elasticsearch | ElasticSearch connection parameters. |
global.rabbitmq | RabbitMQ connection settings. |
global.mongodb | The connection settings of the duplicate search service's internal base. |
global.db | The connection settings of the global search service's internal service database. |
global.redis | Redis connection settings. |
Set up the bulk duplicate search service in Docker
Components of the bulk duplicate search service
Prerequisites:
-
Global search components. View the index in a separate article: Global search.
-
Components of the bulk duplicate search service. View the component index below.
Mongodb. Document-oriented DBMS.
dd-web-api. Web service for communication in Creatio.
dd-data-service. Internal service for communication with mongodb.
dd-duplicates-search-worker. Duplicate search component.
dd-duplicates-deletion-worker. Component for targeted deletion of duplicates.
dd-duplicates-confirmation-worker. Component that groups and filters found duplicates, taking into account their uniqueness.
dd-duplicates-cleaner. Component that clears duplicates.
dd-deduplication-task-worker. Component that sets the deduplication task.
dd-deduplication-preparation-worker. Component that prepares the deduplication process. Generates duplicate search queries according to the rules.
dd-deduplication-task-diagnostic-worker. Component that controls the execution of the duplicate search task.
To set up the components, download the source files. Download files.
- Deploy and set up Creatio global search.
- Download and unpack the necessary source files. Copy them to the computer that has docker, docker-compose software installed. Download files.
- Set up the environment variables.
- Launch the containers.
- Verify that containers are running successfully.
- Verify logging.
- Enable the bulk duplicate search functionality in Creatio.
Set up the environment variables
The compose/.env file stores the environment variables. Edit this file to set the variables.
Variable name | Details | Default value |
---|---|---|
ELASTICSEARCH_URI | The IP address of the server where you deployed ElasticSearch as part of Creatio global search setup. |
|
Launch the containers
To launch the containers, run the following command:
cd compose # go to the compose directory
docker-compose up -d
Verify that containers ran successfully
To view the list of all running containers, run the following command at the console:
docker ps --filter "label=service=dd" -a --format "table {{.Names}}\t{{.Ports}}\t{{.Status}}\t{{.RunningFor}}"
The running containers have an "Up" status.
Verify logging
By default, logging is performed during the "stdout" container command execution. To view the last 100 records from the dd-data-service container, run the following command:
docker logs --tail 100 dd-data-service
Update the bulk duplicate search version
To update the bulk duplicate search version 2.0 to version 3.0, take the following steps.
- Back up Creatio duplicate data. To do this, run the
/api/snapshot/backup/gzip/{indexName}
command via thehttp://[server IP address]:8086/api
swagger web-api. {indexName} is the name (last 64 characters) of the index in the "Global search url address" ("GlobalSearchUrl" code) system setting. - Delete the version 2.0 of docker volumes. To do this, open the docker-compose directory that contains the version 2.0 files and run the docker-compose down -v command.
- Install the version 3.0 of the bulk duplicate search service.
- Upload the duplicate data retrieved on step 1 to the service. To do this, run the
/api/snapshot/restore/gzip
command in the new service version via swagger.
Enable the bulk duplicate search functionality in Creatio
Take the following steps in Creatio.
- Configure the "Deduplication service api address" system setting.
- Set up the "Duplicates search" operation permissions.
- Enable the bulk duplicate search functionality in Creatio. Note that this setting is DBMS-specific.
- Restart the Creatio application.
Configure the "Deduplication service api address" system setting
Go to the System settings section, open the "Deduplication service api address" ("DeduplicationWebApiUrl" code) system setting, and specify the URL to dd-web-api. Use a string of the following type: http://external.deduplication-web-api:8086
.
Set up the "Duplicates search" operation permissions
Go to the Operation permissions section, open the "Duplicates search" ("CanSearchDuplicates" code) system operation, and, on the Operation permission detail, grant permissions to the necessary users/roles, who can search for duplicates.
Enable the bulk duplicate search functionality in Creatio
Toggle the bulk duplicate search (Deduplication, ESDeduplication, BulkESDeduplication) functionality by running an SQL script. The script depends on the DBMS: Microsoft SQL, Oracle, or PostgreSQL.
- For Microsoft SQL DBMS
- For Oracle DBMS
- For PostgreSQL DBMS
DECLARE @DeduplicationFeature NVARCHAR(50) = 'Deduplication';
DECLARE @DeduplicationFeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id FROM Feature WHERE Code = @DeduplicationFeature);
DECLARE @ESDeduplicationFeature NVARCHAR(50) = 'ESDeduplication';
DECLARE @ESDeduplicationFeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id FROM Feature WHERE Code = @ESDeduplicationFeature);
DECLARE @Bulk_ES_DD_Feature NVARCHAR(50) = 'BulkESDeduplication';
DECLARE @Bulk_ES_DD_FeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id
FROM Feature WHERE Code =@Bulk_ES_DD_Feature);
DECLARE @allEmployeesId UNIQUEIDENTIFIER = 'A29A3BA5-4B0D-DE11-9A51-005056C00008';
IF (@DeduplicationFeatureId IS NOT NULL)
BEGIN
IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @DeduplicationFeatureId)
UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId =@DeduplicationFeatureId
ELSE
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1',
@DeduplicationFeatureId)
END;
ELSE
BEGIN
SET @DeduplicationFeatureId = NEWID()
INSERT INTO Feature (Id, Name, Code) VALUES
(@DeduplicationFeatureId, @DeduplicationFeature, @DeduplicationFeature)
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @DeduplicationFeatureId)
END;
IF (@ESDeduplicationFeatureId IS NOT NULL)
BEGIN
IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @ESDeduplicationFeatureId)
UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId = @ESDeduplicationFeatureId
ELSE
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @ESDeduplicationFeatureId)
END;
ELSE
BEGIN
SET @ESDeduplicationFeatureId = NEWID()
INSERT INTO Feature (Id, Name, Code) VALUES (@ESDeduplicationFeatureId, @ESDeduplicationFeature, @ESDeduplicationFeature)
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @ESDeduplicationFeatureId)
END;
IF (@Bulk_ES_DD_FeatureId IS NOT NULL)
BEGIN
IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @Bulk_ES_DD_FeatureId)
UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId =@Bulk_ES_DD_FeatureId
ELSE
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState,FeatureId) VALUES (@allEmployeesId, '1', @Bulk_ES_DD_FeatureId)
END;
ELSE
BEGIN
SET @Bulk_ES_DD_FeatureId = NEWID()
INSERT INTO Feature (Id, Name, Code) VALUES (@Bulk_ES_DD_FeatureId, @Bulk_ES_DD_Feature, @Bulk_ES_DD_Feature)
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @Bulk_ES_DD_FeatureId)
END;
CREATE OR REPLACE FUNCTION
generate_uuid return varchar2 is
v_uuid varchar2(38);
v_guid varchar2(32);
BEGIN
v_guid := sys_guid();
v_uuid := lower(
'{' ||
substr(v_guid, 1,8) || '-' ||
substr(v_guid, 9,4) || '-' ||
substr(v_guid, 13,4) || '-' ||
substr(v_guid, 17,4) || '-' ||
substr(v_guid, 21) ||
'}'
);
RETURN v_uuid;
END;
/
DECLARE
DeduplicationFeature VARCHAR(50) := 'Deduplication';
DeduplicationFeatureId VARCHAR(38) := NULL;
DeduplicationFeatureId_GUID VARCHAR(38) := generate_uuid();
ESDeduplicationFeature VARCHAR(50) := 'ESDeduplication';
ESDeduplicationFeatureId VARCHAR(38) := NULL;
ESDeduplicationFeatureId_GUID VARCHAR(38) := generate_uuid();
BulkESDeduplicationFeature VARCHAR(50) := 'BulkESDeduplication';
BulkESDeduplicationFeatureId VARCHAR(38) := NULL;
Bulk_ES_DD_GUID VARCHAR(38) := generate_uuid();
allEmployeesId VARCHAR(38) := '{7F3B869F-34F3-4F20-AB4D-7480A5FDF647}';
State_Deduplication VARCHAR(1) := NULL;
State_ESDeduplication VARCHAR(1) := NULL;
State_BulkESDeduplication VARCHAR(1) := NULL;
BEGIN
SELECT MAX("Id") INTO DeduplicationFeatureId FROM "Feature" WHERE "Code" = DeduplicationFeature AND rownum = 1;
SELECT MAX("Id") INTO ESDeduplicationFeatureId FROM "Feature" WHERE "Code" = ESDeduplicationFeature AND rownum = 1;
SELECT MAX("Id") INTO BulkESDeduplicationFeatureId FROM "Feature" WHERE "Code" = BulkESDeduplicationFeature AND rownum = 1;
SELECT MAX("FeatureState") INTO State_Deduplication FROM "AdminUnitFeatureState" WHERE "FeatureId" = DeduplicationFeatureId AND rownum = 1;
SELECT MAX("FeatureState") INTO State_ESDeduplication FROM "AdminUnitFeatureState" WHERE "FeatureId" = BulkESDeduplicationFeatureId AND rownum = 1;
SELECT MAX("FeatureState") INTO State_BulkESDeduplication FROM "AdminUnitFeatureState" WHERE "FeatureId" = BulkESDeduplicationFeatureId AND rownum =1;
IF (DeduplicationFeatureId IS NULL) THEN
INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (DeduplicationFeatureId_GUID, DeduplicationFeature, DeduplicationFeature);
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId_GUID);
ELSE
IF (State_Deduplication IS NOT NULL) THEN
UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = DeduplicationFeatureId;
ELSE
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId_GUID);
END IF;
END IF;
IF (ESDeduplicationFeatureId IS NULL) THEN
INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (ESDeduplicationFeatureId_GUID, ESDeduplicationFeature, ESDeduplicationFeature);
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId_GUID);
ELSE
IF (State_ESDeduplication IS NOT NULL) THEN
UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = ESDeduplicationFeatureId;
ELSE
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId_GUID);
END IF;
END IF;
IF (BulkESDeduplicationFeatureId IS NULL) THEN
INSERT INTO "Feature" ("Id", "Name", "Code") VALUES(Bulk_ES_DD_GUID, BulkESDeduplicationFeature, BulkESDeduplicationFeature);
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_GUID);
ELSE
IF (State_BulkESDeduplication IS NOT NULL) THEN
UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = BulkESDeduplicationFeatureId;
ELSE
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_GUID);
END IF;
END IF;
END;
DO $$
DECLARE
DeduplicationFeature VARCHAR(50) := 'Deduplication';
DeduplicationFeatureId uuid;
ESDeduplicationFeature VARCHAR(50) := 'ESDeduplication';
ESDeduplicationFeatureId uuid;
Bulk_ES_DD_Feature VARCHAR(50) := 'BulkESDeduplication';
Bulk_ES_DD_FeatureId uuid;
allEmployeesId uuid := 'A29A3BA5-4B0D-DE11-9A51-005056C00008';
BEGIN
SELECT "Id" INTO DeduplicationFeatureId FROM "Feature"
WHERE "Code" = DeduplicationFeature
LIMIT 1;
IF (DeduplicationFeatureId IS NOT NULL)
THEN
IF EXISTS (SELECT * FROM "AdminUnitFeatureState" WHERE "FeatureId" = DeduplicationFeatureId) THEN
UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = DeduplicationFeatureId;
ELSE
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId);
END IF;
ELSE
DeduplicationFeatureId := uuid_generate_v4();
INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (DeduplicationFeatureId, DeduplicationFeature, DeduplicationFeature);
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', DeduplicationFeatureId);
END IF;
SELECT "Id" INTO ESDeduplicationFeatureId FROM "Feature"
WHERE "Code" = ESDeduplicationFeature
LIMIT 1;
IF (ESDeduplicationFeatureId IS NOT NULL)
THEN
IF EXISTS (SELECT * FROM "AdminUnitFeatureState" WHERE "FeatureId" = ESDeduplicationFeatureId) THEN
UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = ESDeduplicationFeatureId;
ELSE
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId);
END IF;
ELSE
ESDeduplicationFeatureId := uuid_generate_v4();
INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (ESDeduplicationFeatureId, ESDeduplicationFeature, ESDeduplicationFeature);
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', ESDeduplicationFeatureId);
END IF;
SELECT "Id" INTO Bulk_ES_DD_FeatureId FROM "Feature"
WHERE "Code" = Bulk_ES_DD_Feature
LIMIT 1;
IF (Bulk_ES_DD_FeatureId IS NOT NULL)
THEN
IF EXISTS (SELECT * FROM "AdminUnitFeatureState" WHERE "FeatureId" = Bulk_ES_DD_FeatureId) THEN
UPDATE "AdminUnitFeatureState" SET "FeatureState" = 1 WHERE "FeatureId" = Bulk_ES_DD_FeatureId;
ELSE
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_FeatureId);
END IF;
ELSE
Bulk_ES_DD_FeatureId := uuid_generate_v4();
INSERT INTO "Feature" ("Id", "Name", "Code") VALUES (Bulk_ES_DD_FeatureId, Bulk_ES_DD_Feature, Bulk_ES_DD_Feature);
INSERT INTO "AdminUnitFeatureState" ("SysAdminUnitId", "FeatureState", "FeatureId") VALUES (allEmployeesId, '1', Bulk_ES_DD_FeatureId);
END IF;
END $$;
Restart the Creatio application
Clear redis, restart the Creatio application and log in.
Recommended operations for the service functioning
We recommend performing mongodb backup once a day to support the functionality of the service and enable restoring of data, e. g., in case of electricity breakdowns.