Skip to main content
Version: 8.0All Creatio products

Bulk duplicate search

Use bulk duplicate search to deduplicate Creatio section records.

Important

Set up the global search service in ElasticSearch to ensure correct operation of the bulk duplicate search. Learn more in a separate article: Global search. You need repository access to set up the current version of the global search service. Contact Creatio support to verify your license and gain access to the repository.

Basic knowledge of kubernetes or docker-compose and Linux administration is required to set up bulk duplicate search service.

You can deploy the components of global duplicate search using Kubernetes orchestrator and Helm package manager or Docker.

Use the requirements calculator to check the server requirements.

Set up the bulk duplicate search service using Kubernetes

To set up the service, download the source files. Download files.

To install the service:

  1. Set up the target environment:

    1. Kubernetes cluster. Learn more about setting up and managing the cluster in the Kubernetes documentation.
    2. Helm package manager. Learn more about installing the package manager in the Helm documentation.
    3. Global search service via Kubernetes. Learn more in a separate article: Global search.
  2. Unpack the values-onsite.yaml file from the source file archive and save the file to the same directory as the archive.

  3. Open the file. View the main parameters the file uses in the table below. To optimize the load on Redis and RabbitMQ, we recommend using the same service instances for both global search and bulk duplicate search. Bulk duplicate search always uses the same ElasticSearch service as global search.

    note

    By default, the Mongodb service is deployed together with the duplicate search service. If you need to change the Mongodb deployment parameters, unpack the source files from the archive and change the parameters in the mongodb section of the values-onsite.yaml file.

  4. Set up the connection to services you installed previously, for example, when setting up global search, in the values-onsite.yaml source file.

    • For Redis:

      Set up the connection to the previously installed Redis. To do this, specify the connection parameters in the global.redis section of the values-onsite.yaml file.

      # Multi pods global parameters
      global:
      # Redis server connection parameters
      redis:
      host: [host]
      port: [port]
      database: [database]

      Where

      [host] is the address of the Redis server.

      [port] is the connection port of the Redis server.

      [database] is the name of the Redis database.

    • For RabbitMQ:

      Set up the connection to the previously installed RabbitMQ. To do this, specify the connection parameters in the global.rabbitmq section of the values-onsite.yaml file.

      # Multi pods global parameters
      global:
      # RabbitMQ connection parameters
      rabbitmq:
      host: [host]
      vhost: [vhost]
      port: [port]
      user: [user]
      password: [password]

      Where

      [host] is the address of the RabbitMQ service.

      [vhost] is the virtual host address of the RabbitMQ service.

      [port] is the amqp connection port of RabbitMQ.

      [user] is the RabbitMQ user.

      [password] is the RabbitMQ user password.

    • For Mongodb:

      1. Disable the Mongodb installation. To do this, specify enabled: false in the mongodb section of the values-onsite.yaml file.

        mongodb:
        enabled: false
      2. Set up the connection to external mongodb. To do this, specify the connection parameters in the global.mongodb section of the values-onsite.yaml file.

        # Multi pods global parameters
        global:
        # Deduplication database connection parameters
        mongodb:
        host: [host]
        port: [port]
        user: [user]
        password: [password]

        Where

        [host] is the address of the Mongodb service.

        [port] is the port to connect to the service.

        [user] is the Mongodb user on whose behalf to connect.

        [password] is the Mongodb user password.

    • For ElasticSearch:

      Set up the connection to the previously installed ElasticSearch. To do this, specify the connection parameters in the global.elasticsearch section of the values-onsite.yaml file.

      # Multi pods global parameters
      global:
      # Elastic search connection parameters
      elasticsearch:
      protocol: [protocol]
      host: [host]
      port: [port]
      path: [path]
      user: [user]
      password: [password]

      Where

      [user] is the ElasticSearch user.

      [password] is the ElasticSearch user password.

      [port] is the port to connect to ElasticSearch.

      [path] is the path parameter of the ElasticSearch service (by default, ).

      [protocol] is the ElasticSearch connection protocol (by default, http).

      [host] is the address of the ElasticSearch service.

  5. Run the helm install gs -f values-onsite.yaml deduplication.tgz command. As a result, Helm will install the bulk duplicate search service and selected dependencies.

    note

    By default, Helm deploys services with NodePort type.

The main parameters of the global search service the values.yaml file uses.

Parameter

Parameter description

duplicatesSearchWorker.maxDuplicatesPerRecord

The maximum allowable number of duplicates for a single record.

log4Net

Logging settings.

global.elasticsearch

ElasticSearch connection parameters.

global.rabbitmq

RabbitMQ connection settings.

global.mongodb

The connection settings of the duplicate search service's internal base.

global.db

The connection settings of the global search service's internal service database.

global.redis

Redis connection settings.

Set up the bulk duplicate search service in Docker

Components of the bulk duplicate search service

Prerequisites:

  1. Global search components. View the index in a separate article: Global search.

  2. Components of the bulk duplicate search service. View the component index below.

    Mongodb. Document-oriented DBMS.

    dd-web-api. Web service for communication in Creatio.

    dd-data-service. Internal service for communication with mongodb.

    dd-duplicates-search-worker. Duplicate search component.

    dd-duplicates-deletion-worker. Component for targeted deletion of duplicates.

    dd-duplicates-confirmation-worker. Component that groups and filters found duplicates, taking into account their uniqueness.

    dd-duplicates-cleaner. Component that clears duplicates.

    dd-deduplication-task-worker. Component that sets the deduplication task.

    dd-deduplication-preparation-worker. Component that prepares the deduplication process. Generates duplicate search queries according to the rules.

    dd-deduplication-task-diagnostic-worker. Component that controls the execution of the duplicate search task.

To set up the components, download the source files. Download files.

  1. Deploy and set up Creatio global search.
  2. Download and unpack the necessary source files. Copy them to the computer that has docker, docker-compose software installed. Download files.
  3. Set up the environment variables.
  4. Launch the containers.
  5. Verify that containers are running successfully.
  6. Verify logging.
  7. Enable the bulk duplicate search functionality in Creatio.

Set up the environment variables

The compose/.env file stores the environment variables. Edit this file to set the variables.

Variable name

Details

Default value

ELASTICSEARCH_URI

The IP address of the server where you deployed ElasticSearch as part of Creatio global search setup.

http://user:password@external.elasticsearch-ip:9200/

Launch the containers

To launch the containers, run the following command:

cd compose # go to the compose directory
docker-compose up -d

Verify that containers ran successfully

To view the list of all running containers, run the following command at the console:

docker ps --filter "label=service=dd" -a --format "table {{.Names}}\t{{.Ports}}\t{{.Status}}\t{{.RunningFor}}"

The running containers have an "Up" status.

Verify logging

By default, logging is performed during the "stdout" container command execution. To view the last 100 records from the dd-data-service container, run the following command:

docker logs --tail 100 dd-data-service

Update the bulk duplicate search version

To update the bulk duplicate search version 2.0 to version 3.0, take the following steps.

  1. Back up Creatio duplicate data. To do this, run the /api/snapshot/backup/gzip/{indexName} command via the http://[server IP address]:8086/api swagger web-api. {indexName} is the name (last 64 characters) of the index in the "Global search url address" ("GlobalSearchUrl" code) system setting.
  2. Delete the version 2.0 of docker volumes. To do this, open the docker-compose directory that contains the version 2.0 files and run the docker-compose down -v command.
  3. Install the version 3.0 of the bulk duplicate search service.
  4. Upload the duplicate data retrieved on step 1 to the service. To do this, run the /api/snapshot/restore/gzip command in the new service version via swagger.

Enable the bulk duplicate search functionality in Creatio

Take the following steps in Creatio.

  1. Configure the "Deduplication service api address" system setting.
  2. Set up the "Duplicates search" operation permissions.
  3. Enable the bulk duplicate search functionality in Creatio. Note that this setting is DBMS-specific.
  4. Restart the Creatio application.

Configure the "Deduplication service api address" system setting

Go to the System settings section, open the "Deduplication service api address" ("DeduplicationWebApiUrl" code) system setting, and specify the URL to dd-web-api. Use a string of the following type: http://external.deduplication-web-api:8086.

Set up the "Duplicates search" operation permissions

Go to the Operation permissions section, open the "Duplicates search" ("CanSearchDuplicates" code) system operation, and, on the Operation permission detail, grant permissions to the necessary users/roles, who can search for duplicates.

Enable the bulk duplicate search functionality in Creatio

Toggle the bulk duplicate search (Deduplication, ESDeduplication, BulkESDeduplication) functionality by running an SQL script. The script depends on the DBMS: Microsoft SQL, Oracle, or PostgreSQL.

DECLARE @DeduplicationFeature NVARCHAR(50) = 'Deduplication';
DECLARE @DeduplicationFeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id FROM Feature WHERE Code = @DeduplicationFeature);

DECLARE @ESDeduplicationFeature NVARCHAR(50) = 'ESDeduplication';
DECLARE @ESDeduplicationFeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id FROM Feature WHERE Code = @ESDeduplicationFeature);

DECLARE @Bulk_ES_DD_Feature NVARCHAR(50) = 'BulkESDeduplication';
DECLARE @Bulk_ES_DD_FeatureId UNIQUEIDENTIFIER = (SELECT TOP 1 Id
FROM Feature WHERE Code =@Bulk_ES_DD_Feature);

DECLARE @allEmployeesId UNIQUEIDENTIFIER = 'A29A3BA5-4B0D-DE11-9A51-005056C00008';
IF (@DeduplicationFeatureId IS NOT NULL)
BEGIN
IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @DeduplicationFeatureId)
UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId =@DeduplicationFeatureId
ELSE
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1',
@DeduplicationFeatureId)
END;
ELSE
BEGIN
SET @DeduplicationFeatureId = NEWID()
INSERT INTO Feature (Id, Name, Code) VALUES
(@DeduplicationFeatureId, @DeduplicationFeature, @DeduplicationFeature)
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @DeduplicationFeatureId)
END;

IF (@ESDeduplicationFeatureId IS NOT NULL)
BEGIN
IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @ESDeduplicationFeatureId)
UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId = @ESDeduplicationFeatureId
ELSE
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @ESDeduplicationFeatureId)
END;
ELSE
BEGIN
SET @ESDeduplicationFeatureId = NEWID()
INSERT INTO Feature (Id, Name, Code) VALUES (@ESDeduplicationFeatureId, @ESDeduplicationFeature, @ESDeduplicationFeature)
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @ESDeduplicationFeatureId)
END;

IF (@Bulk_ES_DD_FeatureId IS NOT NULL)
BEGIN
IF EXISTS (SELECT * FROM AdminUnitFeatureState WHERE FeatureId = @Bulk_ES_DD_FeatureId)
UPDATE AdminUnitFeatureState SET FeatureState = 1 WHERE FeatureId =@Bulk_ES_DD_FeatureId
ELSE
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState,FeatureId) VALUES (@allEmployeesId, '1', @Bulk_ES_DD_FeatureId)
END;
ELSE
BEGIN
SET @Bulk_ES_DD_FeatureId = NEWID()
INSERT INTO Feature (Id, Name, Code) VALUES (@Bulk_ES_DD_FeatureId, @Bulk_ES_DD_Feature, @Bulk_ES_DD_Feature)
INSERT INTO AdminUnitFeatureState (SysAdminUnitId, FeatureState, FeatureId) VALUES (@allEmployeesId, '1', @Bulk_ES_DD_FeatureId)
END;

Restart the Creatio application

Clear redis, restart the Creatio application and log in.

Recommended operations for the service functioning

We recommend performing mongodb backup once a day to support the functionality of the service and enable restoring of data, e. g., in case of electricity breakdowns.


See also

Global search setup

Deduplication

System requirements calculator