Machine learning service basics
The machine learning (lookup value prediction) service uses statistical analysis methods for machine learning based on historical data. For example, a history of customer communications with customer support is considered historical data in Creatio. The message text, the date and the account category are used. The result is the Responsible Group field.
Creatio interaction with the prediction service
There are two stages of model processing in Creatio: training and prediction.
Prediction model is the algorithm which builds predictions and enables the system to automatically make decisions based on historical data.
Training
The service is “trained” at this stage. Main training steps:
- Establishing a session for data transfer and training.
- Sequentially selecting a portion of data for the model and uploading it to the service.
- Requesting to include a model a training queue.
- Training engine processes the queue for model training, trains the model and saves its parameters to the local database.
- Creatio occasionally queries the service to get the model status.
- Once the model status is set to Done, the model is ready for prediction.
Prediction
The prediction task is performed through a call to the cloud service, indicating the Id of the model instance and the data for the prediction. The result of the service operation is a set of values with prediction probabilities, which is stored in Creatio in the MLPrediction table.
If there is a prediction in the MLPrediction table for a particular entity record, the predicted values for the field are automatically displayed on the edit page.
Creatio settings and data types for working with the prediction service
Creatio setup
The following data is provided for working with the prediction service in Creatio.
- The CloudServicesAPIKey system setting authenticates the Creatio instance in cloud services.
- The record in the ML problem types (MLProblemType) lookup with the populated ServiceUrl field is the address of the implemented prediction service.
- The model records in the ML model (MLModel) lookup that contains information about the selected data for the model, the training period, the current training status, etc. For each model, the MLProblemType field must contain a reference to the correct record of the ML problem types lookup.
- The MLModelTrainingPeriodMinutes system setting determines the frequency of model synchronization launch.
Expanding the training model logic
The above chain of classes calls and creates instances of each other through the IOC of the Terrasoft.Core.Factories.ClassFactory container.
If you need to replace the logic of any component, you need to implement the appropriate interface. When you start the application, you must bind the interface in your own implementation.
Interfaces for logic expansion:
IMLModelTrainerJob – the implementation of this interface will enable you to change the set of models for training.
IMLModelTrainer – responsible for the logic of loading data for training and updating the status of models.
IMLServiceProxy - the implementation of this interface will enable you to execute queries to arbitrary predictive services.
Auxiliary classes for forecasting
Auxiliary (utility) classes for forecasting enable you to implement two basic cases:
- Prediction at the time of creating or updating an entity record on the server.
- Prediction when the entity is changed on the edit page.
While predicting on the Creatio server-side, a business process is created that responds to the entity creation/change signal, reads a set of fields, and calls the prediction service. If you get the correct result, it stores the set of field values with probabilities in the MLClassificationResult table. If necessary, the business process records a separate value (for example, with the highest probability) in the corresponding field of the entity.
Creating data queries for the machine learning model
Use the Terrasoft.Core.DB.Select class instance for queries of training data or data for predicting machine learning service (see “Machine learning service”). It is dynamically imported by the Terrasoft.Configuration.ML.QueryInterpreter.
Use the provided userConnection variable as an argument of the Terrasoft.Core.UserConnection type in the Select constructor when building query expression. The column with the “Id” alias (the unique id of the target object instance) is required in the query expression.
The Select expression can be complex. Use the following practices to simplify it:
- Dynamic adding of types for the interpreter.
- Using local variables.
- Using the Terrasoft.Configuration.QueryExtensions utility class.
Dynamic adding of types for the interpreter
You can dynamically add types for the interpreter. For this, the QueryInterpreter class provides the RegisterConfigurationType and RegisterType methods. You can use them directly in the expression. For example, instead of direct using the type id:
you can use the name of a constant from dynamically registered enumeration:
Using local variables
You can use local variables to avoid code duplication and more convenient structuring. Constraint: the type of the variable must be statically calculated and defined by the var word.
For example, the query with repetitive use of delegates:
you can write in a following way:
Connecting a custom web-service to the machine learning functionality
You can implement typical machine learning problems (classification, scoring, numerical regression) or other similar problems (for example, customer churn forecast) using a custom web-service. This article covers the procedure for connecting a custom web-service implementation of a prediction model to Creatio.
The main principles of the machine learning service operation are covered in the “Machine learning service” article.
The general procedure of connecting a custom web-service to the machine learning service is as follows:
- Create a machine learning web-service engine.
- Expand the problem type list of the machine learning service.
- Implement a machine learning model.
Create a machine learning web-service engine
A custom web-service must implement a service contract for model training and making forecasts based on an existing prediction model. You can find a sample Swagger service contract of the Creatio machine learning service at https://demo-ml.bpmonline.com/swagger/index.html#/MLService
Required methods:
- /session/start – starting a model training session.
- /data/upload – uploading data for an active training session.
- /session/info/get – getting the status of the training session.
- <training start custom method> – a user-defined method to be called by Creatio when the data upload is over. The model training process must not be terminated until the execution of the method is complete. A training session may last for an indefinite period (minutes or even hours). When the training is over, the /session/info/get method will return the training session status: either Done or Error depending on the result. Additionally, if the model is trained successfully, the method will return a model instance summary (ModelSummary): metrics type, metrics value, instance ID, and other data.
- <Prediction custom method> – an arbitrary signature method that will make predictions based on a trained prediction model referenced by the ID.
Web-service development with the Microsoft Visual Studio IDE is covered in the “Developing the configuration server code in the user solution” article.
Expanding the problem type list of the machine learning service
To expand the problem type list of the Creatio machine learning service, add a new record to the MLProblemType lookup. You must specify the following parameters:
- Service endpoint Url – the URL endpoint for the online machine learning service.
- Training endpoint – the endpoint for the training start method.
- Prediction endpoint – the endpoint for the prediction method.
Implementing a machine learning model
To configure and display a machine learning model, you may need to extend the MLModelPage mini-page schema.
Implementing IMLPredictor
Implement the Predict method. The method accepts data exported from the system by object (formatted as Dictionary<string, object>, where key is the field name and value is the field value), and returns the prediction value. This method may use a proxy class that implements the IMLServiceProxy interface to facilitate web-service calls.
Implementing IMLEntityPredictor
Initialize the ProblemTypeId property with the ID of the new problem type record created in the MLProblemType lookup. Additionally, implement the following methods:
- SaveEntityPredictedValues – the method retrieves the prediction value and saves it for the system entity, for which the prediction process is run. If the returned value is of the double type or is similar to classification results, you can use the methods provided in the PredictionSaver auxiliary class.
- SavePrediction optional – the method saves the prediction value with a reference to the trained model instance and the ID of the entity (entityId). For basic problems, the system provides the MLPrediction and MLClassificationResult entities.
ExtendingIMLServiceProxy and MLServiceProxy optional
You can extend the existing IMLServiceProxy interface and the corresponding implementations in the prediction method of the current problem type. In particular, the MLServiceProxy class provides the Predict generic method that accepts contracts for input data and for prediction results.
Implementing IMLBatchPredictor
If the web-service is called with a large set of data (500 instances and more), implement the IMLBatchPredictor interface. You must implement the following methods:
- FormatValueForSaving – returns a converted prediction value ready for database storage. In case a batch prediction process is running, the record is updated using the Update method rather than Entity instances to speed up the process.
- SavePredictionResult – defines how the system will store the prediction value per entity. For basic ML problems, the system provides MLPrediction and MLClassificationResult objects.
Using the Terrasoft.Configuration.QueryExtensions utility class
The Terrasoft.Configuration.QueryExtensions utility class provides several extending methods for the Terrasoft.Core.DB.Select. This enables to build more compact queries.
As the object sourceColumn argument you can use following types (they will be transformed to the Terrasoft.Core.DB.QueryColumnExpression) for all extending methods:
- System.String – the name of the column in the TableAlias.ColumnName as ColumnAlias format (where the TableAlias and ColumnAlias are optional) or “*” – all columns.
- Terrasoft.Core.DB.QueryColumnExpression – will be added without changes.
- Terrasoft.Core.DB.IQueryColumnExpressionConvertible – will be converted.
- Terrasoft.Core.DB.Select – will be considered as subquery.
static Select Cols(this Select select, params object[] sourceColumns)
Adds specified columns or subexpressions to the query.
Using the Cols() extension method, instead of the following expression:
you can write:
static Select Count(this Select select, object sourceColumn)
Adds an aggregation column to calculate the number of non-empty values to the query.
For example, instead:
you can write:
static Select Coalesce(this Select select, params object[] sourceColumns)
Adds a column with the function of determining the first value not equal to NULL to the query.
For example, instead:
you can write:
static Select DateDiff(this Select select, DateDiffQueryFunctionInterval interval, object startDateExpression, object endDateExpression)
Adds a column that specifies the date difference to the query.
For example, instead:
you can write:
static Select IsNull(this Select select, object checkExpression, object replacementValue)
Adds a column with the function replacing NULL value with a replacement expression.
For example, instead:
you can write:
1. Implement custom predictive scoring
Download and unpack the *.zip archive that includes the web service. The service implements custom predictive scoring in a Microsoft Visual Studio Code project.
To develop a custom web service using a Microsoft Visual Studio Code project, follow the instructions in a separate article: Develop C# code in a configuration project.
The MLService web service implements the following endpoints:
- session/start
- data/upload
- session/info/get
- fakeScorer/beginTraining
- fakeScorer/predict
2. Expand the problem list of the ML service
- Click to open the System Designer.
- Go to the System setup block → Lookups.
- Open the ML problem types lookup.
-
Add a problem type.
- Click New on the lookup toolbar.
-
Fill out the problem type properties.
- Set Name to "Fake scoring."
- Set Service endpoint Url to "http://localhost:5000/."
- Set Training endpoint to "/fakeScorer/beginTraining."
-
Find an ID of the problem type. To do this, display the corresponding column in the lookup list.
- Click View → Select fields to display on the lookup toolbar.
- Add the column to the lookup list. To do this, click and select the Id column.
- Click Select → Save.
The ID of the Fake scoring problem type is 19fcfff1-98b9-4933-8f26-457ca45c35ed.
3. Implement a ML model
- Go to the Configuration section and select a custom package to add the schema.
-
Click Add → Page view model on the section list toolbar.
-
Fill out the schema properties.
- Set Code to "UsrMLModelPage."
- Set Title to "MLModelPage."
- Select "MLModelPage" in the Parent object property.
-
Implement the style of the new mini page similar to the mini page for creating a predictive scoring model. To do this, overload the getIsScoring() method.
View the source code of the view model schema of the page below.
- Click Save on the Module Designer's toolbar.
4. Implement the handling of the ML model results
- Go to the Configuration section and select a custom package to add the schema.
-
Click Add → Source code on the section list toolbar.
-
Fill out the schema properties.
- Set Code to "UsrFakeScoringEntityPredictor."
- Set Title to "FakeScoringEntityPredictor."
Click Apply to apply the properties.
-
Implement the handling of the ML model results.
- Implement the Predict() method that receives data exported from Creatio by object and returns the prediction value. The method uses a proxy class that implements the IMLServiceProxy interface to facilitate web service calls.
- Initialize the ProblemTypeId property using the ID of the Fake scoring problem type from the ML problem types lookup.
- Implement the SaveEntityPredictedValues() and SavePrediction() methods.
View the source code of the UsrFakeScoringEntityPredictor class below.
- Click Publish on the Source Code Designer’s toolbar to apply the changes on the database level.
5. Add a new problem type to the prediction method
- Go to the Configuration section and select a custom package to add the schema.
- Click Add → Source code on the section list toolbar.
-
Fill out the schema properties.
- Set Code to "UsrFakeScoringProxy."
- Set Title to "FakeScoringProxy."
Click Apply to apply the properties.
-
Add the Fake scoring problem type to the prediction method. To do this, extend the IMLServiceProxy base interface and implementations in the prediction method of the current problem type. The MLServiceProxy class implements the Predict() method that receives contracts for input data and prediction results.
View the source code of the UsrFakeScoringProxy class below.
- Click Publish on the Source Code Designer’s toolbar to apply the changes on the database level.
6. Implement the batch prediction functionality
- Go to the Configuration section and select a custom package to add the schema.
- Click Add → Source code on the section list toolbar.
-
Fill out the schema properties.
- Set Code to "UsrFakeBatchScorer."
- Set Title to "FakeBatchScorer."
Click Apply to apply the properties.
-
Implement the batch prediction functionality.
- Implement the IMLBatchPredictor interface.
- Implement the FormatValueForSaving() and SavePredictionResult() methods.
View the source code of the UsrFakeBatchScorer class below.
- Click Publish on the Source Code Designer’s toolbar to apply the changes on the database level.
Case description
Implement automatic prediction for the AccountCategory column by the values of the Country, EmployeesNumber and Industry field while saving the account record. The following conditions should be met:
- Model learning should be created on the base of account records for last 90 days.
- Moodel Retraining should be performed every 30 days.
- Permissible value of prediction accuracy for the model – 0,6.
Case implementation algorithm
1. Model learning
To train the model:
1. Add a record to the ML Model lookup. Values of the record fields are given in the Table 1.
Field | Value |
---|---|
Name | Predict account category |
ML problem type | Lookup prediction |
Target schema for prediction | Account |
Quality metric low limit | 0,6 |
Model retrain frequency (days) | 30 |
Training set metadata |
|
Training set query |
You can find examples of queries in the Creating data queries for the machine learning model article. |
Predictions enabled (checkbox) | Enable |
2. Perform the Execute model training job action on the ML Model lookup field.
Wait until the values of the Model processing status field will be changed in following sequence: DataTransfer, QueuedToTrain, Training, Done. The process may take several hours to finish (it depends on the amount of passed data and general workload of the predictive service.
2. Performing the prediction
To start the predictions:
1. Create a business process in the user package. Select the saving of the Contact object as a start signal for the process. Check if the required fields are populated (Fig. 1).
2. Add the MLModelId lookup parameter that refers to the ML Model entity. Select the record with the Predict account category model as a value.
3. Add the RecordId lookup parameter that refers to the Account entity. Select a reference for theRecordId parameter of the Signal element as a value.
4. Add a Script task element on the business process diagram and add the following code there:
After saving and compiling the process, the prediction will be performed for new accounts. The prediction will be displayed on the account edit page.
Implements the IJobExecutor and IMLModelTrainerJob interfaces.
It is used for model synchronization tasks.
Orchestrates model processing on the side of Creatio by launching data transfer sessions, starting trainings, and also checking the status of the models processed by the service. Instances are launched by default by the task scheduler through the standard Execute method of the IJobExecutor interface.
Methods
A virtual method that encapsulates the synchronization logic. The base implementation of this method performs the following actions:
-
Selecting models for training – the records are selected from MLModel based on the following filter:
- The MetaData and TrainingSetQuery fields are populated.
- The Status field is not in the NotStarted, Done or Error state (or not populated at all).
- TrainFrequency is more than 0.
- The TriedToTrainOn days have passed since the last training date (TrainFrequency).
For each record of this selection, the data is sent to the service with the help of the predictive model trainer.
- Selecting previously trained models and updating their status (if necessary).
The data transfer session for the selection starts for each suitable model. The data is sent in packages of 1000 records during the session. For each model, the selection size is limited to 75,000 records.
Implements the IMLModelTrainer interface.
Trainer of the prediction model.
Responsible for the overall processing of a single model during the training stage. Communication with the service is provided through a proxy to a predictive service.
Methods
Sets the training session for the model.
Transfers the data according to the model selection in packages of 1000 records. The selection is limited to 75,000 records.
Indicates the completion of data transfer and informs the service about the need to put the model in the training queue.
Requests the service for the current state of the model and updates the Status (if necessary).
If the training was successful (Status contains the Done value), the service returns the metadata for the trained instance, particularly the accuracy of the resulting instance. If the precision is greater than or equal to the lower threshold (MetricThreshold), the ID of the new instance is written in the ModelInstanceUId field.
Implements the IMLServiceProxy interface.
It is the proxy to the prediction service.
A wrapper class for http requests to a prediction service.
Methods
Sends a data package for the training session.
Calls the service for setting up training in the queue.
Requests the current state from the service for the training session.
Calls the prediction service of the field value for a single set of field values for the previously trained model instance. In the Dictionary data parameter, the field name is passed as the key, which must match the name specified in the MetaData field of the model lookup. If the result is successful, the method returns a list of values with the ClassificationResult type.
The ClassificationResult properties
Field value.
The probability of a given value in the [0:1] range. The sum of the probabilities for one list of results is close to 1 (values of about 0 can be omitted).
The level of importance of this prediction.
High | This field value has a distinct advantage over other values from the list. Only one element in the prediction list can have this level. |
Medium | The value of the field is close to several other high values in the list. For example, two values in the list have a probability of 0.41 and 0.39, and all the others are significantly smaller. |
None | Irrelevant values with low probabilities. |
A utility class that helps to predict the value of a field based on a particular model (either one or several models) for a particular entity.
Methods
Based on the model Id and entity Id, performs predictions and records the results in the resulting entity field. Works with any machine learning task: classification, scoring, numeric field prediction.
Based on the model (or list of several models created for the same object) Id and entity Id performs classification and returns the glossary, whose key is the model object, and the values are the predicted values.
The utility class that assists to save the prediction results in the Creatio object.
Methods
Saves the MLEntityPredictor.ClassifyEntityValues classification results in the Creatio object. By default, it saves only the result, whose Significance equals to High. You can still override this behavior using the passed onSetEntityValue delegate. If the delegate returns false, the value will not be recorded in the Creatio object.
Value | Field value. |
Probability | The probability of a given value in the [0:1] range. The sum of the probabilities for one list of results is close to 1 (values of about 0 can be omitted). |
Significance | The level of importance of this prediction. |
High | This field value has a distinct advantage over other values from the list. Only one element in the prediction list can have this level. |
Medium | The value of the field is close to several other high values in the list. For example, two values in the list have a probability of 0.41 and 0.39, and all the others are significantly smaller. |
None | Irrelevant values with low probabilities. |
MLModel lookup
Contains information about the selected data for the model, the training period, the current training status, etc.
Main MLModel lookup fields
Model name.
The identifier of the current model instance.
The date/time of last training attempt.
The date/time of last training attempt.
Model retraining frequency (days).
Metadata with selection column types.
In this code:
- inputs – a set of incoming columns for the model.
- output – a column, the value of which the model should predict.
Column descriptions support the following attributes:
- name – field name from the TrainingSetQuery expression.
- type – data type for the training engine.
Supported values
Text text column; Lookup lookup column; Boolean logical data type; Numeric numeric type; DateTime date and time. - isRequired – mandatory field value (true/false). Default value – false.
C#-expression of the training data selection. This expression should return the Terrasoft.Core.DB.Select class instance.
You can find examples of queries in the "Examples of data queries for the machine learning model” example.
A link to an object schema for which the prediction will be executed.
The status of model processing (data transfer, training, ready for forecasting).
A quality metric for the current model instance.
Lowest threshold of model quality.
A flag that includes the prediction for this model.
Current training session.
Machine learning problem (defines the algorithm and service url for model training).