
The RETRAIN statement is used to retrain the already trained predictors with the new data. The predictor is updated to leverage the new data in optimizing its predictive capabilities.

Retraining takes at least as much time as the training process of the predictor did because now the dataset used to retrain has new or updated data in addition to the old data.


Here is the syntax:

RETRAIN [MODEL] project_name.predictor_name
[FROM [integration_name | project_name]
    (SELECT column_name, ...
     FROM [integration_name. | project_name.]table_name)
PREDICT target_name
USING engine = 'engine_name',
      tag = 'tag_name',
      active = 0/1];

On execution, we get:

Query OK, 0 rows affected (0.058 sec)


project_nameName of the project where the model resides.
predictor_nameName of the model to be retrained.
integration_nameOptional. Name of the integration created using the CREATE DATABASE statement or file upload.
(SELECT column_name, ... FROM table_name)Optional. Selecting data to be used for training and validation.
target_columnOptional. Column to be predicted.
engine_nameYou can optionally provide an ML engine, based on which the model is retrained.
tag_nameYou can optionally provide a tag that is visible in the training_options column of the mindsdb.models table.
activeOptional. Setting it to 0 causes the retrained version to be inactive. And setting it to 1 causes the retrained version to be active.

Model Versions Every time the model is retrained, its new version is created with the incremented version number.

You can query for all model versions like this:

FROM project_name.models;

For more information on managing model versions, check out our docs here.

When to RETRAIN the Model?

It is advised to RETRAIN the predictor whenever the update_status column value from the mindsdb.models table is set to available.

Here is when the update_status column value is set to available:

  • When the new version of MindsDB is available that causes the predictor to become obsolete.
  • When the new data is available in the table that was used to train the predictor.

To find out whether you need to retrain your model, query the mindsdb.models table and look for the update_status column.

Here are the possible values of the update_status column:

availableIt indicates that the model should be updated.
updatingIt indicates that the retraining process of the model takes place.
up_to_dateIt indicates that your model is up to date and does not need to be retrained.

Let’s run the query.

SELECT name, update_status
FROM mindsdb.models
WHERE name = 'predictor_name';

On execution, we get:

| name             | update_status |
| predictor_name   | up_to_date    |


predictor_nameName of the model to be retrained.
update_statusColumn informing whether the model needs to be retrained.

Alternatively, use the DESCRIBE command as below:

DESCRIBE MODEL predictor_name;


Let’s look at an example using the home_rentals_model model.

First, we check the status of the predictor.

SELECT name, update_status
FROM mindsdb.models
WHERE name = 'home_rentals_model';

On execution, we get:

| name               | update_status |
| home_rentals_model | available     |

Alternatively, use the DESCRIBE command as below:

DESCRIBE MODEL home_rentals_model;

The available value of the update_status column informs us that we should retrain the model.

RETRAIN mindsdb.home_rentals_model;

On execution, we get:

Query OK, 0 rows affected (0.058 sec)

Now, let’s check the status again.

SELECT name, update_status
FROM mindsdb.models
WHERE name = 'home_rentals_model';

On execution, we get:

| name               | update_status |
| home_rentals_model | updating      |

And after the retraining process is completed:

SELECT name, update_status
FROM mindsdb.models
WHERE name = 'home_rentals_model';

On execution, we get:

| name               | update_status |
| home_rentals_model | up_to_date    |