Skip to content

CREATE PREDICTOR Statement

Description

The CREATE PREDICTOR statement is used to train a new model. The basic syntax for training a model is:

Syntax

CREATE PREDICTOR mindsdb.[predictor_name]
FROM [integration_name]
    (SELECT [column_name, ...] FROM [table_name])
PREDICT [target_column]

On execution, you should get:

Query OK, 0 rows affected (x.xxx sec)

Where:

Expressions Description
[predictor_name] Name of the model to be created
[integration_name] is the name of the datasource
(SELECT [column_name, ...] FROM [table_name]) SELECT statement for selecting the data to be used for traning and validation
PREDICT [target_column] where target_column is the column name of the target variable.

Checking the status of the model

After you run the CREATE PREDICTOR statement, you can check the status of the training model, by selecting from the mindsdb.predictors SELECT * FROM mindsdb.predictors WHERE name='[predictor_name]';

Example

This example shows how you can train a Machine Learning model called home_rentals_model to predict the rental prices for real estate properties inside the dataset.

CREATE PREDICTOR mindsdb.home_rentals_model
FROM db_integration (SELECT * FROM house_rentals_data) as rentals
PREDICT rental_price as price;

On execution:

Query OK, 0 rows affected (8.878 sec)

To check the predictor status query the mindsdb.predictors :

SELECT * FROM mindsdb.predictors WHERE name='home_rentals_model';

On execution,

+-----------------+----------+--------------------+--------------+---------------+-----------------+-------+-------------------+------------------+
| name            | status   | accuracy           | predict      | update_status | mindsdb_version | error | select_data_query | training_options |
+-----------------+----------+--------------------+--------------+---------------+-----------------+-------+-------------------+------------------+
| home_rentals123 | complete | 0.9991920992432087 | rental_price | up_to_date    | 22.5.1.0        | NULL  |                   |                  |
+-----------------+----------+--------------------+--------------+---------------+-----------------+-------+-------------------+------------------+

USING Statement

Description

In MindsDB, the underlying AutoML models are based on Lightwood. This library generates models automatically based on the data and a declarative problem definition, but the default configuration can be overridden. The USING ... statement provides the option to configure a model to be trained with specific options.

USING Statement Syntax

CREATE PREDICTOR mindsdb.[predictor_name]
FROM [integration_name]
    (SELECT [column_name, ...] FROM [table_name])
PREDICT [target_column]
USING [parameter_key] = ['parameter_value']
parameter key Description
encoders Grants access to configure how each column is encoded.By default, the AutoML engine will try to get the best match for the data. To learn more about how encoders work and their options, go here.
model Allows you to specify what type of Machine Learning algorithm to learn from the encoder data. To learn more about all the model options, go here.
Other keys supported by lightwood in JsonAI The most common usecases for configuring predictors will be listed and explained in the example below. To see all options available in detail, you should checkout the lightwood docs about JsonAI

... USING encoders Key

Grants access to configure how each column is encoded. To learn more about how encoders work and their options, go here.

...
USING
encoders.[column_name].module='value';

By default, the AutoML engine will try to get the best match for the data.

... USING model Key

Allows you to specify what type of Machine Learning algorithm to learn from the encoder data. To learn more about all the model options, go here

...
USING
model.args='{"key": value}'
;

USING Example

We will use the home rentals dataset, specifying particular encoders for some of the columns and a LightGBM model.

CREATE PREDICTOR mindsdb.home_rentals_predictor
FROM my_db_integration (
    SELECT * FROM home_rentals
) PREDICT rental_price
USING
    encoders.location.module='CategoricalAutoEncoder',
    encoders.rental_price.module = 'NumericEncoder',
    encoders.rental_price.args.positive_domain = 'True',
    model.args='{"submodels":[
                    {"module": "LightGBM",
                     "args": {
                         "stop_after": 12,
                          "fit_on_dev": true
                          }
                    }
                ]}';

CREATE PREDICTOR From file

Description

To train a model using a file:

Syntax

CREATE PREDICTOR mindsdb.[predictor_name]
FROM files
    (SELECT * FROM [file_name])
PREDICT target_variable;

Where:

Description
[predictor_name] Name of the model to be created
[file_name] Name of the file uploaded via the MindsDB editor
(SELECT * FROM [file_name]) SELECT statement for selecting the data to be used for traning and validation
target_variable target_column is the column name of the target variable.

On execution,

Query OK, 0 rows affected (8.878 sec)

Example

CREATE PREDICTOR mindsdb.home_rentals_model
FROM files
    (SELECT * from home_rentals)
PREDICT rental_price;

CREATE PREDICTOR For Time Series Models

Description

To train a timeseries model, MindsDB provides additional statements.

Syntax

CREATE PREDICTOR mindsdb.[predictor_name]
FROM [integration_name]
(SELECT [sequential_column], [partition_column], [other_column], [target_column] FROM [table_name])
PREDICT [target_column]

ORDER BY [sequantial_column]
GROUP BY [partition_column]

WINDOW [int]
HORIZON [int];

Where:

Expressions Description
ORDER BY [sequantial_column] Defines the column that the time series will be order by. These can be a date, or anything that defines the sequence of events.
GROUP BY [partition_column] (optional) Groups the rows that make a partition, for example, if you want to forecast inventory for all items in a store, you can partition the data by product_id, meaning that each product_id has its own time series.
WINDOW [int] Specifies the number [int] of rows to "look back" into when making a prediction after the rows are ordered by the order_by column and split into groups. This could be interpreted like "Always use the previous 10 rows".
HORIZON [int] (optional) keyword specifies the number of future predictions, default value is 1

On execution,

Query OK, 0 rows affected (8.878 sec)

Example

CREATE PREDICTOR mindsdb.inventory_model
FROM db_integration
(SELECT * FROM inventory) as inventory
PREDICT units_in_inventory as predicted_units_in_inventory

ORDER BY date,
GROUP BY product_id,

WINDOW 20
HORIZON 7