CREATE PREDICTOR
Statement¶
Description¶
The CREATE PREDICTOR
statement is used to train a new model. The basic syntax for training a model is as follows:
Syntax¶
CREATE PREDICTOR mindsdb.[predictor_name]
FROM [integration_name]
(SELECT [column_name, ...] FROM [table_name])
PREDICT [target_column];
On execution, we get:
Query OK, 0 rows affected (x.xxx sec)
Where:
Expressions | Description |
---|---|
[predictor_name] |
Name of the model to be created |
[integration_name] |
is the name of the integration created via CREATE DATABASE or file upload |
(SELECT [column_name, ...] FROM [table_name]) |
SELECT statement for selecting the data to be used for training and validation |
PREDICT [target_column] |
where target_column is the column name of the target variable. |
Checking the status of the model
After you run the CREATE PREDICTOR
statement, you can check the status of the training model by selecting from the mindsdb.predictors
.
SELECT *
FROM mindsdb.predictors
WHERE name='[predictor_name]';
On execution, we get:
+------------------------+-----------------------------------+----------------------------------------+-----------------------+-------------+---------------+-----+-----------------+----------------+
|name |status |accuracy |predict |update_status|mindsdb_version|error|select_data_query|training_options|
+------------------------+-----------------------------------+----------------------------------------+-----------------------+-------------+---------------+-----+-----------------+----------------+
|predictor_name |generating or training or complete |number depending on the accuracy metric |column_to_be_predicted |up_to_date |22.7.5.0 | | | |
+------------------------+-----------------------------------+----------------------------------------+-----------------------+-------------+---------------+-----+-----------------+----------------+
Example¶
This example shows how you can train a Machine Learning model called home_rentals_model to predict the rental prices for real estate properties inside the dataset.
CREATE PREDICTOR mindsdb.home_rentals_model
FROM db_integration
(SELECT * FROM house_rentals_data) AS rentals
PREDICT rental_price AS price;
On execution, we get:
Query OK, 0 rows affected (x.xxx sec)
To check the predictor status, query the mindsdb.predictors
table.
SELECT *
FROM mindsdb.predictors
WHERE name='home_rentals_model';
On execution, we get:
+--------------------+----------+--------------------+--------------+---------------+-----------------+-------+-------------------+------------------+
| name | status | accuracy | predict | update_status | mindsdb_version | error | select_data_query | training_options |
+--------------------+----------+--------------------+--------------+---------------+-----------------+-------+-------------------+------------------+
| home_rentals_model | complete | 0.9991920992432087 | rental_price | up_to_date | 22.5.1.0 | NULL | | |
+--------------------+----------+--------------------+--------------+---------------+-----------------+-------+-------------------+------------------+
... USING
Statement¶
... USING
Description¶
In MindsDB, the underlying AutoML models are based on Lightwood. This library generates models automatically based on the data and a declarative problem definition, but the default configuration can be overridden. The USING ...
statement provides the option to configure a model to be trained with specific options.
... USING
Statement Syntax¶
CREATE PREDICTOR mindsdb.[predictor_name]
FROM [integration_name]
(SELECT [column_name, ...] FROM [table_name])
PREDICT [target_column]
USING [parameter_key] = ['parameter_value'];
On execution, we get:
Query OK, 0 rows affected (x.xxx sec)
Where:
parameter key | Description |
---|---|
encoders |
Grants access to configure how each column is encoded.By default, the AutoML engine will try to get the best match for the data. To learn more about how encoders work and their options, go here. |
model |
Allows you to specify what type of Machine Learning algorithm to learn from the encoder data. To learn more about all the model options, go here. |
Other keys supported by lightwood in JsonAI | The most common usecases for configuring predictors will be listed and explained in the example below. To see all options available in detail, you should checkout the lightwood docs about JsonAI |
... USING encoders
Key¶
It grants access to configure how each column is encoded. To learn more about how encoders work and their options, go here.
...
USING encoders.[column_name].module='value';
By default, the AutoML engine will try to get the best match for the data.
... USING model
Key¶
It allows you to specify what type of Machine Learning algorithm to learn from the encoder data.
...
USING model.args='{"key": value}';
Module options:
Module | Description |
---|---|
BaseMixer | Base class for all mixers. |
LightGBM | This mixer configures and uses LightGBM for regression or classification tasks depending on the problem definition. |
LightGBMArray | This mixer is composed of several LightGBM mixers in regression mode aimed at time series forecasting tasks. |
NHitsMixer | Wrapper around a MQN-HITS deep learning model. |
Neural | The Neural mixer trains a fully connected dense network from concatenated encoded outputs of each of the features in the dataset to predicted the encoded output. |
ProphetMixer | This mixer is a wrapper around the popular time series library sktime. |
Regression | The Regression mixer inherits from scikit-learn’s Ridge class |
SkTime | This mixer is a wrapper around the popular time series library sktime. |
Unit | Special mixer that passes along whatever prediction is made by the target encoder without modifications. This is useful for single-column predictive scenarios that may involve complex and/or expensive encoders (e.g. free-form text classification with transformers). |
To learn more about all the model options, please refer to Lightwood's documentation here.
... USING
Example¶
We will use the home rentals dataset, specifying particular encoders for some of the columns and a LightGBM model.
CREATE PREDICTOR mindsdb.home_rentals_predictor
FROM my_db_integration
(SELECT * FROM home_rentals)
PREDICT rental_price
USING
encoders.location.module='CategoricalAutoEncoder',
encoders.rental_price.module = 'NumericEncoder',
encoders.rental_price.args.positive_domain = 'True',
model.args='{"submodels":[
{"module": "LightGBM",
"args": {
"stop_after": 12,
"fit_on_dev": true
}
}
]}';
On execution, we get:
Query OK, 0 rows affected (x.xxx sec)
CREATE PREDICTOR
From file¶
In order to create a predictor from a file, you should first upload a file to MindsDB. Follow this guide to see how to do that.
CREATE PREDICTOR
Description¶
This statement is used to create and train a model using a file or a database table.
CREATE PREDICTOR
Syntax¶
CREATE PREDICTOR mindsdb.[predictor_name]
FROM files
(SELECT * FROM [file_name])
PREDICT target_variable;
On execution, we get:
Query OK, 0 rows affected (x.xxx sec)
Where:
Description | |
---|---|
[predictor_name] |
Name of the model to be created |
[file_name] |
Name of the file uploaded via the MindsDB editor |
(SELECT * FROM [file_name]) |
SELECT statement for selecting the data to be used for training and validation |
target_variable |
target_column is the column name of the target variable. |
CREATE PREDICTOR
Example¶
CREATE PREDICTOR mindsdb.home_rentals_model
FROM files
(SELECT * from home_rentals)
PREDICT rental_price;
On execution, we get:
Query OK, 0 rows affected (x.xxx sec)
CREATE PREDICTOR
For Time Series Models¶
CREATE PREDICTOR
For Time Series Models Description¶
To train a timeseries model, MindsDB provides additional statements.
CREATE PREDICTOR
For Time Series Models Syntax¶
CREATE PREDICTOR mindsdb.[predictor_name]
FROM [integration_name]
(SELECT [sequential_column], [partition_column], [other_column], [target_column]
FROM [table_name])
PREDICT [target_column]
ORDER BY [sequential_column]
GROUP BY [partition_column]
WINDOW [int]
HORIZON [int];
On execution, we get:
Query OK, 0 rows affected (x.xxx sec)
Where:
Expressions | Description |
---|---|
ORDER BY [sequential_column] |
Defines the column that the time series will be order by. These can be a date, or anything that defines the sequence of events. |
GROUP BY [partition_column] (optional) |
Groups the rows that make a partition, for example, if you want to forecast inventory for all items in a store, you can partition the data by product_id, meaning that each product_id has its own time series. |
WINDOW [int] |
Specifies the number [int] of rows to "look back" into when making a prediction after the rows are ordered by the order_by column and split into groups. This could be interpreted like "Always use the previous 10 rows". |
HORIZON [int] (optional) |
keyword specifies the number of future predictions, default value is 1 |
Getting a prediction of a Time Series model
Due to the nature of Time Series Forecasting you will need to use the JOIN
statement to get results.
CREATE PREDICTOR
For Time Series Models Example¶
CREATE PREDICTOR mindsdb.inventory_model
FROM db_integration
(SELECT * FROM inventory) AS inventory
PREDICT units_in_inventory AS predicted_units_in_inventory
ORDER BY date
GROUP BY product_id
WINDOW 20
HORIZON 7;
On execution, we get:
Query OK, 0 rows affected (x.xxx sec)