Create, Train, and Deploy a Model
Description
The CREATE MODEL
statement creates and trains a machine learning (ML) model.
Please note that the CREATE MODEL
statement is equivalent to the CREATE PREDICTOR
statement.
We are transitioning to the CREATE MODEL
statement, but the CREATE PREDICTOR
statement still works.
Syntax
Overview
Here is the full syntax:
CREATE [OR REPLACE] MODEL [IF NOT EXISTS] project_name.predictor_name
[FROM [integration_name | project_name]
(SELECT [sequential_column,] [partition_column,] column_name, ...
FROM [integration_name. | project_name.]table_name
[JOIN model_name])]
PREDICT target_column
[ORDER BY sequential_column]
[GROUP BY partition_column]
[WINDOW int]
[HORIZON int]
[USING engine = 'engine_name',
tag = 'tag_name',
...];
Where:
Expressions | Description |
---|---|
project_name | Name of the project where the model is created. By default, the mindsdb project is used. |
predictor_name | Name of the model to be created. |
integration_name | Name of the integration created using the CREATE DATABASE statement or file upload. |
(SELECT column_name, ... FROM table_name) | Selecting data to be used for training and validation. |
target_column | Column to be predicted. |
ORDER BY sequential_column | Used in time series models. The column by which time series is ordered. It can be a date or anything that defines the sequence of events. |
GROUP BY partition_column | Used in time series models. It is optional. The column by which rows that make a partition are grouped. For example, if you want to forecast the inventory for all items in the store, you can partition the data by product_id , so each distinct product_id has its own time series. |
WINDOW int | Used in time series models. The number of rows to look back at when making a prediction. It comes after the rows are ordered by the column defined in ORDER BY and split into groups by the column(s) defined in GROUP BY . The WINDOW 10 syntax could be interpreted as “Always use the previous 10 rows”. |
HORIZON int | Used in time series models. It is optional. It defines the number of future predictions (it is 1 by default). However, the HORIZON parameter, besides defining the number of predictions, has an impact on the training procedure when using the Lightwood ML backend. For example, different mixers are selected depending on whether the HORIZON value is one or greater than one. |
engine_name | You can optionally provide an ML engine, based on which the model is created. |
tag_name | You can optionally provide a tag that is visible in the training_options column of the mindsdb.models table. |
Regression Models
Here is the syntax for regression models:
CREATE MODEL project_name.predictor_name
FROM integration_name
(SELECT column_name, ... FROM table_name)
PREDICT target_column
[USING engine = 'engine_name',
tag = 'tag_name'];
Please note that the FROM
clause is mandatory here.
The target_column
that will be predicted is a numerical value. The prediction values are not limited to a defined set of values, but can be any number from the given range of numbers.
Classification Models
Here is the syntax for classification models:
CREATE MODEL project_name.predictor_name
FROM integration_name
(SELECT column_name, ... FROM table_name)
PREDICT target_column
[USING engine = 'engine_name',
tag = 'tag_name'];
Please note that the FROM
clause is mandatory here.
The target_column
that will be predicted is a string value. The prediction values are limited to a defined set of values, such as Yes
and No
.
Time Series Models
Here is the syntax for time series models:
CREATE MODEL project_name.predictor_name
FROM integration_name
(SELECT sequential_column, partition_column,
other_column, target_column
FROM table_name)
PREDICT target_column
ORDER BY sequential_column
[GROUP BY partition_column]
WINDOW int
[HORIZON int]
[USING engine = 'engine_name',
tag = 'tag_name'];
Please note that the FROM
clause is mandatory here.
Due to the nature of time series forecasting, you need to use the JOIN
statement and join the data table with the model table to get predictions.
NLP Models
Here is the syntax for using external models within MindsDB:
CREATE MODEL project_name.model_name
PREDICT PRED
USING
engine = 'engine_name',
task = 'task_name',
model_name = 'hub_model_name',
input_column = 'input_column_name',
labels = ['label1', 'label2'];
Please note that you don’t need to define the FROM
clause here. Instead, the input_column
is defined in the USING
clause.
It allows you to bring an external model, for example, from the Hugging Face model hub, and use it within MindsDB.
Large Language Models (LLM)
MindsDB integrates with numerous LLM providers listed here.
Commonly, LLMs support the prompt_template
parameter that stores the message/instruction to the model.
CREATE MODEL project_name.model_name
PREDICT answer
USING
engine = 'llm_engine_name',
prompt_template = 'answer users questions in a helpful way: {{questions}}';
The prompt_template
parameter instructs the model what output should be generated. It can include variables enclosed in double curly braces, which will be replaced with data values upon joining the model with the input data.
Example
Regression Models
Here is an example for regression models that uses data from a database:
CREATE MODEL mindsdb.home_rentals_model
FROM example_db
(SELECT * FROM demo_data.home_rentals)
PREDICT rental_price
USING engine = 'lightwood',
tag = 'my home rentals model';
On execution, we get:
Query OK, 0 rows affected (x.xxx sec)
Visit our tutorial on regression models to see the full example.
Classification Models
Here is an example for classification models that uses data from a file:
CREATE MODEL mindsdb.customer_churn_predictor
FROM files
(SELECT * FROM churn)
PREDICT Churn
USING engine = 'lightwood',
tag = 'my customers model';
On execution, we get:
Query OK, 0 rows affected (x.xxx sec)
Visit our tutorial on classification models to see the full example.
Time Series Models
Here is an example for time series models that uses data from a file:
CREATE MODEL mindsdb.house_sales_predictor
FROM files
(SELECT * FROM house_sales)
PREDICT MA
ORDER BY saledate
GROUP BY bedrooms
-- the target column to be predicted stores one row per quarter
WINDOW 8 -- using data from the last two years to make forecasts (last 8 rows)
HORIZON 4; -- making forecasts for the next year (next 4 rows)
On execution, we get:
Query OK, 0 rows affected (x.xxx sec)
Visit our tutorial on time series models to see the full example.
NLP Models
Here is an example for the Hugging Face model:
CREATE MODEL mindsdb.spam_classifier
PREDICT PRED
USING
engine = 'huggingface',
task = 'text-classification',
model_name = 'mrm8488/bert-tiny-finetuned-sms-spam-detection',
input_column = 'text_spammy',
labels = ['ham', 'spam'];
On execution, we get:
Query OK, 0 rows affected (x.xxx sec)
Large Language Models (LLM)
Here is an example using the OpenAI engine:
CREATE MODEL sentiment_classifier
PREDICT sentiment
USING
engine = 'openai_engine',
prompt_template = 'analyze customer reviews and assign sentiment as positive or negative or neutral: {{review}}';
Note that the prompt_template
parameter stores instructions that the model will follow to generate output.
Visit our page no how to bring Hugging Face models into MindsDB for more details.
Checking Model Status
After you run the CREATE MODEL
statement, you can check the status of the training process by querying the mindsdb.models
table.
DESCRIBE predictor_name;
On execution, we get:
+---------------+-----------------------------------+----------------------------------------+-----------------------+-------------+---------------+-----+-----------------+----------------+
|name |status |accuracy |predict |update_status|mindsdb_version|error|select_data_query|training_options|
+---------------+-----------------------------------+----------------------------------------+-----------------------+-------------+---------------+-----+-----------------+----------------+
|predictor_name |generating or training or complete |number depending on the accuracy metric |column_to_be_predicted |up_to_date |22.7.5.0 | | | |
+---------------+-----------------------------------+----------------------------------------+-----------------------+-------------+---------------+-----+-----------------+----------------+
Was this page helpful?