- Nixtla
- Time Series Forecasting with StatsForecast in MindsDB
Nixtla
Time Series Forecasting with StatsForecast in MindsDB
In this tutorial, we introduce Nixtla’s StatsForecast integration which offers numerous univariate time series forecasting models optimized for high performance and scalability. We’ll go through an example to predict monthly expenditures of various categories for the next quarter.
Prerequisites
To follow along, you can sign up for an account at MindsDB Cloud. Alternatively, head to MindsDB Docs and follow the instructions to manually set up a local instance of MindsDB via Docker or pip. You can also set up MindsDB on AWS following this instruction set.
Please note that before using the StatsForecast engine, you should create it with the below command:
CREATE ML_ENGINE statsforecast
FROM statsforecast;
You can check the available engines with this command:
SHOW ML_ENGINES;
If you see the StatsForecast engine on the list, you are ready to follow the tutorials.
Tutorial using SQL
In this tutorial, we create a model to predict expenditures based on historical data using the StatsForecast engine. Nixtla’s StatsForecast offers numerous univariate time series forecasting models optimized for high performance and scalability. Follow along to see how it works.
We use a table from our MySQL public demo database, so let’s start by connecting MindsDB to it:
CREATE DATABASE mysql_demo_db
WITH ENGINE = 'mysql',
PARAMETERS = {
"user": "user",
"password": "MindsDBUser123!",
"host": "db-demo-data.cwoyhfn6bzs0.us-east-1.rds.amazonaws.com",
"port": "3306",
"database": "public"
};
Now that we’ve connected our database to MindsDB, let’s query the data to be used in the example:
SELECT *
FROM mysql_demo_db.historical_expenditures
LIMIT 3;
Here is the output:
+------------+----------+-------------+
| month | category | expenditure |
+------------+----------+-------------+
| 1982-04-01 | food | 1162.6 |
| 1982-05-01 | food | 1150.9 |
| 1982-06-01 | food | 1160 |
+------------+----------+-------------+
The historical_expenditures
table stores monthly expenditure data for various categories, such as food
, clothing
, industry
, and more.
Let’s create a model table to predict the expenditures:
CREATE MODEL quarterly_expenditure_forecaster
FROM mysql_demo_db
(SELECT * FROM historical_expenditures)
PREDICT expenditure
ORDER BY month
GROUP BY category
WINDOW 12
HORIZON 3
USING ENGINE = 'statsforecast';
In practice, the CREATE MODEL
statement triggers MindsDB to generate an AI table called quarterly_expenditure_forecaster
that uses the StatsForecast integration to predict a column named expenditure
. The model lives inside the default mindsdb
project. In MindsDB, projects are a natural way to keep artifacts, such as models or views, separate according to what predictive task they solve. You can learn more about MindsDB projects here.
While creating time series forecasting models, we define the following parameters:
- The
ORDER BY
clause specifies a column used to sort data. Here we use themonth
column to order the expenditures data. - The
GROUP BY
clause defines a column used to divide data into groups. The model makes independent predictions for each partition of data. Here we use thecategory
column to group the expenditures data. - The
WINDOW
clause specifies how many historical records to consider while making predictions. Here we defineWINDOW 12
. It is equivalent to using the last 12 months’ data to predict future data. - The
HORIZON
clause specifies how many records are to be predicted. Here we defineHORIZON 3
. It is equivalent to predicting expenditures for the next quarter.
The USING
clause specifies the ML engine used to make predictions.
We can check the training status with the following query:
SELECT *
FROM models
WHERE name = 'quarterly_expenditure_forecaster';
One of the pros of using the StatsForecast engine is that it is fast - it doesn’t take long until the model completes the training process.
Once the model status is complete
, the behavior is the same as with any other AI table – you can query for batch predictions by joining it with a data table:
SELECT m.month as month, m.expenditure as forecasted
FROM mindsdb.quarterly_expenditure_forecaster as m
JOIN mysql_demo_db.historical_expenditures as t
WHERE t.month > LATEST
AND t.category = 'food';
Here is the output data:
+----------------------------+-----------------+
| month | forecasted |
+----------------------------+-----------------+
| 2017-10-01 00:00:00.000000 | 10256.251953125 |
| 2017-11-01 00:00:00.000000 | 10182.58984375 |
| 2017-12-01 00:00:00.000000 | 10316.259765625 |
+----------------------------+-----------------+
The historical_expenditures
table is used to make batch predictions. Upon joining the quarterly_expenditure_forecaster
model with the historical_expenditures
table, we get predictions for the next quarter as defined by the HORIZON 3
clause.
MindsDB provides the LATEST
keyword that marks the latest training data point. In the WHERE
clause, we specify the month > LATEST
condition to ensure the predictions are made for data after the latest training data point.
Let’s consider our quarterly_expenditure_forecaster
model. If we train the model using data from January 2020 until December 2020 (as defined by WINDOW 12
), then the predictions come for the first quarter of 2021 (as defined by HORIZON 3
).
Tutorial using MQL
In this tutorial, we create a model to predict expenditures based on historical data using the StatsForecast engine. Nixtla’s StatsForecast offers numerous univariate time series forecasting models optimized for high performance and scalability. Follow along to see how it works.
Before we start, visit our docs to learn how to connect Mongo Compass and Mongo Shell to MindsDB.
We use a collection from our Mongo public demo database, so let’s start by connecting MindsDB to it from Mongo Compass or Mongo Shell:
> use mindsdb
> db.databases.insertOne({
'name': 'mongo_demo_db',
'engine': 'mongodb',
'connection_args': {
"host": "mongodb+srv://user:MindsDBUser123!@demo-data-mdb.trzfwvb.mongodb.net/",
"database": "public"
}
})
Now that we’ve connected our database to MindsDB, let’s query the data to be used in the example.
> use mongo_demo_db
> db.historical_expenditures.find({}).limit(3)
Here is the output:
{
_id: '63fd2388bee7187f230f56fc',
month: '1982-04-01',
category: 'food',
expenditure: '1162.6'
}
{
_id: '63fd2388bee7187f230f56fd',
month: '1982-05-01',
category: 'food',
expenditure: '1150.9'
}
{
_id: '63fd2388bee7187f230f56fe',
month: '1982-06-01',
category: 'food',
expenditure: '1160'
}
The historical_expenditures
collection stores monthly expenditure data for various categories, such as food
, clothing
, industry
, and more.
Let’s create a model to predict the expenditures:
> use mindsdb
> db.predictors.insertOne({
name: 'quarterly_expenditure_forecaster',
predict: 'expenditure',
connection: 'mongo_demo_db',
select_data_query: 'db.historical_expenditures.find({})',
training_options: {
timeseries_settings: {
order_by: ['month'],
group_by: ['category'],
window: 12,
horizon: 3
},
engine: 'statsforecast'
}
})
In practice, the insertOne
method triggers MindsDB to generate an AI collection called quarterly_expenditure_forecaster
that uses the StatsForecast integration to predict a field named expenditure
. The model lives inside the default mindsdb
project. In MindsDB, projects are a natural way to keep artifacts, such as models or views, separate according to what predictive task they solve. You can learn more about MindsDB projects here.
While creating time series forecasting models, we define the following parameters under the timeseries_settings
clause:
- The
order_by
clause specifies a field used to sort data. Here we use themonth
field to order the expenditures data. - The
group_by
clause defines a field used to divide data into groups. The model makes independent predictions for each partition of data. Here we use thecategory
field to group the expenditures data. - The
window
clause specifies how many historical records to consider while making predictions. Here we definewindow: 12
. It is equivalent to using the last 12 months’ data to predict future data. - The
horizon
clause specifies how many records are to be predicted. Here we definehorizon: 3
. It is equivalent to predicting expenditures for the next quarter.
The training_options
clause specifies the ML engine used to make predictions and parameters for the time series models, as outlined above.
We can check the training status with the following query:
> db.getCollection('models').find({
name: 'quarterly_expenditure_forecaster'
})
One of the pros of using the StatsForecast engine is that it is fast - it doesn’t take long until the model completes the training process.
Once the model status is complete
, the behavior is the same as with any other AI collection – you can query for batch predictions by joining it with a data collection:
> db.quarterly_expenditure_forecaster.find({
collection: 'mongo_demo_db.historical_expenditures',
query: {
"$where": "this.month > latest and this.category = 'food'"
}
}).limit(3)
Here is the output data:
{
_id: '63fd2388bee7187f230f58a5',
month: 2017-10-01T00:00:00.000Z,
category: 'food',
expenditure: 10256.251953125
}
{
_id: '63fd2388bee7187f230f58a4',
month: 2017-11-01T00:00:00.000Z,
category: 'food',
expenditure: 10182.58984375
}
{
_id: '63fd2388bee7187f230f58a3',
month: 2017-12-01T00:00:00.000Z,
category: 'food',
expenditure: 10316.259765625
}
The historical_expenditures
collection is used to make batch predictions. Upon joining the quarterly_expenditure_forecaster
model with the historical_expenditures
collection, we get predictions for the next quarter as defined by the horizon: 3
clause.
MindsDB provides the latest
keyword that marks the latest training data point. In the where
clause, we specify the month > latest
condition to ensure the predictions are made for data after the latest training data point.
Let’s consider our quarterly_expenditure_forecaster
model. If we train the model using data from January 2020 until December 2020 (as defined by window: 12
), then the predictions come for the first quarter of 2021 (as defined by horizon: 3
).
Tutorial using House Sales Data
In this tutorial, we take our House Sales tutorial and redo it using the StatsForecast engine.
We use a table from our MySQL public demo database, so let’s start by connecting MindsDB to it:
CREATE DATABASE mysql_demo_db
WITH ENGINE = 'mysql',
PARAMETERS = {
"user": "user",
"password": "MindsDBUser123!",
"host": "db-demo-data.cwoyhfn6bzs0.us-east-1.rds.amazonaws.com",
"port": "3306",
"database": "public"
};
Now that we’ve connected our database to MindsDB, let’s query the data to be used in the example:
SELECT *
FROM mysql_demo_db.house_sales
LIMIT 3;
Here is the output:
+----------+--------------------------+-----+--------+
|saledate |house_price_moving_average|type |bedrooms|
+----------+--------------------------+-----+--------+
|30/09/2007|441854 |house|2 |
|31/12/2007|441854 |house|2 |
|31/03/2008|441854 |house|2 |
+----------+--------------------------+-----+--------+
The house_sales
table stores quarterly house price moving averages per property.
Let’s create a model table to predict the house price moving average values:
CREATE MODEL mindsdb.house_sales_predictor
FROM mysql_demo_db
(SELECT * FROM house_sales)
PREDICT MA
ORDER BY saledate
GROUP BY bedrooms, type
WINDOW 8
HORIZON 4
USING ENGINE = 'statsforecast';
The sytax is the same as in original tutorial. But here, we add the USING
clause that specifies the ML engine used to make predictions.
We can check the training status with the following query:
SELECT *
FROM models
WHERE name = 'house_sales_predictor';
Once the model status is complete
, the behavior is the same as with any other AI table – you can query for batch predictions by joining it with a data table:
SELECT m.saledate AS date, m.house_price_moving_average AS forecast
FROM mindsdb.house_sales_predictor AS m
JOIN mysql_demo_db.house_sales AS t
WHERE t.saledate > LATEST
AND t.type = 'house'
AND t.bedrooms = 2
LIMIT 3;
Here is the output data:
+----------------------------+----------+
| date | forecast |
+----------------------------+----------+
| 2019-12-31 00:00:00.000000 | 510712 |
| 2020-03-31 00:00:00.000000 | 510712 |
| 2020-06-30 00:00:00.000000 | 510712 |
+----------------------------+----------+
What’s Next?
Have fun while trying it out yourself!
- Bookmark MindsDB repository on GitHub.
- Sign up for a free MindsDB account.
- Engage with the MindsDB community on Slack or GitHub to ask questions and share your ideas and thoughts.
If this tutorial was helpful, please give us a GitHub star here.