1. Nixtla
  2. Time Series Forecasting with StatsForecast in MindsDB

In this tutorial, we introduce Nixtla’s StatsForecast integration which offers numerous univariate time series forecasting models optimized for high performance and scalability. We’ll go through an example to predict monthly expenditures of various categories for the next quarter.

Prerequisites

To follow along, you can sign up for an account at MindsDB Cloud. Alternatively, head to MindsDB Docs and follow the instructions to manually set up a local instance of MindsDB via Docker or pip. You can also set up MindsDB on AWS following this instruction set.

Please note that before using the StatsForecast engine, you should create it with the below command:

CREATE ML_ENGINE statsforecast
FROM statsforecast;

You can check the available engines with this command:

SHOW ML_ENGINES;

If you see the StatsForecast engine on the list, you are ready to follow the tutorials.

Tutorial using SQL

In this tutorial, we create a model to predict expenditures based on historical data using the StatsForecast engine. Nixtla’s StatsForecast offers numerous univariate time series forecasting models optimized for high performance and scalability. Follow along to see how it works.

We use a table from our MySQL public demo database, so let’s start by connecting MindsDB to it:

CREATE DATABASE mysql_demo_db
WITH ENGINE = 'mysql',
PARAMETERS = {
    "user": "user",
    "password": "MindsDBUser123!",
    "host": "db-demo-data.cwoyhfn6bzs0.us-east-1.rds.amazonaws.com",
    "port": "3306",
    "database": "public"
};

Now that we’ve connected our database to MindsDB, let’s query the data to be used in the example:

SELECT *
FROM mysql_demo_db.historical_expenditures
LIMIT 3;

Here is the output:

+------------+----------+-------------+
| month      | category | expenditure |
+------------+----------+-------------+
| 1982-04-01 | food     | 1162.6      |
| 1982-05-01 | food     | 1150.9      |
| 1982-06-01 | food     | 1160        |
+------------+----------+-------------+

The historical_expenditures table stores monthly expenditure data for various categories, such as food, clothing, industry, and more.

Let’s create a model table to predict the expenditures:

CREATE MODEL quarterly_expenditure_forecaster
FROM mysql_demo_db
  (SELECT * FROM historical_expenditures)
PREDICT expenditure
ORDER BY month
GROUP BY category
WINDOW 12
HORIZON 3
USING ENGINE = 'statsforecast';

In practice, the CREATE MODEL statement triggers MindsDB to generate an AI table called quarterly_expenditure_forecaster that uses the StatsForecast integration to predict a column named expenditure. The model lives inside the default mindsdb project. In MindsDB, projects are a natural way to keep artifacts, such as models or views, separate according to what predictive task they solve. You can learn more about MindsDB projects here.

While creating time series forecasting models, we define the following parameters:

  • The ORDER BY clause specifies a column used to sort data. Here we use the month column to order the expenditures data.
  • The GROUP BY clause defines a column used to divide data into groups. The model makes independent predictions for each partition of data. Here we use the category column to group the expenditures data.
  • The WINDOW clause specifies how many historical records to consider while making predictions. Here we define WINDOW 12. It is equivalent to using the last 12 months’ data to predict future data.
  • The HORIZON clause specifies how many records are to be predicted. Here we define HORIZON 3. It is equivalent to predicting expenditures for the next quarter.

The USING clause specifies the ML engine used to make predictions.

We can check the training status with the following query:

SELECT *
FROM models
WHERE name = 'quarterly_expenditure_forecaster';

One of the pros of using the StatsForecast engine is that it is fast - it doesn’t take long until the model completes the training process.

Once the model status is complete, the behavior is the same as with any other AI table – you can query for batch predictions by joining it with a data table:

SELECT m.month as month, m.expenditure as forecasted
FROM mindsdb.quarterly_expenditure_forecaster as m
JOIN mysql_demo_db.historical_expenditures as t
WHERE t.month > LATEST
AND t.category = 'food';

Here is the output data:

+----------------------------+-----------------+
| month                      | forecasted      |
+----------------------------+-----------------+
| 2017-10-01 00:00:00.000000 | 10256.251953125 |
| 2017-11-01 00:00:00.000000 | 10182.58984375  |
| 2017-12-01 00:00:00.000000 | 10316.259765625 |
+----------------------------+-----------------+

The historical_expenditures table is used to make batch predictions. Upon joining the quarterly_expenditure_forecaster model with the historical_expenditures table, we get predictions for the next quarter as defined by the HORIZON 3 clause.

MindsDB provides the LATEST keyword that marks the latest training data point. In the WHERE clause, we specify the month > LATEST condition to ensure the predictions are made for data after the latest training data point.

Let’s consider our quarterly_expenditure_forecaster model. If we train the model using data from January 2020 until December 2020 (as defined by WINDOW 12), then the predictions come for the first quarter of 2021 (as defined by HORIZON 3).

Tutorial using MQL

In this tutorial, we create a model to predict expenditures based on historical data using the StatsForecast engine. Nixtla’s StatsForecast offers numerous univariate time series forecasting models optimized for high performance and scalability. Follow along to see how it works.

Before we start, visit our docs to learn how to connect Mongo Compass and Mongo Shell to MindsDB.

We use a collection from our Mongo public demo database, so let’s start by connecting MindsDB to it from Mongo Compass or Mongo Shell:

> use mindsdb
> db.databases.insertOne({
        'name': 'mongo_demo_db',
        'engine': 'mongodb',
        'connection_args': {
            "host": "mongodb+srv://user:MindsDBUser123!@demo-data-mdb.trzfwvb.mongodb.net/",
            "database": "public"
        }
   })

Now that we’ve connected our database to MindsDB, let’s query the data to be used in the example.

> use mongo_demo_db
> db.historical_expenditures.find({}).limit(3)

Here is the output:

{
  _id: '63fd2388bee7187f230f56fc',
  month: '1982-04-01',
  category: 'food',
  expenditure: '1162.6'
}
{
  _id: '63fd2388bee7187f230f56fd',
  month: '1982-05-01',
  category: 'food',
  expenditure: '1150.9'
}
{
  _id: '63fd2388bee7187f230f56fe',
  month: '1982-06-01',
  category: 'food',
  expenditure: '1160'
}

The historical_expenditures collection stores monthly expenditure data for various categories, such as food, clothing, industry, and more.

Let’s create a model to predict the expenditures:

> use mindsdb
> db.predictors.insertOne({
       name: 'quarterly_expenditure_forecaster',
       predict: 'expenditure',
       connection: 'mongo_demo_db',
       select_data_query: 'db.historical_expenditures.find({})',
       training_options: {
          timeseries_settings: {                
              order_by: ['month'],               
              group_by: ['category'],                               
              window: 12,
              horizon: 3           
          },
          engine: 'statsforecast'
      }  
  })

In practice, the insertOne method triggers MindsDB to generate an AI collection called quarterly_expenditure_forecaster that uses the StatsForecast integration to predict a field named expenditure. The model lives inside the default mindsdb project. In MindsDB, projects are a natural way to keep artifacts, such as models or views, separate according to what predictive task they solve. You can learn more about MindsDB projects here.

While creating time series forecasting models, we define the following parameters under the timeseries_settings clause:

  • The order_by clause specifies a field used to sort data. Here we use the month field to order the expenditures data.
  • The group_by clause defines a field used to divide data into groups. The model makes independent predictions for each partition of data. Here we use the category field to group the expenditures data.
  • The window clause specifies how many historical records to consider while making predictions. Here we define window: 12. It is equivalent to using the last 12 months’ data to predict future data.
  • The horizon clause specifies how many records are to be predicted. Here we define horizon: 3. It is equivalent to predicting expenditures for the next quarter.

The training_options clause specifies the ML engine used to make predictions and parameters for the time series models, as outlined above.

We can check the training status with the following query:

> db.getCollection('models').find({
      name: 'quarterly_expenditure_forecaster'
  })

One of the pros of using the StatsForecast engine is that it is fast - it doesn’t take long until the model completes the training process.

Once the model status is complete, the behavior is the same as with any other AI collection – you can query for batch predictions by joining it with a data collection:

> db.quarterly_expenditure_forecaster.find({
      collection: 'mongo_demo_db.historical_expenditures', 
      query:  { 
          "$where": "this.month > latest and this.category = 'food'" 
      }
  }).limit(3)

Here is the output data:

{
  _id: '63fd2388bee7187f230f58a5',
  month: 2017-10-01T00:00:00.000Z,
  category: 'food',
  expenditure: 10256.251953125
}
{
  _id: '63fd2388bee7187f230f58a4',
  month: 2017-11-01T00:00:00.000Z,
  category: 'food',
  expenditure: 10182.58984375
}
{
  _id: '63fd2388bee7187f230f58a3',
  month: 2017-12-01T00:00:00.000Z,
  category: 'food',
  expenditure: 10316.259765625
}

The historical_expenditures collection is used to make batch predictions. Upon joining the quarterly_expenditure_forecaster model with the historical_expenditures collection, we get predictions for the next quarter as defined by the horizon: 3 clause.

MindsDB provides the latest keyword that marks the latest training data point. In the where clause, we specify the month > latest condition to ensure the predictions are made for data after the latest training data point.

Let’s consider our quarterly_expenditure_forecaster model. If we train the model using data from January 2020 until December 2020 (as defined by window: 12), then the predictions come for the first quarter of 2021 (as defined by horizon: 3).

Tutorial using House Sales Data

In this tutorial, we take our House Sales tutorial and redo it using the StatsForecast engine.

We use a table from our MySQL public demo database, so let’s start by connecting MindsDB to it:

CREATE DATABASE mysql_demo_db
WITH ENGINE = 'mysql',
PARAMETERS = {
    "user": "user",
    "password": "MindsDBUser123!",
    "host": "db-demo-data.cwoyhfn6bzs0.us-east-1.rds.amazonaws.com",
    "port": "3306",
    "database": "public"
};

Now that we’ve connected our database to MindsDB, let’s query the data to be used in the example:

SELECT *
FROM mysql_demo_db.house_sales
LIMIT 3;

Here is the output:

+----------+--------------------------+-----+--------+
|saledate  |house_price_moving_average|type |bedrooms|
+----------+--------------------------+-----+--------+
|30/09/2007|441854                    |house|2       |
|31/12/2007|441854                    |house|2       |
|31/03/2008|441854                    |house|2       |
+----------+--------------------------+-----+--------+

The house_sales table stores quarterly house price moving averages per property.

Let’s create a model table to predict the house price moving average values:

CREATE MODEL mindsdb.house_sales_predictor
FROM mysql_demo_db
  (SELECT * FROM house_sales)
PREDICT MA
ORDER BY saledate
GROUP BY bedrooms, type
WINDOW 8
HORIZON 4
USING ENGINE = 'statsforecast';

The sytax is the same as in original tutorial. But here, we add the USING clause that specifies the ML engine used to make predictions.

We can check the training status with the following query:

SELECT *
FROM models
WHERE name = 'house_sales_predictor';

Once the model status is complete, the behavior is the same as with any other AI table – you can query for batch predictions by joining it with a data table:

SELECT m.saledate AS date, m.house_price_moving_average AS forecast
FROM mindsdb.house_sales_predictor AS m
JOIN mysql_demo_db.house_sales AS t
WHERE t.saledate > LATEST
AND t.type = 'house'
AND t.bedrooms = 2
LIMIT 3;

Here is the output data:

+----------------------------+----------+
| date                       | forecast |
+----------------------------+----------+
| 2019-12-31 00:00:00.000000 | 510712   |
| 2020-03-31 00:00:00.000000 | 510712   |
| 2020-06-30 00:00:00.000000 | 510712   |
+----------------------------+----------+

What’s Next?

Have fun while trying it out yourself!

If this tutorial was helpful, please give us a GitHub star here.