Description

The JOIN clause combines rows from the database table and the model table on a column defined in its implementation. It is used to make batch predictions, as shown in the examples.

Syntax

Here is the syntax that lets you join multiple data tables with multiple models to get all predictions at once.

SELECT d1.column_name,
       d2.column_name,
       m1.column_name,
       m2.column_name,
       ...
FROM integration_name.table_name_1 [AS] d1
[JOIN integration_name.table_name_2 [AS] d2 ON ...]
[JOIN ...]
JOIN project_name.model_name_1 [AS] m1
[JOIN project_name.model_name_2 [AS] m2]
[JOIN ...]
[ON d1.input_data = m1.expected_argument];

Where:

NameDescription
integration_name.table_name_1Name of the data source table used as input for making predictions.
integration_name.table_name_2Optionally, you can join arbitrary number of data source tables.
project_name.model_name_1Name of the model table used to make predictions.
project_name.model_name_2Optionally, you can join arbitrary number of models.

Mapping input data to model arguments

If the input data contains a column named question and the model requires an argument named input, you can map these columns, as explained below.

We have a model that expects to receive input:

CREATE MODEL model_name
PREDICT answer
USING
       engine = 'openai',
       prompt_template = 'provide answers to an input from a user: {{input}}';

We have an input data table that has the following columns:

+----+-------------------------------------------+
| id | question                                  |
+----+-------------------------------------------+
| 1  | How many planets are in the solar system? |
| 2  | How many stars are in the solar system?   |
+----+-------------------------------------------+

Now if you want to get answers to these questions using the model, you need to join the input data table with the model and map the question column onto the input argument.

SELECT *
FROM input_table AS d
JOIN model_name AS m
ON d.question = m.input;

Example 1

Let’s join the home_rentals table with the home_rentals_model model using this statement:

SELECT t.rental_price AS real_price, 
       m.rental_price AS predicted_price,
       t.number_of_rooms,  t.number_of_bathrooms,
       t.sqft, t.location, t.days_on_market 
FROM example_db.demo_data.home_rentals AS t 
JOIN mindsdb.home_rentals_model AS m 
LIMIT 20;

On execution, we get:

+------------+-----------------+-----------------+---------------------+------+----------+----------------+
| real_price | predicted_price | number_of_rooms | number_of_bathrooms | sqft | location | days_on_market |
+------------+-----------------+-----------------+---------------------+------+----------+----------------+
| 3901       | 3886            | 2               | 1                   | 917  | great    | 13             |
| 2042       | 2007            | 0               | 1                   | 194  | great    | 10             |
| 1871       | 1865            | 1               | 1                   | 543  | poor     | 18             |
| 3026       | 3020            | 2               | 1                   | 503  | good     | 10             |
| 4774       | 4748            | 3               | 2                   | 1066 | good     | 13             |
| 4382       | 4388            | 3               | 2                   | 816  | poor     | 25             |
| 2269       | 2272            | 0               | 1                   | 461  | great    | 6              |
| 2284       | 2272            | 1               | 1                   | 333  | great    | 6              |
| 5420       | 5437            | 3               | 2                   | 1124 | great    | 9              |
| 5016       | 4998            | 3               | 2                   | 1204 | good     | 7              |
| 1421       | 1427            | 0               | 1                   | 538  | poor     | 43             |
| 3476       | 3466            | 2               | 1                   | 890  | good     | 6              |
| 5271       | 5255            | 3               | 2                   | 975  | great    | 6              |
| 3001       | 2993            | 2               | 1                   | 564  | good     | 13             |
| 4682       | 4692            | 3               | 2                   | 953  | good     | 10             |
| 1783       | 1738            | 1               | 1                   | 493  | poor     | 24             |
| 1548       | 1543            | 1               | 1                   | 601  | poor     | 47             |
| 1492       | 1491            | 0               | 1                   | 191  | good     | 12             |
| 2431       | 2419            | 0               | 1                   | 511  | great    | 1              |
| 4237       | 4257            | 3               | 2                   | 916  | poor     | 36             |
+------------+-----------------+-----------------+---------------------+------+----------+----------------+

Example 2

Let’s query a time series model using this statement:

SELECT m.saledate as date,
       m.ma AS forecast
FROM mindsdb.house_sales_model AS m 
JOIN example_db.demo_data.house_sales AS t
WHERE t.saledate > LATEST
AND t.type = 'house'
LIMIT 4;

On execution, we get:

+----------+------------------+
|date      |forecast          |
+----------+------------------+
|2019-12-31|517506.31349071994|
|2019-12-31|627822.6592658638 |
|2019-12-31|953426.9545788583 |
|2019-12-31|767252.4205039773 |
+----------+------------------+

Follow this doc page to see examples of joining multiple data table with multiple models.