Skip to content

Predicting Home Rental Prices with MindsDB

Introduction

Follow these steps to create, train and query a machine learning model (predictor) using SQL that predicts the rental_price (label) for new properties given their attributes (features).

The Data

Connecting the data

There are a couple of ways you can get the data to follow trough this tutorial.

We have prepared a demo database you can connect to that contains the data to be used example_db.demo_data.home_rentals

CREATE DATABASE example_db
    WITH ENGINE = "postgres",
    PARAMETERS = {
        "user": "demo_user",
        "password": "demo_password",
        "host": "3.220.66.106",
        "port": "5432",
        "database": "demo"
}

Now you can run queries directly on the demo database. Let's start by previewing the data we will use to train our predictor:

SELECT * 
FROM example_db.demo_data.home_rentals 
LIMIT 10;

You can download the source file as a .CSV here and then upload via MindsDB SQL Editor

log-in
Navigate to the Upload a file button.

log-in
Import the file and name it home_rentals

Now you can run queries directly on the demo file as if it was a database. Let's start by previewing the data we will use to train our predictor:

SELECT *
FROM files.home_rentals
LIMIT 10;

From now onwards we will use the table example_db.demo_data.home_rentals make sure you replace it for files.home_rentals if you are connecting the data as a file.

Understanding the Data

+-----------------+---------------------+------+----------+----------------+----------------+--------------+
| number_of_rooms | number_of_bathrooms | sqft | location | days_on_market | neighborhood   | rental_price |
+-----------------+---------------------+------+----------+----------------+----------------+--------------+
|               2 |                   1 |  917 | great    |             13 | berkeley_hills |         3901 |
|               0 |                   1 |  194 | great    |             10 | berkeley_hills |         2042 |
|               1 |                   1 |  543 | poor     |             18 | westbrae       |         1871 |
|               2 |                   1 |  503 | good     |             10 | downtown       |         3026 |
|               3 |                   2 | 1066 | good     |             13 | thowsand_oaks  |         4774 |
+-----------------+---------------------+------+----------+----------------+----------------+--------------+

Where:

Column Description Data Type Usage
number_of_rooms Number of rooms of a given house [0,1,2,3] integer Feature
number_of_bathrooms Number of bathrooms on a given house [1,2] integer Feature
sqft Area of a given house in square feet integer Feature
location Rating of the location of a given house [poor, great, good] character varying Feature
days_on_market Number of days a given house has been open to be rented integer Feature
neighborhood Neighborhood a given house is in [alcatraz_ave, westbrae, ..., south_side, thowsand_oaks ] character varying Feature
rental_price Price for renting a given house in dollars integer Label

Labels and Features

A label is the thing we're predicting—the y variable in simple linear regression ... A feature is an input variable—the x variable in simple linear regression ..

Training a Predictor Via CREATE PREDICTOR

Let's create and train your first machine learning predictor. For that we are going to use the CREATE PREDICTOR syntax, where we specify what sub-query to train FROM (features) and what we want to learn to PREDICT (labels):

CREATE PREDICTOR mindsdb.home_rentals_model
FROM example_db
  (SELECT * FROM demo_data.home_rentals)
PREDICT rental_price;

Checking the Status of a Predictor

A predictor may take a couple of minutes for the training to complete. You can monitor the status of your predictor by copying and pasting this command into your SQL client:

SELECT status
FROM mindsdb.predictors
WHERE name='home_rentals_predictor';

Here we are selecting the status from the table called mindsdb.predictors and using the where statement to only show the model we have just trained, On execution, you we get:

+----------+
| status   |
+----------+
| training |
+----------+

Or after a the model has been trained:

+----------+
| status   |
+----------+
| complete |
+----------+

Making Predictions

Predictor Status Must be 'complete' Before Making a Prediction

Making Predictions Via SELECT

Once the predictor's status is complete. You can make predictions by querying the predictor as if it was a normal table: The SELECT syntax will allow you to make a prediction based on features.

SELECT rental_price,
       rental_price_explain
FROM mindsdb.home_rentals_model
WHERE sqft = 823
AND location='good'
AND neighborhood='downtown'
AND days_on_market=10;

On execution, you should get:

+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| rental_price | rental_price_explain                                                                                                                          |
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| 4394         | {"predicted_value": 4394, "confidence": 0.99, "anomaly": null, "truth": null, "confidence_lower_bound": 4313, "confidence_upper_bound": 4475} |
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+

Making Batch Predictions Via JOIN

You can also make bulk predictions by joining a table with your predictor:

SELECT t.rental_price as real_price, 
       m.rental_price as predicted_price,
       t.number_of_rooms,  t.number_of_bathrooms, t.sqft, t.location, t.days_on_market 
FROM example_db.demo_data.home_rentals as t 
JOIN mindsdb.home_rentals_model as m limit 100
+------------+-----------------+-----------------+---------------------+------+----------+----------------+
| real_price | predicted_price | number_of_rooms | number_of_bathrooms | sqft | location | days_on_market |
+------------+-----------------+-----------------+---------------------+------+----------+----------------+
| 3901       | 3886            | 2               | 1                   | 917  | great    | 13             |
| 2042       | 2007            | 0               | 1                   | 194  | great    | 10             |
| 1871       | 1865            | 1               | 1                   | 543  | poor     | 18             |
| 3026       | 3020            | 2               | 1                   | 503  | good     | 10             |
| 4774       | 4748            | 3               | 2                   | 1066 | good     | 13             |
+------------+-----------------+-----------------+---------------------+------+----------+----------------+