Predicting Home Rental Prices with MindsDB¶
Introduction¶
Follow these steps to create, train and query a machine learning model (predictor) using SQL that predicts the rental_price
(label) for new properties given their attributes (features).
The Data¶
Connecting the data¶
There are a couple of ways you can get the data to follow trough this tutorial.
We have prepared a demo database you can connect to that contains the data to be used example_db.demo_data.home_rentals
CREATE DATABASE example_db
WITH ENGINE = "postgres",
PARAMETERS = {
"user": "demo_user",
"password": "demo_password",
"host": "3.220.66.106",
"port": "5432",
"database": "demo"
}
Now you can run queries directly on the demo database. Let's start by previewing the data we will use to train our predictor:
SELECT *
FROM example_db.demo_data.home_rentals
LIMIT 10;
You can download the source file as a .CSV
here and then upload via MindsDB SQL Editor
Now you can run queries directly on the demo file as if it was a database. Let's start by previewing the data we will use to train our predictor:
SELECT *
FROM files.home_rentals
LIMIT 10;
From now onwards we will use the table example_db.demo_data.home_rentals
make sure you replace it for files.home_rentals
if you are connecting the data as a file.
Understanding the Data¶
+-----------------+---------------------+------+----------+----------------+----------------+--------------+
| number_of_rooms | number_of_bathrooms | sqft | location | days_on_market | neighborhood | rental_price |
+-----------------+---------------------+------+----------+----------------+----------------+--------------+
| 2 | 1 | 917 | great | 13 | berkeley_hills | 3901 |
| 0 | 1 | 194 | great | 10 | berkeley_hills | 2042 |
| 1 | 1 | 543 | poor | 18 | westbrae | 1871 |
| 2 | 1 | 503 | good | 10 | downtown | 3026 |
| 3 | 2 | 1066 | good | 13 | thowsand_oaks | 4774 |
+-----------------+---------------------+------+----------+----------------+----------------+--------------+
Where:
Column | Description | Data Type | Usage |
---|---|---|---|
number_of_rooms |
Number of rooms of a given house [0,1,2,3] |
integer |
Feature |
number_of_bathrooms |
Number of bathrooms on a given house [1,2] |
integer |
Feature |
sqft |
Area of a given house in square feet | integer |
Feature |
location |
Rating of the location of a given house [poor, great, good] |
character varying |
Feature |
days_on_market |
Number of days a given house has been open to be rented | integer |
Feature |
neighborhood |
Neighborhood a given house is in [alcatraz_ave, westbrae, ..., south_side, thowsand_oaks ] |
character varying |
Feature |
rental_price |
Price for renting a given house in dollars | integer |
Label |
Labels and Features
A label is the thing we're predicting—the y variable in simple linear regression ... A feature is an input variable—the x variable in simple linear regression ..
Training a Predictor Via CREATE PREDICTOR
¶
Let's create and train your first machine learning predictor. For that we are going to use the CREATE PREDICTOR
syntax, where we specify what sub-query to train FROM
(features) and what we want to learn to PREDICT
(labels):
CREATE PREDICTOR mindsdb.home_rentals_model
FROM example_db
(SELECT * FROM demo_data.home_rentals)
PREDICT rental_price;
Checking the Status of a Predictor¶
A predictor may take a couple of minutes for the training to complete. You can monitor the status of your predictor by copying and pasting this command into your SQL client:
SELECT status
FROM mindsdb.predictors
WHERE name='home_rentals_predictor';
Here we are selecting the status from the table called mindsdb.predictors and using the where statement to only show the model we have just trained, On execution, you we get:
+----------+
| status |
+----------+
| training |
+----------+
Or after a the model has been trained:
+----------+
| status |
+----------+
| complete |
+----------+
Making Predictions¶
Predictor Status Must be 'complete' Before Making a Prediction
Making Predictions Via SELECT
¶
Once the predictor's status is complete. You can make predictions by querying the predictor as if it was a normal table:
The SELECT
syntax will allow you to make a prediction based on features.
SELECT rental_price,
rental_price_explain
FROM mindsdb.home_rentals_model
WHERE sqft = 823
AND location='good'
AND neighborhood='downtown'
AND days_on_market=10;
On execution, you should get:
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| rental_price | rental_price_explain |
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| 4394 | {"predicted_value": 4394, "confidence": 0.99, "anomaly": null, "truth": null, "confidence_lower_bound": 4313, "confidence_upper_bound": 4475} |
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
Making Batch Predictions Via JOIN
¶
You can also make bulk predictions by joining a table with your predictor:
SELECT t.rental_price as real_price,
m.rental_price as predicted_price,
t.number_of_rooms, t.number_of_bathrooms, t.sqft, t.location, t.days_on_market
FROM example_db.demo_data.home_rentals as t
JOIN mindsdb.home_rentals_model as m limit 100
+------------+-----------------+-----------------+---------------------+------+----------+----------------+
| real_price | predicted_price | number_of_rooms | number_of_bathrooms | sqft | location | days_on_market |
+------------+-----------------+-----------------+---------------------+------+----------+----------------+
| 3901 | 3886 | 2 | 1 | 917 | great | 13 |
| 2042 | 2007 | 0 | 1 | 194 | great | 10 |
| 1871 | 1865 | 1 | 1 | 543 | poor | 18 |
| 3026 | 3020 | 2 | 1 | 503 | good | 10 |
| 4774 | 4748 | 3 | 2 | 1066 | good | 13 |
+------------+-----------------+-----------------+---------------------+------+----------+----------------+