AI Tables Intro
There is an ongoing transformational shift within the modern business world from the “what happened and why” based on historical data analysis to the “what will we predict can happen and how can we make it happen” based on machine learning predictive modeling.
The success of your predictions depends both on the data you have available and the models you train this data on. Data Scientists and Data Engineers need best-in-class tools to prepare the data for feature engineering, the best training models, and the best way of deploying, monitoring, and managing these implementations for optimal prediction confidence.
Machine Learning (ML) Lifecycle
The ML lifecycle can be represented as a process that consists of the data preparation phase, modeling phase, and deployment phase. The diagram below presents all the steps included in each of the stages.
Companies looking to implement machine learning have found their current solutions require substantial amounts of data preparation, cleaning, and labeling, plus hard to find machine learning/AI data scientists to conduct feature engineering; build, train, and optimize models; assemble, verify, and deploy into production; and then monitor in real-time, improve, and refine. Machine learning models require multiple iterations with existing data to train. Additionally, extracting, transforming, and loading (ETL) data from one system to another is complicated, leads to multiple copies of information, and is a compliance and tracking nightmare.
A recent study has shown it takes 64% of companies a month, to over a year, to deploy a machine learning model into production¹. Leveraging existing databases and automating the feature engineering, building, training, and optimization of models, assembling them, and deploying them into production is called AutoML and has been gaining traction within enterprises for enabling non-experts to use machine learning models for practical applications.
MindsDB brings machine learning to existing SQL databases with a concept called AI Tables. AI Tables integrate the machine learning models as virtual tables inside a database, create predictions, and can be queried with simple SQL statements. Almost instantly, time series, regression, and classification predictions can be done directly in your database.
Deep Dive into the AI Tables
Let’s consider the following income table that stores the income and debt values.
SELECT income, debt FROM income_table;
A simple visualization of the data present in the income table is as follows.
Querying the income table to get the debt value for a particular income value results in the following.
SELECT income, debt FROM income WHERE income = 80000;
But what happens when we query the table for income value that is not present?
SELECT income, debt FROM income WHERE income = 90000;
When a table doesn’t have an exact match the query will return a null value. This is where the AI Tables come into play!
Let’s create a debt model that allows us to approximate the debt value for any income value. We’ll train this debt model using the income table’s data.
CREATE PREDICTOR mindsdb.debt_model FROM income_table PREDICT debt;
MindsDB provides the CREATE PREDICTOR statement. When we execute this statement, the predictive model works in the background, automatically creating a vector representation of the data that can be visualized as follows.
Let’s now look for the debt value of some random income value. To get the approximated debt value, we query the debt_model and not the income table.
SELECT income, debt FROM debt_model WHERE income = 90120;