Building a new integration¶
This section will walk you through adding a new integration to MindsDB as data layers or predictive frameworks.
Prerequisite¶
Make sure you have cloned and installed the latest staging version of MindsDB repository locally.
What are handlers?¶
At the heart of the MindsDB philosophy lies the belief that predictive insights are best leveraged when produced as close as possible to the data layer. Usually, this "layer" is a SQL-compatible database, but it could also be a non-SQL database, data stream, or any other tool that interacts with data stored somewhere else.
The above description fits an enormous set of tools that people use across the software industry. The complexity increases further by bringing Machine Learning into the equation, as the set of popular ML tools is similarly huge. We aim to support most technology stacks, requiring a simple integration procedure so that anyone is able to easily contribute the necessary "glue" to enable any predictive system for usage within data layers.
This motivates the concept of handlers, which is an abstraction for the two types of entities mentioned above: data layers and predictive systems. Handlers are meant to enforce a common and sufficient set of behaviors that all MindsDB-compatible entities should support. By creating a handler, the target system is effectively integrated into the wider MindsDB ecosystem.
Structure of a handler¶
Technically speaking, a handler is a self-contained Python package that will have everything required for MindsDB to interact with it, including aspects like dependencies, unit tests, and continuous integration logic. It is up to the author to determine the nature of the package (e.g. close or open source, version control, etc.), although we encourage opening pull requests to expand the default set of supported tools.
The entrypoint is a class definition that should inherit from either integrations.libs.base_handler.DatabaseHandler
or integrations.libs.base_handler.PredictiveHandler
depending on the type of system that is integrated. integrations.libs.base_handler.BaseHandler
defines all the common methods that have to be overwritten in order to achieve a functional implementation.
Apart from the above, structure is not enforced and the package can be built arranged into whatever design is preferred by the author.
Structure within the MindsDB repository¶
The code for integrations is located in the main MindsDB repository under /integrations directory.
integrations # Contains integrations source code
├─ handlers/ # Each integration has its own handler dir
│ ├─ mysql_handler/ # MySQL integration code
│ ├─ lightwood_handler/ # Lightwood integration code
│ ├─ .../ # Other handlers
├─ libs/
│ ├─ base_handler.py # Base class that each handler inherit from
│ ├─ storage_handler.py # Storage classes for each handler
└─ utilities # Handlers utilities dir
│ ├─ install.py # Installs all handlers dependencies
Core methods¶
Apart from __init__()
, there are seven core methods that the handler class has to implement, these are: connect(), disconnect(), check_connection(), native_query(), query(), get_tables(), get_columns()
. It is recommended to check actual examples in the codebase to get an idea of what may go into each of these methods, as they can change a bit depending on the nature of the system being integrated. Respectively, their main purpose is:
connect
: perform any necessary steps to connect to the underlying systemdisconnect
: if needed, gracefully close connections established inconnect
check_connection
: evaluate if the connection is alive and healthy. Will be frequently called.native_query
: should parse any native statement string and act upon it (e.g. raw SQL-like commands).query
: takes a parsed SQL-like command (in the form of an abstract syntax tree) and executes it. An example would be aCREATE PREDICTOR
statement for predictive handlers, which is not native syntax as databases have no notion of aPREDICTOR
entity.get_tables
: All availabletables
s should be listed and returned. Each handler should decide what atable
will mean for the underlying system when interacting with it from the data layer. Typically, this means actual tables for data handlers, and machine learning models (or predictive handlers.get_columns
: Each table registered in the handler will have one or more columns (with their respective data types) that will be returned when calling this method.
As stated in the above section, authors can opt for adding private methods, new files, folders, or any combination of these to structure all the necessary work that will enable the methods above to work as intended.
Predictor-specific behavior¶
For predictive handlers, there is an additional method that is fundamental:
join()
: triggers a specific model to generate predictions given some input data. This behavior manifests in the SQL API when doing any type ofJOIN
operation between tables from a predictive and data handler.
Parsing SQL¶
Whenever a string that contains SQL needs to be parsed, it is heavily recommended to opt for using the mindsdb_sql
package, which contains its own parser that fully supports the MindsDB SQL dialect and partially supports the common SQL dialect. There is also a "render" feature to map other dialects into the already supported ones.
Storing internal state¶
Most handlers need to store internal metadata, ranging from a list of registered tables to implementation-specific details that will greatly vary from one case to another.
The recommendation for storing these bits of information is to opt for storage handlers (located in integrations.libs.storage_handler
). We currently support two options, either a Sqlite or a Redis backend. In both cases, the premise is the same: a key-value store system is setup so that interfaces are kept simple and clean, exposing only get()
and set()
methods for usage within the data and predictive handlers.
Note: for ML frameworks, opt for storing the path to your model weights inside the KV storage, saving weights in optimized formats preferred by the system, like
.h5
)
Formatting output¶
When it comes to building the response of the public methods, the output should be wrapped by the HandlerResponse
and HandlerStatusResponse
classes (located in mindsdb.integrations.libs.response
), which are used by MindsDB executioner to orchestrate and coordinate multiple handler instances in parallel.
Other common methods¶
Under mindsdb.integrations.libs.utils
, contributors can find various methods that may be useful while implementing new handlers, with a focus on predictive handlers.
How to write a handler¶
We wrap up this page by going through all the above information step by step. Remember, if you are adding new data integration you need to extend the DatabaseHandler
. In a case of adding a predictive framework integration extend PredictiveHandler
.
In both cases we need 7 core methods:
connect
– Setup storage and connectiondisconnect
– Terminate connectioncheck_connection
– Health checknative_query
– Act upon a raw SQL-like commandquery
– Act upon a parsed SQL-like commandget_tables
– List all accessible entities within handlerget_columns
– Column info for a specific table entity
And additionally for predictive handlers:
8. join
– Call other handlers to merge data with predictions.
Below, you can find a list of entities required to create a database handler.
Step 1: Create Handler
class:¶
Each DatabaseHandler should inherit from DatabaseHandler
class.
Set class property name
:¶
It will be used inside MindsDB as name of handler. For example, the name is used use an ENGINE
in CREATE DATABASE
statement:
CREATE DATABASE integration_name
WITH ENGINE='postgres', +
PARAMETERS={
'host': '127.0.0.1',
'user': 'root',
'password': 'password'
};
Step 1.1: Implement __init__
¶
This method should initialize the handler. the connection_data
argument will contain PARAMETERS
from CREATE DATABASE
statement as user
, password
etc.
def __init__(self, name: str, connection_data: Optional[dict], **kwargs)
""" Initialize the handler
Args:
name (str): name of particular handler instance
connection_data (dict): parameters for connecting to the database
**kwargs: arbitrary keyword arguments.
"""
Step 1.2: Implement connect
¶
The connect method should set up the connection as:
def connect(self) -> HandlerStatusResponse:
""" Set up any connections required by the handler
Should return output of check_connection() method after attempting
connection. Should switch self.is_connected.
Returns:
HandlerStatusResponse
"""
Step 1.3: Implement disconnect
¶
The disconnect method should close the existing connection as:
def disconnect(self):
""" Close any existing connections
Should switch self.is_connected.
"""
self.is_connected = False
return self.is_connected
Step 1.4: Implement check_connection
¶
The check_connection method is used to perform the health check for the connection:
def check_connection(self) -> HandlerStatusResponse:
""" Check connection to the handler
Returns:
HandlerStatusResponse
"""
Step 1.5: Implement native_query
¶
The native_query method is used to run command on native database language:
def native_query(self, query: Any) -> HandlerStatusResponse:
"""Receive raw query and act upon it somehow.
Args:
query (Any): query in native format (str for sql databases,
dict for mongo, etc)
Returns:
HandlerResponse
"""
Step 1.6: Implement query
¶
The query method is used to run parsed SQL command:
def query(self, query: ASTNode) -> HandlerStatusResponse:
"""Receive query as AST (abstract syntax tree) and act upon it somehow.
Args:
query (ASTNode): sql query represented as AST. May be any kind
of query: SELECT, INTSERT, DELETE, etc
Returns:
HandlerResponse
"""
Step 1.7: Implement get_tables
¶
The get_tables method is used to list tables:
def get_tables(self) -> HandlerStatusResponse:
""" Return list of entities
Return list of entities that will be accesible as tables.
Returns:
HandlerResponse: shoud have same columns as information_schema.tables
(https://dev.mysql.com/doc/refman/8.0/en/information-schema-tables-table.html)
Column 'TABLE_NAME' is mandatory, other is optional.
"""
Step 1.8: Implement get_columns
¶
The get_tables method is used to list tables:
def get_columns(self, table_name: str) -> HandlerStatusResponse:
""" Returns a list of entity columns
Args:
table_name (str): name of one of tables returned by self.get_tables()
Returns:
HandlerResponse: shoud have same columns as information_schema.columns
(https://dev.mysql.com/doc/refman/8.0/en/information-schema-columns-table.html)
Column 'COLUMN_NAME' is mandatory, other is optional. Hightly
recomended to define also 'DATA_TYPE': it should be one of
python data types (by default it str).
"""
Step 2: Create connection_args
dict:¶
The connection_arg
dictinonary should contain all required arguments to establish the connection.
Step 3: Create connection_args_example
dict:¶
The connection_args_example
dictinonary should contain an example of all required arguments to establish the connection.
Step 4: Export all required variables:¶
In __init__
file export:
Handler
- handler classversion
- version of handlername
- name of the handler (same as Handler.name)type
- type of the handler (is it DATA of ML handler)icon_path
- path to file with database icontitle
- short description of handlerdescription
- description of handlerconnection_args
- dict with connection argsconnection_args_example
- example of connection argsimport_error
- error message, in case if is not possible to importHandler
class
E.g:
title = 'Trino'
version = 0.1
description = 'Integration for connection to TrinoDB'
name = 'trino'
type = HANDLER_TYPE.DATA
icon_path = 'icon.png'
__all__ = [
'Handler', 'version', 'name', 'type', 'title',
'description', 'connection_args_example'
'Handler', 'version', 'name', 'type', 'title', 'description', 'icon_path'
]
For real examples, we encourage you to inspect the following handlers inside the MindsDB repository: