# Build an Application Handler
Source: https://docs.mindsdb.com/contribute/app-handlers
In this section, you'll find how to add new application integrations to MindsDB.
**Prerequisite**
You should have the latest version of the MindsDB repository installed locally. Follow [this guide](/contribute/install/) to learn how to install MindsDB for development.
## What are API Handlers?
Application handlers act as a bridge between MindsDB and any application that provides APIs. You use application handlers to create databases using the [`CREATE DATABASE`](/sql/create/databases/) statement. So you can reach data from any application that has its handler implemented within MindsDB.
**Database Handlers**
To learn more about handlers and how to implement a database handler, visit our [doc page here](/contribute/data-handlers/).
## Creating an Application Handler
You can create your own application handler within MindsDB by inheriting from the [`APIHandler`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/api_handler.py#L150) class.
By providing the implementation for some or all of the methods contained in the [`APIHandler`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/api_handler.py#L150) class, you can interact with the application APIs.
### Core Methods
Apart from the `__init__()` method, there are five core methods that must be implemented. We recommend checking actual examples in the codebase to get an idea of what goes into each of these methods, as they can change a bit depending on the nature of the system being integrated.
Let's review the purpose of each method.
| Method | Purpose |
| ----------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| [`_register_table()`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/api_handler.py#L164) | It registers the data resource in memory. For example, if you are using Twitter API it registers the `tweets` resource from `/api/v2/tweets`. |
| [`connect()`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/base.py#L23) | It performs the necessary steps to connect/authenticate to the underlying system. |
| [`check_connection()`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/base.py#L39) | It evaluates if the connection is alive and healthy. |
| [`native_query()`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/base.py#L47) | It parses any *native* statement string and acts upon it (for example, raw syntax commands). |
| `call_application_api()` | It calls the application API and maps the data to pandas DataFrame. This method handles the pagination and data mapping. |
Authors can opt for adding private methods, new files and folders, or any combination of these to structure all the necessary work that will enable the core methods to work as intended.
**Other Common Methods**
Under the [`mindsdb.integrations.utilities`](main/mindsdb/integrations/utilities) library, contributors can find various methods that may be useful while implementing new handlers.
### API Table
Once the data returned from the API call is registered using the [`_register_table()`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/api_handler.py#L164) method, you can use it to map to the [`APITable`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/api_handler.py#L93) class.
The [`APITable`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/api_handler.py#L93) class provides CRUD methods.
| Method | Purpose | |
| --------------- | ----------------------------------------------------------------------------------------------------------- | - |
| `select()` | It implements the mappings from the ast.Select and calls the actual API through the `call_application_api`. | |
| `insert()` | It implements the mappings from the ast.Insert and calls the actual API through the `call_application_api`. | |
| `update()` | It implements the mappings from the ast.Update and calls the actual API through the `call_application_api`. | |
| `delete()` | It implements the mappings from the ast.Delete and calls the actual API through the `call_application_api`. | |
| `add()` | Adds new rows to the data dictionary. | |
| `list()` | List data based on certain conditions by providing FilterCondition, limits, sorting and target fields. | |
| `get_columns()` | It maps the data columns returned by the API. | |
### Implementation
Each application handler should inherit from the [`APIHandler`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/api_handler.py#L150) class.
Here is a step-by-step guide:
* Implementing the [`__init__()`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/api_handler.py#L155) method:
This method initializes the handler.
```py theme={null}
def __init__(self, name: str):
super().__init__(name)
""" constructor
Args:
name (str): the handler name
"""
self._tables = {}
```
* Implementing the [`connect()`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/base.py#L23) method:
The `connect()` method sets up the connection.
```py theme={null}
def connect(self) -> HandlerStatusResponse:
""" Set up any connections required by the handler
Should return output of check_connection() method after attempting
connection. Should switch self.is_connected.
Returns:
HandlerStatusResponse
"""
```
* Implementing the [`check_connection()`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/base.py#L39) method:
The `check_connection()` method performs the health check for the connection.
```py theme={null}
def check_connection(self) -> HandlerStatusResponse:
""" Check connection to the handler
Returns:
HandlerStatusResponse
"""
```
* Implementing the [`native_query()`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/base.py#L47) method:
The `native_query()` method runs commands of the native API syntax.
```py theme={null}
def native_query(self, query: Any) -> HandlerResponse:
"""Receive raw query and act upon it somehow.
Args:
query (Any): query in native format (str for sql databases,
api's json etc)
Returns:
HandlerResponse
"""
```
* Implementing the `call_application_api()` method:
This method makes the API calls. It is **not mandatory** to implement this method, but it can help make the code more reliable and readable.
```py theme={null}
def call_application_api(self, method_name:str = None, params:dict = None) -> DataFrame:
"""Receive query as AST (abstract syntax tree) and act upon it somehow.
Args:
query (ASTNode): sql query represented as AST. Can be any kind
of query: SELECT, INSERT, DELETE, etc
Returns:
DataFrame
"""
```
### Exporting the `connection_args` Dictionary
The `connection_args` dictionary contains all of the arguments used to establish the connection along with their descriptions, types, labels, and whether they are required or not.
The `connection_args` dictionary should be stored in the `connection_args.py` file inside the handler folder.
The `connection_args` dictionary is stored in a separate file in order to be able to hide sensitive information such as passwords or API keys.
By default, when querying for `connection_data` from the `information_schema.databases` table, all sensitive information is hidden. To unhide it, use this command:
```sql theme={null}
set show_secrets=true;
```
Here is an example of the `connection_args.py` file from the [GitHub handler](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/github_handler) where the API key value is set to hidden with `"secret": True`.
```py theme={null}
from collections import OrderedDict
from mindsdb.integrations.libs.const import HANDLER_CONNECTION_ARG_TYPE as ARG_TYPE
connection_args = OrderedDict(
repository={
"type": ARG_TYPE.STR,
"description": " GitHub repository name.",
"required": True,
"label": "Repository",
},
api_key={
"type": ARG_TYPE.PWD,
"description": "Optional GitHub API key to use for authentication.",
"required": False,
"label": "Api key",
"secret": True
},
github_url={
"type": ARG_TYPE.STR,
"description": "Optional GitHub URL to connect to a GitHub Enterprise instance.",
"required": False,
"label": "Github url",
},
)
connection_args_example = OrderedDict(
repository="mindsdb/mindsdb",
api_key="ghp_xxx",
github_url="https://github.com/mindsdb/mindsdb"
)
```
### Exporting All Required Variables
The following should be exported in the `__init__.py` file of the handler:
* The `Handler` class.
* The `version` of the handler.
* The `name` of the handler.
* The `type` of the handler, either `DATA` handler or `ML` handler.
* The `icon_path` to the file with the database icon.
* The `title` of the handler or a short description.
* The `description` of the handler.
* The `connection_args` dictionary with the connection arguments.
* The `connection_args_example` dictionary with an example of the connection arguments.
* The `import_error` message that is used if the import of the `Handler` class fails.
A few of these variables are defined in another file called `__about__.py`. This file is imported into the `__init__.py` file.
Here is an example of the `__init__.py` file for the [GitHub handler](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/github_handler).
```py theme={null}
from mindsdb.integrations.libs.const import HANDLER_TYPE
from .__about__ import __version__ as version, __description__ as description
from .connection_args import connection_args, connection_args_example
try:
from .github_handler import (
GithubHandler as Handler,
connection_args_example,
connection_args,
)
import_error = None
except Exception as e:
Handler = None
import_error = e
title = "GitHub"
name = "github"
type = HANDLER_TYPE.DATA
icon_path = "icon.svg"
__all__ = [
"Handler", "version", "name", "type", "title", "description",
"import_error", "icon_path", "connection_args_example", "connection_args",
]
```
The `__about__.py` file for the same [GitHub handler](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/github_handler) contains the following variables:
```py theme={null}
__title__ = "MindsDB GitHub handler"
__package_name__ = "mindsdb_github_handler"
__version__ = "0.0.1"
__description__ = "MindsDB handler for GitHub"
__author__ = "Artem Veremey"
__github__ = "https://github.com/mindsdb/mindsdb"
__pypi__ = "https://pypi.org/project/mindsdb/"
__license__ = "MIT"
__copyright__ = "Copyright 2023 - mindsdb"
```
## Check out our Application Handlers!
To see some integration handlers that are currently in use, we encourage you to check out the following handlers inside the MindsDB repository:
* [GitHub handler](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/github_handler)
* [TwitterHandler](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/twitter_handler)
And here are [all the handlers available in the MindsDB repository](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers).
# Join our Community
Source: https://docs.mindsdb.com/contribute/community
If you have questions or you want to chat with the MindsDB core team or other community members, you can join our [Slack workspace](https://mindsdb.com/joincommunity)
## MindsDB Newsletter
To get updates on MindsDB's latest announcements, releases, and events, [sign up for our newsletter](https://mindsdb.com/newsletter/).
## Contact Us
If you are interested in MindsDB for large-scale projects, contact us by submitting [this form](https://mindsdb.com/contact-us/).
# How to Contribute to MindsDB
Source: https://docs.mindsdb.com/contribute/contribute
Thank you for your interest in contributing to MindsDB. MindsDB is free, open-source software and all types of contributions are welcome, whether they’re documentation changes, bug reports, bug fixes, or new source code changes.
In order to contribute to MindsDB:
* fork the MindsDB GitHub repository,
* [install MindsDB locally](/contribute/install),
* implement and test your changes,
* **push your changes to the `develop` branch**.
1. Fork the MindsDB repository from [MindsDB GitHub](https://github.com/mindsdb/mindsdb).
2. Clone the MindsDB repository locally from your fork and go inside the repository folder.
```bash theme={null}
cd /path/mindsdb-repo-folder-name
```
3. Fetch all other branches from the MindsDB repository with these commands:
```bash theme={null}
git remote add upstream https://github.com/mindsdb/mindsdb
git fetch upstream
```
4. Switch to the `develop` branch.
```bash theme={null}
git checkout develop
```
5. Create a new branch for your changes from the `develop` branch.
```bash theme={null}
git checkout -b new-branch-name
```
6. Make your changes on this branch.
7. Commit and push your changes to GitHub.
```bash theme={null}
git add *
git commit -m "commit message"
git push --set-upstream origin new-branch-name
```
8. Go to GitHub and create a PR to the `develop` branch of the MindsDB repository.
## MindsDB Release Process
The `main` branch of the [MindsDB repository](https://github.com/mindsdb/mindsdb) contains the latest stable version of MindsDB and represents the GA (General Availability) release. Learn more about [MindsDB release types here](/releases).
MindsDB follows the [Gitflow branching model](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow) to manage development and releases as follows.
All code changes are first committed to the `develop` branch.
When a release is approaching, a short-lived `release` branch is created from the `develop` branch.
* This branch is used for final testing and validation.
* Pre-GA artifacts are built at this stage, including both the Python package and the Docker image, and shared for broader testing and feedback.
After successful testing and validation:
* The `release` branch is merged into the `main` branch, making it an official GA release.
* The final GA versions of the Python package and Docker image are released, while the pre-GA version are removed.
## Contributor Testing Requirements
As a contributor, you are responsible for writing the code according to the [Python Coding Standards](/contribute/python-coding-standards) and thoroughly testing all features or fixes that you implement before they are merged into the `develop` branch.
### Feature Branch Testing
Before merging your changes, the following types of testing must be completed to validate your work in isolation:
* Unit Tests
Verify that individual components or functions behave as expected during development.
* Integration Tests
Ensure that your new code works correctly with existing functionality and doesn't introduce regressions.
### Post-Release Testing
After a release that includes your features or fixes is published, contributors are encouraged to:
* Test their changes in the released environment, and
* Report any issues or unexpected behavior that may arise.
# Build a Database Handler
Source: https://docs.mindsdb.com/contribute/data-handlers
In this section, you'll find how to add new integrations/databases to MindsDB.
**Prerequisite**
You should have the latest version of the MindsDB repository installed locally. Follow [this guide](/contribute/install/) to learn how to install MindsDB for development.
## What are Database Handlers?
Database handlers act as a bridge to any database. You use database handlers to create databases using [the CREATE DATABASE command](/sql/create/databases/). So you can reach data from any database that has its handler implemented within MindsDB.
## Creating a Database Handler
You can create your own database handler within MindsDB by inheriting from the [`DatabaseHandler`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/base.py#L102) class.
By providing the implementation for some or all of the methods contained in the `DatabaseHandler` class, you can connect with the database of your choice.
### Core Methods
Apart from the `__init__()` method, there are seven core methods that must be implemented. We recommend checking actual examples in the codebase to get an idea of what goes into each of these methods, as they can change a bit depending on the nature of the system being integrated.
Let's review the purpose of each method.
| Method | Purpose |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `connect()` | It performs the necessary steps to connect to the underlying system. |
| `disconnect()` | It gracefully closes connections established in the `connect()` method. |
| `check_connection()` | It evaluates if the connection is alive and healthy. This method is called frequently. |
| `native_query()` | It parses any *native* statement string and acts upon it (for example, raw SQL commands). |
| `query()` | It takes a parsed SQL command in the form of an abstract syntax tree and executes it. |
| `get_tables()` | It lists and returns all the available tables. Each handler decides what a *table* means for the underlying system when interacting with it from the data layer. Typically, these are actual tables. |
| `get_columns()` | It returns columns of a table registered in the handler with the respective data type. |
Authors can opt for adding private methods, new files and folders, or any combination of these to structure all the necessary work that will enable the core methods to work as intended.
**Other Common Methods**
Under the `mindsdb.integrations.libs.utils` library, contributors can find various methods that may be useful while implementing new handlers.
Also, there are wrapper classes for the `DatabaseHandler` instances called [HandlerResponse](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/response.py#L7) and [HandlerStatusResponse](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/response.py#L32). You should use them to ensure proper output formatting.
### Implementation
Each database handler should inherit from the [`DatabaseHandler`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/base.py#L102) class.
Here is a step-by-step guide:
* Setting the `name` class property:
MindsDB uses it internally as the name of the handler.
For example, the `CREATE DATABASE` statement uses the handler's name.
```sql theme={null}
CREATE DATABASE integration_name
WITH ENGINE = 'postgres', --- here, the handler's name is `postgres`
PARAMETERS = {
'host': '127.0.0.1',
'user': 'root',
'password': 'password'
};
```
* Implementing the `__init__()` method:
This method initializes the handler. The `connection_data` argument contains the `PARAMETERS` from the `CREATE DATABASE` statement, such as `user`, `password`, etc.
```py theme={null}
def __init__(self, name: str, connection_data: Optional[dict]):
""" constructor
Args:
name (str): the handler name
"""
```
* Implementing the `connect()` method:
The `connect()` method sets up the connection.
```py theme={null}
def connect(self) -> HandlerStatusResponse:
""" Set up any connections required by the handler
Should return the output of check_connection() method after attempting
connection. Should switch self.is_connected.
Returns:
HandlerStatusResponse
"""
```
* Implementing the `disconnect()` method:
The `disconnect()` method closes the existing connection.
```py theme={null}
def disconnect(self):
""" Close any existing connections
Should switch self.is_connected.
"""
```
* Implementing the `check_connection()` method:
The `check_connection()` method performs the health check for the connection.
```py theme={null}
def check_connection(self) -> HandlerStatusResponse:
""" Check connection to the handler
Returns:
HandlerStatusResponse
"""
```
* Implementing the `native_query()` method:
The `native_query()` method runs commands of the native database language.
```py theme={null}
def native_query(self, query: Any) -> HandlerResponse:
"""Receive raw query and act upon it somehow.
Args:
query (Any): query in native format (str for sql databases,
etc)
Returns:
HandlerResponse
"""
```
* Implementing the `query()` method:
The query method runs parsed SQL commands.
```py theme={null}
def query(self, query: ASTNode) -> HandlerResponse:
"""Receive query as AST (abstract syntax tree) and act upon it somehow.
Args:
query (ASTNode): sql query represented as AST. May be any kind
of query: SELECT, INSERT, DELETE, etc
Returns:
HandlerResponse
"""
```
* Implementing the `get_tables()` method:
The `get_tables()` method lists all the available tables.
```py theme={null}
def get_tables(self) -> HandlerResponse:
""" Return list of entities
Return a list of entities that will be accessible as tables.
Returns:
HandlerResponse: should have the same columns as information_schema.tables
(https://dev.mysql.com/doc/refman/8.0/en/information-schema-tables-table.html)
Column 'TABLE_NAME' is mandatory, other is optional.
"""
```
* Implementing the `get_columns()` method:
The `get_columns()` method lists all columns of a specified table.
```py theme={null}
def get_columns(self, table_name: str) -> HandlerResponse:
""" Returns a list of entity columns
Args:
table_name (str): name of one of tables returned by self.get_tables()
Returns:
HandlerResponse: should have the same columns as information_schema.columns
(https://dev.mysql.com/doc/refman/8.0/en/information-schema-columns-table.html)
Column 'COLUMN_NAME' is mandatory, other is optional. Highly
recommended to define also 'DATA_TYPE': it should be one of
python data types (by default it is str).
"""
```
### Exporting the `connection_args` Dictionary
The `connection_args` dictionary contains all of the arguments used to establish the connection along with their descriptions, types, labels, and whether they are required or not.
The `connection_args` dictionary should be stored in the `connection_args.py` file inside the handler folder.
The `connection_args` dictionary is stored in a separate file in order to be able to hide sensitive information such as passwords or API keys.
By default, when querying for `connection_data` from the `information_schema.databases` table, all sensitive information is hidden. To unhide it, use this command:
```sql theme={null}
set show_secrets=true;
```
Here is an example of the `connection_args.py` file from the [MySQL handler](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/mysql_handler) where the password value is set to hidden with `'secret': True`.
```py theme={null}
from collections import OrderedDict
from mindsdb.integrations.libs.const import HANDLER_CONNECTION_ARG_TYPE as ARG_TYPE
connection_args = OrderedDict(
url={
'type': ARG_TYPE.STR,
'description': 'The URI-Like connection string to the MySQL server. If provided, it will override the other connection arguments.',
'required': False,
'label': 'URL'
},
user={
'type': ARG_TYPE.STR,
'description': 'The user name used to authenticate with the MySQL server.',
'required': True,
'label': 'User'
},
password={
'type': ARG_TYPE.PWD,
'description': 'The password to authenticate the user with the MySQL server.',
'required': True,
'label': 'Password',
'secret': True
},
database={
'type': ARG_TYPE.STR,
'description': 'The database name to use when connecting with the MySQL server.',
'required': True,
'label': 'Database'
},
host={
'type': ARG_TYPE.STR,
'description': 'The host name or IP address of the MySQL server. NOTE: use \'127.0.0.1\' instead of \'localhost\' to connect to local server.',
'required': True,
'label': 'Host'
},
port={
'type': ARG_TYPE.INT,
'description': 'The TCP/IP port of the MySQL server. Must be an integer.',
'required': True,
'label': 'Port'
},
ssl={
'type': ARG_TYPE.BOOL,
'description': 'Set it to True to enable ssl.',
'required': False,
'label': 'ssl'
},
ssl_ca={
'type': ARG_TYPE.PATH,
'description': 'Path or URL of the Certificate Authority (CA) certificate file',
'required': False,
'label': 'ssl_ca'
},
ssl_cert={
'type': ARG_TYPE.PATH,
'description': 'Path name or URL of the server public key certificate file',
'required': False,
'label': 'ssl_cert'
},
ssl_key={
'type': ARG_TYPE.PATH,
'description': 'The path name or URL of the server private key file',
'required': False,
'label': 'ssl_key',
}
)
connection_args_example = OrderedDict(
host='127.0.0.1',
port=3306,
user='root',
password='password',
database='database'
)
```
### Exporting All Required Variables
The following should be exported in the `__init__.py` file of the handler:
* The `Handler` class.
* The `version` of the handler.
* The `name` of the handler.
* The `type` of the handler, either `DATA` handler or `ML` handler.
* The `icon_path` to the file with the database icon.
* The `title` of the handler or a short description.
* The `description` of the handler.
* The `connection_args` dictionary with the connection arguments.
* The `connection_args_example` dictionary with an example of the connection arguments.
* The `import_error` message that is used if the import of the `Handler` class fails.
A few of these variables are defined in another file called `__about__.py`. This file is imported into the `__init__.py` file.
Here is an example of the `__init__.py` file for the [MySQL handler](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/mysql_handler).
```py theme={null}
from mindsdb.integrations.libs.const import HANDLER_TYPE
from .__about__ import __version__ as version, __description__ as description
from .connection_args import connection_args, connection_args_example
try:
from .mysql_handler import (
MySQLHandler as Handler,
connection_args_example,
connection_args
)
import_error = None
except Exception as e:
Handler = None
import_error = e
title = 'MySQL'
name = 'mysql'
type = HANDLER_TYPE.DATA
icon_path = 'icon.svg'
__all__ = [
'Handler', 'version', 'name', 'type', 'title', 'description',
'connection_args', 'connection_args_example', 'import_error', 'icon_path'
]
```
The `__about__.py` file for the same [MySQL handler](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/mysql_handler) contains the following variables:
```py theme={null}
__title__ = 'MindsDB MySQL handler'
__package_name__ = 'mindsdb_mysql_handler'
__version__ = '0.0.1'
__description__ = "MindsDB handler for MySQL"
__author__ = 'MindsDB Inc'
__github__ = 'https://github.com/mindsdb/mindsdb'
__pypi__ = 'https://pypi.org/project/mindsdb/'
__license__ = 'MIT'
__copyright__ = 'Copyright 2022- mindsdb'
```
### Exporting Requirements
In the case if the integration requires other packages to function correctly, list them in the `requirements.txt` file.
Create a text file named `requirements.txt` that stores all packages required for using the integration. Here is an example:
```
mysql-connector-python==9.1.0
...
```
## Check out our Database Handlers!
To see some integration handlers that are currently in use, we encourage you to check out the following handlers inside the MindsDB repository:
* [MySQL](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/mysql_handler)
* [Postgres](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/postgres_handler)
And here are [all the handlers available in the MindsDB repository](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers).
# How to Write MindsDB Documentation
Source: https://docs.mindsdb.com/contribute/docs
This section gets you started on how to contribute to the MindsDB documentation.
MindsDB's documentation is run using Mintlify. If you want to contribute to our docs, please follow the steps below to set up the environment locally.
## Running the Docs Locally
**Prerequisite**
You should have installed Git (version 2.30.1 or higher) and Node.js (version 18.10.0 or higher).
Step 1. Clone the MindsDB Git repository:
```console theme={null}
git clone https://github.com/mindsdb/mindsdb.git
```
Step 2. Install Mintlify on your OS:
```console theme={null}
npm i mintlify -g
```
Step 3. Go to the `docs` folder inside the cloned MindsDB Git repository and start Mintlify there:
```console theme={null}
mintlify dev
```
The documentation website is now available at `http://localhost:3000`.
**Getting an Error?**
If you use the Windows operating system, you may get an error saying `no such file or directory: C:/Users/Username/.mintlify/mint/client`. Here are the steps to troubleshoot it:
* Go to the `C:/Users/Username/.mintlify/` directory.
* Remove the `mint` folder.
* Open the Git Bash in this location and run `git clone https://github.com/mintlify/mint.git`.
* Repeat step 3.
## MindsDB Repository Structure
Here is the structure of the MindsDB docs repository:
```
docs # All documentation source files
|__assets/ # Images and icons used throughout the docs
│ ├─ ...
│__folders_with_mdx_files/ # All remaining folders that store the .mdx files
|__mdx_files # Some of the .mdx files are stored in the docs directory
|__mintlify.json # This JSON file stores navigation and page setup
```
# MindsDB Installation for Development
Source: https://docs.mindsdb.com/contribute/install
If you want to contribute to the development of MindsDB, you need to install from source.
If you do not want to contribute to the development of MindsDB but simply install and use it, then [install MindsDB via Docker](/setup/self-hosted/docker).
## Install MindsDB for Development
Here are the steps to install MindsDB from source. You can either
follow the steps below or visit the provided link.
Before installing MindsDB from source, ensure that you use one of the following Python versions: `3.10.x`, `3.11.x`, `3.12.x`, `3.13.x`.
1. Fork the [MindsDB repository from GitHub](https://github.com/mindsdb/mindsdb).
2. Clone the fork locally:
```bash theme={null}
git clone https://github.com//mindsdb.git
```
3. Create a virtual environment:
```bash theme={null}
python -m venv mindsdb-venv
```
4. Activate the virtual environment:
Windows:
```bash theme={null}
.\mindsdb-venv\Scripts\activate
```
macOS/Linux:
```bash theme={null}
source mindsdb-venv/bin/activate
```
5. Install MindsDB with its local development dependencies:
Install dependencies:
```bash theme={null}
cd mindsdb
pip install -e .
```
6. Start MindsDB:
```bash theme={null}
python -m mindsdb
```
By default, MindsDB starts the `http` and `mysql` APIs. You can define which APIs to start using the `api` flag as below.
```bash theme={null}
python -m mindsdb --api http,mysql
```
If you want to start MindsDB without the graphical user interface (GUI), use the `--no_studio` flag as below.
```bash theme={null}
python -m mindsdb --no_studio
```
Alternatively, you can use a makefile to install dependencies and start MindsDB:
```bash theme={null}
make install_mindsdb
make run_mindsdb
```
Now you should see the following message in the console:
```
...
mindsdb.api.http.initialize: - GUI available at http://127.0.0.1:47334/
mindsdb.api.mysql.mysql_proxy.mysql_proxy: Starting MindsDB Mysql proxy server on tcp://127.0.0.1:47335
mindsdb.api.mysql.mysql_proxy.mysql_proxy: Waiting for incoming connections...
mindsdb: mysql API: started on 47335
mindsdb: http API: started on 47334
```
You can access the MindsDB Editor at `localhost:47334`.
## Install dependencies
The core installation includes everything needed to run the Federated Query Engine and essential database capabilities.
The dependencies for many of the data or ML integrations are not installed by default.
If you need additional features — such as Agents, the Knowledge Base, MCP or A2A protocol — you can enable them through extras, rather than installing everything by default.
### Install Features via Extras
Optional integrations and features can be installed as needed using extras.
| Feature | Install command |
| ------------------------- | --------------------------------------- |
| Agents / LLMs | `pip install ".[agents]"` |
| Knowledge Base | `pip install ".[kb]"` |
| Multiple features at once | `pip install ".[agents,knowledgebase]"` |
| Integrations | `pip install .[integration_name]` |
You can find all available [handlers here](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers).
## What's Next?
Now that you installed and started MindsDB locally, go ahead and find out how to create and train a model using the [`CREATE MODEL`](/sql/create/model) statement.
Check out the [Use Cases](/use-cases/overview) section to follow tutorials that cover Large Language Models, Chatbots, Time Series, Classification, and Regression models, Semantic Search, and more.
# How to Write Handlers README
Source: https://docs.mindsdb.com/contribute/integrations-readme
The README file is a crucial document that guides users in understanding, using, and contributing to the MindsDb integration. It serves as the first point of contact for anyone interacting with the integration, hence the need for it to be comprehensive, clear, and user-friendly.
## Sections to Include
### Table of Contents
A well-organized table of contents is provided for easy navigation through the document, allowing users to quickly find the information they need.
### About
Explain what specific database, application, or framework the integration targets. Provide a concise overview of the integration’s purpose, highlighting its key features and benefits.
### Handler Implementation
* Setup
* Detail the installation and initial setup process, including any prerequisites.
* Connection
* Describe the steps to establish and manage connections, with clear instructions.
* Include SQL examples for better clarity.
* Required Parameters
* List and describe all essential parameters necessary for the operation of the integration.
* Optional Parameters
* Detail additional, non-mandatory parameters that can enhance the integration's functionality.
### Example Usage
* Practical Examples: Offer detailed examples showing how to use the integration effectively.
* Coverage: Ensure examples encompass a range of functionalities, from basic to advanced operations.
* SQL Examples: Include SQL statements and their expected outputs to illustrate use cases.
### Supported Tables/Tasks
Clearly enumerate the tables, tasks, or operations that the integration supports, possibly in a list or table format.
### Limitations
Transparently outline any limitations or constraints known in the integration.
### TODO
* Future Developments: Highlight areas for future enhancements or improvements.
* GitHub Issues: Link to open GitHub issues tagged as enhancements, indicating ongoing or planned feature additions.
# Python Coding Standards
Source: https://docs.mindsdb.com/contribute/python-coding-standards
# PEP8
Strict adherence to [PEP8](https://peps.python.org/pep-0008/) standards is mandatory for all code contributions to MindsDB.
**Why PEP8?**
[PEP8](https://peps.python.org/pep-0008/) provides an extensive set of guidelines for Python code styling, promoting readability and a uniform coding standard. By aligning with PEP8, we ensure our codebase remains clean, maintainable, and easily understandable for Python developers at any level.
#### Automated Checks
* Upon submission of a Pull Request (PR), an automated process checks the code for PEP8 compliance.
* Non-compliance with PEP8 can result in the failure of the build process. Adherence to PEP8 is not just a best practice but a necessity to ensure smooth integration of new code into the codebase.
* If a PR fails due to PEP8 violations, the contributor is required to review the automated feedback provided.
* Pay special attention to common PEP8 compliance issues such as proper indentation, appropriate line length, correct use of whitespace, and following the recommended naming conventions.
* Contributors are encouraged to iteratively improve their code based on the feedback until full compliance is achieved.
# Logging
Always instantiate a logger using the MindsDB utilities module. This practice ensures a uniform approach to logging across different parts of the application.
Example of Logger Creation:
```python theme={null}
from mindsdb.utilities import log
logger = log.getLogger(__name__)
```
### Setting Logging
* Environment Variable: Use `MINDSDB_LOG_LEVEL` to set the desired logging level. This approach allows for dynamic adjustment of log verbosity without needing code modifications.
* Log Levels: Available levels include:
* `DEBUG`: Detailed information, typically of interest only when diagnosing problems.
* `INFO:` Confirmation that things are working as expected.
* `WARNING`: An indication that something unexpected happened, or indicative of some problem in the near future.
* `ERROR`: Due to a more serious problem, the software has not been able to perform some function.
* `CRITICAL`: A serious error, indicating that the program itself may be unable to continue running.
* Avoid print() statements. They lack the flexibility and control offered by logging mechanisms, particularly in terms of output redirection and level-based filtering.
* The logger name should be `__name__ ` to automatically reflect the module's name. This convention is crucial for pinpointing the origin of log messages.
# Docstrings
Docstrings are essential for documenting Python code. They provide a clear explanation of the functionality of classes, functions, modules, etc., making the codebase easier to understand and maintain.
A well-written docstring should include:
* Function's Purpose: Describe what the function/class/module does.
* Parameters: List and explain the parameters it takes.
* Return Value: Describe what the function returns.
* Exceptions: Mention any exceptions that the function might raise.
```python theme={null}
def example_function(param1, param2):
"""This is an example docstring.
Args:
param1 (type): Description of param1.
param2 (type): Description of param2.
Returns:
type: Description of the return value.
Raises:
ExceptionType: Description of the exception.
"""
# function body...
```
# Exception Handling
Implementing robust error handling strategies is essential to maintain the stability and reliability of MindsDB. Proper exception management ensures that the application behaves predictably under error conditions, providing clear feedback and preventing unexpected crashes or behavior.
* Utilizing MindsDB Exceptions: To ensure uniformity and clarity in error reporting, always use predefined exceptions from the MindsDB exceptions library.
* Adding New Exceptions: If during development you encounter a scenario where none of the existing exceptions adequately represent the error, consider defining a new, specific exception.
# Data Catalog for Integrations
Source: https://docs.mindsdb.com/data_catalog/integrations/overview
As of now, the Data Catalog is available for the following integrations:
* [Snowflake](/integrations/data-integrations/snowflake)
* [Salesforce](/integrations/app-integrations/salesforce)
* [BigQuery](/integrations/data-integrations/google-bigquery)
* [MS SQL Server](/integrations/data-integrations/microsoft-sql-server)
* [MySQL](/integrations/app-integrations/mysql)
* [Oracle](/integrations/data-integrations/oracle)
* [PostgreSQL](/integrations/data-integrations/postgresql)
### Enabling the Data Catalog
To enable the Data Catalog feature in MindsDB, update your `config.json` file by setting the `data_catalog` flag to `true`:
```json theme={null}
{
"data_catalog": {
"enabled": true
}
}
```
Follow this doc page to learn how to [start MindsDB with custom configuration](/setup/custom-config).
Note that the data catalog is generated for a data source only after this data source is connected to an agent.
Here is an example:
```sql theme={null}
CREATE DATABASE snowflake_data
WITH
ENGINE = 'snowflake',
PARAMETERS = {
"account": "abc123-xyz987",
"user": "username",
"password": "password",
"database": "database_name",
"schema": "schema_name",
"warehouse": "warehouse_name"
};
CREATE AGENT my_agent
USING
include_tables= ['snowflake_data.table_name', ...];
```
Now you can [query the data catalog](/data_catalog/integrations/query) generated for the `snowflake_data` integration.
### How It Works
When you create an [agent](/mindsdb_sql/agents/agent) in MindsDB that connects to one of the supported integrations, the Data Catalog automatically:
1. Inspects the data source.
2. Extracts metadata for all accessible tables and columns.
3. Stores this information in a dedicated catalog schema (`DATA_CATALOG`).
4. Makes this metadata available to agents and users via both SQL queries and internal reasoning.
**Current Limitations**
This feature is still evolving and has some known limitations:
* **One-Time Snapshot**: Metadata is generated only once—at the time the agent is created. If the data schema changes (e.g., new columns, renamed tables), the Data Catalog will not automatically update. A refresh mechanism is planned in a future release.
* **No Manual Feedback**: If any metadata appears to be incorrect (e.g., wrong row counts or data types), there is currently no way for users to flag or correct it. A feedback system will be introduced soon.
# Querying Data Catalog for Integrations
Source: https://docs.mindsdb.com/data_catalog/integrations/query
MindsDB exposes collected metadata from connected data sources via virtual tables in the `INFORMATION_SCHEMA` schema. These views allow users to inspect and query the Data Catalog using familiar SQL syntax.
## Available Data Catalog Tables
To filter results for a specific data integration, use `WHERE TABLE_SCHEMA = ''`.
### `INFORMATION_SCHEMA.META_TABLES`
Provides high-level metadata about available tables in a given integration.
Here are the available columns:
* `TABLE_NAME` (string): Name of the table.
* `TABLE_TYPE` (string, optional): Type of table (e.g., `BASE TABLE`, `VIEW`).
* `TABLE_SCHEMA` (string, optional): Schema name or integration name.
* `TABLE_DESCRIPTION` (string, optional): Description of the table.
* `ROW_COUNT` (integer, optional): Estimated row count.
Here is how to query it foe a specific data integration:
```sql theme={null}
SELECT * FROM INFORMATION_SCHEMA.META_TABLES
WHERE TABLE_SCHEMA = 'integration_name';
```
### `INFORMATION_SCHEMA.META_COLUMNS`
Returns detailed column-level metadata for all tables in the specified integration.
Here are the available columns:
* `TABLE_NAME` (string): Name of the table.
* `COLUMN_NAME` (string): Column name.
* `DATA_TYPE` (string): Data type of the column.
* `COLUMN_DESCRIPTION` (string, optional): Description of the column.
* `IS_NULLABLE` (boolean, optional): Whether nulls are allowed.
* `COLUMN_DEFAULT` (string, optional): Default value, if any.
Here is how to query it foe a specific data integration:
```sql theme={null}
SELECT * FROM INFORMATION_SCHEMA.META_COLUMNS
WHERE TABLE_SCHEMA = 'integration_name';
```
### `INFORMATION_SCHEMA.META_COLUMN_STATISTICS`
Provides statistical insights about each column’s values and distribution.
Here are the available columns:
* `TABLE_NAME` (string): Name of the table.
* `COLUMN_NAME` (string): Column name.
* `MOST_COMMON_VALUES` (array of strings, optional)
* `MOST_COMMON_FREQUENCIES` (array of integers, optional)
* `NULL_PERCENTAGE` (float, optional)
* `MINIMUM_VALUE` (string, optional)
* `MAXIMUM_VALUE` (string, optional)
* `DISTINCT_VALUES_COUNT` (integer, optional)
Here is how to query it foe a specific data integration:
```sql theme={null}
SELECT * FROM INFORMATION_SCHEMA.META_COLUMN_STATISTICS
WHERE TABLE_SCHEMA = 'integration_name';
```
### `INFORMATION_SCHEMA.META_KEY_COLUMN_USAGE`
Describes the primary key columns for tables in the integration.
Here are the available columns:
* `TABLE_NAME` (string): Name of the table.
* `COLUMN_NAME` (string): Column name.
* `ORDINAL_POSITION` (integer, optional)
* `CONSTRAINT_NAME` (string, optional)
Here is how to query it foe a specific data integration:
```sql theme={null}
SELECT * FROM INFORMATION_SCHEMA.META_KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA = 'integration_name';
```
### `INFORMATION_SCHEMA.META_TABLE_CONSTRAINTS`
Lists table-level constraints, including primary and foreign keys.
Here are the available columns:
* `TABLE_NAME` (string): Name of the table.
* `CONSTRAINT_NAME` (string, optional)
* `CONSTRAINT_TYPE` (string): e.g., PRIMARY KEY, FOREIGN KEY
Here is how to query it foe a specific data integration:
```sql theme={null}
SELECT * FROM INFORMATION_SCHEMA.META_TABLE_CONSTRAINTS
WHERE TABLE_SCHEMA = 'integration_name';
```
### `INFORMATION_SCHEMA.META_HANDLER_INFO`
Returns a textual summary of the integration implementation, including supported SQL features and capabilities.
Here are the available columns:
* `HANDLER_INFO` (string): Description.
Here is how to query it foe a specific data integration:
```sql theme={null}
SELECT * FROM INFORMATION_SCHEMA.META_HANDLER_INFO
WHERE TABLE_SCHEMA = 'integration_name';
```
# Data Catalog
Source: https://docs.mindsdb.com/data_catalog/overview
The **Data Catalog** in MindsDB plays a key role in enhancing the context available to [agents](/mindsdb_sql/agents/agent) when querying data sources. By automatically indexing and storing metadata, such as table names, column types, constraints, and statistics, the catalog empowers agents to understand the structure and semantics of the data, leading to more accurate and efficient query generation.
### Why It Matters
When agents interpret natural language questions or generate SQL queries, access to metadata improves their ability to:
* Understand relationships between tables and fields.
* Infer joins, filters, and aggregations more intelligently.
* Avoid syntax errors due to missing or unknown schema information.
This metadata layer provides agents with the necessary context to avoid making uninformed queries.
# Benefits of MindsDB
Source: https://docs.mindsdb.com/faqs/benefits
MindsDB facilitates development of AI-powered apps by bridging the gap between data and AI. Thanks to its numerous integrations with data sources (including databases, vector stores, and applications) and AI frameworks (including LLMs and AutoML), you can mix and match between the available integrations to create custom AI workflows with MindsDB.
Here are some prominent benefits of using MindsDB:
1. **Unified AI Deployment and Management**
MindsDB integrates directly with the database, warehouse, or stream. This eliminates the need to build and maintain custom, complex data pipelines or separate systems for AI/ML deployment.
2. **Automated AI Workflows**
MindsDB automates the entire AI workflow to execute on time-based or event-based triggers. No need to build custom automation logic to get predictions, move data, or (re)train models.
3. **Turn every developer into an AI Engineer**
MindsDB enables developers to leverage their existing SQL skills, accelerating the adoption of AI across teams and departments, turning every developer into an AI Engineer.
4. **Enhanced Scalability and Performance**
Whether in your private cloud or using MindsDB’s managed service, MindsDB enables you to handle large-scale AI/ML workloads efficiently. MindsDB can scale to meet the demands of your use case, ensuring optimal performance and responsiveness.
# Disposable Email Domains and OpenAI
Source: https://docs.mindsdb.com/faqs/disposable-email-doman-and-openai
Disposable email domains can't make use of OpenAI, therefore users will encounter errors with using MindsDB's integration with OpenAI.
To check if your email domain is disposable, you can verify it on [QuickEmailVerification](https://quickemailverification.com/tools/disposable-email-address-detector) or [VerifyEmail.IO](https://verifymail.io/domain/ipnuc.com).
# How to Interact with MindsDB from PHP
Source: https://docs.mindsdb.com/faqs/mindsdb-with-php
To get started with MindsDB, you need to install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
There are a few ways you can interact with MindsDB from the PHP code.
1. You can connect to MindsDB using the [PHP Data Objects](https://www.php.net/manual/en/book.pdo.php) and execute statements directly on MindsDB with the `PDO::query` method.
2. You can use the [REST API](/rest/overview) endpoints to interact with MindsDB directly from PHP.
# Missing required CPU features
Source: https://docs.mindsdb.com/faqs/missing-required-cpu-features
Depending on the operating system and its setup, you may encounter this runtime warning when starting MindsDB:
```bash theme={null}
RuntimeWarning: Missing required CPU features.
The following required CPU features were not detected:
avx2, fma, bmi1, bmi2, lzcnt
```
The solution is to install the `polars-lts-cpu` package in the environment where MindsDB runs.
If you are on an Apple ARM machine (e.g. M1), this warning is likely due to running Python under Rosetta. To troubleshoot it, install a native version of Python that does not run under Rosetta x86-64 emulation.
# How to Persist Predictions
Source: https://docs.mindsdb.com/faqs/persist-predictions
MindsDB provides a range of options for persisting predictions and forecasts. Let's explore all possibilities to save the prediction results.
**Reasons to Save Predictions**
Every time you want to get predictions, you need to query the model, usually joined with an input data table, like this:
```sql theme={null}
SELECT input.product_name, input.review, output.sentiment
FROM mysql_demo_db.amazon_reviews AS input
JOIN sentiment_classifier AS output;
```
However, querying the model returns the result set that is not persistent by default. For future use, it is recommended to persist the result set instead of querying the model again with the same data.
MindsDB enables you to save predictions into a view or a table or download as a CSV file.
## Creating a View
After creating the model, you can save the prediction results into a view.
```sql theme={null}
CREATE VIEW review_sentiment (
-- querying for predictions
SELECT input.product_name, input.review, output.sentiment
FROM mysql_demo_db.amazon_reviews AS input
JOIN sentiment_classifier AS output
LIMIT 10
);
```
Now the `review_sentiment` view stores sentiment predictions made for all customer reviews.
Here is a [comprehensive tutorial](/nlp/sentiment-analysis-inside-mysql-with-openai) on how to predict sentiment of customer reviews using OpenAI.
## Creating a Table
After creating the model, you can save predictions into a database table.
```sql theme={null}
CREATE TABLE local_postgres.question_answers (
-- querying for predictions
SELECT input.article_title, input.question, output.answer
FROM mysql_demo_db.questions AS input
JOIN question_answering_model AS output
LIMIT 10
);
```
Here, the `local_postgres` database is a PostgreSQL database connected to MindsDB with a user that has the write access.
Now the `question_answers` table stores all prediction results.
Here is a [comprehensive tutorial](/nlp/question-answering-inside-mysql-with-openai) on how to answer questions using OpenAI.
## Downloading a CSV File
After executing the `SELECT` statement, you can download the output as a CSV file.
Click the `Export` button and choose the `CSV` option.
# Bring Your Own Model
Source: https://docs.mindsdb.com/integrations/ai-engines/byom
The Bring Your Own Model (BYOM) feature lets you upload your own models in the form of Python code and use them within MindsDB.
## How It Works
You can upload your custom model via the MindsDB editor by clicking `Add` and `Upload custom model`, like this:
Here is the form that needs to be filled out in order to bring your model to MindsDB:
Let's briefly go over the files that need to be uploaded:
* The Python file stores an implementation of your model. It should contain the class with the implementation for the `train` and `predict` methods. Here is the sample format:
```py theme={null}
class CustomPredictor():
def train(self, df, target_col, args=None):
return ''
def predict(self, df):
return df
```
```py theme={null}
import os
import pandas as pd
from sklearn.cross_decomposition import PLSRegression
from sklearn import preprocessing
class CustomPredictor():
def train(self, df, target_col, args=None):
print(args, '1111')
self.target_col = target_col
y = df[self.target_col]
x = df.drop(columns=self.target_col)
x_cols = list(x.columns)
x_scaler = preprocessing.StandardScaler().fit(x)
y_scaler = preprocessing.StandardScaler().fit(y.values.reshape(-1, 1))
xs = x_scaler.transform(x)
ys = y_scaler.transform(y.values.reshape(-1, 1))
pls = PLSRegression(n_components=1)
pls.fit(xs, ys)
T = pls.x_scores_
W = pls.x_weights_
P = pls.x_loadings_
R = pls.x_rotations_
self.x_cols = x_cols
self.x_scaler = x_scaler
self.P = P
def calc_limit(df):
res = None
for column in df.columns:
if column == self.target_col: continue
tbl = df.groupby(self.target_col).agg({column: ['mean', 'min', 'max', 'std']})
tbl.columns = tbl.columns.get_level_values(1)
tbl['name'] = column
tbl['std'] = tbl['std'].fillna(0)
tbl['lower'] = tbl['mean'] - 3 * tbl['std']
tbl['upper'] = tbl['mean'] + 3 * tbl['std']
tbl['lower'] = tbl[["lower", "min"]].max(axis=1) # lower >= min
tbl['upper'] = tbl[["upper", "max"]].min(axis=1) # upper <= max
tbl = tbl[['name', 'lower', 'mean', 'upper']]
try:
res = pd.concat([res, tbl])
except:
res = tbl
return res
trdf = pd.DataFrame()
trdf[self.target_col] = y.values
trdf['T1'] = T.squeeze()
limit = calc_limit(trdf).reset_index()
self.limit = limit
return "Trained predictor ready to be stored"
def predict(self, df):
yt = df[self.target_col].values
xt = df[self.x_cols]
xt = self.x_scaler.transform(xt)
excess_cols = list(set(df.columns) - set(self.x_cols))
pred_df = df[excess_cols].copy()
pred_df[self.target_col] = yt
pred_df['T1'] = (xt @ self.P).squeeze()
pred_df = pd.merge(pred_df, self.limit[[self.target_col, 'lower', 'upper']], how='left', on=self.target_col)
return pred_df
```
* The optional requirements file, or `requirements.txt`, stores all dependencies along with their versions. Here is the sample format:
```sql theme={null}
dependency_package_1 == version
dependency_package_2 >= version
dependency_package_3 >= version, < version
...
```
```sql theme={null}
pandas
scikit-learn
```
Once you upload the above files, please provide an engine name.
Please note that your custom model is uploaded to MindsDB as an engine. Then you can use this engine to create a model.
## Configuration
The BYOM feature can be configured with the following environment variables:
* `MINDSDB_BYOM_ENABLED`
This environment variable defines whether the BYOM feature is enabled (`MINDSDB_BYOM_ENABLED=true`) or disabled (`MINDSDB_BYOM_ENABLED=false`). Note that when running MindsDB locally, it is enabled by default.
* `MINDSDB_BYOM_DEFAULT_TYPE`
This environment variable defines the modes of operation of the BYOM feature.
* `MINDSDB_BYOM_DEFAULT_TYPE=venv`
When using the `venv` mode, MindsDB creates a virtual environment and installs in it the packages listed in the `requirements.txt` file. This virtual environment is dedicated for the custom model. Note that when running MindsDB locally, it is the default mode.
* `MINDSDB_BYOM_DEFAULT_TYPE=inhouse`
When using the `inhouse` mode, there is no dedicated virtual environment for the custom model. It uses the environment of MindsDB, therefore, the `requirements.txt` file is not used with this mode.
* `MINDSDB_BYOM_INHOUSE_ENABLED`
This environment variable defines whether the `inhouse` mode is enabled (`MINDSDB_BYOM_INHOUSE_ENABLED=true`) or disabled (`MINDSDB_BYOM_INHOUSE_ENABLED=false`). Note that when running MindsDB locally, it is enabled by default.
## Example
We upload the custom model, as below:
Here we upload the `model.py` file that stores an implementation of the model and the `requirements.txt` file that stores all the dependencies.
Once the model is uploaded, it becomes an ML engine within MindsDB. Now we use this `custom_model_engine` to create a model as follows:
```sql theme={null}
CREATE MODEL custom_model
FROM my_integration
(SELECT * FROM my_table)
PREDICT target
USING
ENGINE = 'custom_model_engine';
```
Let's query for predictions by joining the custom model with the data table.
```sql theme={null}
SELECT input.feature_column, model_target_column
FROM my_integration.my_table as input
JOIN custom_model as model;
```
Check out the [BYOM handler folder](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/byom_handler) to see the implementation details.
# MindsDB and MLflow
Source: https://docs.mindsdb.com/integrations/ai-engines/mlflow
MLflow allows you to create, train, and serve machine learning models, apart from other features, such as organizing experiments, tracking metrics, and more.
## How to Use MLflow Models in MindsDB
Here are the prerequisites for using MLflow-served models in MindsDB:
1. Train a model via a wrapper class that inherits from the `mlflow.pyfunc.PythonModel` class. It should expose the `predict()` method that returns the predicted output for some input data when called.
Please ensure that the Python version specified for Conda environment matches the one used to train the model.
2. Start the MLflow server:
```bash theme={null}
mlflow server -p 5001 --backend-store-uri sqlite:////path/to/mlflow.db --default-artifact-root ./artifacts --host 0.0.0.0
```
3. Serve the trained model:
```bash theme={null}
mlflow models serve --model-uri ./model_folder_name
```
## Example
Let's create a model that registers an MLflow-served model as an AI Table:
```sql theme={null}
CREATE MODEL mindsdb.mlflow_model
PREDICT target
USING
engine = 'mlflow',
model_name = 'model_folder_name', -- replace the model_folder_name variable with a real value
mlflow_server_url = 'http://0.0.0.0:5001/', -- match the port number with the MLflow server (point 2 in the previous section)
mlflow_server_path = 'sqlite:////path/to/mlflow.db', -- replace the path with a real value (here we use the sqlite database)
predict_url = 'http://localhost:5000/invocations'; -- match the port number that serves the trained model (point 3 in the previous section)
```
Here is how to check the models status:
```sql theme={null}
DESCRIBE mlflow_model;
```
Once the status is `complete`, we can query for predictions.
One way is to query for a single prediction using synthetic data in the `WHERE` clause.
```sql theme={null}
SELECT target
FROM mindsdb.mlflow_model
WHERE text = 'The tsunami is coming, seek high ground';
```
Another way is to query for batch predictions by joining the model with the data table.
```sql theme={null}
SELECT t.text, m.predict
FROM mindsdb.mlflow_model AS m
JOIN files.some_text as t;
```
Here, the data table comes from the `files` integration. It is joined with the model and predictions are made for all the records at once.
**Get More Insights**
Check out the article on [How to bring your own machine learning model to databases](https://medium.com/mindsdb/how-to-bring-your-own-machine-learning-model-to-databases-47a188d6db00) by [Patricio Cerda Mardini](https://medium.com/@paxcema) to learn more.
# Binance
Source: https://docs.mindsdb.com/integrations/app-integrations/binance
In this section, we present how to connect Binance to MindsDB.
[Binance](https://www.binance.com/en) is one of the world's largest cryptocurrency exchanges. It's an online platform where you can buy, sell, and trade a wide variety of cryptocurrencies. Binance offers a range of services beyond just trading, including staking, lending, and various financial products related to cryptocurrencies.
Binance provides real-time trade data that can be utilized within MindsDB to make real-time forecasts.
## Connection
This handler integrates with the [Binance API](https://binance-docs.github.io/apidocs/spot/en/#change-log) to make aggregate trade (kline) data available to use for model training and predictions.
Since there are no parameters required to connect to Binance using MindsDB, you can use the below statement:
```sql theme={null}
CREATE DATABASE my_binance
WITH
ENGINE = 'binance';
```
## Usage
### Select Data
By default, aggregate data (klines) from the latest 1000 trading intervals with a length of one minute (1m) each will be returned.
```sql theme={null}
SELECT *
FROM my_binance.aggregated_trade_data
WHERE symbol = 'BTCUSDT';
```
Here is the sample output data:
```
| symbol | open_time | open_price | high_price | low_price | close_price | volume | close_time | quote_asset_volume | number_of_trades | taker_buy_base_asset_volume | taker_buy_quote_asset_volume |
| ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ------------------ | ---------------- | --------------------------- | ---------------------------- |
| BTCUSDT | 1678338600 | 21752.65000 | 21761.33000 | 21751.53000 | 21756.7000 | 103.8614100 | 1678338659.999| 2259656.20520700 | 3655 | 55.25763000 | 1202219.60971860 |
```
where:
* `symbol` - Trading pair (BTC to USDT in the above example)
* `open_time` - Start time of interval in seconds since the Unix epoch (default interval is 1m)
* `open_price` - Price of a base asset at the beginning of a trading interval
* `high_price` - The highest price of a base asset during trading interval
* `low_price` - Lowest price of a base asset during a trading interval
* `close_price` - Price of a base asset at the end of a trading interval
* `volume` - Total amount of base asset traded during an interval
* `close_time` - End time of interval in seconds since the Unix epoch
* `quote_asset_volume` - Total amount of quote asset (USDT in the above case) traded during an interval
* `number_of_trades` - Total number of trades made during an interval
* `taker_buy_base_asset_volume` - How much of the base asset volume is contributed by taker buy orders
* `taker_buy_quote_asset_volume` - How much of the quote asset volume is contributed by taker buy orders
To get a customized response we can pass open\_time, close\_time, and interval:
```sql theme={null}
SELECT *
FROM my_binance.aggregated_trade_data
WHERE symbol = 'BTCUSDT'
AND open_time > '2023-01-01'
AND close_time < '2023-01-03 08:00:00'
AND interval = '1s'
LIMIT 10000;
```
Supported intervals are [listed here](https://binance-docs.github.io/apidocs/spot/en/#kline-candlestick-data)
### Train a Model
Here is how to create a time series model using 10000 trading intervals in the past with a duration of 1m.
```sql theme={null}
CREATE MODEL mindsdb.btc_forecast_model
FROM my_binance
(
SELECT * FROM aggregated_trade_data
WHERE symbol = 'BTCUSDT'
AND close_time < '2023-01-01'
AND interval = '1m'
LIMIT 10000;
)
PREDICT open_price
ORDER BY open_time
WINDOW 100
HORIZON 10;
```
For more accuracy, the limit can be set to a higher value (e.g. 100,000)
### Making Predictions
First, let's create a view for the most recent BTCUSDT aggregate trade data:
```sql theme={null}
CREATE VIEW recent_btcusdt_data AS (
SELECT * FROM my_binance.aggregated_trade_data
WHERE symbol = 'BTCUSDT'
)
```
Now let's predict the future price of BTC:
```sql theme={null}
SELECT m.*
FROM recent_btcusdt_data AS t
JOIN mindsdb.btc_forecast_model AS m
WHERE m.open_time > LATEST
```
This will give the predicted BTC price for the next 10 minutes (as the horizon is set to 10) in terms of USDT.
# Confluence
Source: https://docs.mindsdb.com/integrations/app-integrations/confluence
This documentation describes the integration of MindsDB with [Confluence](https://www.atlassian.com/software/confluence), a popular collaboration and documentation tool developed by Atlassian.
The integration allows MindsDB to access data from Confluence and enhance it with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
## Connection
Establish a connection to Confluence from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/confluence_handler) as an engine.
```sql theme={null}
CREATE DATABASE confluence_datasource
WITH
ENGINE = 'confluence',
PARAMETERS = {
"api_base": "https://example.atlassian.net",
"username": "john.doe@example.com",
"password": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6"
};
```
Required connection parameters include the following:
* `api_base`: The base URL for your Confluence instance/server.
* `username`: The email address associated with your Confluence account.
* `password`: The API token generated for your Confluence account.
Refer this [guide](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/) for instructions on how to create API tokens for your account.
## Usage
Retrieve data from a specified table by providing the integration and table names:
```sql theme={null}
SELECT *
FROM confluence_datasource.table_name
LIMIT 10;
```
The above example utilize `confluence_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
## Supported Tables
* `spaces`: The table containing information about the spaces in Confluence.
* `pages`: The table containing information about the pages in Confluence.
* `blogposts`: The table containing information about the blog posts in Confluence.
* `whiteboards`: The table containing information about the whiteboards in Confluence.
* `databases`: The table containing information about the databases in Confluence.
* `tasks`: The table containing information about the tasks in Confluence.
# Docker Hub
Source: https://docs.mindsdb.com/integrations/app-integrations/dockerhub
In this section, we present how to connect Docker Hub repository to MindsDB.
[Docker Hub](https://hub.docker.com/) is the world's easiest way to create, manage, and deliver your team's container applications.
Data from Docker Hub can be utilized within MindsDB to train models and make predictions about Docker Hub repositories.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Docker Hub to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Docker Hub.
## Connection
This handler is implemented using the `requests` library that makes http calls to [https://docs.docker.com/docker-hub/api/latest/#tag/resources](https://docs.docker.com/docker-hub/api/latest/#tag/resources).
The required arguments to establish a connection are as follows:
* `username`: Username used to login to DockerHub.
* `password`: Password used to login to DockerHub.
Read about creating an account [here](https://hub.docker.com/).
Here is how to connect to Docker Hub using MindsDB:
```sql theme={null}
CREATE DATABASE dockerhub_datasource
WITH ENGINE = 'dockerhub',
PARAMETERS = {
"username": "username",
"password": "password"
};
```
## Usage
Now, you can query Docker Hub as follows:
```sql theme={null}
SELECT * FROM dockerhub_datasource.repo_images_summary WHERE namespace="docker" AND repository="trusted-registry-nginx";
```
Both the `namespace` and `repository` parameters are required in the WHERE clause.
# Email
Source: https://docs.mindsdb.com/integrations/app-integrations/email
In this section, we present how to connect Email accounts to MindsDB.
By connecting your email account to MindsDB, you can utilize various AI models available within MindsDB to summarize emails, detect spam, or even automate email replies.
Please note that currently you can connect Gmail and Outlook accounts using this integration.
## Connection
This handler was implemented using standard Python libraries: `email`, `imaplib`, and `smtplib`.
The Email handler is initialized with the following required parameters:
* `email` stores an email address used for authentication.
* `password` stores a password used for authentication.
Additionally, the following optional parameters can be passed:
* `smtp_server` used to send emails. Defaults to `smtp.gmail.com`.
* `smtp_port` used to send emails. Defaults to `587`.
* `imap_server` used to receive emails. Defaults to `imap.gmail.com`.
At the moment, the handler has been tested with Gmail and Outlook accounts.
To use the handler on a Gmail account, you must create an app password following [this instruction](https://support.google.com/accounts/answer/185833?hl=en) and use its value for the `password` parameter.
By default, the Email handler connects to Gmail. If you want to use other email providers as Outlook, add the values for `imap_server` and `smtp_server` parameters.
### Gmail
To connect your Gmail account to MindsDB, use the below `CREATE DATABASE` statement:
```sql theme={null}
CREATE DATABASE email_datasource
WITH ENGINE = 'email',
PARAMETERS = {
"email": "youremail@gmail.com",
"password": "yourpassword"
};
```
It creates a database that comes with the `emails` table.
### Outlook
To connect your Outlook account to MindsDB, use the below `CREATE DATABASE` statement:
```sql theme={null}
CREATE DATABASE email_datasource
WITH ENGINE = 'email',
PARAMETERS = {
"email": "youremail@outlook.com",
"password": "yourpassword",
"smtp_server": "smtp.office365.com",
"smtp_port": "587",
"imap_server": "outlook.office365.com"
};
```
It creates a database that comes with the `emails` table.
## Usage
Now you can query for emails like this:
```sql theme={null}
SELECT *
FROM email_datasource.emails;
```
And you can apply filters like this:
```sql theme={null}
SELECT id, body, subject, to_field, from_field, datetime
FROM email_datasource.emails
WHERE subject = 'MindsDB'
ORDER BY id
LIMIT 5;
```
Or, write emails like this:
```sql theme={null}
INSERT INTO email_datasource.emails(to_field, subject, body)
VALUES ("toemail@outlook.com", "MindsDB", "Hello from MindsDB!");
```
# GitHub
Source: https://docs.mindsdb.com/integrations/app-integrations/github
In this section, we present how to connect GitHub repository to MindsDB.
[GitHub](https://github.com/) is a web-based platform and service that is primarily used for version control and collaborative software development. It provides a platform for developers and teams to host, review, and manage source code for software projects.
Data from GitHub, including issues and PRs, can be utilized within MindsDB to make relevant predictions or automate the issue/PR creation.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect GitHub to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to GitHub.
## Connection
This handler is implemented using the `pygithub` library, a Python library that wraps GitHub API v3.
The required arguments to establish a connection are as follows:
* `repository` is the GitHub repository name.
* `api_key` is an optional GitHub API key to use for authentication.
* `github_url` is an optional GitHub URL to connect to a GitHub Enterprise instance.
Check out [this guide](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) on how to create the GitHub API key.
It is recommended to use the API key to avoid the `API rate limit exceeded` error.
Here is how to connect the MindsDB GitHub repository:
```sql theme={null}
CREATE DATABASE mindsdb_github
WITH ENGINE = 'github',
PARAMETERS = {
"repository": "mindsdb/mindsdb"
};
```
## Usage
The `mindsdb_github` connection contains two tables: `issues` and `pull_requests`.
Here is how to query for all issues:
```sql theme={null}
SELECT *
FROM mindsdb_github.issues;
```
You can run more advanced queries to fetch specific issues in a defined order:
```sql theme={null}
SELECT number, state, creator, assignees, title, labels
FROM mindsdb_github.issues
WHERE state = 'open'
LIMIT 10;
```
And the same goes for pull requests:
```sql theme={null}
SELECT number, state, title, creator, head, commits
FROM mindsdb_github.pull_requests
WHERE state = 'open'
LIMIT 10;
```
For more information about available actions and development plans, visit [this page](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/github_handler/README.md).
# GitLab
Source: https://docs.mindsdb.com/integrations/app-integrations/gitlab
In this section, we present how to connect GitLab repository to MindsDB.
[GitLab](https://about.gitlab.com/) is a DevSecOps Platform. Data from GitLab, including issues and MRs, can be utilized within MindsDB to make relevant predictions or automate the issue/MR creation.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect GitLab to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to GitLab.
## Connection
This handler was implemented using the [python-gitlab](https://github.com/python-gitlab/python-gitlab) library.
python-gitlab is a Python library that wraps GitLab API.
The GitLab handler is initialized with the following parameters:
* `repository`: a required name of a GitLab repository to connect to.
* `api_key`: an optional GitLab API key to use for authentication.
Here is how to connect MindsDB to a GitLab repository:
```sql theme={null}
CREATE DATABASE mindsdb_gitlab
WITH ENGINE = 'gitlab',
PARAMETERS = {
"repository": "gitlab-org/gitlab",
"api_key": "api_key", -- optional GitLab API key
};
```
## Usage
The `mindsdb_gitlab` connection contains two tables: `issues` and `merge_requests`.
Now, you can use this established connection to query this table as:
```sql theme={null}
SELECT * FROM mindsdb_gitlab.issues;
```
You can run more advanced queries to fetch specific issues in a defined order:
```sql theme={null}
SELECT number, state, creator, assignee, title, created, labels
FROM mindsdb_gitlab.issues
WHERE state="opened"
ORDER BY created ASC, creator DESC
LIMIT 10;
```
And the same goes for merge requests:
```sql theme={null}
SELECT number, state, creator, reviewers, title, created, has_conflicts
FROM mindsdb_gitlab.merge_requests
WHERE state="merged"
ORDER BY created ASC, creator DESC
LIMIT 10;
```
For more information about available actions and development plans, visit [this page](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/gitlab_handler/README.md).
# Gmail
Source: https://docs.mindsdb.com/integrations/app-integrations/gmail
In this section, we present how to connect Gmail accounts to MindsDB.
[Gmail](https://gmail.com/) is a widely used and popular email service developed by Google.
By connecting your Gmail account to MindsDB, you can utilize various AI models available within MindsDB to summarize emails, detect spam, or even automate email replies.
Please note that currently you can connect your Gmail account to local MindsDB installation by providing a path to the credentials file stored locally.
If you want to connect your Gmail account to MindsDB Cloud, you can upload the credentials file, for instance, to your S3 bucket and provide a link to it as a parameter.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Gmail to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Gmail.
## Connection
The required arguments to establish a connection are as follows:
* `credentials_file` local path to the credentials.json or `credentials_url` in case your file is uploaded to s3. Follow the instructions below to generate the credentials file.
* `scopes` define the level of access granted. It is optional and by default it uses '[https://.../gmail.compose](https://.../gmail.compose)' and '[https://.../gmail.readonly](https://.../gmail.readonly)' scopes.
In order to make use of this handler and connect the Google Calendar app to MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE mindsdb_gmail
WITH ENGINE = 'gmail',
parameters = {
"credentials_file": "mindsdb/integrations/handlers/gmail_handler/credentials.json",
"scopes": ['https://.../gmail.compose', 'https://.../gmail.readonly', ...]
};
```
Or, you can also connect by giving the credentials file from an s3 [pre signed url](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html). To do this you need to pass in the credentials\_file parameter as a [pre signed url](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html). For example:
```sql theme={null}
CREATE DATABASE mindsdb_gmail
WITH ENGINE = 'gmail',
parameters = {
"credentials_url": "https://s3.amazonaws.com/your_bucket/credentials.json?response-content-disposition=inline&X-Amz-Security-Token=12312...",
-- "scopes": ['SCOPE_1', 'SCOPE_2', ...] -- Optional scopes. By default 'https://.../gmail.compose' & 'https://.../gmail.readonly' scopes are used
};
```
You need a Google account in order to use this integration. Here is how to get the credentials file:
1. Create a Google Cloud Platform (GCP) Project:
1.1 Go to the GCP Console ([https://console.cloud.google.com/](https://console.cloud.google.com/)).
1.2 If you haven't created a project before, you'll be prompted to do so now.
1.3 Give your new project a name.
1.4 Click `Create` to create the new project.
2. Enable the Gmail API:
2.1 In the GCP Console, select your project.
2.2 Navigate to `APIs & Services` > `Library`.
2.3 In the search bar, search for `Gmail`.
2.4 Click on `Gmail API`, then click `Enable`.
3. Create credentials for the Gmail API:
3.1 Navigate to `APIs & Services` > `Credentials`.
3.2 Click on the `Create Credentials` button and choose `OAuth client ID`.
3.3 If you haven't configured the OAuth consent screen before, you'll be prompted to do so now. Make sure to choose `External` for User Type, and select the necessary scopes. Make sure to save the changes.
Now, create the OAuth client ID. Choose `Web application` for the Application Type and give it a name.
3.4 Add the following MindsDB URL to `Authorized redirect URIs`:
* For local installation, add `http://localhost/verify-auth`
* For Cloud, add `http://cloud.mindsdb.com/verify-auth`.
3.5 Click `Create`.
4. Download the JSON file:
4.1 After creating your credentials, click the download button (an icon of an arrow pointing down) on the right side of your client ID. This will download a JSON file, so you will use the location to it in the `credentials_file` param.
## Usage
This creates a database called mindsdb\_gmail. This database ships with a table called emails that we can use to search for emails as well as to write emails.
Now you can use your Gmail data, like this:
* searching for email:
```sql theme={null}
SELECT *
FROM mindsdb_gmail.emails
WHERE query = 'alert from:*@google.com'
AND label_ids = "INBOX,UNREAD"
LIMIT 20;
```
* writing emails:
```sql theme={null}
INSERT INTO mindsdb_gmail.emails (thread_id, message_id, to_email, subject, body)
VALUES ('187cbdd861350934d', '8e54ccfd-abd0-756b-a12e-f7bc95ebc75b@Spark', 'test@example2.com', 'Trying out MindsDB',
'This seems awesome. You must try it out whenever you can.');
```
## Example 1: Automating Email Replies
Now that we know how to pull emails into our database and write emails, we can make use of OpenAPI engine to write email replies.
First, create an OpenAI engine, passing your OpenAI API key:
```sql theme={null}
CREATE ML_ENGINE openai_engine
FROM openai
USING
openai_api_key = 'your-openai-api-key';
```
Then, create a model using this engine:
```sql theme={null}
CREATE MODEL mindsdb.gpt_model
PREDICT response
USING
engine = 'openai_engine',
max_tokens = 500,
api_key = 'your_api_key',
model_name = 'gpt-3.5-turbo',
prompt_template = 'From input message: {{body}}\
by from_user: {{sender}}\
In less than 500 characters, write an email response to {{sender}} in the following format:\
Start with proper salutation and respond with a short message in a casual tone, and sign the email with my name mindsdb';
```
## Example 2: Detecting Spam Emails
You can check if an email is spam by using one of the Hugging Face pre-trained models.
```sql theme={null}
CREATE MODEL mindsdb.spam_classifier
PREDICT PRED
USING
engine = 'huggingface',
task = 'text-classification',
model_name = 'mrm8488/bert-tiny-finetuned-sms-spam-detection',
input_column = 'text_spammy',
labels = ['ham', 'spam'];
```
Then, create a view that contains the snippet or the body of the email.
```sql theme={null}
CREATE VIEW mindsdb.emails_text AS(
SELECT snippet AS text_spammy
FROM mindsdb_gmail.emails
);
```
Finally, you can use the model to classify emails into spam or ham:
```sql theme={null}
SELECT h.PRED, h.PRED_explain, t.text_spammy AS input_text
FROM mindsdb.emails_text AS t
JOIN mindsdb.spam_classifier AS h;
```
For more information about available actions and development plans, visit [this page](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/gmail_handler/README.md).
# Gong
Source: https://docs.mindsdb.com/integrations/app-integrations/gong
This documentation describes the integration of MindsDB with [Gong](https://www.gong.io/), a conversation intelligence platform that captures, analyzes, and provides insights from customer conversations.
The integration allows MindsDB to access call recordings, transcripts, analytics, and other conversation data from Gong and enhance it with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. To connect Gong to MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies).
3. Obtain a Gong API key from your [Gong API settings page](https://app.gong.io/settings/api-keys).
## Connection
Establish a connection to Gong from MindsDB by executing the following SQL command and providing its handler name as an engine.
### Using Bearer Token (Recommended)
```sql theme={null}
CREATE DATABASE gong_datasource
WITH
ENGINE = 'gong',
PARAMETERS = {
"api_key": "your_gong_api_key_here"
};
```
### Using Basic Authentication
```sql theme={null}
CREATE DATABASE gong_datasource
WITH
ENGINE = 'gong',
PARAMETERS = {
"access_key": "your_access_key",
"secret_key": "your_secret_key"
};
```
Required connection parameters include the following:
**Authentication (choose one method):**
* `api_key`: Bearer token for authentication (recommended)
* `access_key` + `secret_key`: Basic authentication credentials (alternative method)
Optional connection parameters include the following:
* `base_url`: Gong API base URL. This parameter defaults to `https://api.gong.io`.
* `timeout`: Request timeout in seconds. This parameter defaults to `30`.
If both authentication methods are provided, basic auth (`access_key` + `secret_key`) takes precedence.
## Usage
The following usage examples utilize `gong_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
### Available Tables
The Gong handler provides access to the following tables:
* `calls` - Access call recordings and metadata
* `users` - Get user information and permissions
* `analytics` - Access AI-generated conversation insights
* `transcripts` - Get full conversation transcripts
### Basic Queries
Retrieve recent calls with date filters (recommended for best performance):
```sql theme={null}
SELECT *
FROM gong_datasource.calls
WHERE date >= '2024-01-01' AND date < '2024-02-01'
ORDER BY date DESC
LIMIT 20;
```
Get all users in your organization:
```sql theme={null}
SELECT user_id, name, email, role, status
FROM gong_datasource.users
LIMIT 100;
```
Get analytics for calls with high sentiment scores:
```sql theme={null}
SELECT call_id, sentiment_score, key_phrases, topics
FROM gong_datasource.analytics
WHERE sentiment_score > 0.7
AND date >= '2024-01-01'
LIMIT 50;
```
Get transcripts for a specific call:
```sql theme={null}
SELECT speaker, timestamp, text
FROM gong_datasource.transcripts
WHERE call_id = '12345'
ORDER BY timestamp;
```
### Advanced Queries with JOINs
Get calls with their sentiment analysis:
```sql theme={null}
SELECT
c.title,
c.date,
c.duration,
a.sentiment_score,
a.key_phrases
FROM gong_datasource.calls c
JOIN gong_datasource.analytics a ON c.call_id = a.call_id
WHERE c.date >= '2024-01-01' AND c.date < '2024-02-01'
ORDER BY a.sentiment_score DESC
LIMIT 25;
```
Find calls where specific keywords were mentioned:
```sql theme={null}
SELECT
c.title,
c.date,
t.speaker,
t.text
FROM gong_datasource.calls c
JOIN gong_datasource.transcripts t ON c.call_id = t.call_id
WHERE c.date >= '2024-01-01'
AND t.text LIKE '%pricing%'
LIMIT 50;
```
Get user performance with call sentiment:
```sql theme={null}
SELECT
u.name,
u.email,
c.call_id,
c.title,
a.sentiment_score
FROM gong_datasource.users u
JOIN gong_datasource.calls c ON u.user_id = c.user_id
JOIN gong_datasource.analytics a ON c.call_id = a.call_id
WHERE c.date >= '2024-01-01'
AND a.sentiment_score > 0.8
LIMIT 100;
```
## Data Schema
### calls Table
| Column | Description |
| --------------- | -------------------------------------------- |
| `call_id` | Unique identifier for the call (Primary Key) |
| `title` | Call title or description |
| `date` | Call date and time (ISO-8601 format) |
| `duration` | Call duration in seconds |
| `recording_url` | URL to the call recording |
| `call_type` | Type of call (e.g., "sales", "demo") |
| `user_id` | ID of the user who made the call |
| `participants` | Comma-separated list of participants |
| `status` | Call status |
### users Table
| Column | Description |
| ------------- | -------------------------------------------- |
| `user_id` | Unique identifier for the user (Primary Key) |
| `name` | User's full name |
| `email` | User's email address |
| `role` | User's role in the organization |
| `permissions` | Comma-separated list of user permissions |
| `status` | User status |
### analytics Table
| Column | Description |
| ------------------ | ------------------------------------------------------------------ |
| `call_id` | Reference to the call (Primary Key, Foreign Key to calls.call\_id) |
| `sentiment_score` | Sentiment analysis score |
| `topic_score` | Topic detection score |
| `key_phrases` | Comma-separated list of key phrases |
| `topics` | Comma-separated list of detected topics |
| `emotions` | Comma-separated list of detected emotions |
| `confidence_score` | Confidence score for the analysis |
### transcripts Table
| Column | Description |
| ------------ | ---------------------------------------------------------- |
| `segment_id` | Unique identifier for the transcript segment (Primary Key) |
| `call_id` | Reference to the call (Foreign Key to calls.call\_id) |
| `speaker` | Name of the speaker |
| `timestamp` | Timestamp of the transcript segment (ISO-8601 format) |
| `text` | Transcribed text |
| `confidence` | Confidence score for the transcription |
## Troubleshooting
`Authentication Error`
* **Symptoms**: Failure to connect MindsDB with Gong.
* **Checklist**:
1. Verify that your Gong API key is valid and not expired.
2. Ensure you have the necessary permissions in Gong to access the API.
3. Check that your API key has access to the specific data you're querying.
4. If using basic authentication, verify both `access_key` and `secret_key` are correct.
`Empty Results or Missing Data`
* **Symptoms**: Queries return no results or incomplete data.
* **Checklist**:
1. Verify that date filters are included in your query (required for calls, analytics, transcripts).
2. Check that the date range includes data (analytics and transcripts have \~1 hour lag).
3. Ensure call\_id exists when querying transcripts for a specific call.
4. Verify that your Gong account has data for the requested time period.
`Slow Query Performance`
* **Symptoms**: Queries take a long time to execute.
* **Checklist**:
1. Add date filters to limit the data range (essential for large datasets).
2. Use LIMIT to restrict the number of results.
3. Filter by call\_id when querying transcripts.
4. Avoid querying transcripts without filters (can return thousands of rows per call).
# Google Analytics
Source: https://docs.mindsdb.com/integrations/app-integrations/google-analytics
In this section, we present how to connect Google Analytics to MindsDB.
[Google Analytics](https://analytics.google.com/) is a web analytics service offered by Google that tracks and reports website traffic and also the mobile app traffic & events.
Data from Google Analytics can be utilized within MindsDB to train AI models, make predictions, and automate user engagement and events with AI.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Google Analytics to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Google Analytics.
## Connection
The required arguments to establish a connection are as follows:
* `credentials_file` optional, is a path to the JSON file that stores credentials to the Google account.
* `credentials_json`: optional, is the content of the JSON file that stores credentials to the Google account.
* `property_id` required, is the property id of your Google Analytics website. [Here](https://developers.google.com/analytics/devguides/reporting/data/v1/property-id) is some information on how to get the property id.
> ⚠️ One of credentials\_file or credentials\_json has to be chosen.
Please note that a Google account with enabled Google Analytics Admin API is required. You can find more information [here](https://developers.google.com/analytics/devguides/config/admin/v1/quickstart-client-libraries).
Also an active website connected with Google Analytics is required. You can find more information [here](https://support.google.com/analytics/answer/9304153?hl=en).
To make use of this handler and connect the Google Analytics app to MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE my_ga
WITH
ENGINE = 'google_analytics',
PARAMETERS = {
'credentials_file': '\path-to-your-file\credentials.json',
'property_id': ''
};
```
You need a Google account in order to use this integration. Here is how to get the credentials file:
1. Create a Google Cloud Platform (GCP) Project:
1.1 Go to the GCP Console ([https://console.cloud.google.com/](https://console.cloud.google.com/)).
1.2 If you haven't created a project before, you'll be prompted to do so now.
1.3 Give your new project a name.
1.4 Click `Create` to create the new project.
2. Enable the Google Analytics Admin API:
2.1 In the GCP Console, select your project.
2.2 Navigate to `APIs & Services` > `Library`.
2.3 In the search bar, search for `Google Analytics Admin API`.
2.4 Click on `Google Analytics Admin API`, then click `Enable`.
3. Create credentials for the Google Analytics Admin API :
3.1 Navigate to `APIs & Services` > `Credentials`.
3.2 Click on the `Create Credentials` button and choose `Service account`.
3.3 Enter a unique `Service account ID` .
3.4 Click `Done`.
3.5 Copy the service account you created. Find it under `Service Accounts`.
3.6 Now click on the service account you created, and navigate `KEYS`
3.7 Click `ADD KEY` > `Create new key`.
3.8 Choose `JSON` then click `CREATE`
3.9 After this the credentials file will be downloaded directly. Locate the file and use the location to it in the `credentials_file` param.
4. Add Service account to Google Analytics Property:
4.1 In the Google Analytics Admin Console, select the Account or Property to which you want to grant access.
4.2 Navigate to the `Admin` panel.
4.3 Navigate `Account` > `Account Access Management`.
4.4 Click on the "+" icon to add a new user.
4.5 Enter the service account you copied in step 3.5 as the email address.
4.6 Assign the appropriate permissions to the service account. At a minimum, you'll need to grant it `Edit` permissions.
4.7 Click on the `Add` button to add the service account as a user with the specified permissions.
## Usage
This creates a database that comes with the `conversion_events` table.
Now you can use your Google Analytics data like this:
* searching for conversion events:
```sql theme={null}
SELECT event_name, custom, countingMethod
FROM my_ga.conversion_events;
```
* creating conversion event:
```sql theme={null}
INSERT INTO my_ga.conversion_events (event_name, countingMethod)
VALUES ('mindsdb_event', 2);
```
* updating one conversion event:
```sql theme={null}
UPDATE my_ga.conversion_events
SET countingMethod = 1,
WHERE name = '';
```
* deleting one conversion event:
```sql theme={null}
DELETE
FROM my_ga.conversion_events
WHERE name = '';
```
For more information about available actions and development plans, visit [this page](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/google_analytics_handler).
# Google Calendar
Source: https://docs.mindsdb.com/integrations/app-integrations/google-calendar
In this section, we present how to connect Google Calendar to MindsDB.
[Google Calendar](https://calendar.google.com/calendar/) is an online calendar service and application developed by Google. It allows users to create, manage, and share events and appointments, as well as schedule and organize their personal, work, or team activities.
Data from Google Calendar can be utilized within MindsDB to train AI models, make predictions, and automate time management with AI.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Google Calendar to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Google Calendar.
## Connection
The required arguments to establish a connection are as follows:
* `credentials_file` is a path to the JSON file that stores credentials to the Google account.
Please note that a Google account with enabled Google Calendar is required. You can find more information [here](https://developers.google.com/calendar/api/quickstart/python).
In order to make use of this handler and connect the Google Calendar app to MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE my_calendar
WITH
ENGINE = 'google_calendar',
PARAMETERS = {
'credentials_file': '\path-to-your-file\credentials.json'
};
```
You need a Google account in order to use this integration. Here is how to get the credentials file:
1. Create a Google Cloud Platform (GCP) Project:
1.1 Go to the GCP Console ([https://console.cloud.google.com/](https://console.cloud.google.com/)).
1.2 If you haven't created a project before, you'll be prompted to do so now.
1.3 Give your new project a name.
1.4 Click `Create` to create the new project.
2. Enable the Google Calendar API:
2.1 In the GCP Console, select your project.
2.2 Navigate to `APIs & Services` > `Library`.
2.3 In the search bar, search for `Google Calendar API`.
2.4 Click on `Google Calendar API`, then click `Enable`.
3. Create credentials for the Google Calendar API :
3.1 Navigate to `APIs & Services` > `Credentials`.
3.2 Click on the `Create Credentials` button and choose `OAuth client ID`.
3.3 If you haven't configured the OAuth consent screen before, you'll be prompted to do so now. Make sure to choose `External` for User Type, and add all the necessary scopes. Make sure to save the changes.
Now, create the OAuth client ID. Choose `Desktop app` for the Application Type and give it a name.
3.4 Click `Create`.
4. Download the JSON file:
4.1 After creating your credentials, click the download button (an icon of an arrow pointing down) on the right side of your client ID. This will download a JSON file, so you will use the location to it in the `credentials_file` param.
## Usage
This creates a database that comes with the `calendar` table.
Now you can use your Google Calendar data, like this:
* searching for events:
```sql theme={null}
SELECT id, created_at, author_username, text
FROM my_calendar.events
WHERE start_time = '2023-02-16'
AND end_time = '2023-04-09' LIMIT 20;
```
* creating events:
```sql theme={null}
INSERT INTO my_calendar.events(start_time, end_time, summary, description, location, attendees, reminders, timeZone)
VALUES ('2023-02-16 10:00:00', '2023-02-16 11:00:00', 'MindsDB Meeting', 'Discussing the future of MindsDB', 'MindsDB HQ', '', 'Europe/Athens');
```
* updating one or more events:
```sql theme={null}
UPDATE my_calendar.events
SET summary = 'MindsDB Meeting',
description = 'Discussing the future of MindsDB',
location = 'MindsDB HQ',
attendees = '',
reminders = ''
WHERE event_id > 1 AND event_id < 10; -- used to update events in a given range
```
* deleting one or more events:
```sql theme={null}
DELETE
FROM my_calendar.events
WHERE id = '1';
```
For more information about available actions and development plans, visit [this page](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/google_calendar_handler/README.md).
# Hacker News
Source: https://docs.mindsdb.com/integrations/app-integrations/hackernews
In this section, we present how to connect Hacker News to MindsDB.
[Hacker News](https://news.ycombinator.com/) is an online platform and community for discussions related to technology, startups, computer science, entrepreneurship, and a wide range of other topics of interest to the tech and hacker communities. It was created by Y Combinator, a well-known startup accelerator.
Data from Hacker News, including articles and user comments, can be utilized within MindsDB to train AI models and chatbots with the knowledge and discussions shared at Hacker News.
## Connection
This handler is implemented using the official Hacker News API. It provides a simple and easy-to-use interface to access the Hacker News API.
There are no connection arguments required.
In order to make use of this handler and connect the Hacker News to MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE my_hackernews
WITH ENGINE = 'hackernews';
```
It creates a database that comes with the `stories` and `comments` tables.
## Usage
Now you can query the articles, like this:
```sql theme={null}
SELECT *
FROM my_hackernews.stories
LIMIT 2;
```
And here is how to fetch comments for a specific article:
```sql theme={null}
SELECT *
FROM my_hackernews.comments
WHERE item_id=35662571
LIMIT 1;
```
# Instatus
Source: https://docs.mindsdb.com/integrations/app-integrations/instatus
In this section, we present how to connect Instatus to MindsDB.
[Instatus](https://instatus.com/) is a cloud-based status page software that enables users to communicate status information using incidents and maintenances. It serves as a SaaS platform for creating status pages for services.
The Instatus Handler for MindsDB offers an interface to connect with Instatus via APIs and retrieve status pages.
## Connection
Initialize the Instatus handler with the following parameter:
* `api_key`: Instatus API key for authentication. Obtain it from [Instatus Developer Dashboard](https://dashboard.instatus.com/developer).
Start by creating a database with the new instatus engine using the following SQL command:
```sql theme={null}
CREATE DATABASE mindsdb_instatus --- Display name for the database.
WITH
ENGINE = 'instatus', --- Name of the MindsDB handler.
PARAMETERS = {
"api_key": "" --- Instatus API key to use for authentication.
};
```
## Usage
To get a status page, use the `SELECT` statement:
```sql theme={null}
SELECT id, name, status, subdomain
FROM mindsdb_instatus.status_pages
WHERE id = ''
LIMIT 10;
```
To create a new status page, use the `INSERT` statement:
```sql theme={null}
INSERT INTO mindsdb_instatus.status_pages (email, name, subdomain, components, logoUrl, faviconUrl, websiteUrl, language, useLargeHeader, brandColor, okColor, disruptedColor, degradedColor, downColor, noticeColor, unknownColor, googleAnalytics, subscribeBySms, smsService, twilioSid, twilioToken, twilioSender, nexmoKey, nexmoSecret, nexmoSender, htmlInMeta, htmlAboveHeader, htmlBelowHeader, htmlAboveFooter, htmlBelowFooter, htmlBelowSummary, cssGlobal, launchDate, dateFormat, dateFormatShort, timeFormat)
VALUES ('yourname@gmail.com', 'mindsdb', 'mindsdb-instatus', '["Website", "App", "API"]', 'https://instatus.com/sample.png', 'https://instatus.com/favicon-32x32.png', 'https://instatus.com', 'en', true, '#111', '#33B17E', '#FF8C03', '#ECC94B', '#DC123D', '#70808F', '#DFE0E1', 'UA-00000000-1', true, 'twilio', 'YOUR_TWILIO_SID', 'YOUR_TWILIO_TOKEN', 'YOUR_TWILIO_SENDER', null, null, null, null, null, null, null, null, null, null, 'MMMMMM d, yyyy', 'MMM yyyy', 'p');
```
The following fields are required when inserting new status pages:
* `email` (e.g. '[yourname@gmail.com](mailto:yourname@gmail.com)')
* `name` (e.g 'mindsdb')
* `subdomain` (e.g. 'mindsdb-docs')
* `components` (e.g. '\["Website", "App", "API"]')
The other fields are optional.
To update an existing status page, use the `UPDATE` statement:
```sql theme={null}
UPDATE mindsdb_instatus.status_pages
SET name = 'mindsdb',
status = 'UP',
logoUrl = 'https://instatus.com/sample.png',
faviconUrl = 'https://instatus.com/favicon-32x32.png',
websiteUrl = 'https://instatus.com',
language = 'en',
translations = '{
"name": {
"fr": "nasa"
}
}'
WHERE id = '';
```
# Intercom
Source: https://docs.mindsdb.com/integrations/app-integrations/intercom
[Intercom](https://intercom.com) is a software company that provides customer messaging and engagement tools for businesses. They offer products and services for customer support, marketing, and sales, allowing companies to communicate with their customers through various channels like chat, email, and more.
## Connection
To get started with the Intercom API, you need to initialize the API handler with the required access token for authentication. You can do this as follows:
* `access_token`: Your Intercom access token for authentication.
Check out [this guide](https://developers.intercom.com/docs/build-an-integration/learn-more/authentication/) on how to get the intercom access token in order to access Intercom data.
To create a database using the Intercom engine, you can use a SQL-like syntax as shown below:
```sql theme={null}
CREATE DATABASE myintercom
WITH
ENGINE = 'intercom',
PARAMETERS = {
"access_token" : "your-intercom-access-token"
};
```
## Usage
You can retrieve data from Intercom using a `SELECT` statement. For example:
```sql theme={null}
SELECT *
FROM myintercom.articles;
```
You can filter data based on specific criteria using a `WHERE` clause. Here's an example:
```sql theme={null}
SELECT *
FROM myintercom.articles
WHERE id = ;
```
To create a new article in Intercom, you can use the `INSERT` statement. Here's an example:
```sql theme={null}
INSERT INTO myintercom.articles (title, description, body, author_id, state, parent_id, parent_type)
VALUES (
'Thanks for everything',
'Description of the Article',
'Body of the Article',
6840572,
'published',
6801839,
'collection'
);
```
You can update existing records in Intercom using the `UPDATE` statement. For instance:
```sql theme={null}
UPDATE myintercom.articles
SET title = 'Christmas is here!',
body = '
New gifts in store for the jolly season
'
WHERE id = ;
```
# Jira
Source: https://docs.mindsdb.com/integrations/app-integrations/jira
This documentation describes the integration of MindsDB with [Jira](https://www.atlassian.com/software/jira/guides/getting-started/introduction), the #1 agile project management tool used by teams to plan, track, release and support world-class software with confidence.
The integration allows MindsDB to access data from Jira and enhance it with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. To connect Jira to MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to Jira from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/jira_handler) as an engine.
```sql theme={null}
CREATE DATABASE jira_datasource
WITH
ENGINE = 'jira',
PARAMETERS = {
"url": "https://example.atlassian.net",
"username": "john.doe@example.com",
"api_token": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6"
};
```
Required connection parameters include the following:
* `url`: The base URL for your Jira instance/server.
* `username`: The email address associated with your Jira account.
* `api_token`: The API token generated for your Jira account.
* `cloud`: (Optional) Set to `true` for Jira Cloud or `false` for Jira Server. Defaults to `true`.
Refer this [guide](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/) for instructions on how to create API tokens for your account.
## Usage
Retrieve data from a specified table by providing the integration and table names:
```sql theme={null}
SELECT *
FROM jira_datasource.table_name
LIMIT 10;
```
The above example utilize `jira_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
# MediaWiki
Source: https://docs.mindsdb.com/integrations/app-integrations/mediawiki
In this section, we present how to connect MediaWiki to MindsDB.
[MediaWiki](https://www.mediawiki.org/wiki/MediaWiki) is a free and open-source wiki software platform that is designed to enable the creation and management of wikis. It was originally developed for and continues to power Wikipedia. MediaWiki is highly customizable and can be used to create a wide range of collaborative websites and knowledge bases.
Data from MediaWiki can be utilized within MindsDB to train AI models and chatbots using the wide range of available information.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect MediaWiki to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to MediaWiki.
## Connection
This handler was implemented using [MediaWikiAPI](https://github.com/lehinevych/MediaWikiAPI), the Python wrapper for the MediaWiki API.
There are no connection arguments required to initialize the handler.
To connect the MediaWiki API to MindsDB, the following CREATE DATABASE statement can be used:
```sql theme={null}
CREATE DATABASE mediawiki_datasource
WITH ENGINE = 'mediawiki'
```
## Usage
Now, you can query the MediaWiki API as follows:
```sql theme={null}
SELECT * FROM mediawiki_datasource.pages
```
You can run more advanced queries to fetch specific pages in a defined order:
```sql theme={null}
SELECT *
FROM mediawiki_datasource.pages
WHERE title = 'Barack'
ORDER BY pageid
LIMIT 5
```
# Microsoft One Drive
Source: https://docs.mindsdb.com/integrations/app-integrations/microsoft-onedrive
This documentation describes the integration of MindsDB with [Microsoft OneDrive](https://www.microsoft.com/en-us/microsoft-365/onedrive/online-cloud-storage), a cloud storage service that lets you back up, access, edit, share, and sync your files from any device.
## Prerequisites
1. Before proceeding, ensure that MindsDB is installed locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. Register an application in the [Azure portal](https://portal.azure.com/).
* Navigate to the [Azure Portal](https://portal.azure.com/#home) and sign in with your Microsoft account.
* Locate the **Microsoft Entra ID** service and click on it.
* Click on **App registrations** and then click on **New registration**.
* Enter a name for your application and select the `Accounts in this organizational directory only` option for the **Supported account types** field.
* Keep the **Redirect URI** field empty and click on **Register**.
* Click on **API permissions** and then click on **Add a permission**.
* Select **Microsoft Graph** and then click on **Delegated permissions**.
* Search for the `Files.Read` permission and select it.
* Click on **Add permissions**.
* Request an administrator to grant consent for the above permissions. If you are the administrator, click on **Grant admin consent for \[your organization]** and then click on **Yes**.
* Copy the **Application (client) ID** and record it as the `client_id` parameter, and copy the **Directory (tenant) ID** and record it as the `tenant_id` parameter.
* Click on **Certificates & secrets** and then click on **New client secret**.
* Enter a description for your client secret and select an expiration period.
* Click on **Add** and copy the generated client secret and record it as the `client_secret` parameter.
* Click on **Authentication** and then click on **Add a platform**.
* Select **Web** and enter URL where MindsDB has been deployed followed by `/verify-auth` in the **Redirect URIs** field. For example, if you are running MindsDB locally (on `https://localhost:47334`), enter `https://localhost:47334/verify-auth` in the **Redirect URIs** field.
## Connection
Establish a connection to Microsoft OneDrive from MindsDB by executing the following SQL command:
```sql theme={null}
CREATE DATABASE one_drive_datasource
WITH
engine = 'one_drive',
parameters = {
"client_id": "12345678-90ab-cdef-1234-567890abcdef",
"client_secret": "abcd1234efgh5678ijkl9012mnop3456qrst7890uvwx",
"tenant_id": "abcdef12-3456-7890-abcd-ef1234567890",
};
```
Note that sample parameter values are provided here for reference, and you should replace them with your connection parameters.
Required connection parameters include the following:
* `client_id`: The client ID of the registered application.
* `client_secret`: The client secret of the registered application.
* `tenant_id`: The tenant ID of the registered application.
## Usage
Retrieve data from a specified file in Microsoft OneDrive by providing the integration name and the file name:
```sql theme={null}
SELECT *
FROM one_drive_datasource.`my-file.csv`;
LIMIT 10;
```
Wrap the object key in backticks (\`) to avoid any issues parsing the SQL statements provided. This is especially important when the file name contains spaces, special characters or prefixes, such as `my-folder/my-file.csv`.
At the moment, the supported file formats are CSV, TSV, JSON, and Parquet.
The above examples utilize `one_drive_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
The special `files` table can be used to list the files available in Microsoft OneDrive:
```sql theme={null}
SELECT *
FROM one_drive_datasource.files LIMIT 10
```
The content of files can also be retrieved by explicitly requesting the `content` column. This column is empty by default to avoid unnecessary data transfer:
```sql theme={null}
SELECT path, content
FROM one_drive_datasource.files LIMIT 10
```
This table will return all objects regardless of the file format, however, only the supported file formats mentioned above can be queried.
## Troubleshooting Guide
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with Microsoft OneDrive.
* **Checklist**:
1. Ensure the `client_id`, `client_secret` and `tenant_id` parameters are correctly provided.
2. Ensure the registered application has the required permissions.
3. Ensure the generated client secret is not expired.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing object names containing spaces, special characters or prefixes.
* **Checklist**:
1. Ensure object names with spaces, special characters or prefixes are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel/travel\_data.csv
* Incorrect: SELECT \* FROM integration.'travel/travel\_data.csv'
* Correct: SELECT \* FROM integration.\`travel/travel\_data.csv\`
# Microsoft Teams
Source: https://docs.mindsdb.com/integrations/app-integrations/microsoft-teams
This documentation describes the integration of MindsDB with [Microsoft Teams](https://www.microsoft.com/en-us/microsoft-teams/group-chat-software), the ultimate messaging app for your organization.
The integration allows MindsDB to access data from Microsoft Teams and enhance it with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Microsoft Teams to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to Microsoft Teams from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/ms_teams_handler) as an engine.
```sql theme={null}
CREATE DATABASE teams_datasource
WITH ENGINE = 'teams',
PARAMETERS = {
"client_id": "12345678-90ab-cdef-1234-567890abcdef",
"client_secret": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6",
"tenant_id": "abcdef12-3456-7890-abcd-ef1234567890"
};
```
Required connection parameters include the following:
* `client_id`: The client ID of the registered Microsoft Entra ID application.
* `client_secret`: The client secret of the registered Microsoft Entra ID application.
* `tenant_id`: The tenant ID of the Microsoft Entra ID directory.
Optional connection parameters include the following:
* `permission_mode`: The type of permissions used to access data in Microsoft Teams. Can be either `delegated` (default) or `application`.
The `delegated` permission mode requires user sign-in and allows the app to access data on behalf of the signed-in user. The `application` permission mode does not require user sign-in and allows the app to access data without a user context. You can learn more about permission types in the [Microsoft Graph permissions documentation](https://learn.microsoft.com/en-us/graph/auth/auth-concepts#delegated-and-application-permissions).
Note that application permissions generally require higher privileges and admin consent compared to delegated permissions, as they allow broader access to organizational data without user context.
Microsoft Entra ID was previously known as Azure Active Directory (Azure AD).
### How to set up the Microsoft Entra ID app registration
Follow the instructions below to set up the Microsoft Teams app that will be used to connect with MindsDB.
* Navigate to Microsoft Entra ID in the Azure portal, click on *Add* and then on *App registration*.
* Click on *New registration* and fill out the *Name* and select the `Accounts in any organizational directory (Any Azure AD directory - Multitenant)` option under *Supported account types*.
* If you chose the `application` permission mode you may skip this step, but if you are using `delegated` permissions, select `Web` as the platform and enter URL where MindsDB has been deployed followed by /verify-auth under *Redirect URI*. For example, if you are running MindsDB locally (on [https://localhost:47334](https://localhost:47334)), enter [https://localhost:47334/verify-auth](https://localhost:47334/verify-auth) in the Redirect URIs field.
* Click on *Register*. **Save the *Application (client) ID* and *Directory (tenant) ID* for later use.**
* Click on *API Permissions* and then click on *Add a permission*.
* Select *Microsoft Graph* and then click on either *Delegated permissions* or *Application permissions* based on the permission mode you have chosen.
* Search for the following permissions and select them:
* `delegated` permission mode:
* Team.ReadBasic.All
* Channel.ReadBasic.All
* ChannelMessage.Read.All
* Chat.Read
* `application` permission mode:
* Group.Read.All
* ChannelMessage.Read.All
* Chat.Read.All
* Click on **Add permissions**.
* Request an administrator to grant consent for the above permissions. If you are the administrator, click on **Grant admin consent for \[your organization]** and then click on **Yes**.
* Click on *Certificates & secrets* under *Manage*.
* Click on *New client secret* and fill out the *Description* and select an appropriate *Expires* period, and click on *Add*.
* Copy and **save the client secret in a secure location.**
If you already have an existing app registration, you can use it instead of creating a new one and skip the above steps.
* Open the MindsDB editor and create a connection to Microsoft Teams using the client ID, client secret and tenant ID obtained in the previous steps. Use the `CREATE DATABASE` statement as shown above.
## Usage
Retrieve data from a specified table by providing the integration and table names:
```sql theme={null}
SELECT *
FROM teams_datasource.table_name
LIMIT 10;
```
The above example utilize `teams_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
## Supported Tables
* `teams`: The table containing information about the teams in Microsoft Teams.
* `channels`: The table containing information about the channels in Microsoft Teams.
* `channel_messages`: The table containing information about messages from channels in Microsoft Teams.
* `chats`: The table containing information about the chats in Microsoft Teams.
* `chat_messages`: The table containing information about messages from chats in Microsoft Teams.
# News API
Source: https://docs.mindsdb.com/integrations/app-integrations/newsapi
In this section, we present how to connect News API to MindsDB.
[News API](https://newsapi.org/) is a simple HTTP REST API for searching and retrieving live articles from all over the web.
Data from News API can be utilized within MindsDB for model training and predictions.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect News API to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to News API.
## Connection
This handler is implemented using the [newsapi-python](https://newsapi.org/docs/client-libraries/python) library.
The required arguments to establish a connection are as follows:
* `api_key` News API key to use for authentication.
Check out [this guide](https://newsapi.org/docs/authentication) on how to create the API key.
It is recommended to use the API key to avoid the `API rate limit exceeded` error.
Here is how to connect News API to MindsDB:
```sql theme={null}
CREATE DATABASE newsAPI
WITH ENGINE = 'newsapi'
PARAMETERS = {
"api_key": "Your api key"
};
```
## Usage
Simple Search for recent articles:
```sql theme={null}
SELECT *
FROM newsAPI.article
WHERE query = 'Python';
```
Advanced search for recent articles per specific sources between dates:
```sql theme={null}
SELECT *
FROM newsAPI.article
WHERE query = 'Python'
AND sources="bbc-news"
AND publishedAt >= "2021-03-23" AND publishedAt <= "2023-04-23"
LIMIT 4;
```
For more information about available actions and development plans, visit [this page](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/newsapi_handler/README.md).
# PayPal
Source: https://docs.mindsdb.com/integrations/app-integrations/paypal
In this section, we present how to connect PayPal to MindsDB.
[PayPal](https://www.bankrate.com/finance/credit-cards/guide-to-using-paypal/) is an online payment system that makes paying for things online and sending and receiving money safe and secure.
Data from PayPal can be utilized within MindsDB to train models and make predictions about your transactions.
## Connection
This handler is implemented using [PayPal-Python-SDK](https://github.com/paypal/PayPal-Python-SDK), the Python SDK for PayPal RESTful APIs.
The required arguments to establish a connection are as follows:
* `mode`: The mode of the PayPal API. Can be `sandbox` or `live`.
* `client_id`: The client ID of the PayPal API.
* `client_secret`: The client secret of the PayPal API.
To connect to PayPal using MindsDB, the following CREATE DATABASE statement can be used:
```sql theme={null}
CREATE DATABASE paypal_datasource
WITH ENGINE = 'paypal',
PARAMETERS = {
"mode": "your-paypal-mode",
"client_id": "your-paypal-client-id",
"client_secret": "your-paypal-client-secret"
};
```
Check out [this guide](https://developer.paypal.com/api/rest/) on how to create client credentials for PayPal.
## Usage
Now, you can query PayPal as follows:
Payments:
```sql theme={null}
SELECT * FROM paypal_datasource.payments
```
Invoices:
```sql theme={null}
SELECT * FROM paypal_datasource.invoices
```
Subscriptions:
```sql theme={null}
SELECT * FROM paypal_datasource.subscriptions
```
Orders:
```sql theme={null}
SELECT * FROM paypal_datasource.orders
```
Payouts:
```sql theme={null}
SELECT * FROM paypal_datasource.payouts
```
You can also run more advanced queries on your data:
Payments:
```sql theme={null}
SELECT intent, cart
FROM paypal_datasource.payments
WHERE state = 'approved'
ORDER BY id
LIMIT 5
```
Invoices:
```sql theme={null}
SELECT invoice_number, total_amount
FROM paypal_datasource.invoices
WHERE status = 'PAID'
ORDER BY total_amount DESC
LIMIT 5
```
Subscriptions:
```sql theme={null}
SELECT id, state, name
FROM paypal_datasource.subscriptions
WHERE state ="CREATED"
LIMIT 5
```
Orders:
```sql theme={null}
SELECT id, state, amount
FROM paypal_datasource.orders
WHERE state = 'APPROVED'
ORDER BY total_amount DESC
LIMIT 5
```
Payouts:
```sql theme={null}
SELECT payout_batch_id, amount_currency, amount_value
FROM paypal_datasource.payouts
ORDER BY amount_value DESC
LIMIT 5
```
## Supported Tables
The following tables are supported by the PayPal handler:
* `payments`: payments made.
* `invoices`: invoices created.
* `subscriptions`: subscriptions created.
* `orders`: orders created.
* `payouts`: payouts made.
# Plaid
Source: https://docs.mindsdb.com/integrations/app-integrations/plaid
In this section, we present how to connect Plaid to MindsDB.
[Plaid](https://plaid.com/) is a financial technology company that offers a platform and a set of APIs that facilitate the integration of financial services and data into applications and websites. Its services primarily focus on enabling developers to connect with and access financial accounts and data from various financial institutions.
Data from Plaid can be utilized within MindsDB to train AI models and make financial forecasts.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Plaid to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Plaid.
## Connection
The required arguments to establish a connection are as follows:
* `client_id`
* `secret`
* `access_token`
* `plaid_env`
You can get the `client_id`, `secret`, and `access_token` values [here](https://dashboard.plaid.com/team/keys) once you sign in to your Plaid account.
And [here](https://plaid.com/docs/api/items/#itempublic_tokenexchange) is how you generate the `access_token` value.
In order to make use of this handler and connect the Plaid app to MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE my_plaid
WITH
ENGINE = 'plaid',
PARAMETERS = {
"client_id": "YOUR_CLIENT_ID",
"secret": "YOUR_SECRET",
"access_token": "YOUR_PUBLIC_KEY",
"plaid_env": "ENV"
};
```
It creates a database that comes with two tables: `transactions` and `balance`.
## Usage
Now you can query your data, like this:
```sql theme={null}
SELECT id, merchant_name, authorized_date, amount, payment_channel
FROM my_plaid.transactions
WHERE start_date = '2022-01-01'
AND end_date = '2023-04-11'
LIMIT 20;
```
And if you want to use functions provided by the Plaid API, you can use the native queries syntax, like this:
```sql theme={null}
SELECT * FROM my_plaid (
get_transactions(
start_date = '2022-01-01',
end_date = '2022-02-01'
)
);
```
For more information about available actions and development plans, visit [this page](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/plaid_handler/README.md).
# PyPI
Source: https://docs.mindsdb.com/integrations/app-integrations/pypi
In this section, we present how to connect PyPI to MindsDB.
[PyPI](https://pypi.org) is a host for maintaining and storing Python packages. It's a good place for publishing your Python packages in different versions and releases.
Data from PyPI can be utilized within MindsDB to train models and make predictions about your Python packages.
## Connection
This handler is implemented using the standard Python `requests` library. It is used to connect to the RESTful service that [pypistats.org](https://pypistats.org) is serving.
There are no connection arguments required to initialize the handler.
To connect to PyPI using MindsDB, the following CREATE DATABASE statement can be used:
```sql theme={null}
CREATE DATABASE pypi_datasource
WITH ENGINE = 'pypi'
```
## Usage
Now, you can use the following queries to view the statistics for Python packages (MindsDB, for example):
Overall downloads, including mirrors:
```sql theme={null}
SELECT *
FROM pypi_datasource.overall WHERE package="mindsdb" AND mirrors=true;
```
Overall downloads on CPython==2.7:
```sql theme={null}
SELECT *
FROM pypi_datasource.python_minor WHERE package="mindsdb" AND version="2.7";
```
Recent downloads:
```sql theme={null}
SELECT *
FROM pypi_datasource.recent WHERE package="mindsdb";
```
Recent downloads in the last day:
```sql theme={null}
SELECT *
FROM pypi_datasource.recent WHERE package="mindsdb" AND period="day";
```
All downloads on Linux-based distributions:
```sql theme={null}
SELECT date, downloads
FROM pypi_datasource.system WHERE package="mindsdb" AND os="Linux";
```
Each table takes a required `package` argument in the WHERE clause, which is the name of the package you want to query.
## Supported Tables
The following tables are supported by the PyPI handler:
* `overall`: daily download quantities for packages.
* `recent`: recent download quantities for packages.
* `python_major`: daily download quantities for packages, grouped by Python major version.
* `python_minor`: daily download quantities for packages, grouped by Python minor version.
* `system`: daily download quantities for packages, grouped by operating system.
# Reddit
Source: https://docs.mindsdb.com/integrations/app-integrations/reddit
In this section, we present how to connect Reddit to MindsDB.
[Reddit](https://www.reddit.com/) is a social media platform and online community where registered users can engage in discussions, share content, and participate in various communities called subreddits.
Data from Reddit can be utilized within MindsDB to train AI models and chatbots.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Reddit to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Reddit.
## Connection
This handler is implemented using the [PRAW (Python Reddit API Wrapper)](https://praw.readthedocs.io/en/latest/) library, which is a Python package that provides a simple and easy-to-use interface to access the Reddit API.
The required arguments to establish a connection are as follows:
* `client_id` is a Reddit API client ID.
* `client_secret` is a Reddit API client secret.
* `user_agent` is a user agent string to identify your application.
Here is how to get your Reddit credentials:
1. Go to Reddit App Preferences at [https://www.reddit.com/prefs/apps](https://www.reddit.com/prefs/apps) or [https://old.reddit.com/prefs/apps/](https://old.reddit.com/prefs/apps/)
2. Scroll down to the bottom of the page and click *Create another app...*
3. Fill out the form with the name, description, and redirect URL for your app, then click *Create app*
4. Now you should be able to see the personal user script, secret, and name of your app. Store those as environment variables: `CLIENT_ID`, `CLIENT_SECRET`, and `USER_AGENT`, respectively.
In order to make use of this handler and connect the Reddit app to MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE my_reddit
WITH
ENGINE = 'reddit',
PARAMETERS = {
"client_id": "YOUR_CLIENT_ID",
"client_secret": "YOUR_CLIENT_SECRET",
"user_agent": "YOUR_USER_AGENT"
};
```
It creates a database that comes with two tables: `submission` and `comment`.
## Usage
Now you can fetch data from Reddit, like this:
```sql theme={null}
SELECT *
FROM my_reddit.submission
WHERE subreddit = 'MachineLearning'
AND sort_type = 'top' -- specifies the sorting type for the subreddit (possible values include 'hot', 'new', 'top', 'controversial', 'gilded', 'wiki', 'mod', 'rising')
AND items = 5; -- specifies the number of items to fetch from the subreddit
```
You can also fetch comments for a particular post/submission, like this:
```sql theme={null}
SELECT *
FROM my_reddit.comment
WHERE submission_id = '12gls93'
```
For more information about available actions and development plans, visit [this page](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/reddit_handler/README.md).
# Salesforce
Source: https://docs.mindsdb.com/integrations/app-integrations/salesforce
This documentation describes the integration of MindsDB with [Salesforce](https://www.salesforce.com/), the world’s most trusted customer relationship management (CRM) platform.
The integration allows MindsDB to access data from Salesforce and enhance it with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. To connect Salesforce to MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to Salesforce from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/salesforce_handler) as an engine.
```sql theme={null}
CREATE DATABASE salesforce_datasource
WITH
ENGINE = 'salesforce',
PARAMETERS = {
"username": "demo@example.com",
"password": "demo_password",
"client_id": "3MVG9lKcPoNINVBIPJjdw1J9LLM82HnZz9Yh7ZJnY",
"client_secret": "5A52C1A1E21DF9012IODC9ISNXXAADDA9"
};
```
Required connection parameters include the following:
* `username`: The username for the Salesforce account.
* `password`: The password for the Salesforce account.
* `client_id`: The client ID (consumer key) from a connected app in Salesforce.
* `client_secret`: The client secret (consumer secret) from a connected app in Salesforce.
Optional connection parameters include the following:
* `is_sandbox`: The setting to indicate whether to connect to a Salesforce sandbox environment (`true`) or production environment (`false`). This parameter defaults to `false`.
To create a connected app in Salesforce and obtain the client ID and client secret, follow the steps given below:
1. Log in to your Salesforce account.
2. Go to `Settings` > `Open Advanced Setup` > `Apps` > `App Manager`.
3. Click `New Connected App`, select `Create a Connected App` and click `Continue`.
4. Fill in the required details, i.e., `Connected App Name`, `API Name` and `Contact Phone`.
5. Select the `Enable OAuth Settings` checkbox, set the `Callback URL` to wherever MindsDB is deployed followed by `/verify-auth` (e.g., `http://localhost:47334/verify-auth`), and choose the following OAuth scopes:
* Manage user data via APIs (api)
* Perform requests at any time (refresh\_token, offline\_access)
6. Click `Save` and then `Continue`.
7. Click on `Manage Consumer Details` under `API (Enable OAuth Settings)`, and copy the Consumer Key (client ID) and Consumer Secret (client secret).
8. Click on `Back to Manage Connected Apps` and then `Manage`.
9. Click `Edit Policies`.
10. Under `OAuth Policies`, configure the `Permitted Users` and `IP Relaxation` settings according to your security policies. For example, to enable all users to access the app without enforcing any IP restrictions, select `All users may self-authorize` and `Relax IP restrictions` respectively. Leave the `Refresh Token Policy` set to `Refresh token is valid until revoked`.
11. Click `Save`.
12. Go to `Identity` > `OAuth and OpenID Connect Settings`.
13. Ensure that the `Allow OAuth Username-Password Flows` checkbox is checked.
## Usage
Retrieve data from a specified table by providing the integration and table names:
```sql theme={null}
SELECT *
FROM salesforce_datasource.table_name
LIMIT 10;
```
Run [SOQL](https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql.htm) queries directly on the connected Salesforce account:
```sql theme={null}
SELECT * FROM salesforce_datasource (
--Native Query Goes Here
SELECT Name, Account.Name, Account.Industry
FROM Contact
WHERE Account.Industry = 'Technology'
LIMIT 5
);
```
The above examples utilize `salesforce_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
## Salesforce Table Filtering
We have implemented a filtering logic to exclude tables that are generally not useful for direct business queries, which fall into the following categories:
* System and Auditing Tables: We exclude tables that track field history, record sharing rules, and data change events (e.g., objects ending in History, Share, or ChangeEvent). These are important for system administration but not for typical business analysis.
* Configuration and Metadata: We remove tables that define the structure and configuration of Salesforce itself. This includes objects related to user permissions, internal rules, platform settings, and data definitions (e.g., FieldDefinition, PermissionSet, AssignmentRule).
* Feature-Specific Technical Objects: Tables that support specific backend Salesforce features are excluded. This includes objects related to:
* AI and Einstein: (AI...)
* Developer Components: (Apex..., Aura...)
* Data Privacy and Consent: (objects ending in Consent or containing Policy)
* Chatter and Collaboration Feeds: (...Feed, Collaboration...)
* Archived or Legacy Objects: Older objects that have been replaced by modern equivalents, such as ContentWorkspace, are also excluded to simplify the list.
# Sendinblue
Source: https://docs.mindsdb.com/integrations/app-integrations/sendinblue
In this section, we present how to connect Sendinblue to MindsDB.
[Brevo (formerly Sendinblue)](https://www.brevo.com/) is an all-in-one platform to automate your marketing campaigns over Email, SMS, WhatsApp or chat.
Data from Sendinblue can be used to understand the impact of email marketing.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Sendinblue to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Sendinblue.
## Connection
This handler is implemented using the [sib-api-v3-sdk](https://github.com/sendinblue/APIv3-python-library) library, a Python library that wraps Sendinblue APIs.
The required arguments to establish a connection are as follows:
* `api_key`: a required Sendinblue API key to use for authentication
Check out [this guide](https://developers.brevo.com/docs) on how to create the Sendinblue API key.
It is recommended to use the API key to avoid the `API rate limit exceeded` error.
Here is how to connect the SendinBlue to MindsDB:
```sql theme={null}
CREATE DATABASE sib_datasource
WITH ENGINE = 'sendinblue',
PARAMETERS = {
"api_key": "xkeysib-..."
};
```
## Usage
Use the established connection to query your database:
```sql theme={null}
SELECT * FROM sib_datasource.email_campaigns
```
Run more advanced queries:
```sql theme={null}
SELECT id, name
FROM sib_datasource.email_campaigns
WHERE status = 'sent'
ORDER BY name
LIMIT 5
```
For more information about available actions and development plans, visit [this page](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/sendinblue_handler/README.md).
# Shopify
Source: https://docs.mindsdb.com/integrations/app-integrations/shopify
In this section, we present how to connect Shopify to MindsDB.
[Shopify](https://www.shopify.com/) is an e-commerce platform that enables businesses to create and manage online stores. It is one of the leading e-commerce solutions, providing a wide range of tools and services to help entrepreneurs and businesses sell products and services online.
Data from Shopify can be utilized within MindsDB to train AI models and chatbots using Products, Customers and Orders data, and make predictions relevant for businesses.
## Connection
The required arguments to establish a connection are as follows:
* `shop_url`: a required URL to your Shopify store.
* `access_token`: a required access token to use for authentication.
Here is how you can [create a Shopify access token](https://www.youtube.com/watch?v=4f_aiC5oTNc\&t=302s).
Optionally, if you want to access customer reviews, provide the following parameters:
* `yotpo_app_key`: a token needed to access customer reviews via the Yotpo Product Reviews app.
* `yotpo_access_token`: a token needed to access customer reviews via the Yotpo Product Reviews app.
If you want to query customer reviews, use the [Yotpo Product Reviews](https://apps.shopify.com/yotpo-social-reviews) app available in Shopify. Here are the steps to follow:
1. Install the [Yotpo Product Reviews](https://apps.shopify.com/yotpo-social-reviews) app for your Shopify store.
2. Generate `yotpo_app_key` following [this instruction](https://support.yotpo.com/docs/finding-your-yotpo-app-key-and-secret-key) for retrieving your app key. Learn more about [Yotpo authentication here](https://apidocs.yotpo.com/reference/yotpo-authentication).
3. Generate `yotpo_access_token` following [this instruction](https://develop.yotpo.com/reference/generate-a-token).
To connect your Shopify account to MindsDB, you must first create a new handler instance. You can do it by the following query:
```sql theme={null}
CREATE DATABASE shopify_datasource
WITH ENGINE = 'shopify',
PARAMETERS = {
"shop_url": "your-shop-name.myshopify.com",
"access_token": "shppa_..."
};
```
## Usage
Once you have created the database, you can query the following tables:
* Products table
* Customers table
* Orders table
* CustomerReviews table (requires the [Yotpo Product Reviews](https://apps.shopify.com/yotpo-social-reviews) app to be installed in your Shopify account)
* InventoryLevel table
* Location table
* CarrierService table
* ShippingZone table
* SalesChannel table
### Products table
You can query this table as below:
```sql theme={null}
SELECT *
FROM shopify_datasource.products;
```
Also, you can run more advanced queries and filter products by status, like this:
```sql theme={null}
SELECT id, title
FROM shopify_datasource.products
WHERE status = 'active'
ORDER BY id
LIMIT 5;
```
To insert new data, run the `INSERT INTO` statement, providing the following values: `title`, `body_html`, `vendor`, `product_type`, `tags`, `status`.
To update existing data, run the `UPDATE` statement.
To delete data, run the `DELETE` statement.
### Customers table
You can query this table as below:
```sql theme={null}
SELECT *
FROM shopify_datasource.customers;
```
To insert new data, run this statement:
```sql theme={null}
INSERT INTO shopify_datasource.customers(first_name, last_name, email, phone)
VALUES ('John', 'Doe', 'john.doe@example.com', '+10001112222');
```
To update existing data, run the `UPDATE` statement.
To delete data, run the `DELETE` statement.
### Orders table
You can query this table as below:
```sql theme={null}
SELECT *
FROM shopify_datasource.orders;
```
To insert new data, run the `INSERT INTO` statement.
To update existing data, run the `UPDATE` statement.
To delete data, run the `DELETE` statement.
### CustomerReviews table
You can query this table as below:
```sql theme={null}
SELECT *
FROM shopify_datasource.customer_reviews;
```
### InventoryLevel table
You can query this table as below:
```sql theme={null}
SELECT *
FROM shopify_datasource.inventory_level;
```
### Location table
You can query this table as below:
```sql theme={null}
SELECT *
FROM shopify_datasource.locations;
```
### CarrierService table
You can query this table as below:
```sql theme={null}
SELECT *
FROM shopify_datasource.carrier_service;
```
To insert new data, run the `INSERT INTO` statement, providing the following values: `name`, `callback_url`, `service_discovery`.
To update existing data, run the `UPDATE` statement.
To delete data, run the `DELETE` statement.
### ShippingZone table
You can query this table as below:
```sql theme={null}
SELECT *
FROM shopify_datasource.shipping_zone;
```
### SalesChannel table
You can query this table as below:
```sql theme={null}
SELECT *
FROM shopify_datasource.sales_channel;
```
For more information about available actions and development plans, visit [this page](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/shopify_handler/README.md).
# Slack
Source: https://docs.mindsdb.com/integrations/app-integrations/slack
This documentation describes the integration of MindsDB with [Slack](https://slack.com/), a cloud-based collaboration platform.
The integration allows MindsDB to access data from Slack and enhance Slack with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Slack to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Slack.
## Connection
Establish a connection to Slack from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/slack_handler) as an engine.
```sql theme={null}
CREATE DATABASE slack_datasource
WITH ENGINE = 'slack',
PARAMETERS = {
"token": "values", -- required parameter
"app_token": "values" -- optional parameter
};
```
The Slack handler is initialized with the following parameters:
* `token` is a Slack bot token to use for authentication.
* `app_token` is a Slack app token to use for authentication.
Please note that `app_token` is an optional parameter. Without providing it, you need to integrate an app into a Slack channel.
### Method 1: Chatbot responds in direct messages to a Slack app
One way to connect Slack is to use both bot and app tokens. By following the instructions below, you'll set up the Slack app and be able to message this Slack app directly to chat with the bot.
If you want to use Slack in the [`CREATE CHATBOT`](/agents/chatbot) syntax, use this method of connecting Slack to MindsDB.
Here is how to set up a Slack app and generate both a Slack bot token and a Slack app token:
1. Follow [this link](https://api.slack.com/apps) and sign in with your Slack account.
2. Create a new app `From scratch` or select an existing app.
* Please note that the following instructions support apps created `From scratch`.
* For apps created `From an app manifest`, please follow the [Slack docs here](https://api.slack.com/reference/manifests).
3. Go to *Basic Information* under *Settings*.
* Under *App-Level Tokens*, click on *Generate Token and Scopes*.
* Name the token `socket` and add the `connections:write` scope.
* **Copy and save the `xapp-...` token - you'll need it to publish the chatbot.**
4. Go to *Socket Mode* under *Settings* and toggle the button to *Enable Socket Mode*.
5. Go to *OAuth & Permissions* under *Features*.
* Add the following *Bot Token Scopes*:
* app\_mentions:read
* channels:history
* channels:read
* chat:write
* groups:history
* groups:read (optional)
* im:history
* im:read
* im:write
* mpim:read (optional)
* users.profile:read
* users:read (optional)
* In the *OAuth Tokens for Your Workspace* section, click on *Install to Workspace* and then *Allow*.
* **Copy and save the `xoxb-...` token - you'll need it to publish the chatbot.**
6. Go to *App Home* under *Features* and click on the checkbox to *Allow users to send Slash commands and messages from the messages tab*.
7. Go to *Event Subscriptions* under *Features*.
* Toggle the button to *Enable Events*.
* Under *Subscribe to bot events*, click on *Add Bot User Event* and add `app_mention` and `message.im`.
* Click on *Save Changes*.
8. Now you can use tokens from points 3 and 5 to initialize the Slack handler in MindsDB.
This connection method enables you to chat directly with an app via Slack.
Alternatively, you can connect an app to the Slack channel:
* Go to the channel where you want to use the bot.
* Right-click on the channel and select *View Channel Details*.
* Select *Integrations*.
* Click on *Add an App*.
Here is how to connect Slack to MindsDB:
```sql theme={null}
CREATE DATABASE slack_datasource
WITH
ENGINE = 'slack',
PARAMETERS = {
"token": "xoxb-...",
"app_token": "xapp-..."
};
```
It comes with the `conversations` and `messages` tables.
### Method 2: Chatbot responds on a defined Slack channel
Another way to connect to Slack is to use the bot token only. By following the instructions below, you'll set up the Slack app and integrate it into one of the channels from which you can directly chat with the bot.
Here is how to set up a Slack app and generate a Slack bot token:
1. Follow [this link](https://api.slack.com/apps) and sign in with your Slack account.
2. Create a new app `From scratch` or select an existing app.
* Please note that the following instructions support apps created `From scratch`.
* For apps created `From an app manifest`, please follow the [Slack docs here](https://api.slack.com/reference/manifests).
3. Go to the *OAuth & Permissions* section.
4. Under the *Scopes* section, add the *Bot Token Scopes* necessary for your application. You can add more later as well.
* channels:history
* channels:read
* chat:write
* groups:read
* im:read
* mpim:read
* users:read
5. Install the bot in your workspace.
6. Under the *OAuth Tokens for Your Workspace* section, copy the the *Bot User OAuth Token* value.
7. Open your Slack application and add the App/Bot to one of the channels:
* Go to the channel where you want to use the bot.
* Right-click on the channel and select *View Channel Details*.
* Select *Integrations*.
* Click on *Add an App*.
8. Now you can use the token from step 6 to initialize the Slack handler in MindsDB and use the channel name to query and write messages.
Here is how to connect Slack to MindsDB:
```sql theme={null}
CREATE DATABASE slack_datasource
WITH
ENGINE = 'slack',
PARAMETERS = {
"token": "xoxb-..."
};
```
## Usage
The following usage applies when **Connection Method 2** was used to connect Slack.
See the usage for **Connection Method 1** [via the `CREATE CHATBOT` syntax](/sql/tutorials/create-chatbot).
Retrieve data from a specified table by providing the integration and table names:
```sql theme={null}
SELECT *
FROM slack_datasource.table_name
LIMIT 10;
```
## Supported Tables
The Slack integration supports the following tables:
### `conversations` Table
The `conversations` virtual table is used to query conversations (channels, DMs, and groups) in the connected Slack workspace.
```sql theme={null}
-- Retrieve all conversations in the workspace
SELECT *
FROM slack_datasource.conversations;
-- Retrieve a specific conversation using its ID
SELECT *
FROM slack_datasource.conversations
WHERE id = "";
-- Retrieve a specific conversation using its name
SELECT *
FROM slack_datasource.conversations
WHERE name = "";
```
### `messages` Table
The `messages` virtual table is used to query, post, update, and delete messages in specific conversations within the connected Slack workspace.
```sql theme={null}
-- Retrieve all messages from a specific conversation
-- channel_id is a required parameter and can be found in the conversations table
SELECT *
FROM slack_datasource.messages
WHERE channel_id = "";
-- Post a new message
-- channel_id and text are required parameters
INSERT INTO slack_datasource.messages (channel_id, text)
VALUES("", "Hello from SQL!");
-- Update a bot-posted message
-- channel_id, ts, and text are required parameters
UPDATE slack_datasource.messages
SET text = "Updated message content"
WHERE channel_id = "" AND ts = "";
-- Delete a bot-posted message
-- channel_id and ts are required parameters
DELETE FROM slack_datasource.messages
WHERE channel_id = "" AND ts = "";
```
You can also find the channel ID by right-clicking on the conversation in Slack, selecting 'View conversation details' or 'View channel details,' and copying the channel ID from the bottom of the 'About' tab.
### `threads` Table
The `threads` virtual table is used to query and post messages in threads within the connected Slack workspace.
```sql theme={null}
-- Retrieve all messages in a specific thread
-- channel_id and thread_ts are required parameters
-- thread_ts is the timestamp of the parent message and can be found in the messages table
SELECT *
FROM slack_datasource.threads
WHERE channel_id = "" AND thread_ts = "";
-- Post a message to a thread
INSERT INTO slack_datasource.threads (channel_id, thread_ts, text)
VALUES("", "", "Replying to the thread!");
```
### `users` Table
The `users` virtual table is used to query user information in the connected Slack workspace.
```sql theme={null}
-- Retrieve all users in the workspace
SELECT *
FROM slack_datasource.users;
-- Retrieve a specific user by name
SELECT *
FROM slack_datasource.users
WHERE name = "John Doe";
```
## Rate Limit Considerations
The Slack API enforces rate limits on data retrieval. Therefore, when querying the above tables, by default, the first 1000 (999 for `messages`) records are returned.
To retrieve more records, use the `LIMIT` clause in your SQL queries. For example:
```sql theme={null}
SELECT *
FROM slack_datasource.conversations
LIMIT 2000;
```
When using the LIMIT clause to query additional records, you may encounter Slack API rate limits.
## Next Steps
Follow [this tutorial](/use-cases/ai_agents/build_ai_agents) to build an AI agent with MindsDB.
# Strapi
Source: https://docs.mindsdb.com/integrations/app-integrations/strapi
[Strapi](https://strapi.io/) is a popular open-source Headless Content Management System (CMS) that empowers developers to work with their preferred tools and frameworks, while providing content editors with a user-friendly interface to manage and distribute content across various platforms.
The Strapi Handler is a MindsDB handler that enables SQL-based querying of Strapi collections. This documentation provides a brief overview of its features, initialization parameters, and example usage.
## Connection
To use the Strapi Handler, initialize it with the following parameters:
* `host`: Strapi server host.
* `port`: Strapi server port (typically 1337).
* `api_token`: Strapi server API token for authentication.
* `plural_api_ids`: List of plural API IDs for the collections.
To get started, create a Strapi engine database with the following SQL command:
```sql theme={null}
CREATE DATABASE myshop --- Display name for the database.
WITH ENGINE = 'strapi', --- Name of the MindsDB handler.
PARAMETERS = {
"host" : "", --- Host (can be an IP address or URL).
"port" : "", --- Common port is 1337.
"api_token": "", --- API token of the Strapi server.
"plural_api_ids" : [""] --- Plural API IDs of the collections.
};
```
## Usage
Retrieve data from a collection:
```sql theme={null}
SELECT *
FROM myshop.;
```
Filter data based on specific criteria:
```sql theme={null}
SELECT *
FROM myshop.
WHERE id =
```
Insert new data into a collection:
```sql theme={null}
INSERT INTO myshop. (, , ...)
VALUES (, , ...);
```
Note: You only able to insert data into the collection which has `create`
permission.
Modify existing data in a collection:
```sql theme={null}
UPDATE myshop.
SET = , = , ...
WHERE id = ;
```
Note: You only able to update data into the collection which has `update`
permission.
# Stripe
Source: https://docs.mindsdb.com/integrations/app-integrations/stripe
In this section, we present how to connect Stripe to MindsDB.
[Stripe](https://stripe.com/) is a financial technology company that provides a set of software and payment processing solutions for businesses and individuals to accept payments over the internet. Stripe is one of the leading payment gateway and online payment processing platforms.
Data from Stripe can be utilized within MindsDB to train AI models and chatbots based on customers, products, and payment intents, and make relevant predictions and forecasts.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Stripe to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Stripe.
## Connection
This handler was implemented using [stripe-python](https://github.com/stripe/stripe-python), the Python library for the Stripe API.
There is only one parameter required to set up the connection with Stripe:
* `api_key`: a Stripe API key.
You can find your API keys in the Stripe Dashboard. [Read more](https://stripe.com/docs/keys).
To connect to Stripe using MindsDB, the following CREATE DATABASE statement can be used:
```sql theme={null}
CREATE DATABASE stripe_datasource
WITH ENGINE = 'stripe',
PARAMETERS = {
"api_key": "sk_..."
};
```
## Usage
Now, you can query the data in your Stripe account (customers, for example) as follows:
```sql theme={null}
SELECT * FROM stripe_datasource.customers
```
You can run more advanced queries to fetch specific customers in a defined order:
```sql theme={null}
SELECT name, email
FROM stripe_datasource.customers
WHERE currency = 'inr'
ORDER BY name
LIMIT 5
```
### Supported tables
The following tables are supported by the Stripe handler:
* `customers`
* `products`
* `payment_intents`
# Symbl
Source: https://docs.mindsdb.com/integrations/app-integrations/symbl
This documentation describes the integration of MindsDB with [Symbl](https://symbl.ai/), a platform with state-of-the-art and task-specific LLMs that enables businesses to analyze multi-party conversations at scale.
This integration allows MindsDB to process conversation data and extract insights from it.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Symbl to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
Please note that in order to successfully install the dependencies for Symbl, it is necessary to install `portaudio` and few other Linux packages in the Docker container first. To do this, run the following commands:
1. Start an interactive shell in the container:
```bash theme={null}
docker exec -it mindsdb_container sh
```
If you haven't specified a name when spinning up the MindsDB container with `docker run`, you can find it by running `docker ps`.
If you are using Docker Desktop, you can navigate to 'Containers', locate the multi-container application running the extension, click on the `mindsdb_service` container and then click on the 'Exec' tab to start an interactive shell.
2. Install the required packages:
```bash theme={null}
apt-get update && apt-get install -y \
libportaudio2 libportaudiocpp0 portaudio19-dev \
python3-dev \
build-essential \
&& rm -rf /var/lib/apt/lists/*
```
## Connection
Establish a connection to your Symbl from MindsDB by executing the following SQL command:
```sql theme={null}
CREATE DATABASE mindsdb_symbl
WITH ENGINE = 'symbl',
PARAMETERS = {
"app_id": "app_id",
"app_secret":"app_secret"
};
```
Required connection parameters include the following:
* `app_id`: The Symbl app identifier.
* `app_secret`: The Symbl app secret.
## Usage
First, process the conversation data and get the conversation ID via the `get_conversation_id` table:
```sql theme={null}
SELECT *
FROM mindsdb_symbl.get_conversation_id
WHERE audio_url="https://symbltestdata.s3.us-east-2.amazonaws.com/newPhonecall.mp3";
```
Next, use the conversation ID to get the results of the above from the other supported tables:
```sql theme={null}
SELECT *
FROM mindsdb_symbl.get_messages
WHERE conversation_id="5682305049034752";
```
Other supported tables include:
* `get_topics`
* `get_questions`
* `get_analytics`
* `get_action_items`
The above examples utilize `mindsdb_symbl` as the datasource name, which is defined in the `CREATE DATABASE` command.
# Twitter
Source: https://docs.mindsdb.com/integrations/app-integrations/twitter
In this section, we present how to connect Twitter accounts to MindsDB.
[Twitter](https://twitter.com/) is a widely recognized social media platform and microblogging service that allows users to share short messages called tweets.
The Twitter handler enables you to fetch tweets and create replies utilizing AI models wthin MindsDB. Furthermore, you can automate the process of fetching tweets, preparing replies, and sending replies to Twitter.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Twitter to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Twitter.
## Connection
To connect a Twitter account to MindsDB, you need a Twitter developer account.
Please note that it requires a paid developer account.
We recommend you use the [Elevated access](https://developer.twitter.com/en/support/twitter-api/developer-account) allowing you to pull 2m tweets and to avoid *parameters or authentication issue* error you might get sometimes. You can check [this step-by-step guide](https://medium.com/@skillcate/set-up-twitter-api-to-pull-2m-tweets-month-44d004c6f7ce) describing how to apply for the Elevated access.
If you don't already have a Twitter developer account, follow the steps in the video below to apply for one.
[Begin here to apply for a Twitter developer account](https://developer.twitter.com/apply-for-access)
Watch this [step-by-step video](https://www.youtube.com/watch?v=qVe7PeC0sUQ) explaining the process.
When presented with questions under *How will you use the Twitter API or Twitter Data?*, use answers similar to the ones below (tweak to fit your exact use case). The more thorough your answers are, the more likely it is your account will get approved.
**Intended Usage (In Your Words)**
*I have a blog and want to educate users how to use the Twitter API with MindsDB.*
*I will read tweets that mention me and use them with MindsDB machine learning to generate responses. I plan to post tweets 2-3 times a day and keep using Twitter like I normally would.*
**Are you planning to analyze Twitter data?**
*I plan to build machine learning algorithms based on Twitter data. I am interested in doing sentiment analysis and topic analysis.*
*I will potentially extract:*
* *Tweet text*
* *Favorite count and retweet count*
* *Hashtags and mentions*
**Will your app use Tweet, Retweet, Like, Follow, or Direct Message functionality?**
*I will use the Twitter API to post responses to tweets that mention me.*
*I will have word filters to make sure that I never share offensive or potentially controversial subjects.*
**Do you plan to display Tweets or aggregate data about Twitter content outside Twitter?**
*I plan to share aggregate data as examples for users of my upcoming blog. I don't intend to create an automated dashboard that consumes a lot of Twitter API calls.*
*Every API call will be done locally, or automated on a simple web server. Aggregate of data will be for educational purposes only.*
**Will your product, service, or analysis make Twitter content or derived information available to a government entity?**
Answer NO to this one.
If you already have a Twitter developer account, you need to generate API keys following the instructions below or heading to the [Twitter developer website](https://developer.twitter.com/en).
* Create an application with Read/Write permissions activated:
* Open [developer portal](https://developer.twitter.com/en/portal/projects-and-apps).
* Select the `Add app` button to create a new app.
* Select the `Create new` button.
* Select `Production` and give it a name.
* Copy and populate the following in the below `CREATE DATABASE` statement:
* `Bearer Token` as a value of the `bearer_token` parameter.
* `API Key` as a value of the `consumer_key` parameter.
* `API Key Secret` as a value of the `consumer_secret` parameter.
* Setup user authentication settings:
* Click `Setup` under `User authentication settings`:
* On `Permissions`, select `Read and Write`.
* On `Type of app`, select `Web App`, `Automated App or Bot`.
* On `App info`, provide any URL for the callback URL and website URL (you can use the URL of this page).
* Click `Save`.
* Generate access tokens:
* Once you are back in the app settings, click `Keys and Tokens`:
* Generate `Access Token` and `Access Token Secret` and populate it in the below `CREATE DATABASE` statement:
* `Access Token` as a value of the `access_token` parameter.
* `Access Token Secret` as a value of the `access_token_secret` parameter.
Once you have all the tokens and keys, here is how to connect your Twitter account to MindsDB:
```sql theme={null}
CREATE DATABASE my_twitter
WITH
ENGINE = 'twitter',
PARAMETERS = {
"bearer_token": "twitter bearer token",
"consumer_key": "twitter consumer key",
"consumer_secret": "twitter consumer key secret",
"access_token": "twitter access token",
"access_token_secret": "twitter access token secret"
};
```
## Usage
The `my_twitter` database contains a table called `tweets` by default.
Here is how to search tweets containing `mindsdb` keyword:
```sql theme={null}
SELECT id, created_at, author_username, text
FROM my_twitter.tweets
WHERE query = '(mindsdb OR #mindsdb) -is:retweet -is:reply'
AND created_at > '2023-02-16'
LIMIT 20;
```
Please note that we can see only recent tweets from the past seven days. The `created_at` column condition is skipped if the provided date is earlier than seven days.
Alternatively, you can use a Twitter native query, as below:
```sql theme={null}
SELECT * FROM my_twitter (
search_recent_tweets(
query = '(mindsdb OR #mindsdb) -is:retweet -is:reply',
start_time = '2023-03-16T00:00:00.000Z',
max_results = 2
)
);
```
To learn more about native queries in MindsDB, visit our docs [here](/sql/native-queries).
Here is how to write tweets:
```sql theme={null}
INSERT INTO my_twitter.tweets (reply_to_tweet_id, text)
VALUES
(1626198053446369280, 'MindsDB is great! now its super simple to build ML powered apps'),
(1626198053446369280, 'Holy!! MindsDB is the best thing they have invented for developers doing ML');
```
For more information about available actions and development plans, visit [this page](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/twitter_handler/README.md).
**What's next?**
Check out the [tutorial on how to create a Twitter chatbot](/sql/tutorials/twitter-chatbot) to see one of the interesting applications of this integration.
# Web Crawler
Source: https://docs.mindsdb.com/integrations/app-integrations/web-crawler
In this section, we present how to use a web crawler within MindsDB.
A web crawler is an automated script designed to systematically browse and index content on the internet. Within MindsDB, you can utilize a web crawler to efficiently collect data from various websites.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To use Web Crawler with MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
## Connection
This handler does not require any connection parameters.
Here is how to initialize a web crawler:
```sql theme={null}
CREATE DATABASE my_web
WITH ENGINE = 'web';
```
The above query creates a database called `my_web`. This database by default has a table called `crawler` that stores data from a given URL or multiple URLs.
## Usage
### Parameters
#### Crawl Depth
The `crawl_depth` parameter defines how deep the crawler should navigate through linked pages:
* `crawl_depth = 0`: Crawls only the specified page.
* `crawl_depth = 1`: Crawls the specified page and all linked pages on it.
* Higher values continue the pattern.
#### Page Limits
There are multiple ways to limit the number of pages returned:
* The `LIMIT` clause defines the maximum number of pages returned globally.
* The `per_url_limit` parameter limits the number of pages returned for each specific URL, if more than one URL is provided.
### Crawling a Single URL
The following example retrieves data from a single webpage:
```sql theme={null}
SELECT *
FROM my_web.crawler
WHERE url = 'https://docs.mindsdb.com/';
```
Returns **1 row** by default.
To retrieve more pages from the same URL, specify the `LIMIT`:
```sql theme={null}
SELECT *
FROM my_web.crawler
WHERE url = 'https://docs.mindsdb.com/'
LIMIT 30;
```
Returns up to **30 rows**.
### Crawling Multiple URLs
To crawl multiple URLs at once:
```sql theme={null}
SELECT *
FROM my_web.crawler
WHERE url IN ('https://docs.mindsdb.com/', 'https://dev.mysql.com/doc/', 'https://mindsdb.com/');
```
Returns **3 rows** by default (1 row per URL).
To apply a per-URL limit:
```sql theme={null}
SELECT *
FROM my_web.crawler
WHERE url IN ('https://docs.mindsdb.com/', 'https://dev.mysql.com/doc/')
AND per_url_limit = 2;
```
Returns **4 rows** (2 rows per URL).
### Crawling with Depth
To crawl all pages linked within a website:
```sql theme={null}
SELECT *
FROM my_web.crawler
WHERE url = 'https://docs.mindsdb.com/'
AND crawl_depth = 1;
```
Returns **1 + x rows**, where `x` is the number of linked webpages.
For multiple URLs with crawl depth:
```sql theme={null}
SELECT *
FROM my_web.crawler
WHERE url IN ('https://docs.mindsdb.com/', 'https://dev.mysql.com/doc/')
AND crawl_depth = 1;
```
Returns **2 + x + y rows**, where `x` and `y` are the number of linked pages from each URL.
### Get PDF Content
MindsDB accepts [file uploads](/sql/create/file) of `csv`, `xlsx`, `xls`, `sheet`, `json`, and `parquet`. However, you can also configure the web crawler to fetch data from PDF files accessible via URLs.
```sql theme={null}
SELECT *
FROM my_web.crawler
WHERE url = ''
LIMIT 1;
```
### Configuring Web Handler for Specific Domains
The Web Handler can be configured to interact only with specific domains by using the `web_crawling_allowed_sites` setting in the `config.json` file.
This feature allows you to restrict the handler to crawl and process content only from the domains you specify, enhancing security and control over web interactions.
To configure this, simply list the allowed domains under the `web_crawling_allowed_sites` key in `config.json`. For example:
```json theme={null}
"web_crawling_allowed_sites": [
"https://docs.mindsdb.com",
"https://another-allowed-site.com"
]
```
## Troubleshooting
`Web crawler encounters character encoding issues`
* **Symptoms**: Extracted text appears garbled or contains strange characters instead of the expected text.
* **Checklist**:
1. Open a GitHub Issue: If you encounter a bug or a repeatable error with encoding,
report it on the [MindsDB GitHub](https://github.com/mindsdb/mindsdb/issues) repository by opening an issue.
`Web crawler times out while trying to fetch content`
* **Symptoms**: The crawler fails to retrieve data from a website, resulting in timeout errors.
* **Checklist**:
1. Check the network connection to ensure the target site is reachable.
# YouTube
Source: https://docs.mindsdb.com/integrations/app-integrations/youtube
In this section, we present how to connect YouTube to MindsDB.
[YouTube](https://www.youtube.com/) is a popular online video-sharing platform and social media website where users
can upload, view, share, and interact with videos created by individuals and organizations from around the world.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB on your system or obtain access to cloud options.
2. To use YouTube with MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
## Connection
There are two ways you can connect YouTube to MindsDB:
1. Limited permissions: This option provides MindsDB with read-only access to YouTube, including viewing comments data.
2. Elevated permissions: This option provides MindsDB with full access to YouTube, including viewing comments data and posting replies to comments.
### Option 1: Limited permissions
Establish a connection to YouTube from MindsDB by executing the below SQL command and following the Google authorization link provided as output:
```sql theme={null}
CREATE DATABASE mindsdb_youtube
WITH ENGINE = 'youtube',
PARAMETERS = {
"youtube_api_token": ""
};
```
Alternatively, you can connect YouTube to MindsDB via the form.
To do that, click on the `Add` button, choose `New Datasource`, search for `YouTube`, and follow the instructions in the form. After providing the connection name and the YouTube API token, click on the `Test Connection` button. Once the connection is established, click on the `Save and Continue` button.
Required connection parameters include the following:
* `youtube_api_token`: It is a Google API key used for authentication. Check out [this guide](https://blog.hubspot.com/website/how-to-get-youtube-api-key) on how to create the API key to access YouTube data.
### Option 2: Elevated permissions
Establish a connection to YouTube from MindsDB by executing the below SQL command and following the Google authorization link provided as output:
```sql theme={null}
CREATE DATABASE mindsdb_youtube
WITH ENGINE = 'youtube',
PARAMETERS = {
"credentials_file": "path-to-credentials-json-file"
-- alternatively, use the credentials_url parameter
};
```
Alternatively, you can connect YouTube to MindsDB via the form.
To do that, click on the `Add` button, choose `New Datasource`, search for `YouTube`, and follow the instructions in the form. After providing the connection name and the credentials file or URL, click on the `Test Connection` button and complete the authorization process in the pop-up window. Once the connection is established, click on the `Save and Continue` button.
Required connection parameters include one of the following:
* `credentials_file`: It is a path to a file generated from the Google Cloud Console, as described below.
* `credentials_url`: It is a URL to a file generated from the Google Cloud Console, as described below.
1. Open the Google Cloud Console.
2. Create a new project.
3. Inside this project, go to APIs & Services:
* Go to Enabled APIs & services:
* Click on ENABLE APIS AND SERVICES from the top bar.
* Search for YouTube Data API v3 and enable it.
* Go to OAuth consent screen:
* Click on GET STARTED.
* Provide app name and support email.
* Choose Audience based on who will be using the app.
* Add the Contact Information (email address) of the developer.
* Agree to the terms and click on CONTINUE.
* Click on Create.
* Click on Audience on the left sidebar and under Test users, add the email addresses of the users who will be testing the app. When you are ready to publish the app, you can come back here and click on PUBLISH APP and this app will become available to either the organization or the public based on the audience you have chosen.
* Go to Credentials:
* Click on CREATE CREDENTIALS from the top bar and choose OAuth client ID.
* Choose type as `Web application` and provide a name. Under Authorized redirect URIs, enter URL where MindsDB has been deployed followed by `/verify-auth`. For example, if you are running MindsDB locally (on `https://localhost:47334`), enter `https://localhost:47334/verify-auth`.
* Click on CREATE.
* Download the JSON file that is required to connect YouTube to MindsDB.
## Usage
Use the established connection to query the `comments` table.
You can query for one video's comments:
```sql theme={null}
SELECT *
FROM mindsdb_youtube.comments
WHERE video_id = "raWFGQ20OfA";
```
Or for one channels's comments:
```sql theme={null}
SELECT *
FROM mindsdb_youtube.comments
WHERE channel_id="UC-...";
```
You can include ordering and limiting the output data:
```sql theme={null}
SELECT * FROM mindsdb_youtube.comments
WHERE video_id = "raWFGQ20OfA"
ORDER BY display_name ASC
LIMIT 5;
```
Use the established connection to query the `channels` table.
```sql theme={null}
SELECT * FROM mindsdb_youtube.channels
WHERE channel_id="UC-...";
```
Here, the `channel_id` column is mandatory in the `WHERE` clause.
Use the established connection to query the `videos` table.
```sql theme={null}
SELECT * FROM mindsdb_youtube.videos
WHERE video_id="id";
```
Here, the `video_id` column is mandatory in the `WHERE` clause.
With the connection option 2, you can insert replies to comments:
```sql theme={null}
INSERT INTO mindsdb_youtube.comments (comment_id, reply)
VALUES ("comment_id", "reply message");
```
# Airtable
Source: https://docs.mindsdb.com/integrations/data-integrations/airtable
This is the implementation of the Airtable data handler for MindsDB.
[Airtable](https://www.airtable.com/lp/campaign/database) is a platform that makes it easy to build powerful, custom applications. These tools can streamline just about any process, workflow, or project. And best of all, you can build them without ever learning to write a single line of code.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Airtable to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Airtable.
## Implementation
This handler is implemented using `duckdb`, a library that allows SQL queries to be executed on `pandas` DataFrames.
In essence, when querying a particular table, the entire table is first pulled into a `pandas` DataFrame using the [Airtable API](https://airtable.com/api). Once this is done, SQL queries can be run on the DataFrame using `duckdb`.
The required arguments to establish a connection are as follows:
* `base_id` is the Airtable base ID.
* `table_name` is the Airtable table name.
* `api_key` is the API key for the Airtable API.
## Usage
In order to make use of this handler and connect to the Airtable database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE airtable_datasource
WITH
engine = 'airtable',
parameters = {
"base_id": "dqweqweqrwwqq",
"table_name": "iris",
"api_key": "knlsndlknslk"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM airtable_datasource.example_tbl;
```
At the moment, only the `SELECT` statement is allowed to be executed through `duckdb`. This, however, has no restriction on running machine learning algorithms against your data in Airtable using the `CREATE MODEL` statement.
# Amazon Aurora
Source: https://docs.mindsdb.com/integrations/data-integrations/amazon-aurora
This is the implementation of the Amazon Aurora handler for MindsDB.
[Amazon Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_AuroraOverview.html) is a fully managed relational database engine that's compatible with MySQL and PostgreSQL.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Amazon Aurora to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Amazon Aurora.
## Implementation
This handler was implemented using the existing MindsDB handlers for MySQL and PostgreSQL.
The required arguments to establish a connection are as follows:
* `host`: the host name or IP address of the Amazon Aurora DB cluster.
* `port`: the TCP/IP port of the Amazon Aurora DB cluster.
* `user`: the username used to authenticate with the Amazon Aurora DB cluster.
* `password`: the password to authenticate the user with the Amazon Aurora DB cluster.
* `database`: the database name to use when connecting with the Amazon Aurora DB cluster.
There optional arguments that can be used are as follows:
* `db_engine`: the database engine of the Amazon Aurora DB cluster. This can take one of two values: 'mysql' or 'postgresql'. This parameter is optional, but if it is not provided, `aws_access_key_id` and `aws_secret_access_key` parameters must be provided.
* `aws_access_key_id`: the access key for the AWS account. This parameter is optional and is only required to be provided if the `db_engine` parameter is not provided.
* `aws_secret_access_key`: the secret key for the AWS account. This parameter is optional and is only required to be provided if the `db_engine` parameter is not provided.
## Usage
In order to make use of this handler and connect to an Amazon Aurora MySQL DB Cluster in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE aurora_mysql_datasource
WITH
engine = 'aurora',
parameters = {
"db_engine": "mysql",
"host": "mysqlcluster.cluster-123456789012.us-east-1.rds.amazonaws.com",
"port": 3306,
"user": "admin",
"password": "password",
"database": "example_db"
};
```
Now, you can use this established connection to query your database as follows:
```sql theme={null}
SELECT *
SELECT * FROM aurora_mysql_datasource.example_table;
```
Similar commands can be used to establish a connection and query Amazon Aurora PostgreSQL DB Cluster:
```sql theme={null}
CREATE DATABASE aurora_postgres_datasource
WITH
engine = 'aurora',
parameters = {
"db_engine": "postgresql",
"host": "postgresmycluster.cluster-123456789012.us-east-1.rds.amazonaws.com",
"port": 5432,
"user": "postgres",
"password": "password",
"database": "example_db "
};
SELECT * FROM aurora_postgres_datasource.example_table
```
If you want to switch to different database, you can include it in your query as:
```sql theme={null}
SELECT *
FROM aurora_datasource.new_database.example_table;
```
# Amazon DynamoDB
Source: https://docs.mindsdb.com/integrations/data-integrations/amazon-dynamodb
This documentation describes the integration of MindsDB with [Amazon DynamoDB](https://aws.amazon.com/dynamodb/), a serverless, NoSQL database service that enables you to develop modern applications at any scale.
This data source integration is thread-safe, utilizing a connection pool where each thread is assigned its own connection. When handling requests in parallel, threads retrieve connections from the pool as needed.
## Prerequisites
Before proceeding, ensure that MindsDB is installed locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
## Connection
Establish a connection to your Amazon DynamoDB from MindsDB by executing the following SQL command:
```sql theme={null}
CREATE DATABASE dynamodb_datasource
WITH
engine = 'dynamodb',
parameters = {
"aws_access_key_id": "PCAQ2LJDOSWLNSQKOCPW",
"aws_secret_access_key": "U/VjewPlNopsDmmwItl34r2neyC6WhZpUiip57i",
"region_name": "us-east-1"
};
```
Required connection parameters include the following:
* `aws_access_key_id`: The AWS access key that identifies the user or IAM role.
* `aws_secret_access_key`: The AWS secret access key that identifies the user or IAM role.
* `region_name`: The AWS region to connect to.
Optional connection parameters include the following:
* `aws_session_token`: The AWS session token that identifies the user or IAM role. This becomes necessary when using temporary security credentials.
## Usage
Retrieve data from a specified table by providing the integration name and the table name:
```sql theme={null}
SELECT *
FROM dynamodb_datasource.table_name
LIMIT 10;
```
Indexes can also be queried by adding a third-level namespace:
```sql theme={null}
SELECT *
FROM dynamodb_datasource.table_name.index_name
LIMIT 10;
```
The queries issued to Amazon DynamoDB are in PartiQL, a SQL-compatible query language for Amazon DynamoDB. For more information, refer to the [PartiQL documentation](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-reference.html).
There are a few limitations to keep in mind when querying data from Amazon DynamoDB (some of which are specific to PartiQL):
* The `LIMIT`, `GROUP BY` and `HAVING` clauses are not supported in PartiQL `SELECT` statements. Furthermore, subqueries and joins are not supported either. Refer to the [PartiQL documentation for SELECT statements](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-reference.select.html) for more information.
* `INSERT` statements are not supported by this integration. However, this can be overcome by issuing a 'native query' via an established connection. An example of this is provided below.
Run PartiQL queries directly on Amazon DynamoDB:
```sql theme={null}
SELECT * FROM dynamodb_datasource (
--Native Query Goes Here
INSERT INTO "Music" value {'Artist' : 'Acme Band1','SongTitle' : 'PartiQL Rocks'}
);
```
The above examples utilize `dynamodb_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
## Troubleshooting Guide
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the Amazon S3 DynamoDB.
* **Checklist**:
1. Confirm that provided AWS credentials are correct. Try making a direct connection to the Amazon DynamoDB using the AWS CLI.
2. Ensure a stable network between MindsDB and AWS.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing special characters.
* **Checklist**:
1. Ensure table names with special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel-data
* Incorrect: SELECT \* FROM integration.'travel-data'
* Correct: SELECT \* FROM integration.\`travel-data\`
# Amazon Redshift
Source: https://docs.mindsdb.com/integrations/data-integrations/amazon-redshift
This documentation describes the integration of MindsDB with [Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html), a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more, enabling you to use your data to acquire new insights for your business and customers.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Redshift to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to your Redshift database from MindsDB by executing the following SQL command:
```sql theme={null}
CREATE DATABASE redshift_datasource
WITH
engine = 'redshift',
parameters = {
"host": "examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com",
"port": 5439,
"database": "example_db",
"user": "awsuser",
"password": "my_password"
};
```
Required connection parameters include the following:
* `host`: The host name or IP address of the Redshift cluster.
* `port`: The port to use when connecting with the Redshift cluster.
* `database`: The database name to use when connecting with the Redshift cluster.
* `user`: The username to authenticate the user with the Redshift cluster.
* `password`: The password to authenticate the user with the Redshift cluster.
Optional connection parameters include the following:
* `schema`: The database schema to use. Default is public.
* `sslmode`: The SSL mode for the connection.
## Usage
Retrieve data from a specified table by providing the integration name, schema, and table name:
```sql theme={null}
SELECT *
FROM redshift_datasource.schema_name.table_name
LIMIT 10;
```
Run Amazon Redshift SQL queries directly on the connected Redshift database:
```sql theme={null}
SELECT * FROM redshift_datasource (
--Native Query Goes Here
WITH VENUECOPY AS (SELECT * FROM VENUE)
SELECT * FROM VENUECOPY ORDER BY 1 LIMIT 10;
);
```
The above examples utilize `redshift_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
## Troubleshooting Guide
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the Amazon Redshift cluster.
* **Checklist**:
1. Make sure the Redshift cluster is active.
2. Confirm that host, port, user, password and database are correct. Try a direct Redshift connection using a client like DBeaver.
3. Ensure that the security settings of the Redshift cluster allow connections from MindsDB.
4. Ensure a stable network between MindsDB and Redshift.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing spaces or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel data
* Incorrect: SELECT \* FROM integration.'travel data'
* Correct: SELECT \* FROM integration.\`travel data\`
This [troubleshooting guide](https://docs.aws.amazon.com/redshift/latest/mgmt/troubleshooting-connections.html) provided by AWS might also be helpful.
# Amazon S3
Source: https://docs.mindsdb.com/integrations/data-integrations/amazon-s3
This documentation describes the integration of MindsDB with [Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html), an object storage service that offers industry-leading scalability, data availability, security, and performance.
This data source integration is thread-safe, utilizing a connection pool where each thread is assigned its own connection. When handling requests in parallel, threads retrieve connections from the pool as needed.
## Prerequisites
Before proceeding, ensure that MindsDB is installed locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
## Connection
Establish a connection to your Amazon S3 bucket from MindsDB by executing the following SQL command:
```sql theme={null}
CREATE DATABASE s3_datasource
WITH
engine = 's3',
parameters = {
"aws_access_key_id": "AQAXEQK89OX07YS34OP",
"aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"bucket": "my-bucket"
};
```
Note that sample parameter values are provided here for reference, and you should replace them with your connection parameters.
Required connection parameters include the following:
* `aws_access_key_id`: The AWS access key that identifies the user or IAM role.
* `aws_secret_access_key`: The AWS secret access key that identifies the user or IAM role.
Optional connection parameters include the following:
* `aws_session_token`: The AWS session token that identifies the user or IAM role. This becomes necessary when using temporary security credentials.
* `bucket`: The name of the Amazon S3 bucket. If not provided, all available buckets can be queried, however, this can affect performance, especially when listing all of the available objects.
## Usage
Retrieve data from a specified object (file) in a S3 bucket by providing the integration name and the object key:
```sql theme={null}
SELECT *
FROM s3_datasource.`my-file.csv`;
LIMIT 10;
```
If a bucket name is provided in the `CREATE DATABASE` command, querying will be limited to that bucket and the bucket name can be ommitted from the object key as shown in the example above. However, if the bucket name is not provided, the object key must include the bucket name, such as `s3_datasource.`my-bucket/my-folder/my-file.csv\`.
Wrap the object key in backticks (\`) to avoid any issues parsing the SQL statements provided. This is especially important when the object key contains spaces, special characters or prefixes, such as `my-folder/my-file.csv`.
At the moment, the supported file formats are CSV, TSV, JSON, and Parquet.
The above examples utilize `s3_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
The special `files` table can be used to list all objects available in the specified bucket or all buckets if the bucket name is not provided:
```sql theme={null}
SELECT *
FROM s3_datasource.files LIMIT 10
```
The content of files can also be retrieved by explicitly requesting the `content` column. This column is empty by default to avoid unnecessary data transfer:
```sql theme={null}
SELECT path, content
FROM s3_datasource.files LIMIT 10
```
This table will return all objects regardless of the file format, however, only the supported file formats mentioned above can be queried.
## Troubleshooting Guide
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the Amazon S3 bucket.
* **Checklist**:
1. Make sure the Amazon S3 bucket exists.
2. Confirm that provided AWS credentials are correct. Try making a direct connection to the S3 bucket using the AWS CLI.
3. Ensure a stable network between MindsDB and AWS.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing object names containing spaces, special characters or prefixes.
* **Checklist**:
1. Ensure object names with spaces, special characters or prefixes are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel/travel\_data.csv
* Incorrect: SELECT \* FROM integration.'travel/travel\_data.csv'
* Correct: SELECT \* FROM integration.\`travel/travel\_data.csv\`
# Apache Cassandra
Source: https://docs.mindsdb.com/integrations/data-integrations/apache-cassandra
This is the implementation of the Cassandra data handler for MindsDB.
[Cassandra](https://cassandra.apache.org/_/index.html) is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Apache Cassandra to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Apache Cassandra.
## Implementation
As ScyllaDB is API-compatible with Apache Cassandra, the Cassandra data handler extends the ScyllaDB handler and uses the `scylla-driver` Python library.
The required arguments to establish a connection are as follows:
* `host` is the host name or IP address of the Cassandra database.
* `port` is the port to use when connecting.
* `user` is the user to authenticate.
* `password` is the password to authenticate the user.
* `keyspace` is the keyspace to connect, the top level container for tables.
* `protocol_version` is not required and defaults to 4.
## Usage
In order to make use of this handler and connect to the Cassandra server in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE sc
WITH
engine = "cassandra",
parameters = {
"host": "127.0.0.1",
"port": "9043",
"user": "user",
"password": "pass",
"keyspace": "test_data",
"protocol_version": 4
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM cassandra_datasource.example_table LIMIT 10;
```
# Apache Druid
Source: https://docs.mindsdb.com/integrations/data-integrations/apache-druid
This is the implementation of the Druid data handler for MindsDB.
[Apache Druid](https://druid.apache.org/docs/latest/design) is a real-time analytics database designed for fast slice-and-dice analytics (*OLAP* queries) on large data sets. Most often, Druid powers use cases where real-time ingestion, fast query performance, and high uptime are important.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Apache Druid to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Apache Druid.
## Implementation
This handler was implemented using the `pydruid` library, the Python API for Apache Druid.
The required arguments to establish a connection are as follows:
* `host` is the host name or IP address of the Apache Druid database.
* `port` is the port that Apache Druid is running on.
* `path` is the query path.
* `scheme` is the URI schema. This parameter is optional and defaults to `http`.
* `user` is the username used to authenticate with Apache Druid. This parameter is optional.
* `password` is the password used to authenticate with Apache Druid. This parameter is optional.
## Usage
In order to make use of this handler and connect to Apache Druid in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE druid_datasource
WITH
engine = 'druid',
parameters = {
"host": "localhost",
"port": 8888,
"path": "/druid/v2/sql/",
"scheme": "http"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM druid_datasource.example_tbl;
```
# Apache Hive
Source: https://docs.mindsdb.com/integrations/data-integrations/apache-hive
This documentation describes the integration of MindsDB with [Apache Hive](https://hive.apache.org/), a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.
The integration allows MindsDB to access data from Apache Hive and enhance Apache Hive with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. To connect Apache Hive to MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to Apache Hive from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/hive_handler) as an engine.
```sql theme={null}
CREATE DATABASE hive_datasource
WITH
engine = 'hive',
parameters = {
"username": "demo_user",
"password": "demo_password",
"host": "127.0.0.1",
"database": "default"
};
```
Required connection parameters include the following:
* `host`: The hostname, IP address, or URL of the Apache Hive server.
* `database`: The name of the Apache Hive database to connect to.
Optional connection parameters include the following:
* `username`: The username for the Apache Hive database.
* `password`: The password for the Apache Hive database.
* `port`: The port number for connecting to the Apache Hive server. Default is `10000`.
* `auth`: The authentication mechanism to use. Default is `CUSTOM`. Other options are `NONE`, `NOSASL`, `KERBEROS` and `LDAP`.
## Usage
Retrieve data from a specified table by providing the integration and table names:
```sql theme={null}
SELECT *
FROM hive_datasource.table_name
LIMIT 10;
```
Run HiveQL queries directly on the connected Apache Hive database:
```sql theme={null}
SELECT * FROM hive_datasource (
--Native Query Goes Here
FROM (FROM (FROM src
SELECT TRANSFORM(value)
USING 'mapper'
AS value, count) mapped
SELECT cast(value as double) AS value, cast(count as int) AS count
SORT BY value, count) sorted
SELECT TRANSFORM(value, count)
USING 'reducer'
AS whatever
);
```
The above examples utilize `hive_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
## Troubleshooting
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the Apache Hive database.
* **Checklist**:
1. Ensure that the Apache Hive server is running and accessible
2. Confirm that host, port, user, and password are correct. Try a direct Apache Hive connection using a client like DBeaver.
3. Test the network connection between the MindsDB host and the Apache Hive server.
# Apache Ignite
Source: https://docs.mindsdb.com/integrations/data-integrations/apache-ignite
This is the implementation of the Apache Ignite data handler for MindsDB.
[Apache Ignite](https://ignite.apache.org/docs/latest/) is a distributed database for high-performance computing with in-memory speed.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Apache Ignite to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Apache Ignite.
## Implementation
This handler is implemented using the `pyignite` library, the Apache Ignite thin (binary protocol) client for Python.
The required arguments to establish a connection are as follows:
* `host` is the host name or IP address of the Apache Ignite cluster's node.
* `port` is the TCP/IP port of the Apache Ignite cluster's node. Must be an integer.
There are several optional arguments that can be used as well,
* `username` is the username used to authenticate with the Apache Ignite cluster. This parameter is optional. Default: None.
* `password` is the password to authenticate the user with the Apache Ignite cluster. This parameter is optional. Default: None.
* `schema` is the schema to use for the connection to the Apache Ignite cluster. This parameter is optional. Default: PUBLIC.
## Usage
In order to make use of this handler and connect to an Apache Ignite database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE ignite_datasource
WITH
ENGINE = 'ignite',
PARAMETERS = {
"host": "127.0.0.1",
"port": 10800,
"username": "admin",
"password": "password",
"schema": "example_schema"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM ignite_datasource.demo_table
LIMIT 10;
```
Currently, a connection can be established only to a single node in the cluster. In the future, we'll configure the client to automatically fail over to another node if the connection to the current node fails or times out by providing the hosts and ports for many nodes as explained [here](https://ignite.apache.org/docs/latest/thin-clients/python-thin-client).
# Apache Impala
Source: https://docs.mindsdb.com/integrations/data-integrations/apache-impala
This is the implementation of the Impala data handler for MindsDB.
[Apache Impala](https://impala.apache.org/) is an MPP (Massive Parallel Processing) SQL query engine for processing huge volumes of data that is stored in the Apache Hadoop cluster. It is an open source software written in C++ and Java. It provides high performance and low latency compared to other SQL engines for Hadoop. In other words, Impala is the highest performing SQL engine (giving RDBMS-like experience) that provides the fastest way to access data stored in Hadoop Distributed File System.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Apache Impala to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Apache Impala.
## Implementation
This handler is implemented using `impyla`, a Python library that allows you to use Python code to run SQL commands on Impala.
The required arguments to establish a connection are:
* `user` is the username associated with the database.
* `password` is the password to authenticate your access.
* `host` is the server IP address or hostname.
* `port` is the port through which TCP/IP connection is to be made.
* `database` is the database name to be connected.
## Usage
In order to make use of this handler and connect to the Impala database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE impala_datasource
WITH
engine = 'impala',
parameters = {
"user":"root",
"password":"p@55w0rd",
"host":"127.0.0.1",
"port":21050,
"database":"Db_NamE"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM impala_datasource.TEST;
```
# Apache Pinot
Source: https://docs.mindsdb.com/integrations/data-integrations/apache-pinot
This is the implementation of the Pinot data handler for MindsDB.
[Apache Pinot](https://pinot.apache.org/) is a real-time distributed OLAP database designed for low-latency query execution even at extremely high throughput. Apache Pinot can ingest directly from streaming sources like Apache Kafka and make events available for querying immediately.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Apache Pinot to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Apache Pinot.
## Implementation
This handler was implemented using the `pinotdb` library, the Python DB-API and SQLAlchemy dialect for Pinot.
The required arguments to establish a connection are as follows:
* `host` is the host name or IP address of the Apache Pinot cluster.
* `broker_port` is the port that the Broker of the Apache Pinot cluster is running on.
* `controller_port` is the port that the Controller of the Apache Pinot cluster is running on.
* `path` is the query path.
## Usage
In order to make use of this handler and connect to the Pinot cluster in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE pinot_datasource
WITH
engine = 'pinot',
parameters = {
"host":"localhost",
"broker_port": 8000,
"controller_port": 9000,
"path": "/query/sql",
"scheme": "http"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM pinot_datasource.example_tbl;
```
# Apache Solr
Source: https://docs.mindsdb.com/integrations/data-integrations/apache-solr
This is the implementation of the Solr data handler for MindsDB.
[Apache Solr](https://solr.apache.org/) is a highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration, and more.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Apache Solr to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Apache Solr.
## Implementation
This handler is implemented using the `sqlalchemy-solr` library, which provides a Python/SQLAlchemy interface.
The required arguments to establish a connection are as follows:
* `username` is the username used to authenticate with the Solr server. This parameter is optional.
* `password` is the password to authenticate the user with the Solr server. This parameter is optional.
* `host` is the host name or IP address of the Solr server.
* `port` is the port number of the Solr server.
* `server_path` defaults to `solr` if not provided.
* `collection` is the Solr Collection name.
* `use_ssl` defaults to `false` if not provided.
Further reference: [https://pypi.org/project/sqlalchemy-solr/](https://pypi.org/project/sqlalchemy-solr/).
## Usage
In order to make use of this handler and connect to the Solr database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE solr_datasource
WITH
engine = 'solr',
parameters = {
"username": "demo_user",
"password": "demo_password",
"host": "127.0.0.1",
"port": "8981",
"server_path": "solr",
"collection": "gettingstarted",
"use_ssl": "false"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM solr_datasource.gettingstarted
LIMIT 10000;
```
**Requirements**
A Solr instance with a Parallel SQL supported up and running.
There are certain limitations that need to be taken into account when issuing queries to Solr. Refer to [https://solr.apache.org/guide/solr/latest/query-guide/sql-query.html#parallel-sql-queries](https://solr.apache.org/guide/solr/latest/query-guide/sql-query.html#parallel-sql-queries).
Don't forget to put limit in the end of the SQL statement
# null
Source: https://docs.mindsdb.com/integrations/data-integrations/ckan
## CKAN Integration handler
This handler facilitates integration with [CKAN](https://ckan.org/).
an open-source data catalog platform for managing and publishing open data. CKAN organizes datasets and stores data in its [DataStore](http://docs.ckan.org/en/2.11/maintaining/datastore.html).To retrieve data from CKAN, the [CKANAPI](https://github.com/ckan/ckanapi) must be used.
# Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. To connect SAP HANA to MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies).
The CKAN handler is included with MindsDB by default, so no additional installation is required.
## Configuration
To use the CKAN handler, you need to provide the URL of the CKAN instance you want to connect to. You can do this by setting the `CKAN_URL` environment variable. For example:
```sql theme={null}
CREATE DATABASE ckan_datasource
WITH ENGINE = 'ckan',
PARAMETERS = {
"url": "https://your-ckan-instance-url.com",
"api_key": "your-api-key-if-required"
};
```
> ***NOTE:*** Some CKAN instances will require you to provide an API Token. You can create one in the CKAN user panel.
## Usage
The CKAN handler provides three main tables:
* `datasets`: Lists all datasets in the CKAN instance.
* `resources`: Lists all resources metadata across all packages.
* `datastore`: Allows querying individual datastore resources.
## Example Queries
1. List all datasets:
```sql theme={null}
SELECT * FROM `your-datasource`.datasets;
```
2. List all resources:
```sql theme={null}
SELECT * FROM `your-datasource`.resources ;
```
3. Query a specific datastore resource:
```sql theme={null}
SELECT * FROM `your-datasource`.datastore WHERE resource_id = 'your-resource-id';
```
Replace `your-resource-id-here` with the actual resource ID you want to query.
## Querying Large Resources
The CKAN handler supports automatic pagination when querying datastore resources. This allows you to retrieve large datasets without worrying about API limits.
You can still use the `LIMIT` clause to limit the number of rows returned by the query. For example:
```sql theme={null}
SELECT * FROM ckan_datasource.datastore
WHERE resource_id = 'your-resource-id-here'
LIMIT 1000;
```
## Limitations
* The handler currently supports read operations only. Write operations are not supported.
* Performance may vary depending on the size of the CKAN instance and the complexity of your queries.
* The handler may not work with all CKAN instances, especially those with custom configurations.
* The handler does not support all CKAN API features. Some advanced features may not be available.
* The datastore search will return limited records up to 32000. Please refer to the [CKAN API](https://docs.ckan.org/en/2.11/maintaining/datastore.html#ckanext.datastore.logic.action.datastore_search_sql) documentation for more information.
# ClickHouse
Source: https://docs.mindsdb.com/integrations/data-integrations/clickhouse
This documentation describes the integration of MindsDB with [ClickHouse](https://clickhouse.com/docs/en/intro), a high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP).
The integration allows MindsDB to access data from ClickHouse and enhance ClickHouse with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect ClickHouse to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to ClickHouse from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/clickhouse_handler) as an engine.
```sql theme={null}
CREATE DATABASE clickhouse_conn
WITH ENGINE = 'clickhouse',
PARAMETERS = {
"host": "127.0.0.1",
"port": "8443",
"user": "root",
"password": "mypass",
"database": "test_data",
"protocol" : "https"
}
```
Required connection parameters include the following:
* `host`: is the hostname or IP address of the ClickHouse server.
* `port`: is the TCP/IP port of the ClickHouse server.
* `user`: is the username used to authenticate with the ClickHouse server.
* `password`: is the password to authenticate the user with the ClickHouse server.
* `database`: defaults to `default`. It is the database name to use when connecting with the ClickHouse server.
* `protocol`: defaults to `native`. It is an optional parameter. Its supported values are `native`, `http` and `https`.
## Usage
The following usage examples utilize the connection to ClickHouse made via the `CREATE DATABASE` statement and named `clickhouse_conn`.
Retrieve data from a specified table by providing the integration and table name.
```sql theme={null}
SELECT *
FROM clickhouse_conn.table_name
LIMIT 10;
```
## Troubleshooting
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the ClickHouse database.
* **Checklist**:
1. Ensure that the ClickHouse server is running and accessible
2. Confirm that host, port, user, and password are correct. Try a direct MySQL connection.
3. Test the network connection between the MindsDB host and the ClickHouse server.
`Slow Connection Initialization`
* **Symptoms**: Connecting to the ClickHouse server takes an exceptionally long time, or connections hang without completing
* **Checklist**:
1. Ensure that you are using the appropriate protocol (http, https, or native) for your ClickHouse setup. Misconfigurations here can lead to significant delays.
2. Ensure that firewalls or security groups (in cloud environments) are properly configured to allow traffic on the necessary ports (as 8123 for HTTP or 9000 for native).
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing spaces, reserved words or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel data
* Incorrect: SELECT \* FROM integration.'travel data'
* Correct: SELECT \* FROM integration.\`travel data\`
# Cloud Spanner
Source: https://docs.mindsdb.com/integrations/data-integrations/cloud-spanner
This is the implementation of the Cloud Spanner data handler for MindsDB.
[Cloud Spanner](https://cloud.google.com/spanner) is a fully managed, mission-critical, relational database service that offers transactional consistency at global scale, automatic, synchronous replication for high availability. It supports two SQL dialects: GoogleSQL (ANSI 2011 with extensions) and PostgreSQL.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Cloud Spanner to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Cloud Spanner.
## Implementation
This handler was implemented using the `google-cloud-spanner` Python client library.
The required arguments to establish a connection are as follows:
* `instance_id` is the instance identifier.
* `database_id` is the database identifier.
* `project` is the identifier of the project that owns the resources.
* `credentials` is a stringified GCP service account key JSON.
## Usage
In order to make use of this handler and connect to the Cloud Spanner database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE cloud_spanner_datasource
WITH
engine = 'cloud_spanner',
parameters = {
"instance_id": "my-instance",
"database_id": "example-id",
"project": "my-project",
"credentials": "{...}"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM cloud_spanner_datasource.my_table;
```
Cloud Spanner supports both PostgreSQL and GoogleSQL dialects. However, not all PostgresSQL features are supported.
# CockroachDB
Source: https://docs.mindsdb.com/integrations/data-integrations/cockroachdb
This is the implementation of the CockroachDB data handler for MindsDB.
[CockroachDB](https://www.cockroachlabs.com/docs/) was architected for complex, high performant distributed writes and delivers scale-out read capability. CockroachDB delivers simple relational SQL transactions and obscures complexity away from developers. It is wire-compatible with PostgreSQL and provides a familiar and easy interface for developers.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect CockroachDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to CockroachDB.
## Implementation
CockroachDB is wire-compatible with PostgreSQL. Therefore, its implementation extends the PostgreSQL handler.
The required arguments to establish a connection are as follows:
* `host` is the host name or IP address of the CockroachDB.
* `database` is the name of the database to connect to.
* `user` is the user to authenticate with the CockroachDB.
* `port` is the port to use when connecting.
* `password` is the password to authenticate the user.
In order to make use of this handler and connect to the CockroachDB server in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE cockroachdb
WITH
engine = 'cockroachdb',
parameters = {
"host": "localhost",
"database": "dbname",
"user": "admin",
"password": "password",
"port": "5432"
};
```
## Usage
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM cockroachdb.public.db;
```
# Couchbase
Source: https://docs.mindsdb.com/integrations/data-integrations/couchbase
This is the implementation of the Couchbase data handler for MindsDB.
[Couchbase](https://www.couchbase.com/) is an open-source, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating, and presenting data.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Couchbase to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Couchbase.
## Implementation
This handler is implemented using the `couchbase` library, the Python driver for Couchbase.
The required arguments to establish a connection are as follows:
* `connection_string`: the connection string for the endpoint of the Couchbase server
* `bucket`: the bucket name to use when connecting with the Couchbase server
* `user`: the user to authenticate with the Couchbase server
* `password`: the password to authenticate the user with the Couchbase server
* `scope`: scopes are a level of data organization within a bucket. If omitted, will default to `_default`
Note: The connection string expects either the couchbases\:// or couchbase:// protocol.
If you are using Couchbase Capella, you can find the `connection_string` under the Connect tab
It will also be required to whitelist the machine(s) that will be running MindsDB and database credentials will need to be created for the user. These steps can also be taken under the Connect tab.
In order to make use of this handler and connect to a Couchbase server in MindsDB, the following syntax can be used. Note, that the example uses the default `travel-sample` bucket which can be enabled from the couchbase UI with pre-defined scope and documents.
```sql theme={null}
CREATE DATABASE couchbase_datasource
WITH
engine='couchbase',
parameters={
"connection_string": "couchbase://localhost",
"bucket": "travel-sample",
"user": "admin",
"password": "password",
"scope": "inventory"
};
```
## Usage
Now, you can use this established connection to query your database as follows:
```sql theme={null}
SELECT * FROM couchbase_datasource.airport
```
# CrateDB
Source: https://docs.mindsdb.com/integrations/data-integrations/cratedb
This is the implementation of the CrateDB data handler for MindsDB.
[CrateDB](https://crate.io/) is a distributed SQL database management system that integrates a fully searchable document-oriented data store. It is open-source, written in Java, based on a shared-nothing architecture, and designed for high scalability. CrateDB includes components from Lucene, Elasticsearch and Netty.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect CrateDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to CrateDB.
## Implementation
This handler is implemented using `crate`, a Python library that allows you to use Python code to run SQL commands on CrateDB.
The required arguments to establish a connection are as follows:
* `user` is the username associated with the database.
* `password` is the password to authenticate your access.
* `host` is the hostname or IP adress of the server.
* `port` is the port through which connection is to be made.
* `schema_name` is schema name to get tables from. Defaults to `doc`.
## Usage
In order to make use of this handler and connect to the CrateDB database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE crate_datasource
WITH
engine = 'crate',
parameters = {
"user": "crate",
"password": "",
"host": "127.0.0.1",
"port": 4200,
"schema_name": "doc"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM crate_datasource.demo;
```
# D0lt
Source: https://docs.mindsdb.com/integrations/data-integrations/d0lt
This is the implementation of the D0lt data handler for MindsDB.
[D0lt](https://docs.dolthub.com/introduction/what-is-dolt) is a single-node and embedded DBMS that incorporates Git-style versioning as a first-class entity. D0lt behaves like Git - it is a content-addressable local database where the main objects are tables instead of files. In D0lt, a user creates a database locally. The database contains tables that can be read and updated using SQL. Similar to Git, writes are staged until the user issues a commit. Upon commit, the writes are appended to permanent storage.
Branch and merge semantics are supported allowing for the tables to evolve at a different pace for multiple users. This allows for loose collaboration on data as well as multiple views on the same core data. Merge conflicts are detected for schema and data conflicts. Data conflicts are cell-based, not line-based. Remote repositories allow for cooperation among repository instances. Clone, push, and pull semantics are all available.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect D0lt to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to D0lt.
## Implementation
This handler is implemented using `mysql-connector`, a Python library that allows you to use Python code to run SQL commands on the D0lt database.
The required arguments to establish a connection are as follows:
* `user` is the username associated with the database.
* `password` is the password to authenticate your access.
* `host` is the hostname or IP address of the server.
* `port` is the port through which a TCP/IP connection is to be made.
* `database` is the database name to be connected.
## Usage
In order to make use of this handler and connect to the D0lt database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE d0lt_datasource
WITH
engine = 'd0lt',
parameters = {
"user": "root",
"password": "",
"host": "127.0.0.1",
"port": 3306,
"database": "information_schema"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM D0lt_datasource.TEST;
```
# Databend
Source: https://docs.mindsdb.com/integrations/data-integrations/databend
This is the implementation of the Databend data handler for MindsDB.
[Databend](https://databend.rs/) is a modern cloud data warehouse that empowers your object storage for real-time analytics.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Databend to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Databend.
## Implementation
This handler is implemented by extending the ClickHouse handler.
The required arguments to establish a connection are as follows:
* `protocol` is the protocol to query Databend. Supported values include `native`, `http`, `https`. It defaults to `native` if not provided.
* `host` is the host name or IP address of the Databend warehouse.
* `port` is the TCP/IP port of the Databend warehouse.
* `user` is the username used to authenticate with the Databend warehouse.
* `password` is the password to authenticate the user with the Databend warehouse.
* `database` is the database name to use when connecting with the Databend warehouse.
## Usage
In order to make use of this handler and connect to the Databend database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE databend_datasource
WITH
engine = 'databend',
parameters = {
"protocol": "https",
"user": "root",
"port": 443,
"password": "password",
"host": "some-url.aws-us-east-2.default.databend.com",
"database": "test_db"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM databend_datasource.example_tbl;
```
# Databricks
Source: https://docs.mindsdb.com/integrations/data-integrations/databricks
This documentation describes the integration of MindsDB with [Databricks](https://www.databricks.com/), the world's first data intelligence platform powered by generative AI.
The integration allows MindsDB to access data stored in a Databricks workspace and enhance it with AI capabilities.
This data source integration is thread-safe, utilizing a connection pool where each thread is assigned its own connection. When handling requests in parallel, threads retrieve connections from the pool as needed.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Databricks to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
If the Databricks cluster you are attempting to connect to is terminated, executing the queries given below will attempt to start the cluster and therefore, the first query may take a few minutes to execute.
To avoid any delays, ensure that the Databricks cluster is running before executing the queries.
## Connection
Establish a connection to your Databricks workspace from MindsDB by executing the following SQL command:
```sql theme={null}
CREATE DATABASE databricks_datasource
WITH
engine = 'databricks',
parameters = {
"server_hostname": "adb-1234567890123456.7.azuredatabricks.net",
"http_path": "sql/protocolv1/o/1234567890123456/1234-567890-test123",
"access_token": "dapi1234567890ab1cde2f3ab456c7d89efa",
"schema": "example_db"
};
```
Required connection parameters include the following:
* `server_hostname`: The server hostname for the cluster or SQL warehouse.
* `http_path`: The HTTP path of the cluster or SQL warehouse.
* `access_token`: A Databricks personal access token for the workspace.
Refer the instructions given [https://docs.databricks.com/en/integrations/compute-details.html](https://docs.databricks.com/en/integrations/compute-details.html) and [https://docs.databricks.com/en/dev-tools/python-sql-connector.html#authentication](https://docs.databricks.com/en/dev-tools/python-sql-connector.html#authentication) to find the connection parameters mentioned above for your compute resource.
Optional connection parameters include the following:
* `session_configuration`: Additional (key, value) pairs to set as Spark session configuration parameters. This should be provided as a JSON string.
* `http_headers`: Additional (key, value) pairs to set in HTTP headers on every RPC request the client makes. This should be provided as `"http_headers": [['Header-1', 'value1'], ['Header-2', 'value2']]`.
* `catalog`: The catalog to use for the connection. Default is `hive_metastore`.
* `schema`: The schema (database) to use for the connection. Default is `default`.
## Usage
Retrieve data from a specified table by providing the integration name, catalog, schema, and table name:
```sql theme={null}
SELECT *
FROM databricks_datasource.catalog_name.schema_name.table_name
LIMIT 10;
```
The catalog and schema names only need to be provided if the table to be queried is not in the specified (or default) catalog and schema.
Run Databricks SQL queries directly on the connected Databricks workspace:
```sql theme={null}
SELECT * FROM databricks_datasource (
--Native Query Goes Here
SELECT
city,
car_model,
RANK() OVER (PARTITION BY car_model ORDER BY quantity) AS rank
FROM dealer
QUALIFY rank = 1;
);
```
The above examples utilize `databricks_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
## Troubleshooting Guide
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the Databricks workspace.
* **Checklist**:
1. Make sure the Databricks workspace is active.
2. Confirm that server hostname, HTTP path, access token are correctly provided. If the catalog and schema are provided, ensure they are correct as well.
3. Ensure a stable network between MindsDB and Databricks workspace.
SQL statements running against tables (of reasonable size) are taking longer than expected.
* **Symptoms**: SQL queries taking longer than expected to execute.
* **Checklist**:
1. Ensure the Databricks cluster is running before executing the queries.
2. Check the network connection between MindsDB and Databricks workspace.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing special characters.
* **Checklist**:
1. Ensure table names with special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel-data
* Incorrect: SELECT \* FROM integration.'travel-data'
* Correct: SELECT \* FROM integration.\`travel-data\`
# DataStax
Source: https://docs.mindsdb.com/integrations/data-integrations/datastax
This is the implementation of the DataStax data handler for MindsDB.
[https://docs.datastax.com/en/astra-db-serverless/index.html\[DataStax](https://docs.datastax.com/en/astra-db-serverless/index.html\[DataStax) Astra DB] is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers on-premises solutions, DataStax Enterprise (DSE) and Hyper-Converged Database (HCD), as well as Astra Streaming, a messaging and event streaming cloud service based on Apache Pulsar.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect DataStax to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Create an [Astra DB database](https://docs.datastax.com/en/astra-db-serverless/databases/create-database.html).
## Implementation
DataStax Astra DB is API-compatible with Apache Cassandra and ScyllaDB. Therefore, its implementation extends the ScyllaDB handler and is using the `scylla-driver` Python library.
The required arguments to establish a connection are as follows:
* `user`: The literal string `token`
* `password`: An [Astra application token](https://docs.datastax.com/en/astra-db-serverless/administration/manage-application-tokens.html)
* `secure_connect_bundle`: The path to your database's [Secure Connect Bundle](https://docs.datastax.com/en/astra-db-serverless/databases/secure-connect-bundle.html) zip file
## Usage
In order to make use of this handler and connect to the Astra DB database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE astra_connection
WITH
engine = "astra",
parameters = {
"user": "token",
"password": "application_token",
"secure_connect_bundle": "/home/Downloads/file.zip"
};
```
or, reference the bundle from Datastax s3 as:
```sql theme={null}
CREATE DATABASE astra_connection
WITH ENGINE = "astra",
PARAMETERS = {
"user": "token",
"password": "application_token",
"secure_connect_bundle": "https://datastax-cluster-config-prod.s3.us-east-2.amazonaws.com/32312-b9eb-4e09-a641-213eaesa12-1/secure-connect-demo.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AK..."
}
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM astra_connection.keystore.example_table
LIMIT 10;
```
# DuckDB
Source: https://docs.mindsdb.com/integrations/data-integrations/duckdb
This is the implementation of the DuckDB data handler for MindsDB.
[DuckDB](https://duckdb.org/) is an open-source analytical database system. It is designed for fast execution of analytical queries. There are no external dependencies and the DBMS runs completely embedded within a host process, similar to SQLite. DuckDB provides a rich SQL dialect with support for complex queries with transactional guarantees (ACID).
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect DuckDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to DuckDB.
## Implementation
This handler is implemented using the `duckdb` Python client library.
The DuckDB handler is currently using the `0.7.1.dev187` pre-relase version of the Python client library. In case of issues, make sure your DuckDB database is compatible with this version. See the [`requirements.txt`](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/duckdb_handler/requirements.txt) for details.
The required arguments to establish a connection are as follows:
* `database` is the name of the DuckDB database file. It can be set to `:memory:` to create an in-memory database.
The optional arguments are as follows:
* `read_only` is a flag that specifies whether the connection is in the read-only mode. This is required if multiple processes want to access the same database file at the same time.
## Usage
In order to make use of this handler and connect to the DuckDB database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE duckdb_datasource
WITH
engine = 'duckdb',
parameters = {
"database": "db.duckdb"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM duckdb_datasource.my_table;
```
# EdgelessDB
Source: https://docs.mindsdb.com/integrations/data-integrations/edgelessdb
This is the implementation of the EdgelessDB data handler for MindsDB.
[Edgeless](https://edgeless.systems/) is a full SQL database, tailor-made for confidential computing. It seamlessly integrates with your existing tools and workflows to help you unlock the full potential of your data.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect EdgelessDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to EdgelessDB.
## Implementation
This handler is implemented by extending the MySQL connector.
The required arguments to establish a connection are as follows:
* `host`: the host name of the EdgelessDB connection
* `port`: the port to use when connecting
* `user`: the user to authenticate
* `password`: the password to authenticate the user
* `database`: database name
To use the full potensial of EdgelessDB, you can also specify the following arguments:
* `ssl`: whether to use SSL or not
* `ssl_ca`: path or url to the CA certificate
* `ssl_cert`: path or url to the client certificate
* `ssl_key`: path or url to the client key
## Usage
In order to use EdgelessDB as a data source in MindsDB, you need to use the following syntax:
```sql theme={null}
CREATE DATABASE edgelessdb_datasource
WITH ENGINE = "EdgelessDB",
PARAMETERS = {
"user": "root",
"password": "test123@!Aabvhj",
"host": "localhost",
"port": 3306,
"database": "test_schema"
}
```
Or you can use the following syntax:
```sql theme={null}
CREATE DATABASE edgelessdb_datasource2
WITH ENGINE = "EdgelessDB",
PARAMETERS = {
"user": "root",
"password": "test123@!Aabvhj",
"host": "localhost",
"port": 3306,
"database": "test_schema",
"ssl_cert": "/home/marios/demo/cert.pem",
"ssl_key": "/home/marios/demo/key.pem"
}
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT * FROM edgelessdb_datasource.table_name
```
# ElasticSearch
Source: https://docs.mindsdb.com/integrations/data-integrations/elasticsearch
This documentation describes the integration of MindsDB with [ElasticSearch](https://www.elastic.co/), a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents..
The integration allows MindsDB to access data from ElasticSearch and enhance ElasticSearch with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. To connect ElasticSearch to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to ElasticSearch.
## Connection
Establish a connection to ElasticSearch from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/elasticsearch_handler) as an engine.
```sql theme={null}
CREATE DATABASE elasticsearch_datasource
WITH ENGINE = 'elasticsearch',
PARAMETERS={
'cloud_id': 'xyz', -- optional, if hosts are provided
'hosts': 'https://xyz.xyz.gcp.cloud.es.io:123', -- optional, if cloud_id is provided
'api_key': 'xyz', -- optional, if user and password are provided
'user': 'elastic', -- optional, if api_key is provided
'password': 'xyz' -- optional, if api_key is provided
};
```
The connection parameters include the following:
* `cloud_id`: The Cloud ID provided with the ElasticSearch deployment. Required only when `hosts` is not provided.
* `hosts`: The ElasticSearch endpoint provided with the ElasticSearch deployment. Required only when `cloud_id` is not provided.
* `api_key`: The API key that you generated for the ElasticSearch deployment. Required only when `user` and `password` are not provided.
* `user` and `password`: The user and password used to authenticate. Required only when `api_key` is not provided.
If you want to connect to the local instance of ElasticSearch, use the below statement:
```sql theme={null}
CREATE DATABASE elasticsearch_datasource
WITH ENGINE = 'elasticsearch',
PARAMETERS = {
"hosts": "127.0.0.1:9200",
"user": "user",
"password": "password"
};
```
Required connection parameters include the following (at least one of these parameters should be provided):
* `hosts`: The IP address and port where ElasticSearch is deployed.
* `user`: The user used to autheticate access.
* `password`: The password used to autheticate access.
## Usage
Retrieve data from a specified index by providing the integration name and index name:
```sql theme={null}
SELECT *
FROM elasticsearch_datasource.my_index
LIMIT 10;
```
The above examples utilize `elasticsearch_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
At the moment, the Elasticsearch SQL API has certain limitations that have an impact on the queries that can be issued via MindsDB. The most notable of these limitations are listed below:
1. Only `SELECT` queries are supported at the moment.
2. Array fields are not supported.
3. Nested fields cannot be queried directly. However, they can be accessed using the `.` operator.
For a detailed guide on the limitations of the Elasticsearch SQL API, refer to the [official documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-limitations.html).
## Troubleshooting Guide
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the Elasticsearch server.
* **Checklist**:
1. Make sure the Elasticsearch server is active.
2. Confirm that server, cloud ID and credentials are correct.
3. Ensure a stable network between MindsDB and Elasticsearch.
`Transport Error` or `Request Error`
* **Symptoms**: Errors related to the issuing of unsupported queries to Elasticsearch.
* **Checklist**:
1. Ensure the query is a `SELECT` query.
2. Avoid querying array fields.
3. Access nested fields using the `.` operator.
4. Refer to the [official documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-limitations.html) for more information if needed.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing index names containing special characters.
* **Checklist**:
1. Ensure table names with special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel-data
* Incorrect: SELECT \* FROM integration.'travel-data'
* Correct: SELECT \* FROM integration.\`travel-data\`
This [troubleshooting guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/troubleshooting.html) provided by Elasticsearch might also be helpful.
# Firebird
Source: https://docs.mindsdb.com/integrations/data-integrations/firebird
This is the implementation of the Firebird data handler for MindsDB.
[Firebird](https://firebirdsql.org/en/about-firebird/) is a relational database offering many ANSI SQL standard features that runs on Linux, Windows, and a variety of Unix platforms. Firebird offers excellent concurrency, high performance, and powerful language support for stored procedures and triggers. It has been used in production systems, under a variety of names, since 1981.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Firebird to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Firebird.
## Implementation
This handler is implemented using the `fdb` library, the Python driver for Firebird.
The required arguments to establish a connection are as follows:
* `host` is the host name or IP address of the Firebird server.
* `database` is the port to use when connecting with the Firebird server.
* `user` is the username to authenticate the user with the Firebird server.
* `password` is the password to authenticate the user with the Firebird server.
## Usage
In order to make use of this handler and connect to the Firebird server in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE firebird_datasource
WITH
engine = 'firebird',
parameters = {
"host": "localhost",
"database": "C:\Users\minura\Documents\mindsdb\example.fdb",
"user": "sysdba",
"password": "password"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM firebird_datasource.example_tbl;
```
# Google BigQuery
Source: https://docs.mindsdb.com/integrations/data-integrations/google-bigquery
This documentation describes the integration of MindsDB with [Google BigQuery](https://cloud.google.com/bigquery?hl=en), a fully managed, AI-ready data analytics platform that helps you maximize value from your data.
The integration allows MindsDB to access data stored in the BigQuery warehouse and enhance it with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect BigQuery to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to your BigQuery warehouse from MindsDB by executing the following SQL command:
```sql theme={null}
CREATE DATABASE bigquery_datasource
WITH
engine = "bigquery",
parameters = {
"project_id": "bgtest-1111",
"dataset": "mydataset",
"service_account_keys": "/tmp/keys.json"
};
```
Required connection parameters include the following:
* `project_id`: The globally unique identifier for your project in Google Cloud where BigQuery is located.
* `dataset`: The default dataset to connect to.
Optional connection parameters include the following:
* `service_account_keys`: The full path to the service account key file.
* `service_account_json`: The content of a JSON file defined by the `service_account_keys` parameter.
One of `service_account_keys` or `service_account_json` has to be provided to
establish a connection to BigQuery.
## Usage
Retrieve data from a specified table in the default dataset by providing the integration name and table name:
```sql theme={null}
SELECT *
FROM bigquery_datasource.table_name
LIMIT 10;
```
Retrieve data from a specified table in a different dataset by providing the integration name, dataset name and table name:
```sql theme={null}
SELECT *
FROM bigquery_datasource.dataset_name.table_name
LIMIT 10;
```
Run SQL in any supported BigQuery dialect directly on the connected BigQuery database:
```sql theme={null}
SELECT * FROM bigquery_datasource (
--Native Query Goes Here
SELECT *
FROM t1
WHERE t1.a IN (SELECT t2.a
FROM t2 FOR SYSTEM_TIME AS OF t1.timestamp_column);
);
```
The above examples utilize `bigquery_datasource` as the datasource name, which
is defined in the `CREATE DATABASE` command.
## Troubleshooting Guide
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the BigQuery warehouse.
* **Checklist**:
1. Make sure that the Google Cloud account is active and the Google BigQuery service is enabled.
2. Confirm that the project ID, dataset and service account credentials are correct. Try a direct BigQuery connection using a client like DBeaver.
3. Ensure a stable network between MindsDB and Google BigQuery.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing spaces or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
Examples:
\_ Incorrect: SELECT \_ FROM integration.travel data
\_ Incorrect: SELECT \_ FROM integration.'travel data'
\_ Correct: SELECT \_ FROM integration.\`travel data\`
# Google Cloud SQL
Source: https://docs.mindsdb.com/integrations/data-integrations/google-cloud-sql
This is the implementation of the Google Cloud SQL data handler for MindsDB.
[Cloud SQL](https://cloud.google.com/sql) is a fully-managed database service that makes it easy to set up, maintain, manage, and administer your relational PostgreSQL, MySQL, and SQL Server databases in the cloud.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Google Cloud SQL to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Google Cloud SQL.
## Implementation
This handler was implemented using the existing MindsDB handlers for MySQL, PostgreSQL and SQL Server.
The required arguments to establish a connection are,
* `host`: the host name or IP address of the Google Cloud SQL instance.
* `port`: the TCP/IP port of the Google Cloud SQL instance.
* `user`: the username used to authenticate with the Google Cloud SQL instance.
* `password`: the password to authenticate the user with the Google Cloud SQL instance.
* `database`: the database name to use when connecting with the Google Cloud SQL instance.
* `db_engine`: the database engine of the Google Cloud SQL instance. This can take one of three values: 'mysql', 'postgresql' or 'mssql'.
## Usage
In order to make use of this handler and connect to the Google Cloud SQL instance, you need to create a datasource with the following syntax:
```sql theme={null}
CREATE DATABASE cloud_sql_mysql_datasource
WITH ENGINE = 'cloud_sql',
PARAMETERS = {
"db_engine": "mysql",
"host": "53.170.61.16",
"port": 3306,
"user": "admin",
"password": "password",
"database": "example_db"
};
```
To successfully connect to the Google Cloud SQL instance you have to make sure that the IP address of the machine you are using to connect is added to the authorized networks of the Google Cloud SQL instance. You can do this by following the steps below:
1. Go to the [Cloud SQL Instances](https://console.cloud.google.com/sql/instances) page.
2. Click on the instance you want to add authorized networks to.
3. Click on the **Connections** tab.
4. Click on **Networking** tab.
5. Click on **Add network**.
6. Enter the IP address of the machine you want to connect from.
If you are using MindsDB cloud version you can use the following IP address: `18.220.205.95
3.19.152.46
52.14.91.162`
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT * FROM cloud_sql_mysql_datasource.example_tbl;
```
# Google Sheets
Source: https://docs.mindsdb.com/integrations/data-integrations/google-sheets
This is the implementation of the Google Sheets data handler for MindsDB.
[Google Sheets](https://www.google.com/sheets/about/) is a spreadsheet program included as a part of the free, web-based Google Docs Editors suite offered by Google.
Please note that the integration of MindsDB with Google Sheets works for public sheets only.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Google Sheets to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Google Sheets.
## Implementation
This handler is implemented using `duckdb`, a library that allows SQL queries to be executed on `pandas` DataFrames.
In essence, when querying a particular sheet, the entire sheet is first pulled into a `pandas` DataFrame using the [Google Visualization API](https://developers.google.com/chart/interactive/docs/reference). Once this is done, SQL queries can be run on the DataFrame using `duckdb`.
Since the entire sheet needs to be pulled into memory first (DataFrame), it is recommended to be somewhat careful when querying large datasets so as not to overload your machine.
The required arguments to establish a connection are as follows:
* `spreadsheet_id` is the unique ID of the Google Sheet.
* `sheet_name` is the name of the sheet within the Google Sheet.
## Usage
In order to make use of this handler and connect to a Google Sheet in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE sheets_datasource
WITH
engine = 'sheets',
parameters = {
"spreadsheet_id": "12wgS-1KJ9ymUM-6VYzQ0nJYGitONxay7cMKLnEE2_d0",
"sheet_name": "iris"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM sheets_datasource.example_tbl;
```
The name of the table will be the name of the relevant sheet, provided as an input to the `sheet_name` parameter.
At the moment, only the `SELECT` statemet is allowed to be executed through `duckdb`. This, however, has no restriction on running machine learning algorithms against your data in Google Sheets using the `CREATE MODEL` statement.
# GreptimeDB
Source: https://docs.mindsdb.com/integrations/data-integrations/greptimedb
This is the implementation of the GreptimeDB data handler for MindsDB.
[GreptimeDB](https://greptime.com/) is an open-source, cloud-native time series database features analytical capabilities, scalebility and open protocols support.
## Implementation
This handler is implemented by extending the MySQLHandler.
Connect GreptimeDB to MindsDB by providing the following parameters:
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.
* `user` is the database user.
* `password` is the database password.
There are several optional parameters that can be used as well.
* `ssl` is the `ssl` parameter value that indicates whether SSL is enabled (`True`) or disabled (`False`).
* `ssl_ca` is the SSL Certificate Authority.
* `ssl_cert` stores SSL certificates.
* `ssl_key` stores SSL keys.
## Usage
In order to make use of this handler and connect to the GreptimeDB database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE greptimedb_datasource
WITH
engine = 'greptimedb',
parameters = {
"host": "127.0.0.1",
"port": 4002,
"database": "public",
"user": "username",
"password": "password"
};
```
You can use this established connection to query your table as follows.
```sql theme={null}
SELECT *
FROM greptimedb_datasource.example_table;
```
# IBM Db2
Source: https://docs.mindsdb.com/integrations/data-integrations/ibm-db2
This documentation describes the integration of MindsDB with [IBM Db2](https://www.ibm.com/db2), the cloud-native database built to power low-latency transactions, real-time analytics and AI applications at scale.
The integration allows MindsDB to access data stored in the IBM Db2 database and enhance it with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect IBM Db2 to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to your IBM Db2 database from MindsDB by executing the following SQL command:
```sql theme={null}
CREATE DATABASE db2_datasource
WITH
engine = 'db2',
parameters = {
"host": "127.0.0.1",
"user": "db2inst1",
"password": "password",
"database": "example_db"
};
```
Required connection parameters include the following:
* `host`: The hostname, IP address, or URL of the IBM Db2 database.
* `user`: The username for the IBM Db2 database.
* `password`: The password for the IBM Db2 database.
* `database`: The name of the IBM Db2 database to connect to.
Optional connection parameters include the following:
* `port`: The port number for connecting to the IBM Db2 database. Default is `50000`.
* `schema`: The database schema to use within the IBM Db2 database.
## Usage
Retrieve data from a specified table by providing the integration name, schema, and table name:
```sql theme={null}
SELECT *
FROM db2_datasource.schema_name.table_name
LIMIT 10;
```
Run IBM Db2 native queries directly on the connected database:
```sql theme={null}
SELECT * FROM db2_datasource (
--Native Query Goes Here
WITH
DINFO (DEPTNO, AVGSALARY, EMPCOUNT) AS
(SELECT OTHERS.WORKDEPT, AVG(OTHERS.SALARY), COUNT(*)
FROM EMPLOYEE OTHERS
GROUP BY OTHERS.WORKDEPT
),
DINFOMAX AS
(SELECT MAX(AVGSALARY) AS AVGMAX FROM DINFO)
SELECT THIS_EMP.EMPNO, THIS_EMP.SALARY,
DINFO.AVGSALARY, DINFO.EMPCOUNT, DINFOMAX.AVGMAX
FROM EMPLOYEE THIS_EMP, DINFO, DINFOMAX
WHERE THIS_EMP.JOB = 'SALESREP'
AND THIS_EMP.WORKDEPT = DINFO.DEPTNO
);
```
The above examples utilize `db2_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
## Troubleshooting Guide
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the IBM Db2 database.
* **Checklist**:
1. Make sure the IBM Db2 database is active.
2. Confirm that host, user, password and database are correct. Try a direct connection using a client like DBeaver.
3. Ensure a stable network between MindsDB and the IBM Db2 database.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing spaces or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel-data
* Incorrect: SELECT \* FROM integration.'travel-data'
* Correct: SELECT \* FROM integration.\`travel-data\`
This [guide](https://www.ibm.com/docs/en/db2/11.5?topic=connect-common-db2-problems) of common connection Db2 connection issues provided by IBM might also be helpful.
# IBM Informix
Source: https://docs.mindsdb.com/integrations/data-integrations/ibm-informix
This is the implementation of the IBM Informix data handler for MindsDB.
[IBM Informix](https://www.ibm.com/products/informix) is a product family within IBM's Information Management division that is centered on several relational database management system (RDBMS) offerings. The Informix server supports object–relational models and (through extensions) data types that are not a part of the SQL standard. The most widely used of these are the JSON, BSON, time series, and spatial extensions, which provide both data type support and language extensions that permit high-performance domain-specific queries and efficient storage for data sets based on semi-structured, time series, and spatial data.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect IBM Informix to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to IBM Informix.
## Implementation
This handler is implemented using `IfxPy/IfxPyDbi`, a Python library that allows you to use Python code to run SQL commands on the Informix database.
The required arguments to establish a connection are as follows:
* `user` is the username associated with database.
* `password` is the password to authenticate your access.
* `host` is the hostname or IP address of the server.
* `port` is the port through which TCP/IP connection is to be made.
* `database` is the database name to be connected.
* `schema_name` is the schema name to get tables.
* `server` is the name of server you want connect.
* `logging_enabled` defines whether logging is enabled or not. Defaults to `True` if not provided.
## Usage
In order to make use of this handler and connect to the Informix database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE informix_datasource
WITH
engine='informix',
parameters={
"server": "server",
"host": "127.0.0.1",
"port": 9091,
"user": "informix",
"password": "in4mix",
"database": "stores_demo",
"schema_name": "love",
"loging_enabled": False
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM informix_datasource.items;
```
This integration uses `IfxPy`. As it is in development stage, it can be install using `pip install IfxPy`. However, it doesn't work with higher versions of Python, therefore, you have to build it from source.
1. This code downloads and extracts the `onedb-ODBC` driver used to make connection:
```bash theme={null}
cd $HOME
mkdir Informix
cd Informix
mkdir -p home/informix/cli
wget https://hcl-onedb.github.io/odbc/OneDB-Linux64-ODBC-Driver.tar
sudo tar xvf OneDB-Linux64-ODBC-Driver.tar -C ./home/informix/cli
rm OneDB-Linux64-ODBC-Driver.tar
```
2. Add enviroment variables in the `.bashrc` file:
```bash theme={null}
export INFORMIXDIR=$HOME/Informix/home/informix/cli/onedb-odbc-driver
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}${INFORMIXDIR}/lib:${INFORMIXDIR}/lib/esql:${INFORMIXDIR}/lib/cli
```
3. This code clones the `IfxPy` repo, builds a wheel, and installs it:
```bash theme={null}
pip install wheel
mkdir Temp
cd Temp
git clone https://github.com/OpenInformix/IfxPy.git
cd IfxPy/IfxPy
python setup.py bdist_wheel
pip install --find-links=./dist IfxPy
cd ..
cd ..
cd ..
rm -rf Temp
```
1. This code downloads and extracts the `onedb-ODBC` driver used to make connection:
```bash theme={null}
cd $HOME
mkdir Informix
cd Informix
mkdir /home/informix/cli
wget https://hcl-onedb.github.io/odbc/OneDB-Win64-ODBC-Driver.zip
tar xvf OneDB-Win64-ODBC-Driver.zip -C ./home/informix/cli
del OneDB-Win64-ODBC-Driver.zip
```
2. Add an enviroment variable:
```bash theme={null}
set INFORMIXDIR=$HOME/Informix/home/informix/cli/onedb-odbc-driver
```
3. Add `%INFORMIXDIR%\bin` to the PATH environment variable.
4. This code clones the `IfxPy` repo, builds a wheel, and installs it:
```bash theme={null}
pip install wheel
mkdir Temp
cd Temp
git clone https://github.com/OpenInformix/IfxPy.git
cd IfxPy/IfxPy
python setup.py bdist_wheel
pip install --find-links=./dist IfxPy
cd ..
cd ..
cd ..
rmdir Temp
```
# InfluxDB
Source: https://docs.mindsdb.com/integrations/data-integrations/influxdb
This is the implementation of the InfluxDB data handler for MindsDB.
[InfluxDB](https://www.influxdata.com/) is a time series database that can be used to collect data and monitor the system and devices, especially Edge devices.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect InfluxDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to InfluxDB.
## Implementation
The required arguments to establish a connection are as follows:
* `influxdb_url` is the hosted URL of InfluxDB Cloud.
* `influxdb_token` is the authentication token for the hosted InfluxDB Cloud instance.
* `influxdb_db_name` is the database name of the InfluxDB Cloud instance.
* `influxdb_table_name` is the table name of the InfluxDB Cloud instance.
Please follow [this link](https://docs.influxdata.com/influxdb/cloud/security/tokens/create-token/#create-a-token-in-the-influxdb-ui) to generate token for accessing InfluxDB API.
## Usage
In order to make use of this handler and connect to the InfluxDB database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE influxdb_source
WITH
ENGINE = 'influxdb',
PARAMETERS = {
"influxdb_url": "",
"influxdb_token": "",
"influxdb_table_name": ""
};
```
You can use this established connection to query your table as follows.
```sql theme={null}
SELECT name, time, sensor_id, temperature
FROM influxdb_source.tables
ORDER BY temperature DESC
LIMIT 65;
```
# MariaDB
Source: https://docs.mindsdb.com/integrations/data-integrations/mariadb
This documentation describes the integration of MindsDB with [MariaDB](https://mariadb.org/), one of the most popular open source relational databases.
The integration allows MindsDB to access data from MariaDB and enhance MariaDB with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. To connect MariaDB to MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to MariaDB from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/mariadb_handler) as an engine.
```sql theme={null}
CREATE DATABASE mariadb_conn
WITH ENGINE = 'mariadb',
PARAMETERS = {
"host": "host-name",
"port": 3307,
"database": "db-name",
"user": "user-name",
"password": "password"
};
```
Or:
```sql theme={null}
CREATE DATABASE mariadb_conn
WITH
ENGINE = 'mariadb',
PARAMETERS = {
"url": "mariadb://user-name@host-name:3307"
};
```
Required connection parameters include the following:
* `user`: The username for the MariaDB database.
* `password`: The password for the MariaDB database.
* `host`: The hostname, IP address, or URL of the MariaDB server.
* `port`: The port number for connecting to the MariaDB server.
* `database`: The name of the MariaDB database to connect to.
Or:
* `url`: You can specify a connection to MariaDB Server using a URI-like string, as an alternative connection option. You can also use `mysql://` as the protocol prefix
Optional connection parameters include the following:
* `ssl`: Boolean parameter that indicates whether SSL encryption is enabled for the connection. Set to True to enable SSL and enhance connection security, or set to False to use the default non-encrypted connection.
* `ssl_ca`: Specifies the path to the Certificate Authority (CA) file in PEM format.
* `ssl_cert`: Specifies the path to the SSL certificate file. This certificate should be signed by a trusted CA specified in the `ssl_ca` file or be a self-signed certificate trusted by the server.
* `ssl_key`: Specifies the path to the private key file (in PEM format).
* `use_pure` (`True` by default): Whether to use pure Python or C Extension. If `use_pure=False` and the C Extension is not available, then Connector/Python will automatically fall back to the pure Python implementation.
## Usage
The following usage examples utilize the connection to MariaDB made via the `CREATE DATABASE` statement and named `mariadb_conn`.
Retrieve data from a specified table by providing the integration and table name.
```sql theme={null}
SELECT *
FROM mariadb_conn.table_name
LIMIT 10;
```
## Troubleshooting
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the MariaDB database.
* **Checklist**:
1. Ensure that the MariaDB server is running and accessible
2. Confirm that host, port, user, and password are correct. Try a direct MySQL connection.
3. Test the network connection between the MindsDB host and the MariaDB server.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing spaces, reserved words or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel data
* Incorrect: SELECT \* FROM integration.'travel data'
* Correct: SELECT \* FROM integration.\`travel data\`
# MatrixOne
Source: https://docs.mindsdb.com/integrations/data-integrations/matrixone
This is the implementation of the MatrixOne data handler for MindsDB.
[MatrixOne](https://github.com/matrixorigin/matrixone) is a future-oriented hyper-converged cloud and edge native DBMS that supports transactional, analytical, and streaming workloads with a simplified and distributed database engine, across multiple data centers, clouds, edges, and other heterogeneous infrastructures.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect MatrixOne to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to MatrixOne.
## Implementation
This handler is implemented using `PyMySQL`, a Python library that allows you to use Python code to run SQL commands on the MatrixOne database.
The required arguments to establish a connection are as follows:
* `user` is the username associated with the database.
* `password` is the password to authenticate your access.
* `host` is the hostname or IP address of the database.
* `port` is the port through which TCP/IP connection is to be made.
* `database` is the database name to be connected.
There are several optional arguments that can be used as well.
* `ssl` indicates whether SSL is enabled (`True`) or disabled (`False`).
* `ssl_ca` is the SSL Certificate Authority.
* `ssl_cert` stores the SSL certificates.
* `ssl_key` stores the SSL keys.
## Usage
In order to make use of this handler and connect to the MatrixOne database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE matrixone_datasource
WITH
engine = 'matrixone',
parameters = {
"user": "dump",
"password": "111",
"host": "127.0.0.1",
"port": 6001,
"database": "mo_catalog"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM Matrixone_datasource.demo;
```
# Microsoft Access
Source: https://docs.mindsdb.com/integrations/data-integrations/microsoft-access
This is the implementation of the Microsoft Access data handler for MindsDB.
[Microsoft Access](https://www.microsoft.com/en-us/microsoft-365/access) is a pseudo-relational database engine from Microsoft. It is part of the Microsoft Office suite of applications that also includes Word, Outlook, and Excel, among others. Access is also available for purchase as a stand-alone product. It uses the Jet Database Engine for data storage.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Microsoft Access to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Microsoft Access.
## Implementation
This handler is implemented using `pyodbc`, the Python ODBC bridge.
The only required argument to establish a connection is `db_file` that points to a database file to be queried.
## Usage
In order to make use of this handler and connect to the Access database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE access_datasource
WITH
engine = 'access',
parameters = {
"db_file":"C:\\Users\\minurap\\Documents\\example_db.accdb"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM access_datasource.example_tbl;
```
# Microsoft SQL Server
Source: https://docs.mindsdb.com/integrations/data-integrations/microsoft-sql-server
This documentation describes the integration of MindsDB with Microsoft SQL Server, a relational database management system developed by Microsoft.
The integration allows for advanced SQL functionalities, extending Microsoft SQL Server's capabilities with MindsDB's features.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB [locally via Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or use [MindsDB Cloud](https://cloud.mindsdb.com/).
2. To connect Microsoft SQL Server to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
### Installation
The MSSQL handler supports two connection methods:
#### Option 1: Standard Connection (pymssql - Recommended)
```bash theme={null}
pip install mindsdb[mssql]
```
This installs `pymssql`, which provides native FreeTDS-based connections. Works on all platforms.
#### Option 2: ODBC Connection (pyodbc)
```bash theme={null}
pip install mindsdb[mssql-odbc]
```
This installs both `pymssql` and `pyodbc` for ODBC driver support.
**Additional requirements for ODBC:**
* **System ODBC libraries**: On Linux, install `unixodbc` and `unixodbc-dev`
```bash theme={null}
sudo apt-get install unixodbc unixodbc-dev
```
* **Microsoft ODBC Driver for SQL Server**:
* **Linux**:
```bash theme={null}
# Add Microsoft repository
curl https://packages.microsoft.com/keys/microsoft.asc | sudo tee /etc/apt/trusted.gpg.d/microsoft.asc
curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
# Install ODBC Driver 18
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18
```
* **macOS**: `brew install msodbcsql18`
* **Windows**: Download from [Microsoft](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server)
To verify installed drivers:
```bash theme={null}
odbcinst -q -d
```
## Connection
Establish a connection to your Microsoft SQL Server database from MindsDB by executing the following SQL command:
```sql theme={null}
CREATE DATABASE mssql_datasource
WITH ENGINE = 'mssql',
PARAMETERS = {
"host": "127.0.0.1",
"port": 1433,
"user": "sa",
"password": "password",
"database": "master"
};
```
Required connection parameters include the following:
* `user`: The username for the Microsoft SQL Server.
* `password`: The password for the Microsoft SQL Server.
* `host` The hostname, IP address, or URL of the Microsoft SQL Server.
* `database` The name of the Microsoft SQL Server database to connect to.
Optional connection parameters include the following:
* `port`: The port number for connecting to the Microsoft SQL Server. Default is 1433.
* `server`: The server name to connect to. Typically only used with named instances or Azure SQL Database.
### ODBC Connection
The handler also supports ODBC connections via `pyodbc` for advanced scenarios like Windows Authentication or specific driver requirements.
#### Setup
1. Install: `pip install mindsdb[mssql-odbc]`
2. Install system ODBC driver (see Installation section above)
Basic ODBC Connection:
```sql theme={null}
CREATE DATABASE mssql_odbc_datasource
WITH ENGINE = 'mssql',
PARAMETERS = {
"host": "127.0.0.1",
"port": 1433,
"user": "sa",
"password": "password",
"database": "master",
"driver": "ODBC Driver 18 for SQL Server" -- Specifying driver enables ODBC
};
```
ODBC-specific Parameters:
* `driver`: The ODBC driver name (e.g., "ODBC Driver 18 for SQL Server"). When specified, enables ODBC mode.
* `use_odbc`: Set to `true` to explicitly use ODBC. Optional if `driver` is specified. If this is true default driver is set as `ODBC Driver 17 for SQL Server`.
* `encrypt`: Connection encryption: `"yes"` or `"no"`. Driver 18 defaults to `"yes"`.
* `trust_server_certificate`: Whether to trust self-signed certificates: `"yes"` or `"no"`.
* `connection_string_args`: Additional connection string arguments.
#### Example: Azure SQL Database with Encryption:
```sql theme={null}
CREATE DATABASE azure_sql_datasource
WITH ENGINE = 'mssql',
PARAMETERS = {
"host": "myserver.database.windows.net",
"port": 1433,
"user": "adminuser",
"password": "SecurePass123!",
"database": "mydb",
"driver": "ODBC Driver 18 for SQL Server",
"encrypt": "yes",
"trust_server_certificate": "no"
};
```
#### Example: Local Development (Self-Signed Certificate):
```sql theme={null}
CREATE DATABASE local_mssql
WITH ENGINE = 'mssql',
PARAMETERS = {
"host": "localhost",
"port": 1433,
"user": "sa",
"password": "YourStrong@Passw0rd",
"database": "testdb",
"driver": "ODBC Driver 18 for SQL Server",
"encrypt": "yes",
"trust_server_certificate": "yes" -- Allow self-signed certs
};
```
## Usage
Retrieve data from a specified table by providing the integration name, schema, and table name:
```sql theme={null}
SELECT *
FROM mssql_datasource.schema_name.table_name
LIMIT 10;
```
Run T-SQL queries directly on the connected Microsoft SQL Server database:
```sql theme={null}
SELECT * FROM mssql_datasource (
--Native Query Goes Here
SELECT
SUM(orderqty) total
FROM Product p JOIN SalesOrderDetail sd ON p.productid = sd.productid
JOIN SalesOrderHeader sh ON sd.salesorderid = sh.salesorderid
JOIN Customer c ON sh.customerid = c.customerid
WHERE (Name = 'Racing Socks, L') AND (companyname = 'Riding Cycles');
);
```
The above examples utilize `mssql_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
### Performance Optimization for Large Datasets
The handler is optimized for efficient data processing, but for very large result sets (millions of rows):
1. **Use SQL Server's filtering**: Apply `WHERE` clauses to filter data on the server side
2. **Use pagination**: Use `TOP`/`OFFSET-FETCH` in SQL Server or `LIMIT` in MindsDB queries
3. **Aggregate when possible**: Use `GROUP BY`, `COUNT()`, `AVG()`, etc. to reduce data volume
4. **Index your tables**: Ensure proper indexes on SQL Server for query performance
**Example - Paginated Query:**
```sql theme={null}
SELECT * FROM mssql_datasource (
SELECT TOP 100000 *
FROM large_table
ORDER BY id
OFFSET 0 ROWS
);
```
## Troubleshooting Guide
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the Microsoft SQL Server database.
* **Checklist**:
1. Make sure the Microsoft SQL Server is active.
2. Confirm that host, port, user, and password are correct. Try a direct Microsoft SQL Server connection using a client like SQL Server Management Studio or DBeaver.
3. Ensure a stable network between MindsDB and Microsoft SQL Server.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing spaces or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel data
* Incorrect: SELECT \* FROM integration.'travel data'
* Correct: SELECT \* FROM integration.\`travel data\`
`ODBC Driver Connection Error`
* **Symptoms**: Errors like "Driver not found", "Can't open lib 'ODBC Driver 17 for SQL Server'", or "pyodbc is not installed".
* **Checklist**:
1. **Verify pyodbc is installed**: `pip list | grep pyodbc`
2. **Check system ODBC libraries**: `ldconfig -p | grep odbc` (Linux) should show libodbc.so
3. **Verify ODBC drivers**: Run `odbcinst -q -d` to list installed drivers
4. **Match driver name exactly**: Use the exact name from `odbcinst -q -d` (case-sensitive)
5. **For Driver 18 encryption errors**: Add `"encrypt": "yes", "trust_server_certificate": "yes"` for local/dev servers
6. **Test connection manually**:
```python theme={null}
import pyodbc
print(pyodbc.drivers()) # Should list available drivers
```
# MonetDB
Source: https://docs.mindsdb.com/integrations/data-integrations/monetdb
This is the implementation of the MonetDB data handler for MindsDB.
[MonetDB](https://www.monetdb.org/) is an open-source column-oriented relational database management system originally developed at the Centrum Wiskunde & Informatica in the Netherlands. It is designed to provide high performance on complex queries against large databases, such as combining tables with hundreds of columns and millions of rows.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect MonetDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to MonetDB.
## Implementation
This handler is implemented using `pymonetdb`, a Python library that allows you to use Python code to run SQL commands on the MonetDB database.
The required arguments to establish a connection are as follows:
* `user` is the username associated with the database.
* `password` is the password to authenticate your access.
* `host` is the host name or IP address.
* `port` is the port through which TCP/IP connection is to be made.
* `database` is the database name to be connected.
* `schema_name` is the schema name to get tables. It is optional and defaults to the current schema if not provided.
## Usage
In order to make use of this handler and connect to the MonetDB database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE monetdb_datasource
WITH
engine = 'monetdb',
parameters = {
"user": "monetdb",
"password": "monetdb",
"host": "127.0.0.1",
"port": 50000,
"schema_name": "sys",
"database": "demo"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM monetdb_datasource.demo;
```
# MySQL
Source: https://docs.mindsdb.com/integrations/data-integrations/mysql
This documentation describes the integration of MindsDB with [MySQL](https://www.mysql.com/), a fast, reliable, and scalable open-source database.
The integration allows MindsDB to access data from MySQL and enhance MySQL with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. To connect MySQL to MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to MySQL from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/mysql_handler) as an engine.
```sql theme={null}
CREATE DATABASE mysql_conn
WITH ENGINE = 'mysql',
PARAMETERS = {
"host": "host-name",
"port": 3306,
"database": "db-name",
"user": "user-name",
"password": "password"
};
```
Or:
```sql theme={null}
CREATE DATABASE mysql_datasource
WITH
ENGINE = 'mysql',
PARAMETERS = {
"url": "mysql://user-name@host-name:3306"
};
```
Required connection parameters include the following:
* `user`: The username for the MySQL database.
* `password`: The password for the MySQL database.
* `host`: The hostname, IP address, or URL of the MySQL server.
* `port`: The port number for connecting to the MySQL server.
* `database`: The name of the MySQL database to connect to.
Or:
* `url`: You can specify a connection to MySQL Server using a URI-like string, as an alternative connection option.
Optional connection parameters include the following:
* `ssl`: Boolean parameter that indicates whether SSL encryption is enabled for the connection. Set to True to enable SSL and enhance connection security, or set to False to use the default non-encrypted connection.
* `ssl_ca`: Specifies the path to the Certificate Authority (CA) file in PEM format.
* `ssl_cert`: Specifies the path to the SSL certificate file. This certificate should be signed by a trusted CA specified in the `ssl_ca` file or be a self-signed certificate trusted by the server.
* `ssl_key`: Specifies the path to the private key file (in PEM format).
* `use_pure` (`True` by default): Whether to use pure Python or C Extension. If `use_pure=False` and the C Extension is not available, then Connector/Python will automatically fall back to the pure Python implementation.
## Usage
The following usage examples utilize the connection to MySQL made via the `CREATE DATABASE` statement and named `mysql_conn`.
Retrieve data from a specified table by providing the integration and table name.
```sql theme={null}
SELECT *
FROM mysql_conn.table_name
LIMIT 10;
```
**Next Steps**
Follow [this tutorial](https://docs.mindsdb.com/use-cases/data_enrichment/text-summarization-inside-mysql-with-openai) to see more use case examples.
## Troubleshooting
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the MySQL database.
* **Checklist**:
1. Ensure that the MySQL server is running and accessible
2. Confirm that host, port, user, and password are correct. Try a direct MySQL connection.
3. Test the network connection between the MindsDB host and the MySQL server.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing spaces, reserved words or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel data
* Incorrect: SELECT \* FROM integration.'travel data'
* Correct: SELECT \* FROM integration.\`travel data\`
# OceanBase
Source: https://docs.mindsdb.com/integrations/data-integrations/oceanbase
This is the implementation of the OceanBase data handler for MindsDB.
OceanBase is a distributed relational database. It is the only distributed database in the world that has broken both TPC-C and TPC-H records. OceanBase adopts an independently developed integrated architecture, which encompasses both the scalability of a distributed architecture and the performance advantage of a centralized architecture. It supports hybrid transaction/analytical processing (HTAP) with one engine. Its features include strong data consistency, high availability, high performance, online scalability, high compatibility with SQL and mainstream relational databases, transparency to applications, and a high cost/performance ratio.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect OceanBase to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to OceanBase.
## Implementation
This handler is implemented by extending the MySQL data handler.
The required arguments to establish a connection are as follows:
* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.
## Usage
In order to make use of this handler and connect to the OceanBase server in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE oceanbase_datasource
WITH
ENGINE = 'oceanbase',
PARAMETERS = {
"host": "127.0.0.1",
"user": "oceanbase_user",
"password": "password",
"port": 2881,
"database": "oceanbase_db"
};
```
Now, you can use this established connection to query your database as follows:
```sql theme={null}
SELECT *
FROM oceanbase_datasource.demo_table
LIMIT 10;
```
# OpenGauss
Source: https://docs.mindsdb.com/integrations/data-integrations/opengauss
This is the implementation of the OpenGauss data handler for MindsDB.
[OpenGauss](https://opengauss.org/en/) is an open-source relational database management system released with the Mulan PSL v2 and the kernel built on Huawei's years of experience in the database field. It continuously provides competitive features tailored to enterprise-grade scenarios.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect OpenGauss to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to OpenGauss.
## Implementation
This handler is implemented by extending the PostgreSQL data handler.
The required arguments to establish a connection are as follows:
* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.
## Usage
In order to make use of this handler and connect to the OpenGauss database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE opengauss_datasource
WITH
ENGINE = 'opengauss',
PARAMETERS = {
"host": "127.0.0.1",
"port": 5432,
"database": "opengauss",
"user": "mindsdb",
"password": "password"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM opengauss_datasource.demo_table
LIMIT 10;
```
# Oracle
Source: https://docs.mindsdb.com/integrations/data-integrations/oracle
This documentation describes the integration of MindsDB with [Oracle](https://www.techopedia.com/definition/8711/oracle-database), one of the most trusted and widely used relational database engines for storing, organizing and retrieving data by type while still maintaining relationships between the various types.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Oracle to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to your Oracle database from MindsDB by executing the following SQL command:
```sql theme={null}
CREATE DATABASE oracle_datasource
WITH
ENGINE = 'oracle',
PARAMETERS = {
"host": "localhost",
"service_name": "FREEPDB1",
"user": "SYSTEM",
"password": "password"
};
```
Required connection parameters include the following:
* `user`: The username for the Oracle database.
* `password`: The password for the Oracle database.
* `dsn`: The data source name (DSN) for the Oracle database.
OR
* `host`: The hostname, IP address, or URL of the Oracle server.
AND
* `sid`: The system identifier (SID) of the Oracle database.
OR
* `service_name`: The service name of the Oracle database.
Optional connection parameters include the following:
* `port`: The port number for connecting to the Oracle database. Default is 1521.
* `disable_oob`: The boolean parameter to disable out-of-band breaks. Default is `false`.
* `auth_mode`: The authorization mode to use.
* `thick_mode`: Set to `true` to use thick mode for the connection. Thin mode is used by default.
* `oracle_client_lib_dir`: The directory path where Oracle Client libraries are located. Required if `thick_mode` is set to `true`.
## Usage
Retrieve data from a specified table by providing the integration name, schema, and table name:
```sql theme={null}
SELECT *
FROM oracle_datasource.schema_name.table_name
LIMIT 10;
```
Run PL/SQL queries directly on the connected Oracle database:
```sql theme={null}
SELECT * FROM oracle_datasource (
--Native Query Goes Here
SELECT employee_id, first_name, last_name, email, hire_date
FROM oracle_datasource.hr.employees
WHERE department_id = 10
ORDER BY hire_date DESC;
);
```
The above examples utilize `oracle_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
## Troubleshooting Guide
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the Oracle database.
* **Checklist**:
1. Make sure the Oracle database is active.
2. Confirm that the connection parameters provided (DSN, host, SID, service\_name) and the credentials (user, password) are correct.
3. Ensure a stable network between MindsDB and Oracle.
* **Symptoms**: Connection timeout errors.
* **Checklist**:
1. Verify that the Oracle database is reachable from the MindsDB server.
2. Check for any firewall or network restrictions that might be causing delays.
* **Symptoms**: Can't connect to db: Failed to initialize Oracle client: DPI-1047: Cannot locate a 64-bit Oracle Client library:
* **Checklist**:
1. Ensure that the Oracle Client libraries are installed on the MindsDB server.
2. Verify that the `oracle_client_lib_dir` parameter is set correctly in the connection configuration.
3. Check that the installed Oracle Client libraries match the architecture (64-bit) of the MindsDB server.
This [troubleshooting guide](https://docs.oracle.com/en/database/oracle/oracle-database/19/ntqrf/database-connection-issues.html) provided by Oracle might also be helpful.
# OrioleDB
Source: https://docs.mindsdb.com/integrations/data-integrations/orioledb
This is the implementation of the OrioleDB data handler for MindsDB.
[OrioleDB](https://www.orioledata.com/) is a new storage engine for PostgreSQL, bringing a modern approach to database capacity, capabilities, and performance to the world's most-loved database platform. It consists of an extension, building on the innovative table access method framework and other standard Postgres extension interfaces. By extending and enhancing the current table access methods, OrioleDB opens the door to a future of more powerful storage models that are optimized for cloud and modern hardware architectures.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect OrioleDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to OrioleDB.
## Implementation
This handler is implemented by extending the PostgreSQL data handler.
The required arguments to establish a connection are as follows:
* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `server` is the OrioleDB server.
* `database` is the database name.
## Usage
In order to make use of this handler and connect to the OrioleDB server in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE orioledb_datasource
WITH ENGINE = 'orioledb',
PARAMETERS = {
"user": "orioledb_user",
"password": "password",
"host": "127.0.0.1",
"port": 55505,
"server": "server_name",
"database": "oriole_db"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM orioledb_data.demo_table
LIMIT 10;
```
# PlanetScale
Source: https://docs.mindsdb.com/integrations/data-integrations/planetscale
This is the implementation of the PlanetScale data handler for MindsDB.
[PlanetScale](https://planetscale.com/) is a MySQL-compatible, serverless database platform.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect PlanetScale to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to PlanetScale.
## Implementation
This handler is implemented by extending the MySQL data handler.
The required arguments to establish a connection are as follows:
* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.
## Usage
In order to make use of this handler and connect to the PlanetScale database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE planetscale_datasource
WITH
ENGINE = 'planet_scale',
PARAMETERS = {
"host": "127.0.0.1",
"port": 3306,
"user": "planetscale_user",
"password": "password",
"database": "planetscale_db"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM planetscale_datasource.my_table;
```
# PostgreSQL
Source: https://docs.mindsdb.com/integrations/data-integrations/postgresql
This documentation describes the integration of MindsDB with [PostgreSQL](https://www.postgresql.org/), a powerful, open-source, object-relational database system.
The integration allows MindsDB to access data stored in the PostgreSQL database and enhance PostgreSQL with AI capabilities.
This data source integration is thread-safe, utilizing a connection pool where each thread is assigned its own connection. When handling requests in parallel, threads retrieve connections from the pool as needed.
### Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. To connect PostgreSQL to MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to your PostgreSQL database from MindsDB by executing the following SQL command:
```sql theme={null}
CREATE DATABASE postgresql_conn
WITH ENGINE = 'postgres',
PARAMETERS = {
"host": "127.0.0.1",
"port": 5432,
"database": "postgres",
"user": "postgres",
"schema": "data",
"password": "password"
};
```
Required connection parameters include the following:
* `user`: The username for the PostgreSQL database.
* `password`: The password for the PostgreSQL database.
* `host`: The hostname, IP address, or URL of the PostgreSQL server.
* `port`: The port number for connecting to the PostgreSQL server.
* `database`: The name of the PostgreSQL database to connect to.
Optional connection parameters include the following:
* `schema`: The database schema to use. Default is public.
* `sslmode`: The SSL mode for the connection.
* `connection_parameters`: allows passing any PostgreSQL libpq parameters, such as:
* SSL settings: sslrootcert, sslcert, sslkey, sslcrl, sslpassword
* Network and reliability options: connect\_timeout, keepalives, keepalives\_idle, keepalives\_interval, keepalives\_count
* Session options: application\_name, options, client\_encoding
* Any other libpq-supported parameter
## Usage
The following usage examples utilize the connection to PostgreSQL made via the `CREATE DATABASE` statement and named `postgresql_conn`.
Retrieve data from a specified table by providing the integration name, schema, and table name:
```sql theme={null}
SELECT *
FROM postgresql_conn.table_name
LIMIT 10;
```
Run PostgreSQL-native queries directly on the connected PostgreSQL database:
```sql theme={null}
SELECT * FROM postgresql_conn (
--Native Query Goes Here
SELECT
model,
COUNT(*) OVER (PARTITION BY model, year) AS units_to_sell,
ROUND((CAST(tax AS decimal) / price), 3) AS tax_div_price
FROM used_car_price
);
```
**Next Steps**
Follow [this tutorial](https://docs.mindsdb.com/use-cases/predictive_analytics/house-sales-forecasting) to see more use case examples.
## Troubleshooting
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the PostgreSQL database.
* **Checklist**:
1. Make sure the PostgreSQL server is active.
2. Confirm that host, port, user, schema, and password are correct. Try a direct PostgreSQL connection.
3. Ensure a stable network between MindsDB and PostgreSQL.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing spaces or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel data
* Incorrect: SELECT \* FROM integration.'travel data'
* Correct: SELECT \* FROM integration.\`travel data\`
# QuestDB
Source: https://docs.mindsdb.com/integrations/data-integrations/questdb
This is the implementation of the QuestDB data handler for MindsDB.
[QuestDB](https://questdb.io/) is a columnar time-series database with high performance ingestion and SQL analytics. It is open-source and available on the cloud.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect QuestDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to QuestDB.
## Implementation
This handler is implemented by extending the PostgreSQL data handler.
The required arguments to establish a connection are as follows:
* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.
* `public` stores a value of `True` or `False`. Defaults to `True` if left blank.
## Usage
In order to make use of this handler and connect to the QuestDB server in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE questdb_datasource
WITH
ENGINE = 'questdb',
PARAMETERS = {
"host": "127.0.0.1",
"port": 8812,
"database": "qdb",
"user": "admin",
"password": "password"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM questdb_datasource.demo_table
LIMIT 10;
```
# SAP HANA
Source: https://docs.mindsdb.com/integrations/data-integrations/sap-hana
This documentation describes the integration of MindsDB with [SAP HANA](https://www.sap.com/products/technology-platform/hana/what-is-sap-hana.html), a multi-model database with a column-oriented in-memory design that stores data in its memory instead of keeping it on a disk.
The integration allows MindsDB to access data from SAP HANA and enhance SAP HANA with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. To connect SAP HANA to MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to SAP HANA from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/hana_handler) as an engine.
```sql theme={null}
CREATE DATABASE sap_hana_datasource
WITH
ENGINE = 'hana',
PARAMETERS = {
"address": "123e4567-e89b-12d3-a456-426614174000.hana.trial-us10.hanacloud.ondemand.com",
"port": "443",
"user": "demo_user",
"password": "demo_password",
"encrypt": true
};
```
Required connection parameters include the following:
* `address`: The hostname, IP address, or URL of the SAP HANA database.
* `port`: The port number for connecting to the SAP HANA database.
* `user`: The username for the SAP HANA database.
* `password`: The password for the SAP HANA database.
Optional connection parameters include the following:
* 'database': The name of the database to connect to. This parameter is not used for SAP HANA Cloud.
* `schema`: The database schema to use. Defaults to the user's default schema.
* `encrypt`: The setting to enable or disable encryption. Defaults to \`True'
## Usage
Retrieve data from a specified table by providing the integration, schema and table names:
```sql theme={null}
SELECT *
FROM sap_hana_datasource.schema_name.table_name
LIMIT 10;
```
Run Teradata SQL queries directly on the connected Teradata database:
```sql theme={null}
SELECT * FROM sap_hana_datasource (
--Native Query Goes Here
SELECT customer, year, SUM(sales)
FROM t1
GROUP BY ROLLUP(customer, year);
SELECT customer, year, SUM(sales)
FROM t1
GROUP BY GROUPING SETS
(
(customer, year),
(customer)
)
UNION ALL
SELECT NULL, NULL, SUM(sales)
FROM t1;
);
```
The above examples utilize `sap_hana_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
## Troubleshooting
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the SAP HANA database.
* **Checklist**:
1. Make sure the SAP HANA database is active.
2. Confirm that address, port, user and password are correct. Try a direct connection using a client like DBeaver.
3. Ensure a stable network between MindsDB and SAP HANA.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing spaces or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel-data
* Incorrect: SELECT \* FROM integration.'travel-data'
* Correct: SELECT \* FROM integration.\`travel-data\`
# SAP SQL Anywhere
Source: https://docs.mindsdb.com/integrations/data-integrations/sap-sql-anywhere
This is the implementation of the SAP SQL Anywhere data handler for MindsDB.
[SAP SQL Anywhere](https://www.sap.com/products/technology-platform/sql-anywhere.html) is an embedded database for application software that enables secure and reliable data management for servers where no DBA is available and synchronization for tens of thousands of mobile devices, Internet of Things (IoT) systems, and remote environments.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect SAP SQL Anywhere to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to SAP SQL Anywhere.
## Implementation
This handler is implemented using `sqlanydb`, the Python driver for SAP SQL Anywhere.
The required arguments to establish a connection are as follows:
* `host` is the host name or IP address of the SAP SQL Anywhere instance.
* `port` is the port number of the SAP SQL Anywhere instance.
* `user` specifies the user name.
* `password` specifies the password for the user.
* `database` sets the current database.
* `server` sets the current server.
## Usage
You can use the below SQL statements to create a table in SAP SQL Anywhere called `TEST`.
```sql theme={null}
CREATE TABLE TEST
(
ID INTEGER NOT NULL,
NAME NVARCHAR(1),
DESCRIPTION NVARCHAR(1)
);
CREATE UNIQUE INDEX TEST_ID_INDEX
ON TEST (ID);
ALTER TABLE TEST
ADD CONSTRAINT TEST_PK
PRIMARY KEY (ID);
INSERT INTO TEST
VALUES (1, 'h', 'w');
```
In order to make use of this handler and connect to the SAP SQL Anywhere database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE sap_sqlany_trial
WITH
ENGINE = 'sqlany',
PARAMETERS = {
"user": "DBADMIN",
"password": "password",
"host": "localhost",
"port": "55505",
"server": "TestMe",
"database": "MINDSDB"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM sap_sqlany_trial.test;
```
On execution, we get:
| ID | NAME | DESCRIPTION |
| -- | ---- | ----------- |
| 1 | h | w |
# ScyllaDB
Source: https://docs.mindsdb.com/integrations/data-integrations/scylladb
This is the implementation of the ScyllaDB data handler for MindsDB.
[ScyllaDB](https://www.scylladb.com/) is an open-source distributed NoSQL wide-column data store. It was purposefully designed to offer compatibility with Apache Cassandra while outperforming it with higher throughputs and reduced latencies. For a comprehensive understanding of ScyllaDB, visit ScyllaDB's official website.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect ScyllaDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to ScyllaDB.
### Implementation
The ScyllaDB handler for MindsDB was developed using the scylla-driver library for Python.
The required arguments to establish a connection are as follows:
* `host`: Host name or IP address of ScyllaDB.
* `port`: Connection port.
* `user`: Authentication username. Optional; required only if authentication is enabled.
* `password`: Authentication password. Optional; required only if authentication is enabled.
* `keyspace`: The specific keyspace (top-level container for tables) to connect to.
* `protocol_version`: Optional. Defaults to 4.
* `secure_connect_bundle`: Optional. Needed only for connections to DataStax Astra.
## Usage
To set up a connection between MindsDB and a Scylla server, utilize the following SQL syntax:
```sql theme={null}
CREATE DATABASE scylladb_datasource
WITH
ENGINE = 'scylladb',
PARAMETERS = {
"user": "user@mindsdb.com",
"password": "pass",
"host": "127.0.0.1",
"port": "9042",
"keyspace": "test_data"
};
```
The protocol version is set to 4 by default. Should you wish to modify it,
simply include "protocol\_version": 5 within the PARAMETERS dictionary in the
query above.
With the connection established, you can execute queries on your keyspace as demonstrated below:
```sql theme={null}
SELECT * FROM scylladb_datasource.keystore.example_table LIMIT 10;
```
# SingleStore
Source: https://docs.mindsdb.com/integrations/data-integrations/singlestore
This is the implementation of the SingleStore data handler for MindsDB.
[SingleStore](https://www.singlestore.com/) is a proprietary, cloud-native database designed for data-intensive applications. A distributed, relational, SQL database management system that features ANSI SQL support. It is known for speed in data ingest, transaction processing, and query processing.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect SingleStore to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to SingleStore.
## Implementation
This handler is implemented by extending the MySQL data handler.
The required arguments to establish a connection are as follows:
* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.
There are several optional arguments that can be used as well.
* `ssl` is the `ssl` parameter value that indicates whether SSL is enabled (`True`) or disabled (`False`).
* `ssl_ca` is the SSL Certificate Authority.
* `ssl_cert` stores SSL certificates.
* `ssl_key` stores SSL keys.
## Usage
In order to make use of this handler and connect to the SingleStore database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE singlestore_datasource
WITH
ENGINE = 'singlestore',
PARAMETERS = {
"host": "127.0.0.1",
"port": 3306,
"database": "singlestore",
"user": "root",
"password": "password"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM singlestore_datasource.example_table;
```
# Snowflake
Source: https://docs.mindsdb.com/integrations/data-integrations/snowflake
This documentation describes the integration of MindsDB with [Snowflake](https://www.snowflake.com/en/), a cloud data warehouse used to store and analyze data.
The integration allows MindsDB to access data stored in the Snowflake database and enhance it with AI capabilities.
**Important!**
When querying data from Snowflake, MindsDB automatically converts column names to lower-case. To prevent this, users can provide an alias name as shown below.
**This update is introduced with the MindsDB version 25.3.4.1. It is not backward-compatible and has the following implications:**
1. Queries to Snowflake will return column names in lower-case from now on.
2. The models created with Snowflake as a data source must be recreated.
**How it works**
The below query presents how Snowflake columns are output when queried from MindsDB.
```sql theme={null}
SELECT
CC_NAME, -- converted to lower-case
CC_CLASS AS `CC_CLASS`, -- provided alias name in upper-case
CC_EMPLOYEES,
cc_employees
FROM snowflake_data.TPCDS_SF100TCL.CALL_CENTER;
```
Here is the output:
```sql theme={null}
+--------------+----------+--------------+--------------+
| cc_name | CC_CLASS | cc_employees | cc_employees |
+--------------+----------+--------------+--------------+
| NY Metro | large | 597159671 | 597159671 |
| Mid Atlantic | medium | 944879074 | 944879074 |
+--------------+----------+--------------+--------------+
```
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Snowflake to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
## Connection
The Snowflake handler supports two authentication methods:
### 1. Password Authentication (Legacy)
Establish a connection using username and password:
```sql theme={null}
CREATE DATABASE snowflake_datasource
WITH
ENGINE = 'snowflake',
PARAMETERS = {
"account": "tvuibdy-vm85921",
"user": "your_username",
"password": "your_password",
"database": "test_db",
"auth_type": "password"
};
```
### 2. Key Pair Authentication (Recommended)
Key pair authentication is more secure and is the recommended method by Snowflake:
```sql theme={null}
CREATE DATABASE snowflake_datasource
WITH
ENGINE = 'snowflake',
PARAMETERS = {
"account": "tvuibdy-vm85921",
"user": "your_username",
"private_key_path": "/path/to/your/private_key.pem",
"database": "test_db",
"auth_type": "key_pair"
};
```
If the private key cannot be accesed from disk (for example when running MindsDB on Cloud), provide the PEM content directly:
```sql theme={null}
CREATE DATABASE snowflake_datasource
WITH
ENGINE = 'snowflake',
PARAMETERS = {
"account": "tvuibdy-vm85921",
"user": "your_username",
"private_key": "-----BEGIN PRIVATE KEY-----\\n...\\n-----END PRIVATE KEY-----",
"database": "test_db",
"auth_type": "key_pair"
};
```
With encrypted private key (passphrase protected):
```sql theme={null}
CREATE DATABASE snowflake_datasource
WITH
ENGINE = 'snowflake',
PARAMETERS = {
"account": "tvuibdy-vm85921",
"user": "your_username",
"private_key_path": "/path/to/your/private_key.pem",
"private_key_passphrase": "your_passphrase",
"database": "test_db",
"auth_type": "key_pair"
};
```
### Connection Parameters
Required parameters:
* `account`: The Snowflake account identifier. This [guide](https://docs.snowflake.com/en/user-guide/admin-account-identifier) will help you find your account identifier.
* `user`: The username for the Snowflake account.
* `database`: The name of the Snowflake database to connect to.
* `auth_type`: The authentication type to use. Options: `"password"` or `"key_pair"`.
Authentication parameters (one method required):
* `password`: The password for the Snowflake account (password authentication).
* `private_key_path`: Path to the private key file for key pair authentication.
* `private_key`: PEM-formatted private key content for key pair authentication.
* `private_key_passphrase`: Optional passphrase for encrypted private key (key pair authentication).
Optional parameters:
* `warehouse`: The Snowflake warehouse to use for running queries.
* `schema`: The database schema to use within the Snowflake database. Default is `PUBLIC`.
* `role`: The Snowflake role to use.
For detailed instructions on setting up key pair authentication, please refer to [AUTHENTICATION.md](AUTHENTICATION.md) or the [Snowflake Key Pair Authentication documentation](https://docs.snowflake.com/en/user-guide/key-pair-auth.html).
## Usage
Retrieve data from a specified table by providing the integration name, schema, and table name:
```sql theme={null}
SELECT *
FROM snowflake_datasource.schema_name.table_name
LIMIT 10;
```
Run Snowflake SQL queries directly on the connected Snowflake database:
```sql theme={null}
SELECT * FROM snowflake_datasource (
--Native Query Goes Here
SELECT
employee_table.* EXCLUDE department_id,
department_table.* RENAME department_name AS department
FROM employee_table INNER JOIN department_table
ON employee_table.department_id = department_table.department_id
ORDER BY department, last_name, first_name;
);
```
The above examples utilize `snowflake_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
## Troubleshooting Guide
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the Snowflake account.
* **Checklist**:
1. Make sure the Snowflake is active.
2. Confirm that account, user, password and database are correct. Try a direct Snowflake connection using a client like DBeaver.
3. Ensure a stable network between MindsDB and Snowflake.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing spaces or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel data
* Incorrect: SELECT \* FROM integration.'travel data'
* Correct: SELECT \* FROM integration.\`travel data\`
This [troubleshooting guide](https://community.snowflake.com/s/article/Snowflake-Client-Connectivity-Troubleshooting) provided by Snowflake might also be helpful.
# SQLite
Source: https://docs.mindsdb.com/integrations/data-integrations/sqlite
This is the implementation of the SQLite data handler for MindsDB.
[SQLite](https://www.sqlite.org/about.html) is an in-process library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. The code for SQLite is in the public domain and is thus free to use for either commercial or private purpose. SQLite is the most widely deployed database in the world with more applications than we can count, including several high-profile projects.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect SQLite to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to SQLite.
## Implementation
This handler is implemented using the standard `sqlite3` library that comes with Python.
The only required argument to establish a connection is `db_file` that points to the database file that the connection is to be made to.
Optionally, this may also be set to `:memory:` to create an in-memory database.
## Usage
In order to make use of this handler and connect to the SQLite database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE sqlite_datasource
WITH
engine = 'sqlite',
parameters = {
"db_file": "example.db"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM sqlite_datasource.example_tbl;
```
# StarRocks
Source: https://docs.mindsdb.com/integrations/data-integrations/starrocks
This is the implementation of the StarRocks data handler for MindsDB.
[StarRocks](https://www.starrocks.io/) is the next-generation data platform designed to make data-intensive real-time analytics fast and easy. It delivers query speeds 5 to 10 times faster than other popular solutions. StarRocks can perform real-time analytics well while updating historical records. It can also enhance real-time analytics with historical data from data lakes easily. With StarRocks, you can get rid of the de-normalized tables and get the best performance and flexibility.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect StarRocks to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to StarRocks.
## Implementation
This handler is implemented by extending the MySQL data handler.
The required arguments to establish a connection are as follows:
* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.
## Usage
In order to make use of this handler and connect to the StarRocks server in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE starrocks_datasource
WITH
ENGINE = 'starrocks',
PARAMETERS = {
"host": "127.0.0.1",
"user": "starrocks_user",
"password": "password",
"port": 8030,
"database": "starrocks_db"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM starrocks_datasource.demo_table
LIMIT 10;
```
# Supabase
Source: https://docs.mindsdb.com/integrations/data-integrations/supabase
This is the implementation of the Supabase data handler for MindsDB.
[Supabase](https://supabase.com/) is an open-source Firebase alternative.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Supabase to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Supabase.
## Implementation
This handler is implemented by extending the PostgreSQL data handler.
The required arguments to establish a connection are as follows:
* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.
## Usage
In order to make use of this handler and connect to the Supabase server in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE supabase_datasource
WITH ENGINE = 'supabase',
PARAMETERS = {
"host": "127.0.0.1",
"port": 54321,
"database": "test",
"user": "supabase",
"password": "password"
};
```
You can use this established connection to query your database as follows:
```sql theme={null}
SELECT *
FROM supabase_datasource.public.rentals
LIMIT 10;
```
# SurrealDB
Source: https://docs.mindsdb.com/integrations/data-integrations/surrealdb
This is the implementation of the SurrealDB data handler for MindsDB.
[SurrealDB](https://surrealdb.com/) is an innovative NewSQL cloud database, suitable for serverless applications, jamstack applications, single-page applications, and traditional applications.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect SurrealDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to SurrealDB.
## Implementation
This handler was implemented by using the python library `pysurrealdb`.
The required arguments to establish a connection are:
* `host`: the host name of the Surrealdb connection
* `port`: the port to use when connecting
* `user`: the user to authenticate
* `password`: the password to authenticate the user
* `database`: database name to be connected
* `namespace`: namespace name to be connected
## Usage
To establish a connection with our SurrealDB server which is running locally with the public cloud instance. We are going to use `ngrok tunneling` to connect cloud instance to the local SurrealDB server. You can follow this [guide](https://docs.mindsdb.com/sql/create/database#making-your-local-database-available-to-mindsdb) for that.
Let's make the connection with the MindsDB public cloud
```sql theme={null}
CREATE DATABASE exampledb
WITH ENGINE = 'surrealdb',
PARAMETERS = {
"host": "6.tcp.ngrok.io",
"port": "17141",
"user": "root",
"password": "root",
"database": "testdb",
"namespace": "testns"
};
```
Please change the `host` and `port` properties in the `PARAMETERS` clause based on the values which you got.
We can also query the `dev` table which we created with
```sql theme={null}
SELECT * FROM exampledb.dev;
```
# TDengine
Source: https://docs.mindsdb.com/integrations/data-integrations/tdengine
This is the implementation of the TDEngine data handler for MindsDB.
[TDengine](https://tdengine.com/) is an open source, high-performance, cloud native time-series database optimized for Internet of Things (IoT), Connected Cars, and Industrial IoT. It enables efficient, real-time data ingestion, processing, and monitoring of TB and even PB scale data per day, generated by billions of sensors and data collectors. TDengine differentiates itself from other time-series databases with numerous advantages, such as high performance, simplified solution, cloud-native, ease of use, easy data analytics, and open-source.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect TDengine to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to TDengine.
## Implementation
This handler is implemented using `taos/taosrest`, a Python library that allows you to use Python code to run SQL commands on the TDEngine server.
The required arguments to establish a connection are as follows:
* `user` is the username associated with the server.
* `password` is the password to authenticate your access.
* `url` is the URL to the TDEngine server. For local server, the URL is `localhost:6041` by default.
* `token` is the unique token provided while using TDEngine Cloud.
* `database` is the database name to be connected.
## Usage
In order to make use of this handler and connect to the TDEngine database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE tdengine_datasource
WITH
ENGINE = 'tdengine',
PARAMETERS = {
"user": "tdengine_user",
"password": "password",
"url": "localhost:6041",
"token": "token",
"database": "tdengine_db"
};
```
You can specify `token` instead of `user` and `password` while using TDEngine.
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM tdengine_datasource.demo_table;
```
# Teradata
Source: https://docs.mindsdb.com/integrations/data-integrations/teradata
This documentation describes the integration of MindsDB with [Teradata](https://www.teradata.com/why-teradata), the complete cloud analytics and data platform for Trusted AI.
The integration allows MindsDB to access data from Teradata and enhance Teradata with AI capabilities.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. To connect Teradata to MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to Teradata from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/teradata_handler) as an engine.
```sql theme={null}
CREATE DATABASE teradata_datasource
WITH
ENGINE = 'teradata',
PARAMETERS = {
"host": "192.168.0.41",
"user": "demo_user",
"password": "demo_password",
"database": "example_db"
};
```
Required connection parameters include the following:
* `host`: The hostname, IP address, or URL of the Teradata server.
* `user`: The username for the Teradata database.
* `password`: The password for the Teradata database.
Optional connection parameters include the following:
* `database`: The name of the Teradata database to connect to. Defaults is the user's default database.
## Usage
Retrieve data from a specified table by providing the integration, database and table names:
```sql theme={null}
SELECT *
FROM teradata_datasource.database_name.table_name
LIMIT 10;
```
Run Teradata SQL queries directly on the connected Teradata database:
```sql theme={null}
SELECT * FROM teradata_datasource (
--Native Query Goes Here
SELECT emp_id, emp_name, job_duration AS tsp
FROM employee
EXPAND ON job_duration AS tsp BY INTERVAL '1' YEAR
FOR PERIOD(DATE '2006-01-01', DATE '2008-01-01');
);
```
The above examples utilize `teradata_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.
## Troubleshooting
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the Teradata database.
* **Checklist**:
1. Make sure the Teradata database is active.
2. Confirm that host, user and password are correct. Try a direct connection using a client like DBeaver.
3. Ensure a stable network between MindsDB and Teradata.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing spaces or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel-data
* Incorrect: SELECT \* FROM integration.'travel-data'
* Correct: SELECT \* FROM integration.\`travel-data\`
`Connection Timeout Error`
* **Symptoms**: Connection to the Teradata database times out or queries take too long to execute.
* **Checklist**:
1. Ensure the Teradata server is running and accessible (if the server has been idle for a long time, it may have shut down automatically).
# TiDB
Source: https://docs.mindsdb.com/integrations/data-integrations/tidb
This is the implementation of the TiDB data handler for MindsDB.
[TiDB](https://www.pingcap.com/tidb/) is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing workloads. It is MySQL-compatible and can provide horizontal scalability, strong consistency, and high availability.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect TiDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to TiDB.
## Implementation
This handler is implemented by extending the MySQL data handler.
The required arguments to establish a connection are as follows:
* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.
## Usage
In order to make use of this handler and connect to the TiDB database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE tidb_datasource
WITH
ENGINE = 'tidb',
PARAMETERS = {
"host": "127.0.0.1",
"port": 4000,
"database": "tidb",
"user": "root",
"password": "password"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM tidb_datasource.demo_table;
```
# TimescaleDB
Source: https://docs.mindsdb.com/integrations/data-integrations/timescaledb
This documentation describes the integration of MindsDB with [TimescaleDB](https://docs.timescale.com).
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect TimescaleDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
## Connection
Establish a connection to TimescaleDB from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/timescaledb_handler) as an engine.
```sql theme={null}
CREATE DATABASE timescaledb_datasource
WITH
engine = 'timescaledb',
parameters = {
"host": "examplehost.timescaledb.com",
"port": 5432,
"user": "example_user",
"password": "my_password",
"database": "tsdb"
};
```
Required connection parameters include the following:
* `user`: The username for the TimescaleDB database.
* `password`: The password for the TimescaleDB database.
* `host`: The hostname, IP address, or URL of the TimescaleDB server.
* `port`: The port number for connecting to the TimescaleDB server.
* `database`: The name of the TimescaleDB database to connect to.
Optional connection parameters include the following:
* `schema`: The database schema to use. Default is public.
## Usage
Before attempting to connect to a TimescaleDB server using MindsDB, ensure that it accepts incoming connections using [this guide](https://docs.timescale.com/latest/getting-started/setup/remote-connections/).
The following usage examples utilize the connection to TimescaleDB made via the `CREATE DATABASE` statement and named `timescaledb_datasource`.
Retrieve data from a specified table by providing the integration and table name.
You can use this established connection to query your table as follows,
```sql theme={null}
SELECT *
FROM timescaledb_datasource.sensor;
```
Run PostgreSQL-native queries directly on the connected TimescaleDB database:
```sql theme={null}
SELECT * FROM timescaledb_datasource (
--Native Query Goes Here
SELECT
model,
COUNT(*) OVER (PARTITION BY model, year) AS units_to_sell,
ROUND((CAST(tax AS decimal) / price), 3) AS tax_div_price
FROM used_car_price
);
```
## Troubleshooting
`Database Connection Error`
* **Symptoms**: Failure to connect MindsDB with the TimescaleDB database.
* **Checklist**:
1. Make sure the TimescaleDB server is active.
2. Confirm that host, port, user, schema, and password are correct. Try a direct TimescaleDB connection.
3. Ensure a stable network between MindsDB and TimescaleDB.
`SQL statement cannot be parsed by mindsdb_sql`
* **Symptoms**: SQL queries failing or not recognizing table names containing spaces or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT \* FROM integration.travel data
* Incorrect: SELECT \* FROM integration.'travel data'
* Correct: SELECT \* FROM integration.\`travel data\`
# Trino
Source: https://docs.mindsdb.com/integrations/data-integrations/trino
This is the implementation of the Trino data handler for MindsDB.
[Trino](https://trino.io/) is an open-source distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Trino to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Trino.
## Implementation
This handler is implemented using `pyhive`, a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive.
The required arguments to establish a connection are as follows:
* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
There are some optional arguments as follows:
* `auth` is the authentication method. Currently, only `basic` is supported.
* `http_scheme` takes the value of `http`by default. It can be set to `https` as well.
* `catalog` is the catalog.
* `schema` is the schema name.
* `with` defines default WITH-clause (properties) for ALL tables. This parameter is experimental and might be changed or removed in future release.
## Usage
In order to make use of this handler and connect to the Trino database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE trino_datasource
WITH
ENGINE = 'trino',
PARAMETERS = {
"host": "127.0.0.1",
"port": 443,
"auth": "basic",
"http_scheme": "https",
"user": "trino",
"password": "password",
"catalog": "default",
"schema": "test",
"with": "with (transactional = true)"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM trino_datasource.demo_table;
```
# Vertica
Source: https://docs.mindsdb.com/integrations/data-integrations/vertica
This is the implementation of the Vertica data handler for MindsDB.
The column-oriented [Vertica Analytics Platform](https://www.vertica.com/overview/) was designed to manage large, fast-growing volumes of data and with fast query performance for data warehouses and other query-intensive applications. The product claims to greatly improve query performance over traditional relational database systems, and to provide high availability and exabyte scalability on commodity enterprise servers. Vertica runs on multiple cloud computing systems as well as on Hadoop nodes. Vertica's Eon Mode separates compute from storage, using S3 object storage and dynamic allocation of compute notes.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Vertica to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Vertica.
## Implementation
This handler is implemented using `vertica-python`, a Python library that allows you to use Python code to run SQL commands on the Vertica database.
The required arguments to establish a connection are as follows:
* `user` is the username asscociated with the database.
* `password` is the password to authenticate your access.
* `host` is the host name or IP address of the server.
* `port` is the port through which TCP/IP connection is to be made.
* `database` is the database name to be connected.
* `schema` is the schema name to get tables from.
## Usage
In order to make use of this handler and connect to the Vertica database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE vertica_datasource
WITH
engine = 'vertica',
parameters = {
"user": "dbadmin",
"password": "password",
"host": "127.0.0.1",
"port": 5433,
"schema_name": "public",
"database": "VMart"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM vertica_datasource.TEST;
```
# Vitess
Source: https://docs.mindsdb.com/integrations/data-integrations/vitess
This is the implementation of the Vitess data handler for MindsDB.
[Vitess](https://vitess.io/) is a database solution for deploying, scaling, and managing large clusters of open-source database instances. It currently supports MySQL and Percona Server for MySQL. It's architected to run as effectively in a public or private cloud architecture as it does on dedicated hardware. It combines and extends many important SQL features with the scalability of a NoSQL database.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Vitess to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Vitess.
## Implementation
This handler is implemented by extending the MySQL data handler.
The required arguments to establish a connection are as follows:
* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.
## Usage
In order to make use of this handler and connect to the Vitess server in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE vitess_datasource
WITH
ENGINE = "vitess",
PARAMETERS = {
"user": "root",
"password": "",
"host": "localhost",
"port": 33577,
"database": "commerce"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM vitess_datasource.product
LIMIT 10;
```
# YugabyteDB
Source: https://docs.mindsdb.com/integrations/data-integrations/yugabytedb
This is the implementation of the YugabyteDB data handler for MindsDB.
[YugabyteDB](https://www.yugabyte.com/) is a high-performance, cloud-native distributed SQL database that aims to support all PostgreSQL features. It is best fit for cloud-native OLTP (i.e. real-time, business-critical) applications that need absolute data correctness and require at least one of the following: scalability, high tolerance to failures, or globally-distributed deployments.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect YugabyteDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to YugabyteDB.
## Implementation
This handler is implemented using `psycopg2`, a Python library that allows you to use Python code to run SQL commands on the YugabyteDB database.
The required arguments to establish a connection are as follows:
* `user` is the database user.
* `password` is the database password.
* `host` is the host name, IP address, or URL.
* `port` is the port used to make TCP/IP connection.
* `database` is the database name.
* `schema` is the schema to which your table belongs.
## Usage
In order to make use of this handler and connect to the YugabyteDB database in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE yugabyte_datasource
WITH
engine = 'yugabyte',
parameters = {
"user": "admin",
"password": "1234",
"host": "127.0.0.1",
"port": 5433,
"database": "yugabyte",
"schema": "your_schema_name"
};
```
You can use this established connection to query your table as follows:
```sql theme={null}
SELECT *
FROM yugabyte_datasource.demo;
```
NOTE : If you are using YugabyteDB Cloud with MindsDB Cloud website you need to add below 3 static IPs of MindsDB Cloud to `allow IP list` for accessing it publicly.
```
18.220.205.95
3.19.152.46
52.14.91.162
```

# Data Integrations
Source: https://docs.mindsdb.com/integrations/data-overview
MindsDB integrates with numerous data sources, including databases, vector stores, and applications, making data available to AI models by connecting data sources to MindsDB.
**MindsDB supports Model Context Protocol (MCP)**
MindsDB is an MCP server that enables your MCP applications to answer questions over large-scale federated data. [Learn more here](/mcp/overview).
This section contains instructions on how to connect data sources to MindsDB.
Note that MindsDB doesn't store or copy your data. Instead, it fetches data directly from your connected sources each time you make a query, ensuring that any changes to the data are instantly reflected. This means your data remains in its original location, and MindsDB always works with the most up-to-date information.
### Applications
### Databases
### Vector Stores
If you don't find a data source of your interest, you can [request a feature here](https://github.com/mindsdb/mindsdb/issues/new?assignees=\&labels=enhancement\&projects=\&template=feature_request_v2.yaml) or build a handler following [this instruction for data handlers](/contribute/data-handlers) and [this instruction for applications](/contribute/app-handlers).
**Metadata about data handlers and data sources**
**Data handlers** represent a raw implementation of the integration between MindsDB and a data source.
Here is how you can query for all the available data handlers used to connect data sources to MindsDB.
```sql theme={null}
SELECT *
FROM information_schema.handlers
WHERE type = 'data';
```
Or, alternatively:
```sql theme={null}
SHOW HANDLERS
WHERE type = 'data';
```
And here is how you can query for all the created AI engines:
```sql theme={null}
SELECT *
FROM information_schema.databases;
```
Or, alternatively:
```sql theme={null}
SHOW DATABASES;
```
# Upload CSV, XLSX, XLS files to MindsDB
Source: https://docs.mindsdb.com/integrations/files/csv-xlsx-xls
You can upload CSV, XLSX, and XLS files of any size to MindsDB that runs locally via [Docker](/setup/self-hosted/docker) or [pip](/contribute/install).
CSV, XLSX, XLS files are stored in the form of a table inside MindsDB.
## Upload files
Follow the steps below to upload a file:
1. Click on the `Add` dropdown and choose `Upload file`.
2. Upload a file and provide a name used to access it within MindsDB.
3. Alternatively, upload a file as a link and provide a name used to access it within MindsDB.
## Query files
The CSV, XLSX, and XLS files may contain one or more sheets. Here is how to query data within MindsDB.
Query for the list of available sheets in the file uploaded under the name `my_file`.
```sql theme={null}
SELECT *
FROM files.my_file;
```
Query for the content of one of the sheets listed with the command above.
```sql theme={null}
SELECT *
FROM files.my_file.my_sheet;
```
# Upload JSON files to MindsDB
Source: https://docs.mindsdb.com/integrations/files/json
You can upload JSON files of any size to MindsDB that runs locally via [Docker](/setup/self-hosted/docker) or [pip](/contribute/install).
JSON files are converted into a table, if the JSON file structure allows for it. Otherwise, JSON files are stored similarly to text files.
Here is the sample format of a JSON file that can be uploaded to MindsDB:
```
[
{
"id": 1,
"name": "Alice",
"contact": {
"email": "alice@example.com",
"phone": "123-456-7890"
},
"address": {
"street": "123 Maple Street",
"city": "Wonderland",
"zip": "12345"
}
},
{
"id": 2,
"name": "Bob",
"contact": {
"email": "bob@example.com",
"phone": "987-654-3210"
},
"address": {
"street": "456 Oak Avenue",
"city": "Builderland",
"zip": "67890"
}
}
]
```
MindsDB converts it into a table where each row stores the high-level object.
```sql theme={null}
| id | name | contact | address |
| --- | ----- | ---------------------------------------------------- | --------------------------------------------------------------- |
| 1 | Alice | {"email":"alice@example.com","phone":"123-456-7890"} | {"city":"Wonderland","street":"123 Maple Street","zip":"12345"} |
| 2 | Bob | {"email":"bob@example.com","phone":"987-654-3210"} | {"city":"Builderland","street":"456 Oak Avenue","zip":"67890"} |
```
You can extract the JSON fields from `contact` and `address` columns with the `json_extract` function.
```sql theme={null}
SELECT id,
name,
json_extract(contact, '$.email') AS email,
json_extract(address, '$.city') AS city
FROM files.json_file_name;
```
## Upload files
Follow the steps below to upload a file:
1. Click on the `Add` dropdown and choose `Upload file`.
2. Upload a file and provide a name used to access it within MindsDB.
3. Alternatively, upload a file as a link and provide a name used to access it within MindsDB.
## Query files
Here is how to query data within MindsDB.
Query for the content of the file uploaded under the name `my_file`.
```sql theme={null}
SELECT *
FROM files.my_file;
```
# Upload Parquet files to MindsDB
Source: https://docs.mindsdb.com/integrations/files/parquet
You can upload Parquet files of any size to MindsDB that runs locally via [Docker](/setup/self-hosted/docker) or [pip](/contribute/install).
Parquet files are stored in the form of a table inside MindsDB.
## Upload files
Follow the steps below to upload a file:
1. Click on the `Add` dropdown and choose `Upload file`.
2. Upload a file and provide a name used to access it within MindsDB.
3. Alternatively, upload a file as a link and provide a name used to access it within MindsDB.
## Query files
Here is how to query data within MindsDB.
Query for the content of the file uploaded under the name `my_file`.
```sql theme={null}
SELECT *
FROM files.my_file;
```
# Upload PDF files to MindsDB
Source: https://docs.mindsdb.com/integrations/files/pdf
You can upload PDF files of any size to MindsDB that runs locally via [Docker](/setup/self-hosted/docker) or [pip](/contribute/install).
Note that MindsDB supports only searchable PDFs, as opposed to scanned PDFs. These are stored in the form of a table inside MindsDB.
## Upload files
Follow the steps below to upload a file:
1. Click on the `Add` dropdown and choose `Upload file`.
2. Upload a file and provide a name used to access it within MindsDB.
## Query files
Here is how to query data within MindsDB.
Query for the content of the file uploaded under the name `my_file`.
```sql theme={null}
SELECT *
FROM files.my_file;
```
# Upload TXT files to MindsDB
Source: https://docs.mindsdb.com/integrations/files/txt
You can upload TXT files of any size to MindsDB that runs locally via [Docker](/setup/self-hosted/docker) or [pip](/contribute/install).
TXT files are divided into chunks and stored in multiple table cells. MindsDB uses the [TextLoader from LangChain](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.text.TextLoader.html) to load TXT files.
## Upload files
Follow the steps below to upload a file:
1. Click on the `Add` dropdown and choose `Upload file`.
2. Upload a file and provide a name used to access it within MindsDB.
## Query files
Here is how to query data within MindsDB.
Query for the content of the file uploaded under the name `my_file`.
```sql theme={null}
SELECT *
FROM files.my_file;
```
# Sample Database
Source: https://docs.mindsdb.com/integrations/sample-database
MindsDB provides a read-only PostgreSQL database pre-loaded with various datasets. These datasets are curated to cover a wide range of scenarios and use cases, allowing you to experiment with different features of MindsDB.
Our publicly accessible PostgreSQL database is designed for testing and playground purposes. By using these datasets, you can quickly get started with MindsDB, understand how it works, and see how it can be applied to real-world problems.
## Connection
To connect to our read-only PostgreSQL database and access the example datasets, use the following connection parameters:
```python theme={null}
CREATE DATABASE postgresql_conn
WITH ENGINE = 'postgres',
PARAMETERS = {
"user": "demo_user",
"password": "demo_password",
"host": "samples.mindsdb.com",
"port": "5432",
"database": "demo",
"schema": "demo"
};
```
Below is the list of all avaiable datasets as tables.
## Data Tables
Here are the tables converted into Markdown format:
### Fraud Detection Dataset
This `fraud_detection` table contains data on mobile money transactions where each step represents an hour of simulation.
| step | type | amount | nameOrig | oldbalanceOrg | newbalanceOrig | nameDest | oldbalanceDest | newbalanceDest | isFraud | isFlaggedFraud |
| ---- | -------- | ------- | ----------- | ------------- | -------------- | ----------- | -------------- | -------------- | ------- | -------------- |
| 1 | PAYMENT | 9839.64 | C1231006815 | 170136.0 | 160296.36 | M1979787155 | 0.0 | 0.0 | 0 | 0 |
| 1 | PAYMENT | 1864.28 | C1666544295 | 21249.0 | 19384.72 | M2044282225 | 0.0 | 0.0 | 0 | 0 |
| 1 | TRANSFER | 181.0 | C1305486145 | 181.0 | 0.0 | C553264065 | 0.0 | 0.0 | 1 | 0 |
To learn more about this dataset, please visit [this link](https://www.kaggle.com/datasets/ealaxi/paysim1).
### Customer Churn Dataset
This `customer_churn` table contains data of a U.S. bank customer churn.
| RowNumber | CustomerId | Surname | CreditScore | Geography | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember | EstimatedSalary | Exited |
| --------- | ---------- | -------- | ----------- | --------- | ------ | --- | ------ | -------- | ------------- | --------- | -------------- | --------------- | ------ |
| 1 | 15634602 | Hargrave | 619 | France | Female | 42 | 2 | 0 | 1 | 1 | 1 | 101348.88 | 1 |
| 2 | 15647311 | Hill | 608 | Spain | Female | 41 | 1 | 83807.86 | 1 | 0 | 1 | 112542.58 | 0 |
| 3 | 15619304 | Onio | 502 | France | Female | 42 | 8 | 159660.8 | 3 | 1 | 0 | 113931.57 | 1 |
To learn more about this dataset, please visit [this link](https://www.kaggle.com/datasets/shantanudhakadd/bank-customer-churn-prediction).
### Customer Support Chat Dataset
This `customer_support_chat` table contains data of chat with customer support, which can be used to train Large Language Models (LLMs) for Fine Tuning and Domain Adaptation.
| flags | instruction | category | intent | response |
| ----- | ----------------------------------------------------- | -------- | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| B | question about cancelling order Order Number | ORDER | cancel\_order | I've understood you have a question regarding canceling order Order Number, and I'm here to provide you with the information you need. Please go ahead and ask your question, and I'll do my best to assist you. |
| BQZ | i have a question about cancelling order Order Number | ORDER | cancel\_order | I've been informed that you have a question about canceling order Order Number. I'm here to assist you! Please go ahead and let me know what specific question you have, and I'll provide you with all the information and guidance you need. Your satisfaction is my top priority. |
To learn more about this dataset, please visit [this link](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset).
### Bank Customer Transactions Dataset
This `bank_customer_transactions` table contains data of customer transactions with demographic and shopping behavior information.
| Customer ID | Name | Surname | Gender | Birthdate | Transaction Amount | Date | Merchant Name | Category |
| ----------- | -------- | --------- | ------ | ---------- | ------------------ | ---------- | ---------------------- | -------- |
| 752858 | Sean | Rodriguez | F | 2002-10-20 | 35.47 | 2023-04-03 | Smith-Russell | Cosmetic |
| 26381 | Michelle | Phelps | | 1985-10-24 | 2552.72 | 2023-07-17 | Peck, Spence and Young | Travel |
| 305449 | Jacob | Williams | M | 1981-10-25 | 115.97 | 2023-09-20 | Steele Inc | Clothing |
To learn more about this dataset, please visit [this link](https://www.kaggle.com/datasets/bkcoban/customer-transactions).
### Telecom Customer Churn Dataset
This `telecom_customer_churn` table contains data on customer activities, preferences, and behaviors.
| age | gender | security\_no | region\_category | membership\_category | joining\_date | joined\_through\_referral | referral\_id | preferred\_offer\_types | medium\_of\_operation | internet\_option | last\_visit\_time | days\_since\_last\_login | avg\_time\_spent | avg\_transaction\_value | avg\_frequency\_login\_days | points\_in\_wallet | used\_special\_discount | offer\_application\_preference | past\_complaint | complaint\_status | feedback | churn\_risk\_score |
| --- | ------ | ------------ | ---------------- | -------------------- | ------------- | ------------------------- | ------------ | ----------------------- | --------------------- | ---------------- | ----------------- | ------------------------ | ---------------- | ----------------------- | --------------------------- | ------------------ | ----------------------- | ------------------------------ | --------------- | ------------------- | ------------------------ | ------------------ |
| 18 | F | XW0DQ7H | Village | Platinum Membership | 17-08-2017 | No | xxxxxxxx | Gift Vouchers/Coupons | ? | Wi-Fi | 16:08:02 | 17 | 300.63 | 53005.25 | 17 | 781.75 | Yes | Yes | No | Not Applicable | Products always in Stock | 0 |
| 32 | F | 5K0N3X1 | City | Premium Membership | 28-08-2017 | ? | CID21329 | Gift Vouchers/Coupons | Desktop | Mobile\_Data | 12:38:13 | 16 | 306.34 | 12838.38 | 10 | | Yes | No | Yes | Solved | Quality Customer Care | 0 |
| 44 | F | 1F2TCL3 | Town | No Membership | 11-11-2016 | Yes | CID12313 | Gift Vouchers/Coupons | Desktop | Wi-Fi | 22:53:21 | 14 | 516.16 | 21027 | 22 | 500.69 | No | Yes | Yes | Solved in Follow-up | Poor Website | 1 |
To learn more about this dataset, please visit [this link](https://huggingface.co/datasets/d0r1h/customer_churn).
### House Sales Dataset
This `house_sales` table contains data on houses sold throughout the years.
| saledate | ma | type | bedrooms | created\_at |
| ---------- | ------ | ----- | -------- | -------------------------- |
| 2007-09-30 | 441854 | house | 2 | 2007-02-02 15:41:51.922127 |
| 2007-12-31 | 441854 | house | 2 | 2007-02-23 22:36:08.540248 |
| 2008-03-31 | 441854 | house | 2 | 2007-02-25 19:23:52.585358 |
To learn more about this dataset, please visit [this link](https://www.kaggle.com/datasets/).
# ChromaDB
Source: https://docs.mindsdb.com/integrations/vector-db-integrations/chromadb
In this section, we present how to connect ChromaDB to MindsDB.
[ChromaDB](https://www.trychroma.com/) is the open-source embedding database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect ChromaDB to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to ChromaDB.
## Connection
This handler is implemented using the `chromadb` Python library.
To connect to a remote ChromaDB instance, use the following statement:
```sql theme={null}
CREATE DATABASE chromadb_datasource
WITH ENGINE = 'chromadb'
PARAMETERS = {
"host": "YOUR_HOST",
"port": YOUR_PORT,
"distance": "l2/cosine/ip" -- optional, default is cosine
}
```
The required parameters are:
* `host`: The host name or IP address of the ChromaDB instance.
* `port`: The TCP/IP port of the ChromaDB instance.
* `distance`: It defines how the distance between vectors is calculated. Available method include l2, cosine, and ip, as [explained here](https://docs.trychroma.com/docs/collections/configure).
To connect to an in-memory ChromaDB instance, use the following statement:
```sql theme={null}
CREATE DATABASE chromadb_datasource
WITH ENGINE = "chromadb",
PARAMETERS = {
"persist_directory": "YOUR_PERSIST_DIRECTORY",
"distance": "l2/cosine/ip" -- optional
}
```
The required parameters are:
* `persist_directory`: The directory to use for persisting data.
* `distance`: It defines how the distance between vectors is calculated. Available method include l2, cosine, and ip, as [explained here](https://docs.trychroma.com/docs/collections/configure).
## Usage
Now, you can use the established connection to create a collection (or table in the context of MindsDB) in ChromaDB and insert data into it:
```sql theme={null}
CREATE TABLE chromadb_datasource.test_embeddings (
SELECT embeddings,'{"source": "fda"}' as metadata
FROM mysql_datasource.test_embeddings
);
```
`mysql_datasource` is another MindsDB data source that has been created by connecting to a MySQL database. The `test_embeddings` table in the `mysql_datasource` data source contains the embeddings that we want to store in ChromaDB.
You can query your collection (table) as shown below:
```sql theme={null}
SELECT *
FROM chromadb_datasource.test_embeddings;
```
To filter the data in your collection (table) by metadata, you can use the following query:
```sql theme={null}
SELECT *
FROM chromadb_datasource.test_embeddings
WHERE `metadata.source` = "fda";
```
To conduct a similarity search, the following query can be used:
```sql theme={null}
SELECT *
FROM chromadb_datasource.test_embeddings
WHERE search_vector = (
SELECT embeddings
FROM mysql_datasource.test_embeddings
LIMIT 1
);
```
# Couchbase
Source: https://docs.mindsdb.com/integrations/vector-db-integrations/couchbase
This is the implementation of the Couchbase Vector store data handler for MindsDB.
[Couchbase](https://www.couchbase.com/) is an open-source, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating, and presenting data.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Couchbase to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Couchbase.
## Implementation
In order to make use of this handler and connect to a Couchbase server in MindsDB, the following syntax can be used. Note, that the example uses the default `travel-sample` bucket which can be enabled from the couchbase UI with pre-defined scope and documents.
```sql theme={null}
CREATE DATABASE couchbase_vectorsource
WITH
engine='couchbasevector',
parameters={
"connection_string": "couchbase://localhost",
"bucket": "travel-sample",
"user": "admin",
"password": "password",
"scope": "inventory"
};
```
This handler is implemented using the `couchbase` library, the Python driver for Couchbase.
The required arguments to establish a connection are as follows:
* `connection_string`: the connection string for the endpoint of the Couchbase server
* `bucket`: the bucket name to use when connecting with the Couchbase server
* `user`: the user to authenticate with the Couchbase server
* `password`: the password to authenticate the user with the Couchbase server
* `scope`: scopes are a level of data organization within a bucket. If omitted, will default to `_default`
Note: The connection string expects either the couchbases\:// or couchbase:// protocol.
If you are using Couchbase Capella, you can find the connection\_string under the Connect tab.
It will also be required to whitelist the machine(s) that will be running MindsDB and database credentials will need to be created for the user. These steps can also be taken under the Connect tab.
## Usage
Now, you can use the established connection to create a collection (or table in the context of MindsDB) in Couchbase and insert data into it:
### Creating tables
Now, you can use the established connection to create a collection (or table in the context of MindsDB) in Couchbase and insert data into it:
```sql theme={null}
CREATE TABLE couchbase_vectorsource.test_embeddings (
SELECT embeddings
FROM mysql_datasource.test_embeddings
);
```
`mysql_datasource` is another MindsDB data source that has been created by connecting to a MySQL database. The `test_embeddings` table in the `mysql_datasource` data source contains the embeddings that we want to store in Couchbase.
### Querying and searching
You can query your collection (table) as shown below:
```sql theme={null}
SELECT *
FROM couchbase_vectorsource.test_embeddings;
```
To filter the data in your collection (table) by metadata, you can use the following query:
```sql theme={null}
SELECT *
FROM couchbase_vectorsource.test_embeddings
WHERE id = "some_id";
```
To perform a vector search, the following query can be used:
```sql theme={null}
SELECT *
FROM couchbase_vectorsource.test_embeddings
WHERE embeddings = (
SELECT embeddings
FROM mysql_datasource.test_embeddings
LIMIT 1
);
```
### Deleting records
You can delete documents using `DELETE` just like in SQL.
```sql theme={null}
DELETE FROM couchbase_vectorsource.test_embeddings
WHERE `metadata.test` = 'test1';
```
### Dropping connection
To drop the connection, use this command
```sql theme={null}
DROP DATABASE couchbase_vectorsource;
```
# Milvus
Source: https://docs.mindsdb.com/integrations/vector-db-integrations/milvus
This is the implementation of the Milvus handler for MindsDB.
Milvus is an open-source and blazing fast vector database built for scalable similarity search.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Milvus to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
## Connection and Usage
Visit the [Milvus page for details](https://milvus.io/docs/integration_with_mindsdb.md).
# PGVector
Source: https://docs.mindsdb.com/integrations/vector-db-integrations/pgvector
This is the implementation of the PGVector for MindsDB.
PGVector is an open-source vector similarity search for Postgres. It supports the following:
* exact and approximate nearest neighbor search,
* L2 distance, inner product, and cosine distance,
* any language with a Postgres client,
* ACID compliance, point-in-time recovery, JOINs, and all of the other great features of Postgres.
## Connection
This handler uses `pgvector` Python library.
To connect to a PGVector instance, use the following statement:
```sql theme={null}
CREATE DATABASE pvec
WITH
ENGINE = 'pgvector',
PARAMETERS = {
"host": "127.0.0.1",
"port": 5432,
"database": "postgres",
"user": "user",
"password": "password",
"distance": "cosine"
};
```
The required arguments to establish a connection are the following:
* `host`: The host name or IP address of the postgres instance.
* `port`: The port to use when connecting.
* `database`: The database to connect to.
* `user`: The user to connect as.
* `password`: The password to use when connecting.
* `distance`: It defines how the distance between vectors is calculated. Available methods include cosine (default), l1, l2, ip, hamming, jaccard. [Learn more here](https://github.com/pgvector/pgvector/blob/master/README.md).
## Usage
### Installing the pgvector extension
where you have postgres installed run the following commands to install the pgvector extension
`cd /tmp
git clone --branch v0.4.4 https://github.com/pgvector/pgvector.git
cd pgvector
make
make install`
### Installing the pgvector python library
Ensure you install all from requirements.txt in the pgvector\_handler folder
### Creating a database connection in MindsDB
You can create a database connection like you would for a regular postgres database, the only difference is that you need to specify the engine as `pgvector`
```sql theme={null}
CREATE DATABASE pvec
WITH
ENGINE = 'pgvector',
PARAMETERS = {
"host": "127.0.0.1",
"port": 5432,
"database": "postgres",
"user": "user",
"password": "password"
};
```
You can insert data into a new collection like so
```sql theme={null}
CREATE TABLE pvec.embed
(SELECT embeddings FROM mysql_demo_db.test_embeddings
);
CREATE ML_ENGINE openai
FROM openai
USING
api_key = 'your-openai-api-key';
CREATE MODEL openai_emb
PREDICT embedding
USING
engine = 'openai',
model_name='text-embedding-ada-002',
mode = 'embedding',
question_column = 'review';
create table pvec.itemstest (
SELECT m.embedding AS embeddings, t.review content FROM mysql_demo_db.amazon_reviews t
join openai_emb m
);
```
You can query a collection within your PGVector as follows:
```sql theme={null}
SELECT *
FROM pvec.embed
Limit 5;
SELECT *
FROM pvec.itemstest
Limit 5;
```
You can query on semantic search like so:
```sql theme={null}
SELECT *
FROM pvec3.items_test
WHERE embeddings = (select * from mindsdb.embedding) LIMIT 5;
```
# Pinecone
Source: https://docs.mindsdb.com/integrations/vector-db-integrations/pinecone
This is the implementation of the Pinecone for MindsDB.
Pinecone is a vector database which is fully-managed, developer-friendly, and easily scalable.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Pinecone to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Pinecone.
## Implementation
This handler uses `pinecone-client` python library connect to a pinecone environment.
The required arguments to establish a connection are:
* `api_key`: the API key that can be found in your pinecone account
These optional arguments are used with `CREATE TABLE` statements:
* `dimension`: dimensions of the vectors to be stored in the index (default=8)
* `metric`: distance metric to be used for similarity search (default='cosine')
* `spec`: the spec of the index to be created. This is a dictionary that can contain the following keys:
* `cloud`: the cloud provider to use (default='aws')
* `region`: the region to use (default='us-east-1')
Only the creation of serverless indexes is supported at the moment when running `CREATE TABLE` statements.
## Limitations
* [ ] `DROP TABLE` support
* [ ] Support for [namespaces](https://docs.pinecone.io/docs/namespaces)
* [ ] Display score/distance
* [ ] Support for creating/reading sparse values
* [ ] `content` column is not supported since it does not exist in Pinecone
## Usage
In order to make use of this handler and connect to an environment, use the following syntax:
```sql theme={null}
CREATE DATABASE pinecone_dev
WITH ENGINE = "pinecone",
PARAMETERS = {
"api_key": "..."
};
```
You can query pinecone indexes (`temp` in the following examples) based on `id` or `search_vector`, but not both:
```sql theme={null}
SELECT * from pinecone_dev.temp
WHERE id = "abc"
LIMIT 1
```
```sql theme={null}
SELECT * from pinecone_dev.temp
WHERE search_vector = "[1,2,3,4,5,6,7,8]"
```
If you are using subqueries, make sure that the result is only a single row since the use of multiple search vectors is not allowed
```sql theme={null}
SELECT * from pinecone_database.temp
WHERE search_vector = (
SELECT embeddings FROM sqlitetesterdb.test WHERE id = 10
)
```
Optionally, you can filter based on metadata too:
```sql theme={null}
SELECT * from pinecone_dev.temp
WHERE id = "abc" AND metadata.hello < 100
```
You can delete records using `id` or `metadata` like so:
```sql theme={null}
DELETE FROM pinecone_dev.temp
WHERE id = "abc"
```
Note that deletion through metadata is not supported in starter tier
```sql theme={null}
DELETE FROM pinecone_dev.temp
WHERE metadata.tbd = true
```
You can insert data into a new collection like so:
```sql theme={null}
CREATE TABLE pinecone_dev.temp (
SELECT * FROM mysql_demo_db.temp LIMIT 10);
```
To update records, you can use insert statement. When there is a conflicting ID in pinecone index, the record is updated with new values. It might take a bit to see it reflected.
```sql theme={null}
INSERT INTO pinecone_test.testtable (id,content,metadata,embeddings)
VALUES (
'id1', 'this is a test', '{"test": "test"}', '[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]'
);
```
# Weaviate
Source: https://docs.mindsdb.com/integrations/vector-db-integrations/weaviate
This is the implementation of the Weaviate for MindsDB.
Weaviate is an open-source vector database. It allows you to store data objects and vector embeddings from your favorite ML-models, and scale seamlessly into billions of data objects.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect Weaviate to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to Weaviate.
## Implementation
This handler uses `weaviate-client` python library connect to a weaviate instance.
The required arguments to establish a connection are:
* `weaviate_url`: url of the weaviate database
* `weaviate_api_key`: API key to authenticate with weaviate (in case of cloud instance).
* `persistence_directory`: directory to be used in case of local storage
### Creating connection
In order to make use of this handler and connect to a Weaviate server in MindsDB, the following syntax can be used:
```sql theme={null}
CREATE DATABASE weaviate_datasource
WITH ENGINE = "weaviate",
PARAMETERS = {
"weaviate_url" : "https://sample.weaviate.network",
"weaviate_api_key": "api-key"
};
```
```sql theme={null}
CREATE DATABASE weaviate_datasource
WITH ENGINE = "weaviate",
PARAMETERS = {
"weaviate_url" : "https://localhost:8080",
};
```
```sql theme={null}
CREATE DATABASE weaviate_datasource
WITH ENGINE = "weaviate",
PARAMETERS = {
"persistence_directory" : "db_path",
};
```
### Dropping connection
To drop the connection, use this command
```sql theme={null}
DROP DATABASE weaviate_datasource;
```
### Creating tables
To insert data from a pre-existing table, use `CREATE`
```sql theme={null}
CREATE TABLE weaviate_datascource.test
(SELECT * FROM sqlitedb.test);
```
As weaviate currently doesn't support json field.
So, this creates another table for the "metadata" field and a reference is created in the original table which points to
its metadata entry.
Weaviate follows GraphQL conventions where classes (which are table schemas) start with a capital letter and
properties start with a lowercase letter.
So whenever we create a table, the table's name gets capitalized.
### Dropping collections
To drop a Weaviate table use this command
```sql theme={null}
DROP TABLE weaviate_datasource.tablename;
```
### Querying and selecting
To query database using a search vector, you can use `search_vector` or `embeddings` in `WHERE` clause
```sql theme={null}
SELECT * from weaviate_datasource.test
WHERE search_vector = '[3.0, 1.0, 2.0, 4.5]'
LIMIT 10;
```
Basic query
```sql theme={null}
SELECT * from weaviate_datasource.test
```
You can use `WHERE` clause on dynamic fields like normal SQL
```sql theme={null}
SELECT * FROM weaviate_datasource.createtest
WHERE category = "science";
```
### Deleting records
You can delete entries using `DELETE` just like in SQL.
```sql theme={null}
DELETE FROM weaviate_datasource.test
WHERE id IN (1, 2, 3);
```
Update is not supported by mindsdb vector database
# MindsDB, an AI Data Solution
Source: https://docs.mindsdb.com/mindsdb
MindsDB enables humans, AI, agents, and applications to get highly accurate answers across sprawled and large scale data sources.
## Core Philosophy
MindsDB is built around three fundamental capabilities that form the foundation of MindsDB, enabling seamless integration, organization, and utilization of data.
Connect data from [hundreds of data sources](/integrations/data-overview) that integrate with MindsDB, including databases, data warehouses, applications, and vector databases.
Learn more [here](/mindsdb-connect).
Unify and organize data from one or multiple (structured and unstructured) data sources, by creating
[knowledge bases](/mindsdb_sql/knowledge_bases/overview), [views](/mindsdb_sql/sql/create/view) and [jobs](/mindsdb_sql/sql/create/jobs).
Learn more [here](/mindsdb-unify).
Generate accurate, context-aware responses from unified data using [agents](/mindsdb_sql/agents/agent) or [MCP API](/mcp/overview), making insights easily accessible across applications and teams.
Learn more [here](/mindsdb-respond).
## Install MindsDB
MindsDB is an open-source server that can be deployed anywhere, including local machines and clouds, and customized to fit the purpose.
* Use [MindsDB via Docker Desktop](/setup/self-hosted/docker-desktop). This is the fastest and recommended way to get started.
* Use [MindsDB via Docker](/setup/self-hosted/docker). This provides greater flexibility in customizing the MindsDB instance by rebuilding Docker images.
* Use [MindsDB via AWS Marketplace](/setup/cloud/aws-marketplace). This enables running MindsDB in cloud.
* Use [MindsDB via PyPI](/contribute/install). This option enables contributions to MindsDB.
# Connect
Source: https://docs.mindsdb.com/mindsdb-connect
MindsDB enables connecting data from various data sources and operating on data without moving it from its source. Granting MindsDB access to data is the foundation for all other capabilities.
* **Broad integration support**
Seamlessly connect to databases, applications, and more.
* **Real-time data access**
Work with the most up-to-date data without delays from batch processing.
* **No data movement required**
Operate directly on data at the source. No copying, syncing, or ETL needed.
This documentation includes the following content.
These are all the data sources that can be connected to MindsDB.\
Use MindsDB's SQL Editor or connect MindsDB to any SQL client.Use SQL to connect data to MindsDB.
# MindsDB as a Federated Query Engine
Source: https://docs.mindsdb.com/mindsdb-fqe
MindsDB supports federated querying, enabling users to access and analyze data across a wide variety of structured and unstructured data sources using SQL.
## How Query Pushdown Works in MindsDB
MindsDB acts as a federated query engine by translating and pushing down SQL queries to the native engines of connected data sources. Rather than retrieving data and processing queries within MindsDB, it delegates computation to the underlying data sources. This “pushdown” approach ensures:
* High performance: Queries leverage the indexing and processing capabilities of the native engines.
* Low resource usage: MindsDB avoids executing resource-heavy and high-latency operations within the query engine, preventing bottlenecks in CPU, memory, or network.
## Query Translation Limits
Each connected data source has its own SQL dialect, features, and constraints. While MindsDB SQL provides a unified interface, not all SQL expressions or data types can be translated across every database engine. In cases where a native data type or expression is not supported by the underlying engine:
* The query is passed from MindsDB to the data source in its current form, with unsupported data types handled as strings.
* If the data source does not support the syntax, it may return an error.
* Errors originating from the underlying data source are passed through to the user to provide the most accurate context.
## Cross-Database Join Limits
MindsDB allows joining tables across disparate data sources. However, cross-database joins introduce complexity:
* Pushdown can occur partially, not for all joined data sources.
* Join conditions for a particular data source must be executable by its underlying database engine.
## Recap
MindsDB’s federated query engine enables seamless integration with diverse data systems, but effective use requires understanding the limitations of SQL translation and pushdown:
* Pushdown is preferred to optimize performance and avoid resource strain.
* Not all SQL constructs are translatable, especially for vector stores or non-relational systems.
* Errors may occur when a connected data source cannot parse the generated query.
* Workarounds include query decomposition, using simpler expressions, and avoiding unsupported joins or vector logic.
Understanding these nuances helps users debug query errors more effectively and make full use of MindsDB’s federated query capabilities.
# Navigating the MindsDB GUI
Source: https://docs.mindsdb.com/mindsdb-gui
MindsDB offers a user-friendly graphical interface that allows users to execute SQL commands, view their outputs, and easily navigate connected data sources, projects, and their contents.
Let's explore the features and usage of the MindsDB editor.
## Accessing the MindsDB GUI Editor
Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
## Exploring the MindsDB GUI Editor
### Query Editor
This is the primary component where users can input SQL commands and queries. It provides a code editor environment where users can write, edit, and execute SQL statements.
It is located in the top center of the MindsDB GUI.
You can open multiple query editor tabs by clicking the plus button next to the current tab, like this:
### Results Viewer
Once a query is executed, the results viewer displays the output of the query. It presents the results in a tabular format, showing rows and columns of data.
It is located in the bottom center of the MindsDB GUI.
MindsDB supports additional features such as the following:
1. The [Data Insights](/sql/data-insights) feature provides useful data visualization charts.
2. The Export feature lets you export the query output as a CSV or Markdown file.
### Object Explorer
The object explorer provides an overview of the projects, models, views, connected data sources, and tables.
Users can navigate through the available objects by expanding the tree structure items. Upon hovering over the tables, you can query their content using the provided `SELECT` statement, as below.
### Model Progress Bar
MindsDB provides a custom SQL statement to create and deploy models as virtual tables. Upon executing the [`CREATE MODEL`](/sql/create/model) statement, you can monitor the training progress at the bottom-left corner below the object explorer.
Once the model is ready, its status updates to complete.
### Add New Data Sources
You can connect a data source to MindsDB by clicking the `Add` button and choosing `New Datasource`. It takes you to a page that lists all available data sources, including, databases, data warehouses, applications, and more. Here, you can search for a data source you want to connect to and follow the instructions.
For more information, visit the **Data Sources** section of the docs.
### Upload Files
You can upload a file to MindsDB by clicking the `Add` button and choosing `Upload File`. It takes you to a form where you can upload a file and give it a name.
For more information, visit [our docs here](/sql/create/file).
### Upload Custom Models
MindsDB offers a way to upload your custom model in the form of Python code and incorporate it into the MindsDB ecosystem. You can do that by clicking the `Add` button and choosing `Upload custom model`.
For more information, visit [our docs here](/custom-model/byom).
# Naming Standards for MindsDB Objects
Source: https://docs.mindsdb.com/mindsdb-objects
MindsDB allows you to create and manage a variety of entities within its ecosystem. All MindsDB objects follow the same naming conventions to ensure consistency and compatibility across the platform.
## MindsDB Entities
The following entities can be created in MindsDB:
* Databases → [CREATE DATABASE](https://docs.mindsdb.com/mindsdb_sql/sql/create/database)
* Knowledge Bases (KBs) → [CREATE KNOWLEDGE\_BASE](https://docs.mindsdb.com/mindsdb_sql/knowledge_bases/create)
* Tables → [CREATE TABLE](https://docs.mindsdb.com/mindsdb_sql/sql/create/table)
* Views → [CREATE VIEW](https://docs.mindsdb.com/mindsdb_sql/sql/create/view)
* Projects → [CREATE PROJECT](https://docs.mindsdb.com/mindsdb_sql/sql/create/project)
* Jobs → [CREATE JOB](https://docs.mindsdb.com/mindsdb_sql/sql/create/jobs)
* Triggers → [CREATE TRIGGER](https://docs.mindsdb.com/mindsdb_sql/sql/create/trigger)
* Agents → [CREATE AGENT](https://docs.mindsdb.com/mindsdb_sql/agents/agent_syntax)
## General Naming Rules
When creating these entities, the following conventions apply:
* **Case-insensitive names**
Object names are not sensitive to letter casing. For example:
```sql theme={null}
CREATE VIEW my_view (...); -- creates "my_view"
CREATE VIEW My_View (...); -- also creates "my_view"
CREATE VIEW MY_VIEW (...); -- also creates "my_view"
```
All names are automatically converted to lowercase.
* **Allowed characters**
Lowercase letters (`a–z`)
Numbers (`0–9`)
Underscores (`_`)
Example:
```sql theme={null}
CREATE AGENT my_agent345 (...); -- creates "my_agent345"
```
* **Special characters**
If you need special characters or spaces in object names, enclose them in backticks.
```sql theme={null}
CREATE VIEW `my view` (...); -- creates “my view”
CREATE VIEW `my-view!` (...); -- creates “my-view!”
```
However, names inside backticks must be lowercase. Using uppercase letters will result in an error because all object names must be in lowercase letters.
```sql theme={null}
CREATE VIEW `My View` (...); -- error
```
When working with entities from a data source connected to MindsDB, their original names are preserved and are not subject to MindsDB naming rules.
For example, if you connect a Snowflake data source that contains a table named `ANALYTICS_101` with a column named `Date_Time`, you must reference them exactly as they appear in the source, utilizing backticks, as shown below:
```sql theme={null}
SELECT `Date_Time`
FROM snowflake_data.`ANALYTICS_101`;
```
## Backward Compatibility
Older objects created with uppercase letters are still supported for backward compatibility. To reference them, wrap the name in backticks.
```sql theme={null}
SELECT * FROM `MyView`; -- selects from “MyView”
DROP VIEW `MyView`; -- deletes “MyView”
```
You cannot create new objects with uppercase letters. For example:
```sql theme={null}
CREATE VIEW `MyView` (...); -- error
```
## Examples
Here are some practical examples:
### Databases
Note that when enclosing the object name in backticks, it preserves the case-sensitivity and special characters included in the name. Otherwise, the upper-case letters are automatically converted to lower-case letters.
See the usage examples below.
```sql theme={null}
CREATE DATABASE my_database WITH …; -- creates my_database
SELECT * FROM my_database.table_name; -- selects from my_database
DROP DATABASE my_database; -- drops my_database
CREATE DATABASE MY_DATABASE WITH …; -- creates my_database (note that upper-case letters are converted to lower-case letters)
SELECT * FROM my_database.table_name; -- selects from my_database
SELECT * FROM MY_DATABASE.table_name; -- selects from my_database
DROP DATABASE MY_DATABASE; -- drops my_database
CREATE DATABASE `My-database` WITH …; -- creates My-database (note that the name must be enclosed in backticks because it contains a special character)
SELECT * FROM `My-database`.table_name; -- selects from My-database
DROP DATABASE `My-database`; -- drops My-database
```
```sql theme={null}
-- this works
CREATE DATABASE demodata WITH …;
SELECT * FROM demodata.table_name;
SELECT * FROM `demodata`.table_name;
DROP DATABASE demodata;
-- this works and converts all letters to lower-case
CREATE DATABASE demoData WITH …;
SELECT * FROM demoData ...
DROP DATABASE demoData;
-- this works and keeps upper/lower-case letters because the name is enclosed in backticks
CREATE DATABASE `DemoData` WITH …;
SELECT * FROM `DemoData` ...
DROP DATABASE `DemoData` ...
```
```sql theme={null}
CREATE DATABASE DemoData WITH …; -- creates demodata
CREATE DATABASE `DemoData` WITH …; -- cannot create DemoData because demodata already exists
DROP DATABASE `DemoData`; -- cannot drop DemoData because DemoData does not exist
DROP DATABASE DemoData; -- drops demodata
CREATE DATABASE `DemoData` WITH …; -- creates DemoData
CREATE DATABASE demodata WITH …; -- cannot create demodata because DemoData already exists
DROP DATABASE demodata; -- cannot drop demodata because demodata does not exist
DROP DATABASE `DemoData`; -- drops demodata
```
```sql theme={null}
CREATE DATABASE demodata WITH …; -- creates demodata
SELECT * FROM DEMODATA.table_name; -- selects from demodata, because DEMODATA is converted to demodata
DROP DATABASE demodata; -- drops demodata
CREATE DATABASE `DemoData` WITH …; -- creates DemoData
SELECT * FROM demodata.table_name; -- cannot select from demodata
SELECT * FROM `DemoData`.table_name; -- selects from DemoData
DROP DATABASE demodata; -- cannot drop demodata because demodata does not exist
DROP DATABASE `DemoData`; -- drops DemoData
CREATE DATABASE `Dèmo data 2` WITH …;
SELECT * FROM `Dèmo data 2`.table_name;
DROP DATABASE `Dèmo data 2`;
```
### Views
```sql theme={null}
CREATE VIEW my_view (...); -- creates "my_view"
CREATE VIEW My_View (...); -- also creates "my_view"
CREATE VIEW `my view` (...); -- creates "my view"
CREATE VIEW `My_View` (...); -- error
```
If an older object named `My_View` exists, you can still use it:
```sql theme={null}
SELECT * FROM `My_View`; -- selects from “My_View”
DROP VIEW `My_View`; -- deletes “My_View”
```
### Agents
```sql theme={null}
CREATE AGENT my_agent USING ...; -- creates "my_agent"
CREATE AGENT My_Agent USING ...; -- also creates "my_agent"
CREATE AGENT `my agent 1` USING ...; -- creates "my agent 1"
CREATE AGENT `My agent 1` USING ...; -- error
```
If an older object named `My agent 1` exists, you can still use it:
```sql theme={null}
SELECT * FROM `My agent 1`; -- selects from “My agent 1”
DROP AGENT `My agent 1`; -- deletes “My agent 1”
```
# Respond
Source: https://docs.mindsdb.com/mindsdb-respond
MindsDB enables generating insightful and accurate responses from unified data using natural language. Whether answering questions, powering applications, or enabling automations, responses are context-aware and grounded in real-time data.
* **Natural language data queries**
Ask questions in natural language and receive precise answers.
* **AI-powered insights**
Leverage integrated models to analyze, predict, and explain data in context.
* **Actionable responses**
Drive decisions and automations directly from query results.
This documentation includes the following content.
Deploy agents specialized in answering questions over connected and unified data.Connect to MindsDB through MCP (Model Context Protocol) for seamless interaction.
# Unify
Source: https://docs.mindsdb.com/mindsdb-unify
MindsDB enables unifying data from structured and unstructured data sources into a single, queryable interface. This unified view allows seamless querying and model-building across all data without consolidation into one system.
* **Federated query engine**
Query across multiple data sources as if they were a single database.
* **Structured and unstructured data support**
Unify relational data, documents, vector data, and more in one place.
* **No data transformation required**
Use data in its native format without the need for preprocessing.
This documentation includes the following content.
Index and organize unstructured data for efficient retrieval.Simplify data access by creating unified views across different sources.Organize views, knowledge bases, and models into projects.Operate on data using functions.Schedule tasks with jobs.Set up triggering events on data.
# How Agents Work
Source: https://docs.mindsdb.com/mindsdb_sql/agents/agent
Agents enable conversation with data, including structured and unstructured data connected to MindsDB.
Connect your data to MindsDB by [connecting databases or applications](/integrations/data-overview) or [uploading files](/mindsdb_sql/sql/create/file). Users can opt for using [knowledge bases](/mindsdb_sql/knowledge_bases/overview) to store and retrieve data efficiently.
Create an agent, passing the connected data and defining the underlying model.
```sql theme={null}
CREATE AGENT my_agent
USING
model = {
"provider": "openai",
"model_name" : "gpt-4o",
"api_key": "sk-abc123"
},
data = {
"knowledge_bases": ["mindsdb.sales_kb", "mindsdb.orders_kb"],
"tables": ["postgres_conn.customers", "mysql_conn.products"]
},
prompt_template='
mindsdb.sales_kb stores sales analytics data
mindsdb.orders_kb stores order data
postgres_conn.customers stores customers data
mysql_conn.products stores products data
';
```
Query an agent and ask question over the connected data.
```sql theme={null}
SELECT answer
FROM my_agent
WHERE question = 'What is the average number of orders per customers?';
```
Follow [this doc page to learn more about the usage of agents](/mindsdb_sql/agents/agent_syntax).
# How to Chat with Agents
Source: https://docs.mindsdb.com/mindsdb_sql/agents/agent_gui
Agents enable conversation with data, including structured and unstructured data connected to MindsDB.
MindsDB provides a chat interface that enables users to chat with their data.
Select an agent from the list of existing agents, or create one if none exists yet.
Now the chat interface is connected to this agent via [Agent2Agent Protocol](https://google.github.io/A2A/) and users can chat with the data connected to this agent.
# How to Use Agents
Source: https://docs.mindsdb.com/mindsdb_sql/agents/agent_syntax
Agents enable conversation with data, including structured and unstructured data connected to MindsDB.
## `CREATE AGENT` Syntax
Here is the syntax for creating an agent:
```sql theme={null}
CREATE AGENT my_agent
USING
model = {
"provider": "openai",
"model_name" : "gpt-4o",
"api_key": "sk-abc123",
"base_url": "http://example.com",
"api_version": "2024-02-01"
},
data = {
"knowledge_bases": ["project_name.kb_name", ...],
"tables": ["datasource_conn_name.table_name", ...]
},
prompt_template='describe data',
timeout=10;
```
It creates an agent that uses the defined model and has access to the connected data.
```sql theme={null}
SHOW AGENTS
WHERE name = 'my_agent';
```
Note that you can insert all tables from a connected data source and all knowledge bases from a project using the `*` syntax.
```sql theme={null}
...
data = {
"knowledge_bases": ["project_name.*", ...],
"tables": ["datasource_conn_name.*", ...]
},
...
```
### `model`
This parameter defines the underlying language model, including:
* `provider`
It is a required parameter. It defines the model provider from the list below.
* `model_name`
It is a required parameter. It defines the model name from the list below.
* `api_key`
It is an optional parameter (applicable to selected providers), which stores the API key to access the model. Users can provide it either in this `api_key` parameter, or using [environment variables](/mindsdb_sql/functions/from_env).
* `base_url`
It is an optional parameter (applicable to selected providers), which stores the base URL for accessing the model. It is the root URL used to send API requests.
* `api_version`
It is an optional parameter (applicable to selected providers), which defines the API version.
The available models and providers include the following.
Available models:
* claude-3-opus-20240229
* claude-3-sonnet-20240229
* claude-3-haiku-20240307
* claude-2.1
* claude-2.0
* claude-instant-1.2
Available models include all models accessible from Bedrock.
Note that in order to use Bedrock as a model provider, you should ensure the following packages are installed: `langchain_aws` and `transformers`.
The following parameters are specific to this provider:
* `aws_region_name` is a required parameter.
* `aws_access_key_id` is a required parameter.
* `aws_secret_access_key` is a required parameter.
* `aws_session_token` is an optional parameter. It may be required depending on the AWS permissions setup.
Available models:
* gemini-2.5-pro-preview-03-25
* gemini-2.0-flash
* gemini-2.0-flash-lite
* gemini-1.5-flash
* gemini-1.5-flash-8b
* gemini-1.5-pro
Available models:
* gemma
* llama2
* mistral
* mixtral
* llava
* neural-chat
* codellama
* dolphin-mixtral
* qwen
* llama2-uncensored
* mistral-openorca
* deepseek-coder
* nous-hermes2
* phi
* orca-mini
* dolphin-mistral
* wizard-vicuna-uncensored
* vicuna
* tinydolphin
* llama2-chinese
* openhermes
* zephyr
* nomic-embed-text
* tinyllama
* openchat
* wizardcoder
* phind-codellama
* starcoder
* yi
* orca2
* falcon
* starcoder2
* wizard-math
* dolphin-phi
* nous-hermes
* starling-lm
* stable-code
* medllama2
* bakllava
* codeup
* wizardlm-uncensored
* solar
* everythinglm
* sqlcoder
* nous-hermes2-mixtral
* stable-beluga
* yarn-mistral
* samantha-mistral
* stablelm2
* meditron
* stablelm-zephyr
* magicoder
* yarn-llama2
* wizard-vicuna
* llama-pro
* deepseek-llm
* codebooga
* mistrallite
* dolphincoder
* nexusraven
* open-orca-platypus2
* all-minilm
* goliath
* notux
* alfred
* megadolphin
* xwinlm
* wizardlm
* duckdb-nsql
* notus
Available models:
* gpt-3.5-turbo
* gpt-3.5-turbo-16k
* gpt-3.5-turbo-instruct
* gpt-4
* gpt-4-32k
* gpt-4-1106-preview
* gpt-4-0125-preview
* gpt-4.1
* gpt-4.1-mini
* gpt-4o
* o4-mini
* o3-mini
* o1-mini
Available models:
* microsoft/phi-3-mini-4k-instruct
* mistralai/mistral-7b-instruct-v0.2
* writer/palmyra-med-70b
* mistralai/mistral-large
* mistralai/codestral-22b-instruct-v0.1
* nvidia/llama3-chatqa-1.5-70b
* upstage/solar-10.7b-instruct
* google/gemma-2-9b-it
* adept/fuyu-8b
* google/gemma-2b
* databricks/dbrx-instruct
* meta/llama-3\_1-8b-instruct
* microsoft/phi-3-medium-128k-instruct
* 01-ai/yi-large
* nvidia/neva-22b
* meta/llama-3\_1-70b-instruct
* google/codegemma-7b
* google/recurrentgemma-2b
* google/gemma-2-27b-it
* deepseek-ai/deepseek-coder-6.7b-instruct
* mediatek/breeze-7b-instruct
* microsoft/kosmos-2
* microsoft/phi-3-mini-128k-instruct
* nvidia/llama3-chatqa-1.5-8b
* writer/palmyra-med-70b-32k
* google/deplot
* meta/llama-3\_1-405b-instruct
* aisingapore/sea-lion-7b-instruct
* liuhaotian/llava-v1.6-mistral-7b
* microsoft/phi-3-small-8k-instruct
* meta/codellama-70b
* liuhaotian/llava-v1.6-34b
* nv-mistralai/mistral-nemo-12b-instruct
* microsoft/phi-3-medium-4k-instruct
* seallms/seallm-7b-v2.5
* mistralai/mixtral-8x7b-instruct-v0.1
* mistralai/mistral-7b-instruct-v0.3
* google/paligemma
* google/gemma-7b
* mistralai/mixtral-8x22b-instruct-v0.1
* google/codegemma-1.1-7b
* nvidia/nemotron-4-340b-instruct
* meta/llama3-70b-instruct
* microsoft/phi-3-small-128k-instruct
* ibm/granite-8b-code-instruct
* meta/llama3-8b-instruct
* snowflake/arctic
* microsoft/phi-3-vision-128k-instruct
* meta/llama2-70b
* ibm/granite-34b-code-instruct
Available models:
* palmyra-x5
* palmyra-x4
Users can define the model for the agent choosing one of the following options.
**Option 1.** Use the `model` parameter to define the specification.
```sql theme={null}
CREATE AGENT my_agent
USING
model = {
"provider": "openai",
"model_name" : "got-4o",
"api_key": "sk-abc123",
"base_url": "https://example.com/",
"api_version": "2024-02-01"
},
...
```
**Option 2.** Define the default model in the [MindsDB configuration file](/setup/custom-config).
If you define `default_llm` in the configuration file, you do not need to provide the `model` parameter when creating an agent. If provide both, then the values from the `model` parameter are used.
You can define the default models in the Settings of the MindsDB Editor GUI.
```bash theme={null}
"default_llm": {
"provider": "openai",
"model_name" : "got-4o",
"api_key": "sk-abc123",
"base_url": "https://example.com/",
"api_version": "2024-02-01"
}
```
### `data`
This parameter stores data connected to the agent, including knowledge bases and data sources connected to MindsDB.
The following parameters store the list of connected data.
* `knowledge_bases` stores the list of [knowledge bases](/mindsdb_sql/knowledge_bases/overview) to be used by the agent.
* `tables` stores the list of tables from data sources connected to MindsDB.
### `prompt_template`
This parameter stores instructions for the agent.
It is recommended to provide data description of the data sources listed in the `knowledge_bases` and `tables` parameters to help the agent locate relevant data for answering questions.
### `timeout`
This parameter defines the time the agent can take to come back with an answer.
For example, when the `timeout` parameter is set to 10, the agent has 10 seconds to return an answer. If the agent takes longer than 10 seconds, it aborts the process and comes back with an answer indicating its failure to return an answer within the defined time interval.
## `SELECT FROM AGENT` Syntax
Query an agent to generate responses to questions.
```sql theme={null}
SELECT answer
FROM my_agent
WHERE question = 'What is the average number of orders per customers?';
```
You can redefine the agent's parameters at the query time as below.
```sql theme={null}
SELECT answer
FROM my_agent
WHERE question = 'What is the average number of orders per customers?';
USING
model = {
"provider": "openai",
"model_name" : "gpt-4.1",
"api_key": "sk-abc123"
},
data = {
"knowledge_bases": ["project_name.kb_name", ...],
"tables": ["datasource_conn_name.table_name", ...]
},
prompt_template='describe data',
timeout=10;
```
The `USING` clause may contain any combination of parameters from the `CREATE AGENT` command, depending on which parameters users want to update for the query.
For example, users may want to check the performance of other models to decide which model works better for their use case.
```sql theme={null}
SELECT answer
FROM my_agent
WHERE question = 'What is the average number of orders per customers?';
USING
model = {
"provider": "google",
"model_name" : "gemini-2.5-flash",
"api_key": "ABc123"
};
```
## `ALTER AGENT` Syntax
Update existing agents with new data, model, or prompt.
```sql theme={null}
ALTER AGENT my_agent
USING
model = {
"provider": "openai",
"model_name" : "gpt-4.1",
"api_key": "sk-abc123",
"base_url": "http://example.com",
"api_version": "2024-02-01"
},
data = {
"knowledge_bases": ["project_name.kb_name", ...],
"tables": ["datasource_conn_name.table_name", ...]
},
prompt_template='describe data';
```
Note that all parameters are optional. Users can update any combination of parameters.
See detailed descriptions of parameters in the [`CREATE AGENT` section](/mindsdb_sql/agents/agent_syntax#create-agent-syntax).
Here is how to connect new data to an agent.
```sql theme={null}
ALTER AGENT my_agent
USING
data = {
"knowledge_bases": ["mindsdb.sales_kb"],
"tables": ["mysql_db.car_sales", "mysql_db.car_info"]
};
```
And here is how to update a model used by the agent.
```sql theme={null}
ALTER AGENT my_agent
USING
model = {
"provider": "openai",
"model_name" : "gpt-4.1",
"api_key": "sk-abc123"
};
```
## `DROP AGENT` Syntax
Here is the syntax for deleting an agent:
```sql theme={null}
DROP AGENT my_agent;
```
# MariaDB SkySQL Setup Guide with MindsDB
Source: https://docs.mindsdb.com/mindsdb_sql/connect/connect-mariadb-skysql
Find more information on MariaDB Sky SQL [here](https://cloud.MariaDB.com/)
## 1. Select your service for MindsDB
If you haven't already, identify the service to be enabled with MindsDB and make
sure it is running. Otherwise, skip to step 2.
## 2. Add MindsDB to your service Allowlist
Access to MariaDB SkySQL services is [restricted on a per-service basis](https://mariadb.com/products/skysql/docs/security/firewalls/ip-allowlist-services/). Add the following IP addresses to allow MindsDB to connect to your MariaDB service, do this by clicking on the cog icon and navigating to Security Access. In the dialog, input as prompted – one by one – the following IPs:
```
18.220.205.95
3.19.152.46
52.14.91.162
```
## 3. Download your service .pem file
A [certificate authority chain](https://mariadb.com/products/skysql/docs/connect/connection-parameters-portal/#certificate-authority-chain) (.pem file) must be provided for proper TLS certificate validation.
From your selected service, click on the world globe icon (Connect to service). In the Login Credentials section, click Download. The `aws_skysql_chain.pem`
file will download onto your machine.
## 4. Publically Expose your service .pem File
Select secure storage for the `aws_skysql_chain.pem` file that allows a working public URL or localpath. For example, you can store it in an S3 bucket.
## 5. Link MindsDB to your MariaDB SkySQL Service
To print the query template, go to MindsDB Editor and add a new data source from the Connect tab, choose MariaDB SkySQL from the list. Fill in the values and run a query to complete the setup.
Here are the codes:
```sql Template theme={null}
CREATE DATABASE maria_datasource --- display name for the database
WITH ENGINE = 'MariaDB', --- name of the MindsDB handler
PARAMETERS = {
"host": " ", --- host IP address or URL
"port": , --- port used to make TCP/IP connection
"database": " ", --- database name
"user": " ", --- database user
"password": " ", --- database password
"ssl": True/False, --- optional, the `ssl` parameter value indicates whether SSL is enabled (`True`) or disabled (`False`)
"ssl_ca": { --- optional, SSL Certificate Authority
"path": " " --- either "path" or "url"
},
"ssl_cert": { --- optional, SSL certificates
"url": " " --- either "path" or "url"
},
"ssl_key": { --- optional, SSL keys
"path": " " --- either "path" or "url"
}
};
```
```sql Example for MariaDB SkySQL Service theme={null}
CREATE DATABASE skysql_datasource
WITH ENGINE = 'MariaDB',
PARAMETERS = {
"host": "mindsdbtest.mdb0002956.db1.skysql.net",
"port": 5001,
"database": "mindsdb_data",
"user": "DB00007539",
"password": "password",
--- here, the SSL certificate is required
"ssl-ca": {
"url": "https://mindsdb-web-builds.s3.amazonaws.com/aws_skysql_chain.pem"
}
};
```
## What's Next?
Now that you are all set, we recommend you check out our **Tutorials** and
**Community Tutorials** sections, where you'll find various examples of
regression, classification, and time series predictions with MindsDB.
To learn more about MindsDB itself, follow the guide on
[MindsDB database structure](/sql/table-structure/). Also, don't miss out on the
remaining pages from the **SQL API** section, as they explain a common SQL
syntax with examples.
Have fun!
# MindsDB and DBeaver
Source: https://docs.mindsdb.com/mindsdb_sql/connect/dbeaver
DBeaver is a database tool that allows you to connect to and work with various database engines. You can download it [here](https://dbeaver.io/).
## Data Setup
First, create a new database connection in DBeaver by clicking the icon, as shown below.
Next, choose the MySQL database engine and click the *Next* button.
If you have multiple `MySQL` options, choose the `Driver for MySQL8 and later`.
Now it's time to fill in the connection details.
Use the following parameters:
* `127.0.0.1` or `localhost` for the host name. If you run MindsDB in cloud, specify the host name accordingly.
* `47335` for the port, which is the port of the MySQL API exposed by MindsDB. Learn more about [available APIs here](/setup/environment-vars#mindsdb-apis).
* `mindsdb` for the database name.
* `mindsdb` for the user name, unless specified differently in the [`config.json` file](/setup/custom-config#auth).
* `` for the password, unless specified differently in the [`config.json` file](/setup/custom-config#auth).
Now we are ready to test the connection.
## Testing the Connection
Click on the `Test Connection...` button to check if all the provided data allows you to connect to MindsDB.
On success, you should see the message, as below.
## Let's Run Some Queries
To finally make sure that our MindsDB database connection works, let's run some queries.
```sql theme={null}
SHOW FULL DATABASES;
```
On execution, we get:
```sql theme={null}
+----------------------+---------+--------+
| Database | TYPE | ENGINE |
+----------------------+---------+--------+
| information_schema | system | [NULL] |
| mindsdb | project | [NULL] |
| files | data | files |
+----------------------+---------+--------+
```
Here is how it looks in DBeaver:
How to [whitelist MindsDB Cloud IP address](/faqs/whitelist-ips)?
## What's Next?
Now that you are all set, we recommend you to check out our [Tutorials](/sql/tutorials/house-sales-forecasting) section where you'll find various examples of
regression, classification, and time series predictions with MindsDB or [Community Tutorials](/tutorials) list.
To learn more about MindsDB itself, follow the guide on
[MindsDB database structure](/sql/table-structure/). Also, don't miss out on the
remaining pages from the **SQL API** section, as they explain a common SQL
syntax with examples.
Have fun!
# MindsDB and Deepnote
Source: https://docs.mindsdb.com/mindsdb_sql/connect/deepnote
We have worked with the team at Deepnote, and built native integration to Deepnote notebooks.
Please check:
* [Deepnote Demo Guide](https://deepnote.com/project/Machine-Learning-With-SQL-8GDF7bc7SzKlhBLorqoIcw/%2Fmindsdb_demo.ipynb)
* [Deepnote Integration Docs](https://docs.deepnote.com/integrations/mindsdb)
## What's Next?
Now that you are all set, we recommend you check out our **Tutorials** and
**Community Tutorials** sections, where you'll find various examples of
regression, classification, and time series predictions with MindsDB.
To learn more about MindsDB itself, follow the guide on
[MindsDB database structure](/sql/table-structure/). Also, don't miss out on the
remaining pages from the **SQL API** section, as they explain a common SQL
syntax with examples.
Have fun!
# MindsDB and Grafana
Source: https://docs.mindsdb.com/mindsdb_sql/connect/grafana
[Grafana](https://grafana.com/) is an open-source analytics and interactive visualization web application
that allows users to ingest data from various sources, query this data, and display it on customizable charts for easy analysis.
## How to Connect
To begin, set up Grafana by following one of the methods outlined in the [Grafana Installation Documentation](https://grafana.com/docs/grafana/latest/setup-grafana/installation/#supported-operating-systems).
Once Grafana is successfully set up in your environment, navigate to the Connections section, click on Add new connection, and select the MySQL plugin
, as shown below.
Now it's time to fill in the connection details.
There are three options, as below.
You can connect to your local MindsDB. To do that, please use the connection details below:
```
Host: `127.0.0.1:47335`
Username: `mindsdb`
Password:
Database:
```
Now we are ready to Save & test the connection.
## Testing the Connection
Click on the `Save & test` button to check if all the provided data
allows you to connect to MindsDB.
On success, you should see the message, as below.
## Examples
### Querying
To verify the functionality of our MindsDB database connection,
you can query data in the Explore view. Use the text edit mode to compose your queries.
```sql theme={null}
SHOW FULL DATABASES;
```
On execution, we get:
### Visual Query Builder
Now you can build a dashboard with a MindsDB database connection.
Example query :
```sql theme={null}
CREATE DATABASE mysql_demo_db
WITH ENGINE = "mysql",
PARAMETERS = {
"user": "user",
"password": "MindsDBUser123!",
"host": "samples.mindsdb.com",
"port": "3306",
"database": "public"
};
SELECT * FROM mysql_demo_db.air_passengers;
```
On execution, we get:
How to [whitelist MindsDB Cloud IP address](/faqs/whitelist-ips)?
## What's Next?
Now that you are all set, we recommend you check out our **Tutorials** and
**Community Tutorials** sections, where you'll find various examples of
regression, classification, and time series predictions with MindsDB.
To learn more about MindsDB itself, follow the guide on
[MindsDB database structure](/sql/table-structure/). Also, don't miss out on the
remaining pages from the **SQL API** section, as they explain a common SQL
syntax with examples.
Have fun!
# MindsDB and Jupyter Notebooks
Source: https://docs.mindsdb.com/mindsdb_sql/connect/jupysql
Jupysql - full SQL client on Jupyter. It allows you to run SQL and plot large
datasets in Jupyter via a %sql and %%sql magics. It also allows users to plot
the data directly from the DB ( via %sqlplot magics).
Jupysql facilitates working with databases and Jupyter. You can download it
[here](https://github.com/ploomber/jupysql) or run a `pip install jupysql`.
You can consider an option to interact with MindsDB directly from [MySQL CLI](/connect/mysql-client/) or [Postgres CLI](/connect/postgres-client/).
## How to Connect
#### Pre-requisite:
* Make sure you have *jupysql* installed: To install it, run `pip install jupysql`
* Make sure you have *pymysql* installed: To install it, run `pip install pymysql`
You can easily verify the installation of jupysql by running this code:
```python theme={null}
%load_ext sql
```
This command loads the package and allows you to run cell magics on top of Jupyter.
And for pymysql, validate by running this command:
```python theme={null}
import pymysql
```
Please follow the instructions below to connect to your MindsDB via Jupysql and Jupyter.
You can use the Python code below to connect your Jupyter notebook (or lab) to Local MindsDB database (via Jupysql).
Load the extension:
```python theme={null}
%load_ext sql
```
Connect to your DB:
```python theme={null}
%sql mysql+pymysql://mindsdb:@127.0.0.1:47335/mindsdb
```
Testing connection by listing the existing tables (pure SQL):
```python theme={null}
%sql show tables
```
Please note that we use the following connection details:
* Username is `mindsdb`
* Password is left empty
* Host is `127.0.0.1`
* Port is `47335`
* Database name is `mindsdb`
*Docker* - connecting to docker might have a different port.
Create a database connection and execute the code above. On success, only the last command which lists the tables will output. The expected output is:
```bash theme={null}
* mysql+pymysql://mindsdb:***@127.0.0.1:47335/mindsdb
2 rows affected.
Tables_in_mindsdb
models
```
## What's Next?
Now that you are all set, we recommend you check out our **Tutorials** and
**Community Tutorials** sections, where you'll find various examples of
regression, classification, and time series predictions with MindsDB.
To learn more about MindsDB itself, follow the guide on
[MindsDB database structure](/sql/table-structure/). Also, don't miss out on the
remaining pages from the **SQL API** section, as they explain a common SQL
syntax with examples.
Have fun!
# MindsDB and Metabase
Source: https://docs.mindsdb.com/mindsdb_sql/connect/metabase
Metabase is open-source software that facilitates data analysis. It lets you visualize your data easily and intuitively. Now that MindsDB supports the MySQL binary protocol, you can connect it to Metabase and see the forecasts by creating and training the models.
For more information, visit [Metabase](https://www.metabase.com/).
## Setup
### MindsDB
Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
### Metabase
Now, let's set up the Metabase by following one of the approaches presented on
[the Metabase Open Source Edition page](https://www.metabase.com/start/oss/).
Here, we use the
[.jar approach](https://www.metabase.com/docs/latest/installation-and-operation/running-the-metabase-jar-file.html)
for Metabase.
## How to Connect
Follow the steps below to connect your MindsDB to Metabase.
1. Open your Metabase and navigate to the *Admin settings* by clicking the cog
in the bottom left corner.
2. Once there, click on *Databases* in the top navigation bar.
3. Click on *Add database* in the top right corner.
4. Fill in the form using the following data:
```text theme={null}
Database type: `MySQL`
Display name: `MindsDB`
Host: `localhost`
Port: `47335`
Database name: `mindsdb`
Username: `mindsdb`
Password: *leave it empty*
```
5. Click on *Save*.
Now you're connected!
## Example
Now that the connection between MindsDB and Metabase is established, let's do
some examples.
Most of the SQL statements that you usually run in your
[MindsDB SQL Editor](/connect/mindsdb_editor/) can be run in Metabase as well.
Let's start with something easy.
On your Metabase's home page, click on *New > SQL query* in the top right corner
and then, select your MindsDB database.
Let's execute the following command in the editor.
```sql theme={null}
SHOW TABLES;
```
On execution, we get:
Please note that creating a
[database connection](/sql/tutorials/home-rentals/#connecting-the-data) using
the `CREATE DATABASE` statement fails because of the curly braces (`{}`) being
used by JDBC as the escape sequences.
```sql theme={null}
CREATE DATABASE example_db
WITH ENGINE = "postgres",
PARAMETERS = {
"user": "demo_user",
"password": "demo_password",
"host": "samples.mindsdb.com",
"port": "5432",
"database": "demo"
};
```
On execution, we get:
You can overcome this issue using the
[MindsDB SQL Editor](/connect/mindsdb_editor/) to create a database.
Now, getting back to the Metabase, let's run some queries on the database
created with the help of the [MindsDB SQL Editor](/connect/mindsdb_editor/).
```sql theme={null}
SELECT *
FROM example_db.demo_data.home_rentals
LIMIT 10;
```
On execution, we get:
## What's Next?
Now that you are all set, we recommend you check out our **Tutorials** and
**Community Tutorials** sections, where you'll find various examples of
regression, classification, and time series predictions with MindsDB.
To learn more about MindsDB itself, follow the guide on
[MindsDB database structure](/sql/table-structure/). Also, don't miss out on the
remaining pages from the **SQL API** section, as they explain a common SQL
syntax with examples.
Have fun!
# MindsDB SQL Editor
Source: https://docs.mindsdb.com/mindsdb_sql/connect/mindsdb_editor
MindsDB provides a SQL Editor, so you don't need to download additional SQL clients to connect to MindsDB.
## How to Use the MindsDB SQL Editor
There are two ways you can use the Editor, as below.
After setting up the MindsDB using [Docker](/setup/self-hosted/docker), or pip
on
[Linux](/setup/self-hosted/pip/linux)/[Windows](/setup/self-hosted/pip/windows)/[MacOS](/setup/self-hosted/pip/macos),
or pip via [source code](/setup/self-hosted/pip/source), go to your terminal and
execute the following:
```bash theme={null}
python -m mindsdb
```
On execution, we get:
```bash theme={null}
...
2022-05-06 14:07:04,599 - INFO - - GUI available at http://127.0.0.1:47334/
...
```
Immediately after, your browser automatically opens the MindsDB SQL Editor. In
case if it doesn't, visit the URL
[`http://127.0.0.1:47334/`](http://127.0.0.1:47334/) in your browser of
preference.
Here is a sneak peek of the MindsDB SQL Editor:
## What's Next?
Now that you are all set, we recommend you check out our **Tutorials** and
**Community Tutorials** sections, where you'll find various examples of
regression, classification, and time series predictions with MindsDB.
To learn more about MindsDB itself, follow the guide on
[MindsDB database structure](/sql/table-structure/). Also, don't miss out on the
remaining pages from the **SQL API** section, as they explain a common SQL
syntax with examples.
Have fun!
# MindsDB and MySQL CLI
Source: https://docs.mindsdb.com/mindsdb_sql/connect/mysql-client
MindsDB provides a powerful MySQL API that allows users to connect to it
using the MySQL Command Line Client.
Please note that connecting to MindsDB's MySQL API is the same as connecting to
a MySQL database. Find more information on MySQL CLI
[here](https://dev.mysql.com/doc/refman/8.0/en/mysql.html).
By default, MindsDB starts the `http` and `mysql` APIs. You can define which APIs to start using the `api` flag as below.
```bash theme={null}
python -m mindsdb --api http,mysql,postgres
```
If you want to start MindsDB without the graphical user interface (GUI), use the `--no_studio` flag as below.
```bash theme={null}
python -m mindsdb --no_studio
```
## How to Connect
To connect MindsDB in MySQL, use the `mysql` client program:
```bash theme={null}
mysql -h [hostname] --port [TCP/IP port number] -u [user] -p [password]
```
Here is the command that allows you to connect to MindsDB.
```bash theme={null}
mysql -h 127.0.0.1 --port 47335 -u mindsdb
```
On execution, we get:
```bash theme={null}
Welcome to the MariaDB monitor. Commands end with ";" or "\g".
Server version: 5.7.1-MindsDB-1.0 (MindsDB)
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]>
```
## What's Next?
Now that you are all set, we recommend you check out our [Use Cases](/use-cases/overview) section, where you'll find various examples of regression, classification, time series, and NLP predictions with MindsDB.
To learn more about MindsDB itself, follow the guide on [MindsDB database structure](/sql/table-structure/). Also, don't miss out on the remaining pages from the **MindsDB SQL** section, as they explain a common SQL syntax with examples.
Have fun!
# MindsDB and SQL Alchemy
Source: https://docs.mindsdb.com/mindsdb_sql/connect/sql-alchemy
SQL Alchemy is a Python SQL toolkit, that provides object-relational mapping features for the Python programming language.
SQL Alchemy facilitates working with databases and Python. You can download it [here](https://www.sqlalchemy.org/) or run a `pip install sqlalchemy`.
You can consider an option to interact with MindsDB directly from [MySQL CLI](/connect/mysql-client/) or [Postgres CLI](/connect/postgres-client/).
## How to Connect
Please follow the instructions below to connect your MindsDB to SQL Alchemy.
You can use the Python code below to connect your MindsDB database to SQL Alchemy.
Make sure you have the *pymysql* module installed before executing the Python code. To install it, run the `pip install pymysql` command.
```python theme={null}
from sqlalchemy import create_engine
user = 'mindsdb'
password = ''
host = '127.0.0.1'
port = 47335
database = ''
def get_connection():
return create_engine(
url="mysql+pymysql://{0}:{1}@{2}:{3}/{4}".format(user, password, host, port, database)
)
if __name__ == '__main__':
try:
engine = get_connection()
engine.connect()
print(f"Connection to the {host} for user {user} created successfully.")
except Exception as ex:
print("Connection could not be made due to the following error: \n", ex)
```
Please note that we use the following connection details:
* Username is `mindsdb`
* Password is left empty
* Host is `127.0.0.1`
* Port is `47335`
* Database name is left empty
To create a database connection, execute the code above. On success, the following output is expected:
```bash theme={null}
Connection to the 127.0.0.1 for user mindsdb created successfully.
```
The Sqlachemy `create_engine` is lazy. This implies any human error when
entering the connection details would be undetectable until an action becomes
necessary, such as when calling the `execute` method to execute SQL commands.
## What's Next?
Now that you are all set, we recommend you check out our **Tutorials** and
**Community Tutorials** sections, where you'll find various examples of
regression, classification, and time series predictions with MindsDB.
To learn more about MindsDB itself, follow the guide on
[MindsDB database structure](/sql/table-structure/). Also, don't miss out on the
remaining pages from the **SQL API** section, as they explain a common SQL
syntax with examples.
Have fun!
# MindsDB and Tableau
Source: https://docs.mindsdb.com/mindsdb_sql/connect/tableau
Tableau lets you visualize your data easily and intuitively. Now that MindsDB
supports the MySQL binary protocol, you can connect it to Tableau and see the
forecasts.
## How to Connect
Follow the steps below to connect your MindsDB to Tableau.
First, create a new workbook in Tableau and open the *Connectors* tab in the
*Connect to Data* window.
Next, choose *MySQL* and provide the details of your MindsDB connection, such as
the IP, port, and database name. Optionally, you can provide a username and
password. Then, click *Sign In*.
Here are the connection parameters:
```text theme={null}
Host: `localhost`
Port: `47335`
Database name: `mindsdb`
Username: `mindsdb`
Password: *leave it empty*
```
You can [set up the authetication with user and password in the config file](/setup/custom-config#auth).
Now you're connected!
## Overview of MindsDB in Tableau
The content of your MindsDB is visible in the right-side pane.
All the predictors are listed under the *Table* section. You can also switch
between the integrations, such as *mindsdb* or *files*, in the *Database*
section using the drop-down.
Now, let's run some examples!
## Examples
### Example 1
Previewing one of the tables from the *mysql* integration:
### Example 2
There is one technical limitation. Namely, we cannot join tables from different
databases/integrations in Tableau. To overcome this challenge, you can use
either views or custom SQL queries.
* Previewing a view that joins a data table with a predictor table:
* Using a custom SQL query by clicking the *New Custom SQL* button in the
right-side pane:
## What's Next?
Now that you are all set, we recommend you check out our **Tutorials** and
**Community Tutorials** sections, where you'll find various examples of
regression, classification, and time series predictions with MindsDB.
To learn more about MindsDB itself, follow the guide on
[MindsDB database structure](/sql/table-structure/). Also, don't miss out on the
remaining pages from the **SQL API** section, as they explain a common SQL
syntax with examples.
**From Our Community**
Check out the articles and video guides created by our community:
* Article on [Predicting & Visualizing Hourly Electricity Demand in the US with MindsDB and Tableau](https://teslimodus.medium.com/predicting-visualizing-hourly-electricity-demand-in-the-us-with-mindsdb-and-tableau-126d1c74d860)
by [Teslim Odumuyiwa](https://teslimodus.medium.com/)
* Article on [Predicting & Visualizing Petroleum Production with MindsDB and Tableau](https://dev.to/tesprogram/predicting-visualizing-petroleum-production-with-mindsdb-and-tableau-373f)
by [Teslim Odumuyiwa](https://github.com/Tes-program)
* Article on [Predicting & Visualizing Gas Prices with MindsDB and Tableau](https://dev.to/tesprogram/predicting-visualizing-gas-prices-with-mindsdb-and-tableau-d1p)
by [Teslim Odumuyiwa](https://github.com/Tes-program)
* Article on [How To Visualize MindsDB Predictions with Tableau](https://dev.to/ephraimx/how-to-visualize-mindsdb-predictions-with-tableau-2bpd)
by [Ephraimx](https://dev.to/ephraimx)
* Video guide on [Connecting MindsDB to Tableau](https://www.youtube.com/watch?v=eUiBVrm85v4)
by [Alissa Troiano](https://github.com/alissatroiano)
* Video guide on [Visualizing prediction result in Tableau](https://youtu.be/4aio-8kNbOo) by
[Teslim Odumuyiwa](https://github.com/Tes-program)
Have fun!
# Bring Your Own Function
Source: https://docs.mindsdb.com/mindsdb_sql/functions/custom_functions
Custom functions provide advanced means of manipulating data. Users can upload custom functions written in Python to MindsDB and apply them to data.
## How It Works
You can upload your custom functions via the MindsDB editor by clicking `Add` and `Upload custom functions`, like this:
Here is the form that needs to be filled out in order to bring your custom functions to MindsDB:
Let's briefly go over the files that need to be uploaded:
* The Python file stores an implementation of your custom functions. Here is the sample format:
```py theme={null}
def function_name_1(a:type, b:type) -> type:
return x
def function_name_2(a:type, b:type, c:type) -> type:
return x
```
Note that if the input and output types are not set, then `str` is used by default.
```py theme={null}
def add_integers(a:int, b:int) -> int:
return a+b
```
* The optional requirements file, or `requirements.txt`, stores all dependencies along with their versions. Here is the sample format:
```sql theme={null}
dependency_package_1 == version
dependency_package_2 >= version
dependency_package_3 >= verion, < version
...
```
```sql theme={null}
pandas
scikit-learn
```
Once you upload the above files, please provide the name for a storage collection.
Let's look at an example.
## Example
We upload the custom functions, as below:
Here we upload the `functions.py` file that stores an implementation of the functions and the `requirements.txt` file that stores all the dependencies. We named the storage collection as `custom_functions`.
Now we can use the functions as below:
```sql theme={null}
SELECT functions.add_integers(sqft, 1) AS added_one, sqft
FROM example_db.home_rentals
LIMIT 1;
```
Here is the output:
```sql theme={null}
+-----------+------+
| added_one | sqft |
+-----------+------+
| 918 | 917 |
+-----------+------+
```
# The FROM_ENV() Function
Source: https://docs.mindsdb.com/mindsdb_sql/functions/from_env
MindsDB provides the `FROM_ENV()` function that lets users pull values from the environment variables into MindsDB.
## Usage
Here is how to use the `FROM_ENV()` function.
```sql theme={null}
FROM_ENV("MDB_MY_ENV_VAR")
```
Note that due to security concerns, **only the environment variables with name starting with `MDB_` can be extracted with the `from_env()` function**.
Learn more about [MindsDB variables here](/mindsdb_sql/functions/variables).
# The LLM() Function
Source: https://docs.mindsdb.com/mindsdb_sql/functions/llm_function
MindsDB provides the `LLM()` function that lets users incorporate the LLM-generated output directly into the data queries.
## Prerequisites
The `LLM()` function requires a large language model, which can be defined in the following ways:
* By setting the `default_llm` parameter in the [MindsDB configuration file](/setup/custom-config#default-llm).
* By saving the default model in the MindsDB Editor under Settings.
* By defining the environment variables as below, choosing one of the available model providers.
Here are the environment variables for the OpenAI provider:
```
LLM_FUNCTION_MODEL_NAME
LLM_FUNCTION_TEMPERATURE
LLM_FUNCTION_MAX_RETRIES
LLM_FUNCTION_MAX_TOKENS
LLM_FUNCTION_BASE_URL
OPENAI_API_KEY
LLM_FUNCTION_API_ORGANIZATION
LLM_FUNCTION_REQUEST_TIMEOUT
```
Note that the values stored in the environment variables are specific for each provider.
Here are the environment variables for the Anthropic provider:
```
LLM_FUNCTION_MODEL_NAME
LLM_FUNCTION_TEMPERATURE
LLM_FUNCTION_MAX_TOKENS
LLM_FUNCTION_TOP_P
LLM_FUNCTION_TOP_K
LLM_FUNCTION_DEFAULT_REQUEST_TIMEOUT
LLM_FUNCTION_API_KEY
LLM_FUNCTION_BASE_URL
```
Note that the values stored in the environment variables are specific for each provider.
Here are the environment variables for the LiteLLM provider:
```
LLM_FUNCTION_MODEL_NAME
LLM_FUNCTION_TEMPERATURE
LLM_FUNCTION_API_BASE
LLM_FUNCTION_MAX_RETRIES
LLM_FUNCTION_MAX_TOKENS
LLM_FUNCTION_TOP_P
LLM_FUNCTION_TOP_K
```
Note that the values stored in the environment variables are specific for each provider.
Here are the environment variables for the Ollama provider:
```
LLM_FUNCTION_BASE_URL
LLM_FUNCTION_MODEL_NAME
LLM_FUNCTION_TEMPERATURE
LLM_FUNCTION_TOP_P
LLM_FUNCTION_TOP_K
LLM_FUNCTION_REQUEST_TIMEOUT
LLM_FUNCTION_FORMAT
LLM_FUNCTION_HEADERS
LLM_FUNCTION_NUM_PREDICT
LLM_FUNCTION_NUM_CTX
LLM_FUNCTION_NUM_GPU
LLM_FUNCTION_REPEAT_PENALTY
LLM_FUNCTION_STOP
LLM_FUNCTION_TEMPLATE
```
Note that the values stored in the environment variables are specific for each provider.
Here are the environment variables for the Nvidia NIMs provider:
```
LLM_FUNCTION_BASE_URL
LLM_FUNCTION_MODEL_NAME
LLM_FUNCTION_TEMPERATURE
LLM_FUNCTION_TOP_P
LLM_FUNCTION_REQUEST_TIMEOUT
LLM_FUNCTION_FORMAT
LLM_FUNCTION_HEADERS
LLM_FUNCTION_NUM_PREDICT
LLM_FUNCTION_NUM_CTX
LLM_FUNCTION_NUM_GPU
LLM_FUNCTION_REPEAT_PENALTY
LLM_FUNCTION_STOP
LLM_FUNCTION_TEMPLATE
LLM_FUNCTION_NVIDIA_API_KEY
```
Note that the values stored in the environment variables are specific for each provider.
**OpenAI-compatible model providers** can be used like OpenAI models.
There is a number of OpenAI-compatible model providers including OpenRouter or vLLM. To use models via these providers, users need to define the base URL and the API key of the provider.
Here is an example of using OpenRouter.
```
LLM_FUNCTION_MODEL_NAME = "mistralai/devstral-small-2505"
LLM_FUNCTION_BASE_URL = "https://openrouter.ai/api/v1"
OPENAI_API_KEY = "openrouter-api-key"
```
## Usage
You can use the `LLM()` function to simply ask a question and get an answer.
```sql theme={null}
SELECT LLM('How many planets are there in the solar system?');
```
Here is the output:
```sql theme={null}
+------------------------------------------+
| llm |
+------------------------------------------+
| There are 8 planets in the solar system. |
+------------------------------------------+
```
Moreover, you can use the `LLM()` function with your data to swiftly complete tasks such as text generation or summarization.
```sql theme={null}
SELECT
comment,
LLM('Describe the comment''s category in one word: ' || comment) AS category
FROM example_db.user_comments;
```
Here is the output:
```sql theme={null}
+--------------------------+----------+
| comment | category |
+--------------------------+----------+
| I hate tacos | Dislike |
| I want to dance | Desire |
| Baking is not a big deal | Opinion |
+--------------------------+----------+
```
# Standard Functions
Source: https://docs.mindsdb.com/mindsdb_sql/functions/standard-functions
MindsDB supports standard SQL functions via DuckDB and MySQL engines.
## DuckDB Functions
MindsDB executes functions on the underlying DuckDB engine. Therefore, [all DuckDB functions](https://duckdb.org/docs/stable/sql/functions/overview) are supported within MindsDB out of the box.
* [Aggregate Functions](https://duckdb.org/docs/stable/sql/functions/aggregates)
* [Array Functions](https://duckdb.org/docs/stable/sql/functions/array)
* [Bitstring Functions](https://duckdb.org/docs/stable/sql/functions/bitstring)
* [Blob Functions](https://duckdb.org/docs/stable/sql/functions/blob)
* [Date Format Functions](https://duckdb.org/docs/stable/sql/functions/dateformat)
* [Date Functions](https://duckdb.org/docs/stable/sql/functions/date)
* [Date Part Functions](https://duckdb.org/docs/stable/sql/functions/datepart)
* [Enum Functions](https://duckdb.org/docs/stable/sql/functions/enum)
* [Interval Functions](https://duckdb.org/docs/stable/sql/functions/interval)
* [Lambda Functions](https://duckdb.org/docs/stable/sql/functions/lambda)
* [List Functions](https://duckdb.org/docs/stable/sql/functions/list)
* [Map Functions](https://duckdb.org/docs/stable/sql/functions/map)
* [Nested Functions](https://duckdb.org/docs/stable/sql/functions/nested)
* [Numeric Functions](https://duckdb.org/docs/stable/sql/functions/numeric)
* [Pattern Matching](https://duckdb.org/docs/stable/sql/functions/pattern_matching)
* [Regular Expressions](https://duckdb.org/docs/stable/sql/functions/regular_expressions)
* [Struct Functions](https://duckdb.org/docs/stable/sql/functions/struct)
* [Text Functions](https://duckdb.org/docs/stable/sql/functions/text)
* [Time Functions](https://duckdb.org/docs/stable/sql/functions/time)
* [Timestamp Functions](https://duckdb.org/docs/stable/sql/functions/timestamp)
* [Timestamp with Time Zone Functions](https://duckdb.org/docs/stable/sql/functions/timestamptz)
* [Union Functions](https://duckdb.org/docs/stable/sql/functions/union)
* [Utility Functions](https://duckdb.org/docs/stable/sql/functions/utility)
* [Window Functions](https://duckdb.org/docs/stable/sql/functions/window_functions)
## MySQL Functions
MindsDB executes MySQL-style functions on the underlying DuckDB engine. The following functions have been adapted to MySQL-style functions.
String functions:
* [`CHAR`](https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_char)
* [`FORMAT`](https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_format)
* [`INSTR`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_instr)
* [`LENGTH`](https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_length)
* [`LOCATE`](https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_locate)
* [`SUBSTRING_INDEX`](https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_substring-index)
* [`UNHEX`](https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_unhex)
Date and time functions:
* [`ADDDATE`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_adddate)
* [`ADDTIME`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_addtime)
* [`CONVERT_TZ`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_convert-tz)
* [`CURDATE`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_curdate)
* [`CURTIME`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_curtime)
* [`DATE_ADD`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_date-add)
* [`DATE_FORMAT`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_date-format)
* [`DATE_SUB`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_date-sub)
* [`DATEDIFF`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_datediff)
* [`DAYNAME`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_dayname)
* [`DAYOFMONTH`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_dayofmonth)
* [`DAYOFWEEK`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_dayofweek)
* [`DAYOFYEAR`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_dayofyear)
* [`EXTRACT`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_extract)
* [`FROM_DAYS`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_from-days)
* [`FROM_UNIXTIME`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_from-unixtime)
* [`GET_FORMAT`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_get-format)
* [`TIMESTAMPDIFF`](https://dev.mysql.com/doc/refman/8.4/en/date-and-time-functions.html#function_timestampdiff)
Other functions:
* [`REGEXP_SUBSTR`](https://dev.mysql.com/doc/refman/8.4/en/regexp.html#function_regexp-substr)
* [`SHA2`](https://dev.mysql.com/doc/refman/8.4/en/encryption-functions.html#function_sha2)
# The TO_MARKDOWN() Function
Source: https://docs.mindsdb.com/mindsdb_sql/functions/to_markdown_function
MindsDB provides the `TO_MARKDOWN()` function that lets users extract the content of their documents in markdown by simply specifying the document path or URL. This function is especially useful for passing the extracted content of documents through LLMs or for storing them in a [Knowledge Base](/mindsdb_sql/agents/knowledge-bases).
## Configuration
The `TO_MARKDOWN()` function supports different file formats and methods of passing documents into it, as well as an LLM required for processing documents.
### Supported File Formats
The `TO_MARKDOWN()` function supports PDF, XML, and Nessus file formats. The documents can be provided from URLs, file storage, or Amazon S3 storage.
### Supported LLMs
The `TO_MARKDOWN()` function requires an LLM to process the document content into the Markdown format.
The supported LLM providers include:
* OpenAI
* Azure OpenAI
* Google
The model you select must support multi-modal inputs, that is, both images and text. For example, OpenAI’s gpt-4o is a supported multi-modal model.
User can provide an LLM using one of the below methods:
1. Set the default model in the Settings of MindsDB Editor.
2. Set the default model in the [MindsDB configuration file](/setup/custom-config#default-llm).
3. Use environment variables defined below to set an LLM specifically for the `TO_MARKDOWN()` function.
The `TO_MARKDOWN_FUNCTION_PROVIDER` environment variable defines the selected provider, which is one of `openai`, `azure_openai`, or `google`.
Here are the environment variables for the OpenAI provider:
```
TO_MARKDOWN_FUNCTION_API_KEY (required)
TO_MARKDOWN_FUNCTION_MODEL_NAME
TO_MARKDOWN_FUNCTION_TEMPERATURE
TO_MARKDOWN_FUNCTION_MAX_RETRIES
TO_MARKDOWN_FUNCTION_MAX_TOKENS
TO_MARKDOWN_FUNCTION_BASE_URL
TO_MARKDOWN_FUNCTION_API_ORGANIZATION
TO_MARKDOWN_FUNCTION_REQUEST_TIMEOUT
```
Here are the environment variables for the Azure OpenAI provider:
```
TO_MARKDOWN_FUNCTION_API_KEY (required)
TO_MARKDOWN_FUNCTION_BASE_URL (required)
TO_MARKDOWN_FUNCTION_API_VERSION (required)
TO_MARKDOWN_FUNCTION_MODEL_NAME
TO_MARKDOWN_FUNCTION_TEMPERATURE
TO_MARKDOWN_FUNCTION_MAX_RETRIES
TO_MARKDOWN_FUNCTION_MAX_TOKENS
TO_MARKDOWN_FUNCTION_API_ORGANIZATION
TO_MARKDOWN_FUNCTION_REQUEST_TIMEOUT
```
Here are the environment variables for the Google provider:
```
TO_MARKDOWN_FUNCTION_API_KEY
TO_MARKDOWN_FUNCTION_MODEL_NAME
TO_MARKDOWN_FUNCTION_TEMPERATURE
TO_MARKDOWN_FUNCTION_MAX_TOKENS
TO_MARKDOWN_FUNCTION_REQUEST_TIMEOUT
```
## Usage
You can use the `TO_MARKDOWN()` function to extract the content of your documents in markdown format. The arguments for this function are:
* `file_path_or_url`: The path or URL of the document you want to extract content from.
The following example shows how to use the `TO_MARKDOWN()` function with a PDF document from [Amazon S3 storage connected to MindsDB](/integrations/data-integrations/amazon-s3).
```sql theme={null}
SELECT TO_MARKDOWN(public_url) FROM s3_datasource.files;
```
Here are the steps for passing files from Amazon S3 into TO\_MARKDOWN().
1. Connect Amazon S3 to MindsDB following [this instruction](/integrations/data-integrations/amazon-s3).
2. The `public_url` of the file is generated in the `s3_datasource.files` table upon connecting the Amazon S3 data source to MindsDB.
3. Upon running the above query, the `public_url` of the file is selected from the `s3_datasource.files` table.
The following example shows how to use the `TO_MARKDOWN()` function with a PDF document from URL.
```sql theme={null}
SELECT TO_MARKDOWN('https://www.princexml.com/howcome/2016/samples/invoice/index.pdf');
```
Here is the output:
````sql theme={null}
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| to_markdown |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ```markdown |
| # Invoice |
| |
| YesLogic Pty. Ltd. |
| 7 / 39 Bouverie St |
| Carlton VIC 3053 |
| Australia |
| |
| www.yeslogic.com |
| ABN 32 101 193 560 |
| |
| Customer Name |
| Street |
| Postcode City |
| Country |
| |
| Invoice date: | Nov 26, 2016 |
| --- | --- |
| Invoice number: | 161126 |
| Payment due: | 30 days after invoice date |
| |
| | Description | From | Until | Amount | |
| |---------------------------|-------------|-------------|------------| |
| | Prince Upgrades & Support | Nov 26, 2016 | Nov 26, 2017 | USD $950.00 | |
| | Total | | | USD $950.00 | |
| |
| Please transfer amount to: |
| |
| Bank account name: | Yes Logic Pty Ltd |
| --- | --- |
| Name of Bank: | Commonwealth Bank of Australia (CBA) |
| Bank State Branch (BSB): | 063010 |
| Bank State Branch (BSB): | 063010 |
| Bank State Branch (BSB): | 063019 |
| Bank account number: | 13201652 |
| Bank SWIFT code: | CTBAAU2S |
| Bank address: | 231 Swanston St, Melbourne, VIC 3000, Australia |
| |
| The BSB number identifies a branch of a financial institution in Australia. When transferring money to Australia, the BSB number is used together with the bank account number and the SWIFT code. Australian banks do not use IBAN numbers. |
| |
| www.yeslogic.com |
| ``` |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
````
The content of each PDF page is intelligently extracted by first assessing how visually complex the page is. Based on this assessment, the system decides whether traditional text parsing is sufficient or if the page should be processed using an LLM.
### Usage with Knowledge Bases
You can also use the `TO_MARKDOWN()` function to extract content from documents and store it in a [Knowledge Base](/mindsdb_sql/agents/knowledge-bases). This is particularly useful for creating a Knowledge Base from a collection of documents.
```sql theme={null}
INSERT INTO my_kb (
SELECT
HASH('https://www.princexml.com/howcome/2016/samples/invoice/index.pdf') as id,
TO_MARKDOWN('https://www.princexml.com/howcome/2016/samples/invoice/index.pdf') as content
)
```
# Variables
Source: https://docs.mindsdb.com/mindsdb_sql/functions/variables
MindsDB supports the usage of variables. Users can save values of API keys or other frequently used values and pass them as variables when creating knowledge bases, agents, or other MindsDB object.
## Usage
Here is how to create variables in MindsDB.
* Create variables using `SET` and save values either using the [`from_env()` function](/mindsdb_sql/functions/from_env) or directly.
```sql theme={null}
SET @my_env_var = from_env("MDB_MY_ENV_VAR")
SET @my_value = "123456"
```
* Use variables to pass parameters when creating objects in MindsDB.
Here is an example for [knowledge bases](/mindsdb_sql/knowledge_bases/overview).
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
embedding_model = {
"provider": "openai",
"model_name" : "text-embedding-3-large",
"api_key": @my_env_var
},
...;
```
# How to Alter Existing Knowledge Bases
Source: https://docs.mindsdb.com/mindsdb_sql/knowledge_bases/alter
The `ALTER KNOWLEDGE_BASE` command enables users to modify the configuration of the existing knowledge base without the need to recreate it.
This document lists parameters that can be altered, explains the process and the effect on the existing knowledge base.
## `ALTER KNOWLEDGE_BASE` Syntax
Here is the syntax used to alter the existing knowledge base.
```sql theme={null}
ALTER KNOWLEDGE_BASE
USING
= ,
...;
```
The following parameters can be altered:
* `embedding_model`
Users can alter only the API key of the provider used for the embedding model, while users cannot alter the provider and the model itself because it would be incompatible with the already embedded content that is stored in a knowledge base.
```sql theme={null}
ALTER KNOWLEDGE_BASE my_kb
USING
embedding_model = { 'api_key': 'new-api-key' };
```
Upon altering the API key of the embedding model’s provider, ensure that the new API key has access to the same embedding model so that the knowledge base can continue to function without issues.
* `reranking_model`
Users can turn off reranking by setting `reranking_model = false`, or change the provider, API key, and model used for reranking.
```sql theme={null}
ALTER KNOWLEDGE_BASE my_kb
USING
reranking_model = { ‘provider’: ‘new_provider’, ‘model_name’: ‘new_model’, 'api_key': 'new-api-key' };
ALTER KNOWLEDGE_BASE my_kb
USING
reranking_model = false;
```
Upon updating the reranking model, the knowledge base will use the newly defined reranking model when reranking results, provided that reranking is turned on.
* `content_columns`
Users can change the content columns.
```sql theme={null}
ALTER KNOWLEDGE_BASE my_kb
USING
content_columns=['content_col1', 'conten_col2', ...];
```
Upon changing the content columns, all the previously inserted content stays unchanged. Now the knowledge base will be embedding content from columns defined in the most recent call to `ALTER KNOWLEDGE_BASE`.
* `metadata_columns`
Users can change the metadata columns, overriding the existing metadata columns.
```sql theme={null}
ALTER KNOWLEDGE_BASE my_kb
USING
metadata_columns=['metadata_col1', 'metadata_col2', ...];
```
Upon changing the metadata columns:
* All metadata fields are stored in the knowledge base. No data is removed.
* Users can filter only by metadata fields defined in the most recent call to `ALTER KNOWLEDGE_BASE`.
* To be able to filter by all metadata fields, include them in the list as below.
```sql theme={null}
ALTER KNOWLEDGE_BASE my_kb
USING
metadata_columns=[‘existing_metadata_fields’, ..., 'new_metadata_fields', ...];
```
* `id_column`
Users can change the ID column.
```sql theme={null}
ALTER KNOWLEDGE BASE my_kb
USING
id_column='my_id';
```
Upon changing the ID column, users must keep in mind that inserting data with an already existing ID value will update the existing row and not create a new one.
* `storage`
Users cannot update the underlying vector database of the existing knowledge base.
* `preprocessing`
Users can modify the [`preprocessing` parameters as defined here](/mindsdb_sql/knowledge_bases/insert_data#chunking-data).
# How to Create Knowledge Bases
Source: https://docs.mindsdb.com/mindsdb_sql/knowledge_bases/create
A knowledge base is an advanced system that organizes information based on semantic meaning rather than simple keyword matching. It integrates embedding models, reranking models, and vector stores to enable context-aware data retrieval.
## `CREATE KNOWLEDGE_BASE` Syntax
Here is the syntax for creating a knowledge base:
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
embedding_model = {
"provider": "openai",
"model_name" : "text-embedding-3-large",
"api_key": "sk-..."
},
reranking_model = {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-..."
},
storage = my_vector_store.storage_table,
metadata_columns = ['date', 'creator', ...],
content_columns = ['review', 'content', ...],
id_column = 'id';
```
Upon execution, it registers `my_kb` and associates the specified models and storage. `my_kb` is a unique identifier of the knowledge base within MindsDB.
Here is how to list all knowledge bases:
```sql theme={null}
SHOW KNOWLEDGE_BASES;
```
Users can use the variables and the [`from_env()` function](/mindsdb_sql/functions/from_env) to pass parameters when creating knowledge bases.
As MindsDB stores objects, such as models or knowledge bases, inside [projects](/mindsdb_sql/sql/create/project), you can create a knowledge base inside a custom project.
```sql theme={null}
CREATE PROJECT my_project;
CREATE KNOWLEDGE_BASE my_project.my_kb
USING
...
```
### Supported LLMs
Below is the list of all language models supported for the `embedding_model` and `reranking_model` parameters.
#### `provider = 'openai'`
This provider is supported for both `embedding_model` and `reranking_model`.
Users can define the default embedding and reranking models from OpenAI in Settings of the MindsDB GUI.
Furthermore, users can select `Custom OpenAI API` from the dropdown and use models from any OpenAI-compatible API.
When choosing `openai` as the model provider, users should define the following model parameters.
* `model_name` stores the name of the OpenAI model to be used.
* `api_key` stores the OpenAI API key.
Learn more about the [OpenAI integration with MindsDB here](/integrations/ai-engines/openai).
#### `provider = 'openai_azure'`
This provider is supported for both `embedding_model` and `reranking_model`.
Users can define the default embedding and reranking models from Azure OpenAI in Settings of the MindsDB GUI.
When choosing `openai_azure` as the model provider, users should define the following model parameters.
* `model_name` stores the name of the OpenAI model to be used.
* `api_key` stores the OpenAI API key.
* `base_url` stores the base URL of the Azure instance.
* `api_version` stores the version of the Azure instance.
Users need to log in to their Azure OpenAI instance to retrieve all relevant parameter values. Next, click on `Explore Azure AI Foundry portal` and go to `Models + endpoints`. Select the model and copy the parameter values.
#### `provider = 'google'`
This provider is supported for both `embedding_model` and `reranking_model`.
Users can define the default embedding and reranking models from Google in Settings of the MindsDB GUI.
When choosing `google` as the model provider, users should define the following model parameters.
* `model_name` stores the name of the Google model to be used.
* `api_key` stores the Google API key.
Learn more about the [Google Gemini integration with MindsDB here](/integrations/ai-engines/google_gemini).
#### `provider = 'bedrock'`
This provider is supported for both `embedding_model` and `reranking_model`.
When choosing `bedrock` as the model provider, users should define the following model parameters.
* `model_name` stores the name of the model available via Amazon Bedrock.
* `aws_access_key_id` stores a unique identifier associated with your AWS account, used to identify the user or application making requests to AWS.
* `aws_region_name` stores the name of the AWS region you want to send your requests to (e.g., `"us-west-2"`).
* `aws_secret_access_key` stores the secret key associated with your AWS access key ID. It is used to sign your requests securely.
* `aws_session_token` is an optional parameter that stores a temporary token used for short-term security credentials when using AWS Identity and Access Management (IAM) roles or temporary credentials.
#### `provider = 'snowflake'`
This provider is supported for both `embedding_model` and `reranking_model`.
When choosing `snowflake` as the model provider, users should choose one of the available models from [Snowflake Cortex AI](https://www.snowflake.com/en/product/features/cortex/) and define the following model parameters.
* `model_name` stores the name of the model available via Snowflake Cortex AI.
* `api_key` stores the Snowflake Cortex AI API key.
* `account_id` stores the Snowflake account ID.
Follow the below steps to generate the API key.
1. Generate a key pair according to [this instruction](https://docs.snowflake.com/en/user-guide/key-pair-auth) as below.
* Execute these commands in the console:
```bash theme={null}
# generate private key
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt
# generate public key
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub
```
* Save the public key, that is, the content of rsa\_key.pub, into your database user:
```sql theme={null}
ALTER USER my_user SET RSA_PUBLIC_KEY = ""
```
2. Verify the key pair with the database user.
* Install `snowsql` following [this instruction](https://docs.snowflake.com/en/user-guide/snowsql-install-config).
* Execute this command in the console:
```bash theme={null}
snowsql -a -u my_user --private-key-path rsa_key.p8
```
3. Generate JWT token.
* Download the Python script from [Snowflake's Developer Guide for Authentication](https://docs.snowflake.com/en/developer-guide/sql-api/authenticating). Here is a [direct download link](https://docs.snowflake.com/en/_downloads/aeb84cdfe91dcfbd889465403b875515/sql-api-generate-jwt.py).
* Ensure to have the PyJWT module installed that is required for running the script.
* Run the script using this command:
```bash theme={null}
sql-api-generate-jwt.py --account --user my_user --private_key_file_path rsa_key.p8
```
This command returns the JWT token, which is used in the `api_key` parameter for the `snowflake` provider.
#### `provider = 'ollama'`
This provider is supported for both `embedding_model` and `reranking_model`.
Users can define the default embedding and reranking models from Ollama in Settings of the MindsDB GUI.
When choosing `ollama` as the model provider, users should define the following model parameters.
* `model_name` stores the name of the model to be used.
* `base_url` stores the base URL of the Ollama instance.
### `embedding_model`
The embedding model is a required component of the knowledge base. It stores specifications of the embedding model to be used.
Users can define the embedding model choosing one of the following options.
**Option 1.** Use the `embedding_model` parameter to define the specification.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
...
embedding_model = {
"provider": "azure_openai",
"model_name" : "text-embedding-3-large",
"api_key": "sk-abc123",
"base_url": "https://ai-6689.openai.azure.com/",
"api_version": "2024-02-01"
},
...
```
**Option 2.** Define the default embedding model in the [MindsDB configuration file](/setup/custom-config).
You can define the default models in the Settings of the MindsDB Editor GUI.
Note that if you define [`default_embedding_model` in the configuration file](/setup/custom-config#default_embedding_model), you do not need to provide the `embedding_model` parameter when creating a knowledge base. If provide both, then the values from the `embedding_model` parameter are used.
When using `default_embedding_model` from the configuration file, the knowledge base saves this model internally. Therefore, when changing `default_embedding_model` in the configuration file to a different one after the knowledge base is created, it does not affect the already created knowledge bases.
```bash theme={null}
"default_embedding_model": {
"provider": "azure_openai",
"model_name" : "text-embedding-3-large",
"api_key": "sk-abc123",
"base_url": "https://ai-6689.openai.azure.com/",
"api_version": "2024-02-01"
}
```
The embedding model specification includes:
* `provider`
It is a required parameter. It defines the model provider.
* `model_name`
It is a required parameter. It defines the embedding model name as specified by the provider.
* `api_key`
The API key is required to access the embedding model assigned to a knowledge base. Users can provide it either in this `api_key` parameter, or in the `OPENAI_API_KEY` environment variable for `"provider": "openai"` and `AZURE_OPENAI_API_KEY` environment variable for `"provider": "azure_openai"`.
* `base_url`
It is an optional parameter, which defaults to `https://api.openai.com/v1/`. It is a required parameter when using the `azure_openai` provider. It is the root URL used to send API requests.
* `api_version`
It is an optional parameter. It is a required parameter when using the `azure_openai` provider. It defines the API version.
### `reranking_model`
The reranking model is an optional component of the knowledge base. It stores specifications of the reranking model to be used.
Users can disable reranking features of knowledge bases by setting this parameter to `false`.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
...
reranking_model = false,
...
```
Users can enable reranking features of knowledge bases by defining the reranking model choosing one of the following options.
**Option 1.** Use the `reranking_model` parameter to define the specification.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
...
reranking_model = {
"provider": "azure_openai",
"model_name" : "gpt-4o",
"api_key": "sk-abc123",
"base_url": "https://ai-6689.openai.azure.com/",
"api_version": "2024-02-01",
"method": "multi-class"
},
...
```
**Option 2.** Define the default reranking model in the [MindsDB configuration file](/setup/custom-config).
You can define the default models in the Settings of the MindsDB Editor GUI.
Note that if you define [`default_reranking_model` in the configuration file](/setup/custom-config#default-reranking-model), you do not need to provide the `reranking_model` parameter when creating a knowledge base. If provide both, then the values from the `reranking_model` parameter are used.
When using `default_reranking_model` from the configuration file, the knowledge base saves this model internally. Therefore, when changing `default_reranking_model` in the configuration file to a different one after the knowledge base is created, it does not affect the already created knowledge bases.
```bash theme={null}
"default_reranking_model": {
"provider": "azure_openai",
"model_name" : "gpt-4o",
"api_key": "sk-abc123",
"base_url": "https://ai-6689.openai.azure.com/",
"api_version": "2024-02-01",
"method": "multi-class"
}
```
The reranking model specification includes:
* `provider`
It is a required parameter. It defines the model provider as listed in [supported LLMs](/mindsdb_sql/knowledge_bases/create#supported-llms).
* `model_name`
It is a required parameter. It defines the embedding model name as specified by the provider.
* `api_key`
The API key is required to access the embedding model assigned to a knowledge base. Users can provide it either in this `api_key` parameter, or in the `OPENAI_API_KEY` environment variable for `"provider": "openai"` and `AZURE_OPENAI_API_KEY` environment variable for `"provider": "azure_openai"`.
* `base_url`
It is an optional parameter, which defaults to `https://api.openai.com/v1/`. It is a required parameter when using the `azure_openai` provider. It is the root URL used to send API requests.
* `api_version`
It is an optional parameter. It is a required parameter when using the `azure_openai` provider. It defines the API version.
* `method`
It is an optional parameter. It defines the method used to calculate the relevance of the output rows. The available options include `multi-class` and `binary`. It defaults to `multi-class`.
**Reranking Method**
The `multi-class` reranking method classifies each document chunk (that meets any specified metadata filtering conditions) into one of four relevance classes:
1. Not relevant with class weight of 0.25.
2. Slightly relevant with class weight of 0.5.
3. Moderately relevant with class weight of 0.75.
4. Highly relevant with class weight of 1.
The overall `relevance_score` of a document is calculated as the sum of each chunk’s class weight multiplied by its class probability (from model logprob output).
The `binary` reranking method simplifies classification by determining whether a document is relevant or not, without intermediate relevance levels. With this method, the overall `relevance_score` of a document is calculated based on the model log probability.
### `storage`
The vector store is a required component of the knowledge base. It stores data in the form of embeddings.
It is optional for users to provide the `storage` parameter. If not provided, the default ChromaDB is created when creating a knowledge base.
The available options include either [PGVector](/integrations/vector-db-integrations/pgvector) or [ChromaDB](/integrations/vector-db-integrations/chromadb).
It is recommended to use PGVector version 0.8.0 or higher for a better performance.
If the `storage` parameter is not provided, the system creates the default ChromaDB vector database called `_chromadb` with the default table called `default_collection` that stores the embedded data. This default ChromaDB vector database is stored in MindsDB's storage.
In order to provide the storage vector database, it is required to connect it to MindsDB beforehand.
Here is an example for [PGVector](/integrations/vector-db-integrations/pgvector).
```sql theme={null}
CREATE DATABASE my_pgvector
WITH ENGINE = 'pgvector',
PARAMETERS = {
"host": "127.0.0.1",
"port": 5432,
"database": "postgres",
"user": "user",
"password": "password",
"distance": "cosine"
};
CREATE KNOWLEDGE_BASE my_kb
USING
...
storage = my_pgvector.storage_table,
...
```
Note that you do not need to have the `storage_table` created as it is created when creating a knowledge base.
### `metadata_columns`
The data inserted into the knowledge base can be classified as metadata, which enables users to filter the search results using defined data fields.
Note that source data column(s) included in `metadata_columns` cannot be used in `content_columns`, and vice versa.
This parameter is an array of strings that lists column names from the source data to be used as metadata. If not provided, then all inserted columns (except for columns defined as `id_column` and `content_columns`) are considered metadata columns.
Here is an example of usage. A user wants to store the following data in a knowledge base.
```sql theme={null}
+----------+-------------------+------------------------+
| order_id | product | notes |
+----------+-------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Laptop Stand | Prefer aluminum finish |
+----------+-------------------+------------------------+
```
Go to the *Complete Example* section below to find out how to access this sample data.
The `product` column can be used as metadata to enable metadata filtering.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
...
metadata_columns = ['product'],
...
```
### `content_columns`
The data inserted into the knowledge base can be classified as content, which is embedded by the embedding model and stored in the underlying vector store.
Note that source data column(s) included in `content_columns` cannot be used in `metadata_columns`, and vice versa.
This parameter is an array of strings that lists column names from the source data to be used as content and processed into embeddings. If not provided, the `content` column is expected by default when inserting data into the knowledge base.
Here is an example of usage. A user wants to store the following data in a knowledge base.
```sql theme={null}
+----------+-------------------+------------------------+
| order_id | product | notes |
+----------+-------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Laptop Stand | Prefer aluminum finish |
+----------+-------------------+------------------------+
```
Go to the *Complete Example* section below to find out how to access this sample data.
The `notes` column can be used as content.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
...
content_columns = ['notes'],
...
```
### `id_column`
The ID column uniquely identifies each source data row in the knowledge base.
It is an optional parameter. If provided, this parameter is a string that contains the source data ID column name. If not provided, it is generated from the hash of the content columns.
Here is an example of usage. A user wants to store the following data in a knowledge base.
```sql theme={null}
+----------+-------------------+------------------------+
| order_id | product | notes |
+----------+-------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Laptop Stand | Prefer aluminum finish |
+----------+-------------------+------------------------+
```
Go to the *Complete Example* section below to find out how to access this sample data.
The `order_id` column can be used as ID.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
...
id_column = 'order_id',
...
```
Note that if the source data row is chunked into multiple chunks by the knowledge base (that is, to optimize the storage), then these rows in the knowledge base have the same ID value that identifies chunks from one source data row.
**Available options for the ID column values**
* User-Defined ID Column:
When users defined the `id_column` parameter, the values from the provided source data column are used to identify source data rows within the knowledge base.
* User-Generated ID Column:
When users do not have a column that uniquely identifies each row in their source data, they can generate the ID column values when inserting data into the knowledge base using functions like `HASH()` or `ROW_NUMBER()`.
```sql theme={null}
INSERT INTO my_kb (
SELECT ROW_NUMBER() OVER (ORDER BY order_id) AS id, *
FROM sample_data.orders
);
```
* Default ID Column:
If the `id_column` parameter is not defined, its default values are build from the hash of the content columns and follow the format: ``.
### Example
Here is a sample knowledge base that will be used for examples in the following content.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
embedding_model = {
"provider": "openai",
"model_name" : "text-embedding-3-large",
"api_key": "sk-abc123"
},
reranking_model = {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-abc123"
},
metadata_columns = ['product'],
content_columns = ['notes'],
id_column = 'order_id';
```
## `DESCRIBE KNOWLEDGE_BASE` Syntax
Users can get details about the knowledge base using the `DESCRIBE KNOWLEDGE_BASE` command.
```sql theme={null}
DESCRIBE KNOWLEDGE_BASE my_kb;
```
Here is the sample output:
```sql theme={null}
+---------+---------+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+--------------------+----------------+-------+----------+
| NAME | PROJECT | MODEL | STORAGE | PARAMS | INSERT_STARTED_AT | INSERT_FINISHED_AT | PROCESSED_ROWS | ERROR | QUERY_ID |
+---------+---------+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+--------------------+----------------+-------+----------+
| my_kb | mindsdb | [NULL] | my_kb_chromadb.default_collection | {"embedding_model": {"provider": "openai", "model_name": "text-embedding-ada-002", "api_key": "sk-xxx"}, "reranking_model": {"provider": "openai", "model_name": "gpt-4o", "api_key": "sk-xxx"}, "default_vector_storage": "my_kb_chromadb"} | [NULL] | [NULL] | [NULL] | [NULL]| [NULL] |
+---------+---------+--------+----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+--------------------+----------------+-------+----------+
```
## `DROP KNOWLEDGE_BASE` Syntax
Here is the syntax for deleting a knowledge base:
```sql theme={null}
DROP KNOWLEDGE_BASE my_kb;
```
Upon execution, it removes the knowledge base with its content.
Upon execution, it identifies matching records based on the user-defined condition and removes all associated data (metadata, content, chunks, embeddings) for matching records from the KB's storage.
# How to Evaluate Knowledge Bases
Source: https://docs.mindsdb.com/mindsdb_sql/knowledge_bases/evaluate
Evaluating knowledge bases verifies how accurate and relevant is the data returned by the knowledge base.
## `EVALUATE KNOWLEDGE_BASE` Syntax
With the `EVALUATE KNOWLEDGE_BASE` command, users can evaluate the relevancy and accuracy of the documents and data returned by the knowledge base.
Below is the complete syntax that includes both required and optional parameters.
```sql theme={null}
EVALUATE KNOWLEDGE_BASE my_kb
USING
test_table = my_datasource.my_test_table,
version = 'doc_id',
generate_data = {
'from_sql': 'SELECT id, content FROM my_datasource.my_table',
'count': 100
},
evaluate = false,
llm = {
'provider': 'openai',
'api_key':'sk-xxx',
'model_name':'gpt-4'
},
save_to = my_datasource.my_result_table;
```
### `test_table`
This is a required parameter that stores the name of the table from one of the data sources connected to MindsDB. For example, `test_table = my_datasource.my_test_table` defines a table named `my_test_table` from a data source named `my_datasource`.
This test table stores test data commonly in form of questions and answers. Its content depends on the `version` parameter defined below.
Users can provide their own test data or have the test data generated by the `EVALUATE KNOWLEDGE_BASE` command, which is performed when setting the `generate_data` parameter defined below.
### `version`
This is an optional parameter that defines the version of the evaluator. If not defined, its default value is `doc_id`.
* `version = 'doc_id'`
The evaluator checks whether the document ID returned by the knowledge base matched the expected document ID as defined in the test table.
* `version = 'llm_relevancy'`
The evaluator uses a language model to rank and evaluate responses from the knowledge base.
### `generate_data`
This is an optional parameter used to configure the test data generation, which is saved into the table defined in the `test_table` parameter. If not defined, its default value is `false`, meaning that no test data is generated.
Available values are as follows:
* A dictionary containing the following values:
* `from_sql` defines the SQL query that fetches the test data. For example, `'from_sql': 'SELECT id, content FROM my_datasource.my_table'`. If not defined, it fetches test data from the knowledge base on which the `EVALUATE` command is executed: `SELECT chunk_content, id FROM my_kb`.
* `count` defines the size of the test dataset. For example, `'count': 100`. Its default value is 20.
When providing the `from_sql` parameter, it requires specific column names as follows:
* With `version = 'doc_id'`, the `from_sql` parameter should contain a query that returns the `id` and `content` columns, like this: `'from_sql': 'SELECT id_column_name AS id, content_column_names AS content FROM my_datasource.my_table'`
* With `version = 'llm_relevancy'`, the `from_sql` parameter should contain a query that returns the `content` column, like this: `'from_sql': 'SELECT content_column_names AS content FROM my_datasource.my_table'`
* A value of `true`, such as `generate_data = true`, which implies that default values for `from_sql` and `count` will be used.
### `evaluate`
This is an optional parameter that defines whether to evaluate the knowledge base. If not defined, its default value is `true`.
Users can opt for setting it to false, `evaluate = false`, in order to generate test data into the test table without running the evaluator.
### `llm`
This is an optional parameter that defines a language model to be used for evaluations, if `version` is set to `llm_relevancy`.
If not defined, its default value is the [`reranking_model` defined with the knowledge base](/mindsdb_sql/knowledge_bases/create#reranking-model).
Users can define it with the `EVALUATE KNOWLEDGE_BASE` command in the same manner.
```sql theme={null}
EVALUATE KNOWLEDGE_BASE my_kb
USING
...
llm = {
"provider": "azure_openai",
"model_name" : "gpt-4o",
"api_key": "sk-abc123",
"base_url": "https://ai-6689.openai.azure.com/",
"api_version": "2024-02-01",
"method": "multi-class"
},
...
```
### `save_to`
This is an optional parameter that stores the name of the table from one of the data sources connected to MindsDB. For example, `save_to = my_datasource.my_result_table` defines a table named `my_result_table` from the data source named `my_datasource`. If not defined, the results are not saved into a table.
This table is used to save the evaluation results.
By default, evaluation results are returned after executing the `EVALUATE KNOWLEDGE_BASE` statement.
### Evaluation Results
When using `version = 'doc_id'`, the following columns are included in the evaluation results:
* `total` stores the total number of questions.
* `total_found` stores the number of questions to which the knowledge bases provided correct answers.
* `retrieved_in_top_10` stores the number of top 10 questions to which the knowledge bases provided correct answers.
* `cumulative_recall` stores data that can be used to create a chart.
* `avg_query_time` stores the execution time of a search query of the knowledge base.
* `name` stores the knowledge base name.
* `created_at` stores the timestamp when the evaluation was created.
When using `version = 'llm_relevancy'`, the following columns are included in the evaluation results:
* `avg_relevancy` stores the average relevancy.
* `avg_relevance_score_by_k` stores the average relevancy at k.
* `avg_first_relevant_position` stores the average first relevant position.
* `mean_mrr` stores the Mean Reciprocal Rank (MRR).
* `hit_at_k` stores the Hit\@k value.
* `bin_precision_at_k` stores the Binary Precision\@k.
* `avg_entropy` stores the average relevance score entropy.
* `avg_ndcg` stores the average nDCG.
* `avg_query_time` stores the execution time of a search query of the knowledge base.
* `name` stores the knowledge base name.
* `created_at` stores the timestamp when the evaluation was created.
# How to Use Knowledge Bases
Source: https://docs.mindsdb.com/mindsdb_sql/knowledge_bases/examples
This section contains examples of usage of knowledge bases.
### Sales Data
Here is the data that will be inserted into the knowledge base.
```sql theme={null}
+----------+-------------------+------------------------+
| order_id | product | notes |
+----------+-------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Laptop Stand | Prefer aluminum finish |
+----------+-------------------+------------------------+
```
You can access this sample data as below:
```sql theme={null}
CREATE DATABASE sample_data
WITH ENGINE = 'postgres',
PARAMETERS = {
"user": "demo_user",
"password": "demo_password",
"host": "samples.mindsdb.com",
"port": "5432",
"database": "demo",
"schema": "demo_data"
};
SELECT * FROM sample_data.orders;
```
Here is how to create a knowledge base specifically for the data.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
embedding_model = {
"provider": "openai",
"model_name" : "text-embedding-3-large",
"api_key": "sk-abc123"
},
reranking_model = {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-abc123"
},
metadata_columns = ['product'],
content_columns = ['notes'],
id_column = 'order_id';
```
Here is how to insert the data.
```sql theme={null}
INSERT INTO my_kb
SELECT order_id, product, notes
FROM sample_data.orders;
```
Here is how to query the knowledge base.
```sql theme={null}
SELECT *
FROM my_kb
WHERE product = 'Wireless Mouse'
AND content = 'color'
AND relevance > 0.5;
```
### Financial Data
You can access the sample data as below:
```sql theme={null}
CREATE DATABASE sample_data
WITH ENGINE = 'postgres',
PARAMETERS = {
"user": "demo_user",
"password": "demo_password",
"host": "samples.mindsdb.com",
"port": "5432",
"database": "demo",
"schema": "demo_data"
};
SELECT * FROM sample_data.financial_headlines;
```
Here is how to create a knowledge base specifically for the data.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
embedding_model = {
"provider": "openai",
"model_name" : "text-embedding-3-large",
"api_key": "sk-xxx"
},
reranking_model = {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-xxx"
},
metadata_columns = ['sentiment_labelled'],
content_columns = ['headline'];
```
Here is how to insert the data.
```sql theme={null}
INSERT INTO my_kb
SELECT *
FROM sample_data.financial_headlines
USING
batch_size = 500,
threads = 10;
```
Here is how to query the knowledge base.
* Query without defined `LIMIT`
```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'investors';
```
This query returns 10 rows, as the default `LIMIT` is set to 10.
* Query with defined `LIMIT`
```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'investors'
LIMIT 20;
```
This query returns 20 rows, as the user-defined `LIMIT` is set to 20.
* Query with defined `LIMIT` and `relevance`
```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'investors'
AND relevance >= 0.8
LIMIT 20;
```
This query may return 20 or less rows, depending on whether the relevance scores of the rows match the user-defined condition.
# How to Hybrid Search Knowledge Bases
Source: https://docs.mindsdb.com/mindsdb_sql/knowledge_bases/hybrid_search
Knowledge bases support two primary search methods: [semantic search](/mindsdb_sql/knowledge_bases/query#semantic-search) and [metadata/keyword search](/mindsdb_sql/knowledge_bases/query#metadata-filtering). Each method has its strengths and ideal use cases.
Semantic similarity search uses vector embeddings to retrieve content that is semantically related to a given query. This is especially powerful when users are searching for concepts, ideas, or questions expressed in natural language.
However, semantic search may fall short when users are looking for specific keywords, such as acronyms, internal terminology, or custom identifiers. These types of terms are often not well-represented in the embedding model's training data. As a result, embedding-based semantic search might entirely miss results that do contain the exact keyword.
To address this gap, knowledge bases offer hybrid search, which combines the best of both worlds: semantic similarity and exact keyword matching. Hybrid search ensures that results relevant by meaning and results matching specific terms are both considered and ranked appropriately.
## Enabling Hybrid Search
To use hybrid search, you first need to [create a knowledge base](/mindsdb_sql/knowledge_bases/create) and [insert data into it](/mindsdb_sql/knowledge_bases/insert_data).
Hybrid search can be enabled at the time of querying the knowledge base by specifying the appropriate configuration options, as shown below.
```sql theme={null}
SELECT * from my_kb
WHERE
content = ”ACME-213”
AND hybrid_search_alpha = 0.8;
```
The `hybrid_search_alpha` parameter enables hybrid search functionality and allows you to control the balance between semantic and keyword relevance, with values varying between 0 (more importance on keyword relevance) and 1 (more importance on semantic relevance) and the default value of 0.5.
Alternatively, you can use the `hybrid_search` parameter and set it to `true` in order to enable hybrid search with default `hybrid_search_alpha = 0.5`.
Note that hybrid search works only on knowledge bases that use PGVector as a [storage](/mindsdb_sql/knowledge_bases/create#storage). Ensure to [install the PGVector handler to connect it to MindsDB](/integrations/vector-db-integrations/pgvector#usage).
Knowledge bases provide optional [reranking features](/mindsdb_sql/knowledge_bases/create#reranking-model) that users can decide to use in specific use cases. When the reranker is available, it is used to rerank results from both the full-text index search and the embedding-based semantic search. It estimates the relevance of each document and orders them from most to least relevant.
However, users can disable the reranker using `reranking = false`, which might be desirable for performance reasons or specific use cases. When reranking is disabled, the system still needs to combine the two search result sets. In this case, the final ranking of each document is computed as a weighted average of the embedding similarity score and the [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) keyword relevance score from the full-text search.
**Relevance-Based Document Selection for Reranking**
When retrieving documents from the full-text index, there is a practical limit on how many documents can be passed to the reranker, since reranking is typically computationally expensive. To ensure that only the most promising candidates are selected for reranking, we apply relevance heuristics during the keyword search stage.
One widely used heuristic is BM25, a ranking function that scores documents based on their keyword relevance to the user query. BM25 considers both the frequency of a keyword within a document and how common that keyword is across the entire corpus.
By scoring documents using BM25, the system can prioritize more relevant matches and limit reranker input to a smaller, high-quality subset of documents. This helps achieve a balance between performance and retrieval accuracy in hybrid search.
This is the so-called alpha reranking.
## Implementation of Hybrid Search
Hybrid search in knowledge bases combines semantic similarity and keyword-based search methods into a unified search mechanism.
The diagram below illustrates the hybrid search process.
When a user submits a query, it is simultaneously routed through two parallel search mechanisms: an embedding-based semantic search (left) and a full-text keyword search (right).
Below is a breakdown of how hybrid search works under the hood:
* **Semantic Search** (path on the left)
It takes place in parallel with the keyword search. Semantic search starts by embedding the search query and searching against the content of the knowledge base. This results in a set of relevant documents found.
* **Keyword Search** (path on the right)
It takes place in parallel with the semantic search. The system performs a keyword-based search, using one or more keywords provided in the search query, over the content of the knowledge base. To ensure performance, especially at scale, when dealing with millions of documents, we rely on a full-text indexing system.
This index is typically built as an inverted index, mapping keywords to the documents in which they appear. It allows for efficient lookups and rapid retrieval of all entries that contain the given terms.
Storage of Full-Text Index
Just as embeddings are stored to support semantic similarity search, a full-text index must also be stored to enable efficient keyword-based retrieval. This index serves as the foundation for fast and scalable full-text search and is tightly integrated with the knowledge base.
Each knowledge base maintains its own dedicated full-text index, built and updated as documents are ingested or modified. Maintaining this index alongside the stored embeddings ensures that both semantic and keyword search capabilities are always available and performant, forming the backbone of hybrid search.
This step ensures that exact matches, like specific acronyms, ticket numbers, or product identifiers, can be found quickly, even if the semantic model wouldn’t have surfaced them.
* **Combining Results**
At this step, results from both searches are merged. Semantic search returned documents similar in meaning to the user’s query using embeddings, while keyword search returned documents containing the keywords extracted from the user’s query. This complete result set is passed to the reranker.
* **Reranking**
The results are reranked, considering relevance scores from both search types, and ordered accordingly.
There are two mechanisms for reranking the results:
* Using the reranking model of the knowledge base
If the knowledge base was created with the reranking model provided, the hybrid search uses it to rerank the result set.
```sql theme={null}
SELECT * from my_kb
WHERE
content = ”ACME-213”
AND hybrid_search = true; -- here, hybrid_search_alpha = 0.5
```
In this query, the hybrid search uses the reranking features enabled with the knowledge base.
* Using the alpha reranking that can be further customized for hybrid search
Users can opt for using the alpha reranking that can be customized specifically for hybrid search. By setting the `hybrid_search_alpha` parameter to any value between 0 and 1, users can give importance to results from the keyword search (if the value is closer to 0) or the semantic search (if the value is closer to 1).
```sql theme={null}
SELECT * from my_kb
WHERE
content = ”ACME-213”
AND hybrid_search_alpha = 0.4
AND reranking = false;
```
This query uses hybrid search with emphasis on results from the keyword search.
Relevance-Based Document Selection for Reranking
When retrieving documents from the full-text index, there is a practical limit on how many documents can be passed to the reranker, since reranking is typically computationally expensive. To ensure that only the most promising candidates are selected for reranking, we apply relevance heuristics during the keyword search stage.
One widely used heuristic is BM25, a ranking function that scores documents based on their keyword relevance to the user query. BM25 considers both the frequency of a keyword within a document and how common that keyword is across the entire corpus.
By scoring documents using BM25, the system can prioritize more relevant matches and limit reranker input to a smaller, high-quality subset of documents. This helps achieve a balance between performance and retrieval accuracy in hybrid search.
Overall, the reranker ensures that highly relevant keyword matches appear alongside semantically similar results, offering users a balanced and accurate response.
# How to Insert Data into Knowledge Bases
Source: https://docs.mindsdb.com/mindsdb_sql/knowledge_bases/insert_data
Knowledge Bases (KBs) organize data across data sources, including databases, files, documents, webpages, enabling efficient search capabilities.
Here is what happens to data when it is inserted into the knowledge base.
Upon inserting data into the knowledge base, it is split into chunks, transformed into the embedding representation to enhance the search capabilities, and stored in a vector database.
## `INSERT INTO` Syntax
Here is the syntax for inserting data into a knowledge base:
```sql theme={null}
INSERT INTO my_kb
SELECT order_id, product, notes
FROM sample_data.orders;
```
Upon execution, it inserts data into a knowledge base, using the embedding model to embed it into vectors before inserting into an underlying vector database.
The status of the `INSERT INTO` is logged in the `information_schema.queries` table with the timestamp when it was ran, and can be queried as follows:
```sql theme={null}
SELECT *
FROM information_schema.queries;
```
To speed up data insertion, you can use these performance optimization flags:
**Skip duplicate checking (kb\_no\_upsert)**
```sql theme={null}
INSERT INTO my_kb
SELECT *
FROM table_name
USING kb_no_upsert = true;
```
This skips all duplicate checking and directly inserts data. Use only when the knowledge base is empty (initial data load).
**Skip existing items (kb\_skip\_existing)**
```sql theme={null}
INSERT INTO my_kb
SELECT *
FROM table_name
USING kb_skip_existing = true;
```
This checks for existing items and skips them entirely, including avoiding embedding calculation for existing content. More efficient than upsert when you only want to insert new items.
**Handling duplicate data while inserting into the knowledge base**
Knowledge bases uniquely identify data rows using an ID column, which prevents from inserting duplicate data, as follows.
* **Case 1: Inserting data into the knowledge base without the `id_column` defined.**
When users do not define the `id_column` during the creation of a knowledge base, MindsDB generates the ID for each row using a hash of the content columns, as [explained here](/mindsdb_sql/knowledge_bases/create#id-column).
**Example:**
If two rows have exactly the same content in the content columns, their hash (and thus their generated ID) will be the same.
Note that duplicate rows are skipped and not inserted.
Since both rows in the below table have the same content, only one row will be inserted.
| name | age |
| ----- | --- |
| Alice | 25 |
| Alice | 25 |
* **Case 2: Inserting data into the knowledge base with the `id_column` defined.**
When users define the `id_column` during the creation of a knowledge base, then the knowledge base uses that column's values as the row ID.
**Example:**
If the `id_column` has duplicate values, the knowledge base skips the duplicate row(s) during the insert.
The second row in the below table has the same `id` as the first row, so only one of these rows is inserted.
| id | name | age |
| -- | ----- | --- |
| 1 | Alice | 25 |
| 1 | Bob | 30 |
**Best practice**
Ensure the `id_column` uniquely identifies each row to avoid unintentional data loss due to duplicate ID skipping.
**Performance optimization for duplicate handling**
For better performance when handling duplicates, you can use:
* `kb_skip_existing = true`: Checks for existing IDs and skips them completely (no embedding calculation, more efficient)
* `kb_no_upsert = true`: Skips duplicate checking entirely (fastest, use only for initial load into empty KB)
### Update Existing Data
In order to update existing data in the knowledge base, insert data with the column ID that you want to update and the updated content.
Here is an example of usage. A knowledge base stores the following data.
```sql theme={null}
+----------+-------------------+------------------------+
| order_id | product | notes |
+----------+-------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Laptop Stand | Prefer aluminum finish |
+----------+-------------------+------------------------+
```
A user updated `Laptop Stand` to `Aluminum Laptop Stand`.
```sql theme={null}
+----------+-----------------------+------------------------+
| order_id | product | notes |
+----------+-----------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Aluminum Laptop Stand | Prefer aluminum finish |
+----------+-----------------------+------------------------+
```
Go to the *Complete Example* section below to find out how to access this sample data.
Here is how to propagate this change into the knowledge base.
```sql theme={null}
INSERT INTO my_kb
SELECT order_id, product, notes
FROM sample_data.orders
WHERE order_id = 'Q7P';
```
The knowledge base matches the ID value to the existing one and updates the data if required.
### Insert Data using Partitions
In order to optimize the performance of data insertion into the knowledge base, users can set up partitions and threads to insert batches of data in parallel. This also enables tracking the progress of data insertion process including cancelling and resuming it if required.
Here is an example.
```sql theme={null}
INSERT INTO my_kb
SELECT order_id, product, notes
FROM sample_data.orders
USING
batch_size = 200,
track_column = order_id,
threads = 10,
error = 'skip';
```
The parameters include the following:
* `batch_size` defines the number of rows fetched per iteration to optimize data extraction from the source. It defaults to 1000.
* `threads` defines threads for running partitions. Note that if the [ML task queue](/setup/custom-config#overview-of-config-parameters) is enabled, threads are used automatically. The available values for `threads` are:
* a number of threads to be used, for example, `threads = 10`,
* a boolean value that defines whether to enable threads, setting `threads = true`, or disable threads, setting `threads = false`.
* `track_column` defines the column used for sorting data before partitioning.
* `error` defines the error processing options. The available values include `raise`, used to raise errors as they come, or `skip`, used to subside errors. It defaults to `raise` if not provided.
After executing the `INSERT INTO` statement with the above parameters, users can view the data insertion progress by querying the `information_schema.queries` table.
```sql theme={null}
SELECT * FROM information_schema.queries;
```
Users can cancel the data insertion process using the process ID from the `information_schema.queries` table.
```sql theme={null}
SELECT query_cancel(1);
```
Note that canceling the query will not remove the already inserted data.
Users can resume the data insertion process using the process ID from the `information_schema.queries` table.
```sql theme={null}
SELECT query_resume(1);
```
### Chunking Data
Upon inserting data into the knowledge base, the data chunking is performed in order to optimize the storage and search of data.
Each chunk is identified by its chunk ID of the following format: `:of:to`.
#### Text
Users can opt for defining the chunking parameters when creating a knowledge base.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
...
preprocessing = {
"text_chunking_config" : {
"chunk_size": 2000,
"chunk_overlap": 200
}
},
...;
```
The `chunk_size` parameter defines the size of the chunk as the number of characters. And the `chunk_overlap` parameter defines the number of characters that should overlap between subsequent chunks.
#### JSON
Users can opt for defining the chunking parameters specifically for JSON data.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
...
preprocessing = {
"type": "json_chunking",
"json_chunking_config" : {
...
}
},
...;
```
When the `type` of chunking is set to `json_chunking`, users can configure it by setting the following parameter values in the `json_chunking_config` parameter:
* `flatten_nested`\
It is of the `bool` data type with the default value of `True`.\
It defines whether to flatten nested JSON structures.
* `include_metadata`\
It is of the `bool` data type with the default value of `True`.\
It defines whether to include original metadata in chunks.
* `chunk_by_object`\
It is of the `bool` data type with the default value of `True`.\
It defines whether to chunk by top-level objects (`True`) or create a single document (`False`).
* `exclude_fields`\
It is of the `List[str]` data type with the default value of an empty list.\
It defines the list of fields to exclude from chunking.
* `include_fields`\
It is of the `List[str]` data type with the default value of an empty list.\
It defines the list of fields to include in chunking (if empty, all fields except excluded ones are included).
* `metadata_fields`\
It is of the `List[str]` data type with the default value of an empty list.\
It defines the list of fields to extract into metadata for filtering (can include nested fields using dot notation). If empty, all primitive fields will be extracted (top-level fields if available, otherwise all primitive fields in the flattened structure).
* `extract_all_primitives`\
It is of the `bool` data type with the default value of `False`.\
It defines whether to extract all primitive values (strings, numbers, booleans) into metadata.
* `nested_delimiter`\
It is of the `str` data type with the default value of `"."`.\
It defines the delimiter for flattened nested field names.
* `content_column`\
It is of the `str` data type with the default value of `"content"`.\
It defines the name of the content column for chunk ID generation.
### Underlying Vector Store
Each knowledge base has its underlying vector store that stores data inserted into the knowledge base in the form of embeddings.
Users can query the underlying vector store as follows.
* KB with the default ChromaDB vector store:
```sql theme={null}
SELECT id, content, metadata, embeddings
FROM _chromadb.storage_table;
```
* KB with user-defined vector store (either [PGVector](/integrations/vector-db-integrations/pgvector) or [ChromaDB](/integrations/vector-db-integrations/chromadb)):
```sql theme={null}
SELECT id, content, metadata, embeddings
FROM .;
```
### Example
Here a sample knowledge base created in the previous **Example** section is inserted into.
```sql theme={null}
INSERT INTO my_kb
SELECT order_id, product, notes
FROM sample_data.orders;
```
When inserting into a knowledge base where the `content_columns` parameter was not specified, the column storing content must be aliased `AS content` as below.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
...
id_column = 'order_id',
...
```
```sql theme={null}
INSERT INTO my_kb
SELECT order_id, notes AS content
FROM sample_data.orders;
```
## `DELETE FROM` Syntax
Here is the syntax for deleting from a knowledge base:
```sql theme={null}
DELETE FROM my_kb
WHERE id = 'A1B';
```
## `CREATE INDEX ON KNOWLEDGE_BASE` Syntax
Users can create index on the knowledge base to speed up the search operations.
```sql theme={null}
CREATE INDEX ON KNOWLEDGE_BASE my_kb;
```
Note that this feature works only when PGVector is used as the [storage of the knowledge base](/mindsdb_sql/knowledge_bases/create#storage), as ChromaDB provides the index features by default.
Upon executing this statement, an index is created on the knowledge base's underlying vector store. This is essentially a database index created on the vector database.
Note that having an index on the knowledge base may reduce the speed of the insert operations. Therefore, it is recommended to insert bulk data into the knowledge base before creating an index. The index improves performance of querying the knowledge base, while it may slow down subsequent data inserts.
# How Knowledge Bases Work
Source: https://docs.mindsdb.com/mindsdb_sql/knowledge_bases/overview
A knowledge base is an advanced AI-table that organizes information based on semantic meaning rather than simple keyword matching. It integrates embedding models, reranking models, and vector stores to enable context-aware data retrieval.
By performing semantic reasoning across multiple data points, a knowledge base delivers deeper insights and more accurate responses, making it a powerful tool for intelligent data access.
Before diving into the syntax, here is a quick walkthrough showing how knowledge bases work in MindsDB.
We start by creating a knowledge base and inserting data. Next we can run semantic search queries with metadata filtering.
Use the `CREATE KNOWLEDGE_BASE` command to create a knowledge base, specifying all its components.
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
embedding_model = {
"provider": "openai",
"model_name" : "text-embedding-3-large",
"api_key": "sk-abc123"
},
reranking_model = {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-abc123"
},
metadata_columns = ['product'],
content_columns = ['notes'],
id_column = 'order_id';
```
In this example, we use a simple dataset containing customer notes for product orders which will be inserted into the knowledge base.
```sql theme={null}
+----------+-----------------------+------------------------+
| order_id | product | notes |
+----------+-----------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Aluminum Laptop Stand | Prefer aluminum finish |
+----------+-----------------------+------------------------+
```
Use the `INSERT INTO` command to ingest data into the knowledge base.
```sql theme={null}
INSERT INTO my_kb
SELECT order_id, product, notes
FROM sample_data.orders;
```
Query the knowledge base using semantic search.
```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'color preference'
```
This query returns:
```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
| 3XZ | 3XZ_notes:1of1:0to19 | Gift wrap requested | {"chunk_index":0,"content_column":"notes","end_char":19,"original_doc_id":"3XZ_notes","original_row_id":"3XZ","product":"Bluetooth Speaker","source":"TextChunkingPreprocessor","start_char":0} | Bluetooth Speaker | 0.8010851611432231 | 0.2500003885558766 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```
Query the knowledge base using semantic search and define the `relevance` parameter to receive only the best matching data for your use case.
```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'color'
AND relevance >= 0.2502;
```
This query returns:
```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```
Add metadata filtering to focus your search.
```sql theme={null}
SELECT *
FROM my_kb
WHERE product = 'Wireless Mouse'
AND content = 'color'
AND relevance >= 0.2502;
```
This query returns:
```sql theme={null}
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.504396172197583 |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
```
The following sections explain the syntax and other features of knowledge bases.
# How to Query Knowledge Bases
Source: https://docs.mindsdb.com/mindsdb_sql/knowledge_bases/query
Knowledge Bases support two primary querying approaches: semantic search and metadata filtering, each of which offers different filtering capabilities, including filtering by the relevance score to ensure only data most relevant to the query is returned.
* **Semantic Search**
Semantic search enables users to query Knowledge Bases using natural language. When searching semantically, you reference the content column in your SQL statement. MindsDB will interpret the input as a semantic query and use vector-based similarity to find relevant results.
```sql theme={null}
SELECT * FROM my_kb
WHERE content = 'what document types store reviews?';
```
Only specific operators are allowed when filtering semantically using the content column.
* Standard vector search: `content = ‘xxx’`, `content LIKE ‘xxx’`
* Exclusions from search: `id != xxx`, `id <> xxx`, `content NOT LIKE ‘zzz’`
* Nested queries: `id NOT IN (SELECT DISTINCT id FROM my_kb WHERE content = ‘xxx’)`
* Multiple queries: `content IN (‘xxx’, ‘yyy’)` which is equivalent to `content = ‘xxx’ OR content = ‘yyy’`, `content NOT IN (‘zzz’, ‘aaa’)`
* Logical operators: `content = ‘xxx’ OR content = ‘yyy’` which is a union of results for both conditions, `content = ‘xxx’ AND content = ‘yyy’` which is an intersection of results for both conditions
* **Metadata Filtering**
It allows users to query Knowledge Bases based on the available metadata fields. These fields can be used in the `WHERE` clause of a SQL statement.
```sql theme={null}
SELECT * FROM my_kb
WHERE document_type = ‘cover letter’
AND document_author = 'bot';
```
You can apply a variety of filtering conditions to metadata columns, such as equality checks, range filters, or pattern matches.
* Equality checks: `=`, `<>`, `!=`
* Range filters: `>`, `<`, `>=`, `<=`, `BETWEEN ... AND ...`
* Pattern matching: `LIKE`, `NOT LIKE`, `IN`, `NOT IN`
* Logical operators: `AND`, `OR`, `NOT`
* **Relevance Filtering**
Every semantic search result is assigned a relevance score, which indicates how closely a given entry matches your query. You can filter results by this score to ensure only the most relevant entries are returned.
Here is how to fine-tune the filtering of data.
* Start by querying the knowledge base without a WHERE clause on the relevance column. This will show you a range of relevance scores returned by your query.
* Determine a cutoff relevance value that fits your use case. For example, `relevance > 0.75`.
* Re-run your query with the condition on `relevance` to restrict results to those above your chosen threshold. The results set contains only data with relevance greater than 0.75.
```sql theme={null}
SELECT * FROM my_kb
WHERE content = 'what document types store reviews?’
AND relevance > 0.75;
```
See more [examples here](/mindsdb_sql/knowledge_bases/query#examples).
## `SELECT FROM KB` Syntax
Knowledge bases provide an abstraction that enables users to see the stored data.
Note that here a sample knowledge base created and inserted into in the previous **Example** sections is searched.
```sql theme={null}
SELECT *
FROM my_kb;
```
Here is the sample output:
```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
| 3XZ | 3XZ_notes:1of1:0to19 | Gift wrap requested | {"chunk_index":0,"content_column":"notes","end_char":19,"original_doc_id":"3XZ_notes","original_row_id":"3XZ","product":"Bluetooth Speaker","source":"TextChunkingPreprocessor","start_char":0} | Bluetooth Speaker | 0.8010851611432231 | 0.2500003885558766 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```
The following columns are stored in the knowledge base.
* `id`
It stores values from the column defined in the `id_column` parameter when creating the knowledge base. These are the source data IDs.
* `chunk_id`
Knowledge bases chunk the inserted data in order to fit the defined chunk size. If the chunking is performed, the following chunk ID format is used: `:of:to`.
* `chunk_content`
It stores values from the column(s) defined in the `content_columns` parameter when creating the knowledge base.
* `metadata`
It stores the general metadata and the metadata defined in the `metadata_columns` parameter when creating the knowledge base.
* `distance`
It stores the calculated distance between the chunk's content and the search phrase.
* `relevance`
It stores the calculated relevance of the chunk as compared to the search phrase. Its values are between 0 and 1.
Note that the calculation method of `relevance` differs as follows:
* When the ranking model is provided, the default `relevance` is equal or greater than 0, unless defined otherwise in the `WHERE` clause.
* When the reranking model is not provided and the `relevance` is not defined in the query, then no relevance filtering is applied and the output includes all rows matched based on the similarity and metadata search.
* When the reranking model is not provided but the `relevance` is defined in the query, then the relevance is calculated based on the `distance` column (`1/(1+ distance)`) and the `relevance` value is compared with this relevance value to filter the output.
### Semantic Search
Users can query a knowledge base using semantic search by providing the search phrase (called `content`) to be searched for.
```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'color'
```
Alternatively, users can filter by the `chunk_content` column of the knowledge base.
```sql theme={null}
SELECT *
FROM my_kb
WHERE chunk_content LIKE '%color%'
```
Here is the output:
```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
| 3XZ | 3XZ_notes:1of1:0to19 | Gift wrap requested | {"chunk_index":0,"content_column":"notes","end_char":19,"original_doc_id":"3XZ_notes","original_row_id":"3XZ","product":"Bluetooth Speaker","source":"TextChunkingPreprocessor","start_char":0} | Bluetooth Speaker | 0.8010851611432231 | 0.2500003885558766 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```
When querying a knowledge base, the default values include the following:
* `relevance`
If not provided, its default value is equal to or greater than 0, ensuring there is no filtering of rows based on their relevance.
* `LIMIT`
If not provided, its default value is 10, returning a maximum of 10 rows.
Note that when specifying both `relevance` and `LIMIT` as follows:
```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'color'
AND relevance >= 0.5
LIMIT 20;
```
The query extracts 20 rows (as defined in the `LIMIT` clause) that match the defined `content`. Next, these set of rows is filtered out to match the defined `relevance`.
Users can limit the `relevance` in order to get only the most relevant results.
```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'color'
AND relevance >= 0.5;
```
Here is the output:
```sql theme={null}
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+--------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5103766499957533 |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+--------------------+
```
By providing the `relevance` filter, the output is limited to only data with relevance score of the provided value. The available values of `relevance` are between 0 and 1, and its default value covers all available relevance values ensuring no filtering based on the relevance score.
Users can limit the number of rows returned.
```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'color'
LIMIT 2;
```
Here is the output:
```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```
### Metadata Filtering
Besides semantic search features, knowledge bases enable users to filter the result set by the defined metadata.
```sql theme={null}
SELECT *
FROM my_kb
WHERE product = 'Wireless Mouse';
```
Here is the output:
```sql theme={null}
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+----------+
| id | chunk_id | chunk_content | metadata | product | relevance | distance |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+----------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | [NULL] | [NULL] |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+----------+
```
Note that when searching by metadata alone, the `relevance` column values are not calculated.
Users can do both, filter by metadata and search by content.
```sql theme={null}
SELECT *
FROM my_kb
WHERE product = 'Wireless Mouse'
AND content = 'color'
AND relevance >= 0.5;
```
Here is the output:
```sql theme={null}
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.504396172197583 |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
```
## `JOIN` Syntax
Knowledge bases can be used in the standard SQL JOIN statements.
```sql theme={null}
SELECT t.order_id, t.product, t.notes, kb.chunk_content, kb.relevance
FROM local_postgres.orders AS t
JOIN my_kb AS kb
ON t.order_id = kb.id
WHERE t.order_id = 'A1B'
AND kb.content = 'color'
AND kb.product = 'Wireless Mouse';
```
Here is the output:
```sql theme={null}
+----------+------------------+------------------------+------------------------+--------------------+
| order_id | product | notes | chunk_content | relevance |
+----------+------------------+------------------------+------------------------+--------------------+
| A1B | Wireless Mouse | Request color: black | Request color: black | 0.5106591666649376 |
+----------+------------------+------------------------+------------------------+--------------------+
```
## Examples
We have a knowledge base that stores data about movies.
```sql theme={null}
+----------+-----------------------------------+-------------------------------------------------------------------------+
| id | content | metadata |
+----------+-----------------------------------+-------------------------------------------------------------------------+
| movie_id | "A bank security expert plots..." | {"genre":"Crime","rating":6.3,"expanded_genres":"Comedy, Crime, Drama"} |
+----------+-----------------------------------+-------------------------------------------------------------------------+
```
It uses the `movie_id` column to uniquely identify each entry. The `content` column stores the description of the movie, and the metadata includes `genre`, `rating`, and `expanded_genre` columns.
Let's see the query examples.
* Selecting high-rated action movies with heist themes and no romance.
```sql theme={null}
SELECT * FROM movies_kb
WHERE content LIKE 'heist bank robbery space alien planet'
AND genre != 'Romance'
AND expanded_genres NOT LIKE '%Romance%'
AND rating > 7.0;
```
This query includes a semantic search filtering condition - `content LIKE 'heist bank robbery space alien planet'` - and multiple metadata filtering conditions - `genre != 'Romance' AND expanded_genres NOT LIKE '%Romance%' AND rating > 7.0`.
* Selecting action-comedies with car chase scenes.
```sql theme={null}
SELECT * FROM movies_kb
WHERE content LIKE 'car chase driving speed race'
AND expanded_genres LIKE '%Action%'
AND expanded_genres LIKE '%Comedy%'
AND rating > 6.5;
```
This query includes a semantic search filtering condition - `content LIKE 'car chase driving speed race'` - and multiple metadata filtering conditions - `expanded_genres LIKE '%Action%' AND expanded_genres LIKE '%Comedy%' AND rating > 6.5`.
* Selecting historical dramas without war themes.
```sql theme={null}
SELECT * FROM movies_kb
WHERE content LIKE 'historical period past century era'
AND content NOT LIKE 'war battle soldier military'
AND content NOT LIKE 'fight combat weapon'
AND expanded_genres LIKE '%Drama%'
AND rating > 3.5;
```
This query includes multiple semantic search filtering conditions - `content LIKE 'historical period past century era' AND content NOT LIKE 'war battle soldier military' AND content NOT LIKE 'fight combat weapon'` - and multiple metadata filtering conditions - `expanded_genres LIKE '%Drama%' AND rating > 3.5`.
* Selecting multi-genre movies with different ratings.
```sql theme={null}
SELECT * FROM movies_kb
WHERE (content LIKE 'detective mystery investigation' AND (genre = 'Mystery' OR expanded_genres LIKE '%Thriller%'))
OR (content LIKE 'romance love relationship' AND (genre = 'Romance' OR expanded_genres LIKE '%Romance%'))
AND rating > 7.0;
```
This query includes nested semantic search filtering conditions - `(content LIKE 'detective mystery investigation' AND (genre = 'Mystery' OR expanded_genres LIKE '%Thriller%'))` - and a metadata filtering condition - `rating > 7.0`.
* Selecting adventure movies excluding some genres.
```sql theme={null}
SELECT * FROM movies_kb
WHERE content LIKE 'adventure journey quest treasure'
AND genre NOT IN ('Horror', 'Romance', 'Family')
AND rating > 6.5;
```
This query includes multiple semantic search filtering conditions - `content LIKE 'adventure journey quest treasure'` - and multiple metadata filtering conditions - `genre NOT IN ('Horror', 'Romance', 'Family') AND rating > 6.5`.
* Selecting comedy movies in specific rating range.
```sql theme={null}
SELECT * FROM movies_kb
WHERE content LIKE 'comedy funny humor laugh'
AND rating BETWEEN 7.0 AND 9.0
AND expanded_genres LIKE '%Comedy%';
```
This query includes multiple semantic search filtering conditions - `content LIKE 'comedy funny humor laugh'` - and multiple metadata filtering conditions - `rating BETWEEN 7.0 AND 9.0 AND expanded_genres LIKE '%Comedy%'`.
* Selecting different thriller subgenres.
```sql theme={null}
SELECT * FROM movies_kb
WHERE content LIKE 'detective investigation mystery' AND rating > 7.0
UNION
SELECT * FROM movies_kb
WHERE content LIKE 'heist robbery theft steal' AND rating > 7.0
UNION
SELECT * FROM movies_kb
WHERE content LIKE 'spy secret agent undercover' AND rating > 7.0;
```
This query combines the results of three queries using the `UNION` operator.
# SQL API
Source: https://docs.mindsdb.com/mindsdb_sql/overview
MindsDB enhances standard SQL by providing AI building blocks.
This section introduces custom SQL syntax provided by MindsDB to bring data and AI together.
Follow these steps to get started:
Use [CREATE DATABASE](/mindsdb_sql/sql/create/database) to connect your data source to MindsDB.
Explore all available [data sources here](/integrations/data-overview).
Use [CREATE ML\_ENGINE](/mindsdb_sql/sql/create/ml-engine) to configure an engine of your choice.
Explore all available [AI engines here](/integrations/ai-overview).
Use [CREATE MODEL](/mindsdb_sql/sql/create/model) to create, train, and deploy AI/ML models within MindsDB.
Query for a [single prediction](/mindsdb_sql/sql/get-single-prediction) or [batch predictions](/mindsdb_sql/sql/get-batch-predictions) by joining data with models.
Use [JOB](/mindsdb_sql/sql/create/jobs), [TRIGGER](/mindsdb_sql/sql/create/trigger), or [AGENT](/mindsdb_sql/agents/agent) to automate workflows.
# Alter a View
Source: https://docs.mindsdb.com/mindsdb_sql/sql/api/alter-view
## Description
The `ALTER VIEW` statement updates the query assigned to a view created with the [`CREATE VIEW` command](/mindsdb_sql/sql/create/view).
## Syntax
Here is the syntax:
```sql theme={null}
ALTER VIEW view_name [AS] (
SELECT * FROM integration_name.table_name
);
--or
ALTER VIEW name
FROM integration_name (
SELECT * FROM table_name
);
```
# Delete From a Table
Source: https://docs.mindsdb.com/mindsdb_sql/sql/api/delete
## Description
The `DELETE` statement removes rows that fulfill the `WHERE` clause criteria.
## Syntax
Here is the syntax:
```sql theme={null}
DELETE FROM integration_name.table_name
WHERE column_name = column_value_to_be_removed;
```
This statement removes all rows from the `table_name` table (that belongs to the `integration_name` integration) wherever the `column_name` column value is equal to `column_value_to_be_removed`.
And here is another way to filter the rows using a subquery:
```sql theme={null}
DELETE FROM integration_name.table_name
WHERE column_name IN
(
SELECT column_value_to_be_removed
FROM some_integration.some_table
WHERE some_column = some_value
);
```
This statement removes all rows from the `table_name` table (that belongs to the `integration_name` integration) wherever the `column_name` column value is equal to one of the values returned by the subquery.
# Insert Into a Table
Source: https://docs.mindsdb.com/mindsdb_sql/sql/api/insert
## Description
The `INSERT INTO` statement inserts data into a table. The data comes from a subselect query. It is commonly used to input prediction results into a database table.
## Syntax
Here is the syntax:
```sql theme={null}
INSERT INTO integration_name.table_name
(SELECT ...);
```
Please note that the destination table (`integration_name.table_name`) must
exist and contain all the columns where the data is to be inserted.
And the steps followed by the syntax:
* It executes a subselect query to get the output dataset.
* It uses the `INSERT INTO` statement to insert the output of the
`(SELECT ...)` query into the `integration_name.table_name` table.
On execution, we get:
```sql theme={null}
Query OK, 0 row(s) updated - x.xxxs
```
### Example
We want to save the prediction results into the `int1.tbl1` table.
Here is the schema structure used throughout this example:
```bash theme={null}
int1
└── tbl1
mindsdb
└── predictor_name
int2
└── tbl2
```
Where:
| Name | Description |
| ---------------- | ------------------------------------------------------------------------------------- |
| `int1` | Integration where the table that stores prediction results resides. |
| `tbl1` | Table that stores prediction results. |
| `predictor_name` | Name of the model. |
| `int2` | Integration where the data source table used in the inner `SELECT` statement resides. |
| `tbl2` | Data source table used in the inner `SELECT` statement. |
Let's execute the query.
```sql theme={null}
INSERT INTO int1.tbl1 (
SELECT *
FROM int2.tbl2 AS ta
JOIN mindsdb.predictor_name AS tb
WHERE ta.date > '2015-12-31'
);
```
On execution, we get:
```sql theme={null}
Query OK, 0 row(s) updated - x.xxxs
```
# Join Tables On
Source: https://docs.mindsdb.com/mindsdb_sql/sql/api/join-on
## Description
The `JOIN` statement combines two or more tables based `ON` a specified column(s). It functions as a standard `JOIN` in SQL while offering the added capability of **combining data from multiple data sources**, allowing users to join data from one or more data sources seamlessly.
## Syntax
Here is the syntax:
```sql theme={null}
SELECT t1.column_name, t2.column_name, t3.column_name
FROM datasource1.table1 [AS] t1
JOIN datasource2.table2 [AS] t2
ON t1.column_name = t2.column_name
JOIN datasource3.table3 [AS] t3
ON t1.column_name = t3.column_name;
```
This query joins data from three different datasources - `datasource1`, `datasource2`, and `datasource3` - allowing users to execute federated queries accross multiple data sources.
**Nested `JOINs`**
MindsDB provides you with two categories of `JOINs`. One is [the `JOIN` statement which combines the data table with the model table](/mindsdb_sql/sql/api/join) in order to fetch bulk predictions. Another is the regular `JOIN` used throughout SQL, which requires the `ON` clause.
You can nest these types of `JOINs` as follows:
```sql theme={null}
SELECT * FROM (
SELECT *
FROM project_name.model_table AS m
JOIN datasource_name.data_table AS d;
) AS t1
JOIN (
SELECT *
FROM project_name.model_table AS m
JOIN datasource_name.data_table AS d;
) AS t2
ON t1.column_name = t2.column_name;
```
## Example 1
Let's use the following data to see how the different types of `JOINs` work.
The `pets` table that stores pets:
```sql theme={null}
+------+-------+
|pet_id|name |
+------+-------+
|1 |Moon |
|2 |Ripley |
|3 |Bonkers|
|4 |Star |
|5 |Luna |
|6 |Lake |
+------+-------+
```
And the `owners` table that stores pets' owners:
```sql theme={null}
+--------+-------+------+
|owner_id|name |pet_id|
+--------+-------+------+
|1 |Amy |4 |
|2 |Bob |1 |
|3 |Harry |5 |
|4 |Julia |2 |
|5 |Larry |3 |
|6 |Henry |0 |
+--------+-------+------+
```
### `JOIN` or `INNER JOIN`
The `JOIN` or `INNER JOIN` command joins the rows of the `owners` and `pets` tables wherever there is a match. For example, a pet named Lake does not have an owner, so it'll be left out.
```sql theme={null}
SELECT *
FROM files.owners o
[INNER] JOIN files.pets p
ON o.pet_id = p.pet_id;
```
On execution, we get:
```sql theme={null}
+--------+-------+------+------+-------+
|owner_id|name |pet_id|pet_id|name |
+--------+-------+------+------+-------+
|1 |Amy |4 |4 |Star |
|2 |Bob |1 |1 |Moon |
|3 |Harry |5 |5 |Luna |
|4 |Julia |2 |2 |Ripley |
|5 |Larry |3 |3 |Bonkers|
+--------+-------+------+------+-------+
```
As in standard SQL, you can use the `WHERE` clause to filter the output data.
```sql theme={null}
SELECT *
FROM files.owners o
[INNER] JOIN files.pets p
ON o.pet_id = p.pet_id
WHERE o.name = 'Amy'
OR o.name = 'Bob';
```
On execution, we get:
```sql theme={null}
+--------+-------+------+------+-------+
|owner_id|name |pet_id|pet_id|name |
+--------+-------+------+------+-------+
|1 |Amy |4 |4 |Star |
|2 |Bob |1 |1 |Moon |
+--------+-------+------+------+-------+
```
### `LEFT JOIN`
The `LEFT JOIN` command joins the rows of two tables such that all rows from the left table, even the ones with no match, show up. Here, the left table is the `owners` table.
```sql theme={null}
SELECT *
FROM files.owners o
LEFT JOIN files.pets p
ON o.pet_id = p.pet_id;
```
On execution, we get:
```sql theme={null}
+--------+-------+------+------+-------+
|owner_id|name |pet_id|pet_id|name |
+--------+-------+------+------+-------+
|1 |Amy |4 |4 |Star |
|2 |Bob |1 |1 |Moon |
|3 |Harry |5 |5 |Luna |
|4 |Julia |2 |2 |Ripley |
|5 |Larry |3 |3 |Bonkers|
|6 |Henry |0 |[NULL]|[NULL] |
+--------+-------+------+------+-------+
```
### `RIGHT JOIN`
The `RIGHT JOIN` command joins the rows of two tables such that all rows from the right table, even the ones with no match, show up. Here, the right table is the `pets` table.
```sql theme={null}
SELECT *
FROM files.owners o
RIGHT JOIN files.pets p
ON o.pet_id = p.pet_id;
```
On execution, we get:
```sql theme={null}
+--------+-------+------+------+-------+
|owner_id|name |pet_id|pet_id|name |
+--------+-------+------+------+-------+
|2 |Bob |1 |1 |Moon |
|4 |Julia |2 |2 |Ripley |
|5 |Larry |3 |3 |Bonkers|
|1 |Amy |4 |4 |Star |
|3 |Harry |5 |5 |Luna |
|[NULL] |[NULL] |[NULL]|6 |Lake |
+--------+-------+------+------+-------+
```
### `FULL JOIN` or `FULL OUTER JOIN`
The `FULL [OUTER] JOIN` command joins the rows of two tables such that all rows from both tables, even the ones with no match, show up.
```sql theme={null}
SELECT *
FROM files.owners o
FULL [OUTER] JOIN files.pets p
ON o.pet_id = p.pet_id;
```
On execution, we get:
```sql theme={null}
+--------+------+------+------+-------+---------+
|owner_id|name |pet_id|pet_id|name |animal_id|
+--------+------+------+------+-------+---------+
|1 |Amy |4 |4 |Star |2 |
|2 |Bob |1 |1 |Moon |1 |
|3 |Harry |5 |5 |Luna |2 |
|4 |Julia |2 |2 |Ripley |1 |
|5 |Larry |3 |3 |Bonkers|3 |
|6 |Henry |0 |[NULL]|[NULL] |[NULL] |
|[NULL] |[NULL]|[NULL]|6 |Lake |4 |
+--------+------+------+------+-------+---------+
```
## Example 2
More than two tables can be joined subsequently.
Let's use another table called `animals`:
```sql theme={null}
+---------+-------+
|animal_id|name |
+---------+-------+
|1 |Dog |
|2 |Cat |
|3 |Hamster|
|4 |Fish |
+---------+-------+
```
Now we can join all three tables.
```sql theme={null}
SELECT *
FROM files.owners o
RIGHT JOIN files.pets p ON o.pet_id = p.pet_id
JOIN files.animals a ON p.animal_id = a.animal_id;
```
On execution, we get:
```sql theme={null}
+--------+-------+------+------+-------+---------+---------+-------+
|owner_id|name |pet_id|pet_id|name |animal_id|animal_id|name |
+--------+-------+------+------+-------+---------+---------+-------+
|2 |Bob |1 |1 |Moon |1 |1 |Dog |
|4 |Julia |2 |2 |Ripley |1 |1 |Dog |
|5 |Larry |3 |3 |Bonkers|3 |3 |Hamster|
|1 |Amy |4 |4 |Star |2 |2 |Cat |
|3 |Harry |5 |5 |Luna |2 |2 |Cat |
|[NULL] |[NULL] |[NULL]|6 |Lake |4 |4 |Fish |
+--------+-------+------+------+-------+---------+---------+-------+
```
# Query a Table
Source: https://docs.mindsdb.com/mindsdb_sql/sql/api/select
## Description
The `SELECT` statement fetches data from a table and predictions from a model.
Here we go over example of selecting data from tables of connected data sources. To learn how to select predictions from a model, visit [this page](/sql/api/select-predictions).
## Syntax
## Simple SELECT FROM an integration
In this example, query contains only tables from one integration. This query will be executed on this integration database (where integration name will be cut from the table name).
```sql theme={null}
SELECT location, max(sqft)
FROM example_db.demo_data.home_rentals
GROUP BY location
LIMIT 5;
```
## Raw SELECT FROM an integration
It is also possible to send [native queries](/sql/native-queries) to integration that use syntax native to a given integration. It is useful when a query can not be parsed as SQL.
```sql theme={null}
SELECT ... FROM integration_name ( native query goes here );
```
Here is an example of selecting from a Mongo integration using Mongo-QL syntax:
```sql theme={null}
SELECT * FROM mongo (
db.house_sales2.find().limit(1)
);
```
## Complex queries
1. Subselect on data from integration.
It can be useful in cases when integration engine doesn't support some functions, for example, grouping, as shown below. In this case, all data from raw select are passed to MindsDB and then subselect performs operations on them inside MindsDB.
```sql theme={null}
SELECT type, max(bedrooms), last(MA)
FROM mongo (
db.house_sales2.find().limit(300)
) GROUP BY 1
```
2. Unions
It is possible to use `UNION` and `UNION ALL` operators. It this case, every subselect from union will be fetched and merged to one result-set on MindsDB side.
```sql theme={null}
SELECT data.time as date, data.target
FROM datasource.table_name as data
UNION ALL
SELECT model.time as date, model.target as target
FROM mindsdb.model as model
JOIN datasource.table_name as t
WHERE t.time > LATEST AND t.group = 'value';
```
# Query a File
Source: https://docs.mindsdb.com/mindsdb_sql/sql/api/select-files
## Description
The `SELECT * FROM files.file_name` statement is used to select data from a file.
First, you upload a file to the MindsDB Editor by following
[this guide](/sql/create/file/). And then, you can
[`CREATE MODEL`](/sql/create/model) from the uploaded file.
## Syntax
Here is the syntax:
```sql theme={null}
SELECT *
FROM files.file_name;
```
On execution, we get:
```sql theme={null}
+--------+--------+--------+--------+
| column | column | column | column |
+--------+--------+--------+--------+
| value | value | value | value |
+--------+--------+--------+--------+
```
Where:
| Name | Description |
| ----------- | --------------------------------------------------------------------------------------------- |
| `file_name` | Name of the file uploaded to the MindsDB Editor by following [this guide](/sql/create/file/). |
| `column` | Name of the column from the file. |
## Example
Once you uploaded your file by following [this guide](/sql/create/file/), you
can query it like a table.
```sql theme={null}
SELECT *
FROM files.home_rentals
LIMIT 10;
```
On execution, we get:
```sql theme={null}
+-----------------+---------------------+-------+----------+----------------+---------------+--------------+--------------+
| number_of_rooms | number_of_bathrooms | sqft | location | days_on_market | initial_price | neighborhood | rental_price |
+-----------------+---------------------+-------+----------+----------------+---------------+--------------+--------------+
| 0 | 1 | 484,8 | great | 10 | 2271 | south_side | 2271 |
| 1 | 1 | 674 | good | 1 | 2167 | downtown | 2167 |
| 1 | 1 | 554 | poor | 19 | 1883 | westbrae | 1883 |
| 0 | 1 | 529 | great | 3 | 2431 | south_side | 2431 |
| 3 | 2 | 1219 | great | 3 | 5510 | south_side | 5510 |
| 1 | 1 | 398 | great | 11 | 2272 | south_side | 2272 |
| 3 | 2 | 1190 | poor | 58 | 4463 | westbrae | 4123.812 |
| 1 | 1 | 730 | good | 0 | 2224 | downtown | 2224 |
| 0 | 1 | 298 | great | 9 | 2104 | south_side | 2104 |
| 2 | 1 | 878 | great | 8 | 3861 | south_side | 3861 |
+-----------------+---------------------+-------+----------+----------------+---------------+--------------+--------------+
```
Now let's create a predictor using the uploaded file. You can learn more about
the [`CREATE MODEL` command here](/sql/create/model).
```sql theme={null}
CREATE MODEL mindsdb.home_rentals_model
FROM files
(SELECT * from home_rentals)
PREDICT rental_price;
```
On execution, we get:
```sql theme={null}
Query OK, 0 rows affected (x.xxx sec)
```
# Query a View
Source: https://docs.mindsdb.com/mindsdb_sql/sql/api/select-view
## Description
The `SELECT` statement fetches data from a view that resides inside a project.
## Syntax
Here is the syntax:
```sql theme={null}
SELECT *
FROM project_name.view_name;
```
# Update a Table
Source: https://docs.mindsdb.com/mindsdb_sql/sql/api/update
## Description
MindsDB provides two ways of using the `UPDATE` statement:
1. The regular `UPDATE` statement updates specific column values in an existing table.
2. The `UPDATE FROM SELECT` statement updates data in an existing table from a subselect query. It can be used as an alternative to `CREATE TABLE` or `INSERT INTO` to store predictions.
## Syntax
Here is an example of the regular `UPDATE` statement:
```sql theme={null}
UPDATE integration_name.table_name
SET column_name = new_value
WHERE column_name = old_value
```
Please replace the placeholders as follows:
* `integration_name` is the name of the connected data source.
* `table_name` is the table name within that data source.
* `column_name` is the column name within that table.
And here is an example of the `UPDATE FROM SELECT` statement that updates a table with predictions made within MindsDB:
```sql theme={null}
UPDATE
integration_to_be_updated.table_to_be_updated
SET
column_to_be_updated = prediction_data.predicted_value_column,
FROM
(
SELECT p.predicted_value_column, p.column1, p.column2
FROM integration_name.table_name as t
JOIN model_name as p
) AS prediction_data
WHERE
column1 = prediction_data.column1
AND column2 = prediction_data.column2
```
Below is an alternative for the `UPDATE FROM SELECT` statement that updates a table with predictions. This syntax is easier to write.
```sql theme={null}
UPDATE
integration_to_be_updated.table_to_be_updated
ON
column1, column2
FROM
(
SELECT p.predicted_value_column as column_to_be_updated, p.column1, p.column2
FROM integration_name.table_name as t
JOIN model_name as p
)
```
The steps followed by the syntax:
* It executes query from the `FROM` clause to get the output data. In our example, we query for predictions, but it could be a simple select from another table. Please note that it is aliased as `prediction_data`.
* It updates all rows from the `table_to_be_updated` table (that belongs to the `integration_to_be_updated` integration) that match the `WHERE` clause criteria. The rows are updated with values as defined in the `SET` clause.
It is recommended to use the primary key column(s) in the WHERE clause (here, `column1` and `column2`), as the primary key column(s) uniquely identify each row. Otherwise, the `UPDATE` statement may lead to unexpected results by altering rows that you didn't want to affect.
# Use a Data Source
Source: https://docs.mindsdb.com/mindsdb_sql/sql/api/use
## Description
The `USE integration_name` statement provides an option to use the connected
datasources and `SELECT` from the database tables. Even if you are
connecting to MindsDB as MySQL database, you will still be able to `SELECT` from your database.
## Syntax
To connect to your database `USE` the created datasource:
```sql theme={null}
USE integration_name;
```
Then, simply `SELECT` from the tables:
```sql theme={null}
SELECT * FROM table_name;
```
# Connect a Data Source
Source: https://docs.mindsdb.com/mindsdb_sql/sql/create/database
## Description
MindsDB lets you connect to your favorite databases, data warehouses, data lakes, etc., via the `CREATE DATABASE` command.
The MindsDB SQL API supports creating connections to integrations by passing the
connection parameters specific per integration. You can find more in the
[Supported Integrations](#supported-integrations) chapter.
MindsDB doesn't store or copy your data. Instead, it fetches data directly from your connected sources each time you make a query, ensuring that any changes to the data are instantly reflected. This means your data remains in its original location, and MindsDB always works with the most up-to-date information.
## Syntax
Let's review the syntax for the `CREATE DATABASE` command.
```sql theme={null}
CREATE DATABASE [IF NOT EXISTS] datasource_name
[WITH] [ENGINE [=] engine_name] [,]
[PARAMETERS [=] {
"key": "value",
...
}];
```
On execution, we get:
```sql theme={null}
Query OK, 0 rows affected (x.xxx sec)
```
Where:
| Name | Description |
| ----------------- | ---------------------------------------------------------------------------------- |
| `datasource_name` | Identifier for the data source to be created. |
| `engine_name` | Engine to be selected depending on the database connection. |
| `PARAMETERS` | `{"key": "value"}` object with the connection parameters specific for each engine. |
**SQL Commands Resulting in the Same Output** Please note that the
keywords/statements enclosed within square brackets are optional. Also, by
default, the engine is `mindsdb` if not provided otherwise. That yields the
following SQL commands to result in the same output.
```sql theme={null}
CREATE DATABASE db;
CREATE DATABASE db ENGINE 'mindsdb';
CREATE DATABASE db ENGINE = 'mindsdb';
CREATE DATABASE db WITH ENGINE 'mindsdb';
CREATE DATABASE db USING ENGINE = 'mindsdb';
```
### What's available on your installation
Here is how you can query for all the available data handlers used to create database connections.
```sql theme={null}
SELECT *
FROM information_schema.handlers
WHERE type = 'data';
```
Or, alternatively:
```sql theme={null}
SHOW HANDLERS
WHERE type = 'data';
```
And here is how you can query for all the connected databases:
```sql theme={null}
SELECT *
FROM information_schema.databases;
```
Or, alternatively:
```sql theme={null}
SHOW DATABASES;
SHOW FULL DATABASES;
```
## Example
### Connecting a Data Source
Here is an example of how to connect to a MySQL database.
```sql theme={null}
CREATE DATABASE mysql_datasource
WITH ENGINE = 'mariadb',
PARAMETERS = {
"user": "root",
"port": 3307,
"password": "password",
"host": "127.0.0.1",
"database": "my_database"
};
```
On execution, we get:
```sql theme={null}
Query OK, 0 rows affected (8.878 sec)
```
### Listing Linked Databases
You can list all the linked databases using the command below.
```sql theme={null}
SHOW DATABASES;
```
On execution, we get:
```sql theme={null}
+--------------------+
| Database |
+--------------------+
| information_schema |
| mindsdb |
| files |
| mysql_datasource |
+--------------------+
```
## Making your Local Database Available to MindsDB
When connecting your local database to MindsDB Cloud, you should expose the
local database server to be publicly accessible. It is easy to accomplish using
[Ngrok Tunnel](https://ngrok.com). The free tier offers all you need to get
started.
The installation instructions are easy to follow. Head over to the
[downloads page](https://ngrok.com/download) and choose your operating system.
Follow the instructions for installation.
Then [create a free account at Ngrok](https://dashboard.ngrok.com/signup) to get
an auth token that you can use to configure your Ngrok instance.
Once installed and configured, run the following command to obtain the host and
port for your localhost at `port-number`.
```bash theme={null}
ngrok tcp port-number
```
Here is an example. Assuming that you run a PostgreSQL database at
`localhost:5432`, use the following command:
```bash theme={null}
ngrok tcp 5432
```
On execution, we get:
```bash theme={null}
Session Status online
Account myaccount (Plan: Free)
Version 2.3.40
Region United States (us)
Web Interface http://127.0.0.1:4040
Forwarding tcp://4.tcp.ngrok.io:15093 -> localhost 5432
```
Now you can access your local database at `4.tcp.ngrok.io:15093` instead of
`localhost:5432`.
So to connect your local database to the MindsDB GUI, use the `Forwarding`
information. The host is `4.tcp.ngrok.io`, and the port is `15093`.
Proceed to create a database connection in the MindsDB GUI by executing the
`CREATE DATABASE` statement with the host and port number obtained from
Ngrok.
```sql theme={null}
CREATE DATABASE psql_datasource
WITH ENGINE = 'postgres',
PARAMETERS = {
"user": "postgres",
"port": 15093,
"password": "password",
"host": "4.tcp.ngrok.io",
"database": "postgres"
};
```
Please note that the Ngrok tunnel loses connection when stopped or canceled. To
reconnect your local database to MindsDB, you should create an Ngrok tunnel
again. In the free tier, Ngrok changes the host and port values each time you
launch the program, so you need to reconnect your database in the MindsDB Cloud
by passing the new host and port values obtained from Ngrok.
Before resetting the database connection, drop the previously connected data
source using the `DROP DATABASE` statement.
```sql theme={null}
DROP DATABASE psql_datasource;
```
After dropping the data source and reconnecting your local database, you can use
the predictors that you trained using the previously connected data source.
However, if you have to `RETRAIN` your predictors, please ensure the database
connection has the same name you used when creating the predictor to avoid
failing to retrain.
## Supported Integrations
The list of databases supported by MindsDB keeps growing. Check out all our [database integrations here](/data-integrations/all-data-integrations).
# Upload a File
Source: https://docs.mindsdb.com/mindsdb_sql/sql/create/file
Follow the steps below to upload a file to MindsDB.
Note that the trailing whitespaces on column names are erased upon uploading a file to MindsDB.
1. Access the MindsDB Editor.
2. Open the `Add` menu and choose `Upload file`.
3. Select a file, provide its name, and click on `Save & Continue`.
4. Now you can query the file.
```sql theme={null}
SELECT * FROM files.file_name;
```
Here is how to list all files:
```sql theme={null}
SHOW TABLES FROM files;
```
This command is the same as the command for listing tables because files uploaded to MindsDB become tables within the MindsDB ecosystem and are stored in the `files` database.
### Configuring URL File Upload for Specific Domains
The File Uploader can be configured to interact only with specific domains by using the [`url_file_upload` key in `config.json` file](/setup/custom-config#url-file-upload).
This feature allows you to restrict the handler to upoad and process files only from the domains you specify, enhancing security and control over web interactions.
To configure this, simply list the allowed domains under the [`url_file_upload` key in `config.json` file](/setup/custom-config#url-file-upload).
## What's Next?
Now, you are ready to create a predictor from a file. Make sure to check out
[this guide](/sql/create/model/)
on how to do that.
# JOBS
Source: https://docs.mindsdb.com/mindsdb_sql/sql/create/jobs
MindsDB enables you to automate any pipeline using JOBS, which grant you the power to schedule any query at any frequency. Additionally, it introduces the keyword [LAST](#last), offering the capability for a JOB to act solely on new data, essentially treating any data source as a stream.
## Description
The `CREATE JOB` statement lets you schedule the execution of queries by providing relevant parameters, such as start date, end date, or repetition frequency.
## Syntax
### `CREATE JOB`
Here is the syntax:
```sql theme={null}
CREATE JOB [IF NOT EXISTS] [project_name.]job_name [AS] (
[; ][; ...]
)
[START ]
[END ]
[EVERY [number] ]
[IF ([; ][; ...])];
```
Where:
| Expression | Description |
| ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `[project_name.]job_name` | Name of the job preceded by an optional project name where the job is to be created. If you do not provide the `project_name` value, then the job is created in the default `mindsdb` project. |
| `[; ][; ...]` | One or more statements separated by `;` to be executed by the job. |
| `[START ]` | Optional. The date when the job starts its periodical or one-time execution. If not set, it is the current system date. |
| `[END ]` | Optional. The date when the job ends its periodical or one-time execution. If it is not set (and the repetition rules are set), then the job repeats forever. |
| `[EVERY [number] ]` | Optional. The repetition rules for the job. If not set, the job runs once, not considering the end date value. If the `number` value is not set, it defaults to 1. |
| `[IF ([; ][; ...])]` | Optional. If the last statement returns one or more rows, only then the job will execute. |
**Available `` formats**
Here are the supported `` formats:
* `'%Y-%m-%d %H:%M:%S'`
* `'%Y-%m-%d'`
Please note that the default time zone is UTC.
**Available `` values**
And the supported `` values:
* `minute` / `minutes` / `min`
* `hour` / `hours`
* `day` / `days`
* `week` / `weeks`
* `month` / `months`
Further, you can query all jobs and their execution history like this:
```sql theme={null}
SHOW JOBS;
SELECT * FROM [project_name.]jobs WHERE name = 'job_name';
SELECT * FROM log.jobs_history WHERE project = 'mindsdb' AND name = 'job_name';
```
### `LAST`
MindsDB provides a custom `LAST` keyword that enables you to fetch data inserted after the last time you queried for it. It is a convenient way to select only the newly added data rows when running a job.
Imagine you have the `fruit_data` table that contains the following:
```sql theme={null}
+-------+-----------+
| id | name |
+-------+-----------+
| 1 | apple |
| 2 | orange |
+-------+-----------+
```
When you run the `SELECT` query with the `LAST` keyword for the first time, it'll give an empty output.
```sql theme={null}
SELECT id, name
FROM fruit_data
WHERE id > LAST;
```
This query returns:
```sql theme={null}
+-------+-----------+
| id | name |
+-------+-----------+
| null | null |
+-------+-----------+
```
If you want to specify a concrete value for `LAST` in the first execution of such a query, use the `COALESCE(LAST, )` function.
```sql theme={null}
SELECT id, name
FROM fruit_data
WHERE id > COALESCE(LAST, 1);
```
It will result in executing the following query in the first run:
```sql theme={null}
SELECT id, name
FROM fruit_data
WHERE id > 1;
```
And the below query at each subsequent run:
```sql theme={null}
SELECT id, name
FROM fruit_data
WHERE id > LAST;
```
Now imagine you inserted a new record into the `fruit_data` table:
```sql theme={null}
+-------+-----------+
| id | name |
+-------+-----------+
| 1 | apple |
| 2 | orange |
| 3 | pear |
+-------+-----------+
```
When you run the `SELECT` query with the `LAST` keyword again, you'll get only the newly added record as output.
```sql theme={null}
SELECT id, name
FROM fruit_data
WHERE id > LAST;
```
This query returns:
```sql theme={null}
+-------+-----------+
| id | name |
+-------+-----------+
| 3 | pear |
+-------+-----------+
```
From this point on, whenever you add new records into the `fruit_data` table, it'll be returned by the next run of the `SELECT` query with the `LAST` keyword. And, if you do not add any new records between the query runs, then the output will be null.
If you want to clear context for the `LAST` keyword in the editor, then run `set context = 0` or `set context = null`.
### Conditional Jobs
Here is how you can create a conditional job that will execute periodically only if there is new data available:
```sql theme={null}
CREATE JOB conditional_job (
FINETUNE MODEL model_name
FROM (
SELECT *
FROM datasource.table_name
WHERE incremental_column > LAST
)
)
EVERY 1 min
IF (
SELECT *
FROM datasource.table_name
WHERE incremental_column > LAST
);
```
The above job will be triggered every minute, but it will execute its task (i.e. finetuning the model) only if there is new data available.
## Examples
### Example 1: Retrain a Model
In this example, we create a job in the current project to retrain the `home_rentals_model` model and insert predictions into the `rentals` table.
```sql theme={null}
CREATE JOB retrain_model_and_save_predictions (
RETRAIN mindsdb.home_rentals_model
USING
join_learn_process = true;
INSERT INTO my_integration.rentals (
SELECT m.rental_price, m.rental_price_explain
FROM mindsdb.home_rentals_model AS m
JOIN example_db.demo_data.home_rentals AS d
)
)
END '2023-04-01 00:00:00'
EVERY 2 days;
```
Please note that the `join_learn_process` parameter in the `USING` clause of the [`RETRAIN`](/sql/api/retrain) statement ensures that the retraining process completes before inserting predictions into a table. In general, this parameter is used to prevent several retrain processes from running simultaneously.
The `retrain_model_and_save_predictions` job starts its execution on the current system date and ends on the 1st of April 2023. The job is executed every 2 days.
### Example 2: Save Predictions
In this example, the job creates a table named as `result_{{START_DATETIME}}` and inserts predictions into it.
```sql theme={null}
CREATE JOB save_predictions (
CREATE TABLE my_integration.`result_{{START_DATETIME}}` (
SELECT m.rental_price, m.rental_price_explain
FROM mindsdb.home_rentals_model AS m
JOIN example_db.demo_data.home_rentals AS d
)
)
EVERY hour;
```
Please note that the uniqueness of the created table name is ensured here by using the `{{START_DATETIME}}` variable that is replaced at runtime by the date and time of the current run.
You can use the following variables for this purpose:
* `PREVIOUS_START_DATETIME` is replaced by date and time of the previous run of this job.
* `START_DATETIME` is replaced by date and time of the current job run.
* `START_DATE` is replaced by date of the current job run.
The `save_predictions` job starts its execution on the current system date and repeats every 2 hours until it is manually disabled.
### Example 3: Drop a Model
In this example, we create a job to drop the `home_rentals_model` model scheduled on the 1st of April 2023.
```sql theme={null}
CREATE JOB drop_model (
DROP MODEL mindsdb.home_rentals_model
)
START '2023-04-01';
```
This job runs once on the 1st of April 2023.
### Example 4: Twitter Chatbot
You can easily create a chatbot to respond to tweets using jobs. But before you get to it, you should connect your Twitter account to MindsDB following the instructions [here](/integrations/app-integrations/twitter).
Follow the [tutorial on how to create a Twitter chatbot](/sql/tutorials/twitter-chatbot) to learn the details.
Let's create a job that runs every hour, checks for new tweets, and responds using the OpenAI model.
```sql theme={null}
CREATE JOB mindsdb.gpt4_twitter_job AS (
-- insert into tweets the output of joining model and new tweets
INSERT INTO my_twitter_v2.tweets (in_reply_to_tweet_id, text)
SELECT
t.id AS in_reply_to_tweet_id,
r.response AS text
FROM my_twitter.tweets t
JOIN mindsdb.snoopstein_model r
WHERE
t.query = '(@snoopstein OR @snoop_stein OR #snoopstein OR #snoop_stein) -is:retweet -from:snoop_stein'
AND t.created_at > LAST
LIMIT 10
)
EVERY hour;
```
The [`SELECT`](/sql/api/select) statement joins the data table with the model table to get responses for newly posted tweets, thanks to the `LAST` keyword. Then, the [`INSERT INTO`](/sql/api/insert) statement writes these responses to the `tweets` table of the `my_twitter` integration.
To learn more about OpenAI integration with MindsDB, visit our docs [here](/nlp/nlp-mindsdb-openai).
## Additional Configuration
Here is the template of the `config.json` file that you can pass as a parameter when starting your local MindsDB instance:
```bash theme={null}
"jobs": {
"disable": true,
"check_interval": 30
}
```
The `disable` parameter defines whether the scheduler is active (`true`) or not (`false`). By default, in the MindsDB Editor, the scheduler is active.
The `check_interval` parameter defines the interval in seconds between consecutive checks of the scheduler table. By default, in the MindsDB Editor, it is 30 seconds.
You can modify the default configuration in your local MindsDB installation by creating a `config.json` file and starting MindsDB with this file as a parameter. You can find detailed instructions [here](/setup/custom-config#starting-mindsdb-with-extended-configuration).
# Create a Project
Source: https://docs.mindsdb.com/mindsdb_sql/sql/create/project
## Description
MindsDB introduces projects that are a natural way to keep artifacts, such as models or views, separate according to what predictive task they solve. You can learn more about MindsDB projects [here](/sql/project).
## Syntax
Here is the syntax for creating a project:
```sql theme={null}
CREATE PROJECT [IF NOT EXISTS] project_name;
```
# Create a Table
Source: https://docs.mindsdb.com/mindsdb_sql/sql/create/table
## Description
The `CREATE TABLE` statement creates a table and optionally fills it with data from provided query. It may be used to materialize prediction results as tables.
## Syntax
You can use the `CREATE TABLE` statement to create an empty table:
```sql theme={null}
CREATE TABLE integration_name.table_name (
column_name data_type,
...
);
```
You can use the `CREATE TABLE` statement to create a table and fill it with data:
```sql theme={null}
CREATE TABLE integration_name.table_name
(SELECT ...);
```
Or the `CREATE OR REPLACE TABLE` statement:
```sql theme={null}
CREATE OR REPLACE TABLE integration_name.table_name
(SELECT ...);
```
Here is how to list tables from a connected data source:
```sql theme={null}
SHOW TABLES FROM data_source_name;
```
Note that the `integration_name` connection must be created with the [`CREATE DATABASE`](/mindsdb_sql/sql/create/database) statement and the user with write access.
Here are the steps followed by the syntax:
* It executes a subselect query to get the output data.
* In the case of the `CREATE OR REPLACE TABLE` statement, the
`integration_name.table_name` table is dropped before recreating it.
* It (re)creates the `integration_name.table_name` table inside the
`integration_name` integration.
* It uses the [`INSERT INTO`](/sql/api/insert/) statement to insert the
output of the `(SELECT ...)` query into the
`integration_name.table_name`.
## Example
We want to save the prediction results into the `int1.tbl1` table.
Here is the schema structure used throughout this example:
```bash theme={null}
int1
└── tbl1
mindsdb
└── predictor_name
int2
└── tbl2
```
Where:
| Name | Description |
| ---------------- | ------------------------------------------------------------------------------------- |
| `int1` | Integration where the table that stores prediction results resides. |
| `tbl1` | Table that stores prediction results. |
| `predictor_name` | Name of the model. |
| `int2` | Integration where the data source table used in the inner `SELECT` statement resides. |
| `tbl2` | Data source table used in the inner `SELECT` statement. |
Let's execute the query.
```sql theme={null}
CREATE OR REPLACE TABLE int1.tbl1 (
SELECT *
FROM int2.tbl2 AS ta
JOIN mindsdb.predictor_name AS tb
WHERE ta.date > '2015-12-31'
);
```
# Create a Trigger
Source: https://docs.mindsdb.com/mindsdb_sql/sql/create/trigger
## Description
Triggers enable users to define event-based actions. For example, if a table is updated, then run a query to update predictions.
Currently, you can create triggers on the following data sources:
* [MongoDB](/integrations/data-integrations/mongodb) (available for MongoDB Atlas Database),
* [Slack](/integrations/app-integrations/slack),
* [Solace](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/solace_handler),
* [PostgreSQL](/integrations/data-integrations/postgresql) (requires write access).
## Syntax
Here is the syntax for creating a trigger:
```sql theme={null}
CREATE TRIGGER trigger_name
ON integration_name.table_name
[COLUMNS column_name1, column_name2, ...]
(
sql_code
)
```
By creating a trigger on a data source, every time this data source is updated or new data is inserted, the `sql_code` provided in the statement will be executed.
You can create a trigger either on a table...
```sql theme={null}
CREATE TRIGGER trigger_name
ON integration_name.table_name
(
sql_code
)
```
...or on one or more columns of a table.
```sql theme={null}
CREATE TRIGGER trigger_name
ON integration_name.table_name
COLUMNS column_name1, column_name2
(
sql_code
)
```
Here is how to list all triggers:
```sql theme={null}
SHOW TRIGGERS;
```
## Example
Firstly, connect Slack to MindsDB following [this instruction](/integrations/app-integrations/slack#set-up-a-slack-app-and-generate-tokens) and connect the Slack app to a channel.
```sql theme={null}
CREATE DATABASE mindsdb_slack
WITH
ENGINE = 'slack',
PARAMETERS = {
"token": "xoxb-...",
"app_token": "xapp-..."
};
```
Create a model that will be used to answer chat questions every time new messages arrive. Here we use the [OpenAI engine](/integrations/ai-engines/openai), but you can use any [other LLM](/integrations/ai-overview#large-language-models).
```sql theme={null}
CREATE MODEL chatbot_model
PREDICT answer
USING
engine = 'openai_engine',
prompt_template = 'answer the question: {{text}}';
```
Here is how to generate answers to Slack messages using the model:
```sql theme={null}
SELECT s.text AS question, m.answer
FROM chatbot_model m
JOIN mindsdb_slack.messages s
WHERE s.channel_id = 'slack-bot-channel-id'
AND s.user != 'U07J30KPAUF'
AND s.created_at > LAST;
```
Let's analyze this query:
* We select the question from the Slack connection and the answer generated by the model.
* We join the model with the `messages` table.
* In the `WHERE` clause:
* We provide the channel name where the app/bot is integrated.
* We exclude the messages sent by the app/bot. You can find the user ID of the app/bot by querying the `mindsdb_slack.users` table.
* We use the `LAST` keyword to ensure that the model generates answers only to the newly sent messages.
Finally, create a trigger that will insert an answer generated by the model every time when new messages are sent to the channel.
```sql theme={null}
CREATE TRIGGER slack_trigger
ON mindsdb_slack.messages
(
INSERT INTO mindsdb_slack.messages (channel_id, text)
SELECT 'slack-bot-channel-id' AS channel_id, answer AS text
FROM chatbot_model m
JOIN TABLE_DELTA s
WHERE s.user != 'slack-bot-id' # this is to prevent the bot from replying to its own messages
AND s.channel_id = 'slack-bot-channel-id'
);
```
Let's analyze this statement:
* We create a trigger named `slack_trigger`.
* The trigger is created on the `mindsdb_slack.messages` table. Therefore, every time when data is added or updated, the trigger will execute its code.
* We provide the code to be executed by the trigger every time the triggering event takes place.
* We insert an answer generated by the model into the `messages` table.
* The `TABLE_DELTA` stands for the table on which the trigger has been created.
* We exclude the messages sent by the app/bot. You can find the user ID of the app/bot by querying the `mindsdb_slack.users` table.
# Create a View
Source: https://docs.mindsdb.com/mindsdb_sql/sql/create/view
## Description
The `CREATE VIEW` statement creates a view, which is a great way to do data preparation in MindsDB. A VIEW is a saved `SELECT` statement, which is executed every time we call this view.
## Syntax
Here is the syntax:
```sql theme={null}
CREATE VIEW [IF NOT EXISTS] project_name.view_name AS (
SELECT columns
FROM integration_name.table_name AS a
JOIN integration_name.table_name AS p ON a.id = p.id
JOIN ...
);
```
Here is how to list all views:
```sql theme={null}
SHOW VIEWS;
```
# Remove a Data Source
Source: https://docs.mindsdb.com/mindsdb_sql/sql/drop/database
## Description
The `DROP DATABASE` statement deletes the database.
## Syntax
Here is the syntax:
```sql theme={null}
DROP DATABASE [IF EXISTS] database_name;
```
On execution, we get:
```sql theme={null}
Query successfully completed
```
# Remove a File
Source: https://docs.mindsdb.com/mindsdb_sql/sql/drop/file
## Description
The `DROP TABLE` statement is also used to delete a file.
## Syntax
Here is the syntax:
```sql theme={null}
DROP TABLE files.file_name;
```
On execution, we get:
```sql theme={null}
Query successfully completed
```
Please note that the uploaded files are tables as well. So to remove an uploaded file, use this `DROP TABLE` statement.
# Remove a Job
Source: https://docs.mindsdb.com/mindsdb_sql/sql/drop/jobs
## Description
The `DROP JOB` statement deletes the job.
## Syntax
Here is the syntax for deleting a job:
```sql theme={null}
DROP JOB [IF EXISTS] [project_name.]job_name;
```
The `project_name` value is optional. The `job_name` value indicates the job to be deleted.
Let's look at some examples:
```sql theme={null}
DROP JOB my_project.retrain_and_save_job;
```
Here we drop the `retrain_and_save_job` that resides in the `my_project` project.
And another example:
```sql theme={null}
DROP JOB create_table_job;
```
Here we drop the `create_table_job` job that resides in the current project.
To learn more about projects in MindsDB, visit our docs [here](/sql/project).
# Remove a Project
Source: https://docs.mindsdb.com/mindsdb_sql/sql/drop/project
## Description
The `DROP PROJECT` statement deletes the project.
## Syntax
Here is the syntax:
```sql theme={null}
DROP PROJECT [IF EXISTS] project_name;
```
On execution, we get:
```sql theme={null}
Query successfully completed
```
# Remove a Table
Source: https://docs.mindsdb.com/mindsdb_sql/sql/drop/table
## Description
The `DROP TABLE` statement deletes a table or a file.
Please note that this feature is not yet implemented for tables from connected data sources.
## Syntax
Here is the syntax:
```sql theme={null}
DROP TABLE table_name;
```
And for files:
```sql theme={null}
DROP TABLE files.file_name;
```
On execution, we get:
```sql theme={null}
Query successfully completed
```
Please note that the uploaded files are tables as well. So to remove an uploaded file, use this `DROP TABLE` statement.
# Remove a Trigger
Source: https://docs.mindsdb.com/mindsdb_sql/sql/drop/trigger
## Description
Triggers enable users to define event-based actions. For example, if a table is updated, then run a query to update predictions.
Currently, you can create triggers on the following data sources: [MongoDB](https://docs.mindsdb.com/integrations/data-integrations/mongodb), [Slack](https://docs.mindsdb.com/integrations/app-integrations/slack), [Solace](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/solace_handler).
## Syntax
Here is the syntax for removing a trigger:
```sql theme={null}
DROP TRIGGER trigger_name;
```
# Remove a View
Source: https://docs.mindsdb.com/mindsdb_sql/sql/drop/view
## Description
The `DROP VIEW` statement deletes the view.
## Syntax
Here is the syntax:
```sql theme={null}
DROP VIEW [IF EXISTS] view_name;
```
On execution, we get:
```sql theme={null}
Query successfully completed
```
# List Data Handlers
Source: https://docs.mindsdb.com/mindsdb_sql/sql/list-data-handlers
## Description
The `SHOW HANDLERS` command lists all available handlers. The `WHERE` clause filter handlers by the type (data or ML).
## Syntax
Here is the syntax:
```sql theme={null}
SHOW HANDLERS
WHERE type = 'data';
```
# List Projects
Source: https://docs.mindsdb.com/mindsdb_sql/sql/list-projects
## Description
The `SHOW DATABASES` command lists all available data sources and projects. The `WHERE` clause filters all projects.
## Syntax
Here is the syntax:
```sql theme={null}
SHOW DATABASES
WHERE type = 'project';
```
Alternatively, you can use the `FULL` keyword to get more information:
```sql theme={null}
SHOW FULL DATABASES
WHERE type = 'project';
```
# Native Queries
Source: https://docs.mindsdb.com/mindsdb_sql/sql/native-queries
The underlying database engine of MindsDB is MySQL. However, you can run queries native to your database engine within MindsDB.
## Connect your Database to MindsDB
To run queries native to your database, you must first connect your database to MindsDB using the `CREATE DATABASE` statement.
```sql theme={null}
CREATE DATABASE example_db
WITH ENGINE = "postgres",
PARAMETERS = {
"user": "demo_user",
"password": "demo_password",
"host": "samples.mindsdb.com",
"port": "5432",
"database": "demo"
};
```
Here we connect the `example_db` database, which is a PostgreSQL database.
## Run Queries Native to your Database
Once we have our PostgreSQL database connected, we can run PostgreSQL-native queries.
### Querying
To run PostgreSQL-native code, we must nest it within the `SELECT` statement like this:
```sql theme={null}
SELECT * FROM example_db (
SELECT
model,
year,
price,
transmission,
mileage,
fueltype,
mpg, -- miles per galon
ROUND(CAST((mpg / 2.3521458) AS numeric), 1) AS kml, -- kilometers per liter
(date_part('year', CURRENT_DATE)-year) AS years_old, -- age of a car
COUNT(*) OVER (PARTITION BY model, year) AS units_to_sell, -- how many units of a certain model are sold in a year
ROUND((CAST(tax AS decimal) / price), 3) AS tax_div_price -- value of tax divided by price of a car
FROM demo_data.used_car_price
);
```
On execution, we get:
```sql theme={null}
+-----+----+-----+------------+-------+--------+----+----+---------+-------------+-------------+
|model|year|price|transmission|mileage|fueltype|mpg |kml |years_old|units_to_sell|tax_div_price|
+-----+----+-----+------------+-------+--------+----+----+---------+-------------+-------------+
| A1 |2010|9990 |Automatic |38000 |Petrol |53.3|22.7|12 |1 |0.013 |
| A1 |2011|6995 |Manual |65000 |Petrol |53.3|22.7|11 |5 |0.018 |
| A1 |2011|6295 |Manual |107000 |Petrol |53.3|22.7|11 |5 |0.02 |
| A1 |2011|4250 |Manual |116000 |Diesel |70.6|30 |11 |5 |0.005 |
| A1 |2011|6475 |Manual |45000 |Diesel |70.6|30 |11 |5 |0 |
+-----+----+-----+------------+-------+--------+----+----+---------+-------------+-------------+
```
The first line (`SELECT * FROM example_db`) informs MindsDB that we select from a PostgreSQL database. After that, we nest a PostgreSQL code within brackets.
### Creating Views
We can create a view based on a native query.
```sql theme={null}
CREATE VIEW cars FROM example_db (
SELECT
model,
year,
price,
transmission,
mileage,
fueltype,
mpg, -- miles per galon
ROUND(CAST((mpg / 2.3521458) AS numeric), 1) AS kml, -- kilometers per liter
(date_part('year', CURRENT_DATE)-year) AS years_old, -- age of a car
COUNT(*) OVER (PARTITION BY model, year) AS units_to_sell, -- how many units of a certain model are sold in a year
ROUND((CAST(tax AS decimal) / price), 3) AS tax_div_price -- value of tax divided by price of a car
FROM demo_data.used_car_price
);
```
On execution, we get:
```sql theme={null}
Query OK, 0 rows affected (x.xxx sec)
```
# Query Jobs
Source: https://docs.mindsdb.com/mindsdb_sql/sql/query-jobs
## Querying Jobs
Here is how we can view all jobs in a project:
```sql theme={null}
SHOW JOBS WHERE project = 'project-name';
SELECT * FROM project-name.jobs;
```
On execution, we get:
```sql theme={null}
+------------------------------------+---------+----------------------------+----------------------------+----------------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| NAME | PROJECT | RUN_START | RUN_END | NEXT_RUN_AT | SCHEDULE_STR | QUERY |
+------------------------------------+---------+----------------------------+----------------------------+----------------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| drop_model | mindsdb | 2023-04-01 00:00:00.000000 | [NULL] | 2023-04-01 00:00:00.000000 | [NULL] | DROP MODEL mindsdb.home_rentals_model |
| retrain_model_and_save_predictions | mindsdb | 2023-02-15 19:19:43.210122 | 2023-04-01 00:00:00.000000 | 2023-02-15 19:19:43.210122 | every 2 days | RETRAIN mindsdb.home_rentals_model USING join_learn_process = true; INSERT INTO my_integration.rentals (SELECT m.rental_price, m.rental_price_explain FROM mindsdb.home_rentals_model AS m JOIN example_db.demo_data.home_rentals AS d) |
| save_predictions | mindsdb | 2023-02-15 19:19:48.545580 | [NULL] | 2023-02-15 19:19:48.545580 | every hour | CREATE TABLE my_integration.`result_{{START_DATETIME}}` (SELECT m.rental_price, m.rental_price_explain FROM mindsdb.home_rentals_model AS m JOIN example_db.demo_data.home_rentals AS d) |
+------------------------------------+---------+----------------------------+----------------------------+----------------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
Or from all projects at once:
```sql theme={null}
SHOW JOBS;
SELECT *
FROM information_schema.jobs;
```
## Querying Jobs History
You can query the history of jobs similar to querying for jobs. Here you can find information about an error if the job didn't execute successfully.
Here is how we can view all jobs history in the current project:
```sql theme={null}
SELECT *
FROM log.jobs_history
WHERE project = 'mindsdb';
```
On execution, we get:
```sql theme={null}
+------------------------------------+---------+----------------------------+----------------------------+----------------------------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| NAME | PROJECT | RUN_START | RUN_END | NEXT_RUN_AT | ERROR | QUERY |
+------------------------------------+---------+----------------------------+----------------------------+----------------------------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| retrain_model_and_save_predictions | mindsdb | 2023-02-15 19:19:43.210122 | 2023-04-01 00:00:00.000000 | 2023-02-15 19:19:43.210122 | [NULL] | RETRAIN mindsdb.home_rentals_model USING join_learn_process = true; INSERT INTO my_integration.rentals (SELECT m.rental_price, m.rental_price_explain FROM mindsdb.home_rentals_model AS m JOIN example_db.demo_data.home_rentals AS d) |
| save_predictions | mindsdb | 2023-02-15 19:19:48.545580 | [NULL] | 2023-02-15 19:19:48.545580 | [NULL] | CREATE TABLE my_integration.`result_{{START_DATETIME}}` (SELECT m.rental_price, m.rental_price_explain FROM mindsdb.home_rentals_model AS m JOIN example_db.demo_data.home_rentals AS d) |
+------------------------------------+---------+----------------------------+----------------------------+----------------------------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
Please note that the `drop_model` job is not in the `jobs_history` table because it didn't start yet.
# Query Triggers
Source: https://docs.mindsdb.com/mindsdb_sql/sql/query-triggers
## Description
Triggers enable users to define event-based actions. For example, if a table is updated, then run a query to update predictions.
Currently, you can create triggers on the following data sources: [MongoDB](https://docs.mindsdb.com/integrations/data-integrations/mongodb), [Slack](https://docs.mindsdb.com/integrations/app-integrations/slack), [Solace](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/solace_handler).
## Syntax
Here is the syntax for querying all triggers:
```sql theme={null}
SHOW TRIGGERS;
```
# List Data Sources
Source: https://docs.mindsdb.com/mindsdb_sql/sql/show-databases
## Description
The `SHOW DATABASES` statement lists all connected data sources that MindsDB can access.
## Syntax
Here is how to list all connected data sources:
```sql theme={null}
SHOW DATABASES;
```
# Use a Project
Source: https://docs.mindsdb.com/mindsdb_sql/sql/use/project
## Description
The `USE` statement will change the context of MindsDB to the specified project. This allows you to run subsequent queries within the context of that project.
## Syntax
Here is the syntax:
```sql theme={null}
USE project_name;
```
On execution, we get:
```sql theme={null}
Query successfully completed
```
# The CASE WHEN Statement
Source: https://docs.mindsdb.com/mindsdb_sql/sql_support/case-when
MindsDB supports standard SQL syntax, including the `CASE WHEN` statement.
The `CASE WHEN` statement is used for conditional logic within queries. It evaluates conditions and returns specific values based on whether each condition is true or false, allowing for conditional output within `SELECT`, `WHERE`, and other clauses.
```sql theme={null}
SELECT
CASE
WHEN a=1 THEN a+b
WHEN 1+2=b*2 THEN 0
WHEN (a+b>2 or bb THEN b
ELSE c
END
FROM table_name;
```
# Common Table Expressions
Source: https://docs.mindsdb.com/mindsdb_sql/sql_support/cte
MindsDB supports standard SQL syntax, including Common Table Expressions (CTEs).
CTEs are used to create temporary, named result sets that simplify complex queries, enhance readability, and allow for modular query design by breaking down large queries into manageable parts.
```sql theme={null}
WITH table_name1 AS (
SELECT columns
FROM table1 t1
JOIN table2 t2
ON t1.col = t2.col
),
table_name2 AS (
SELECT columns
FROM table1 t1
JOIN table2 t2
ON t1.col = t2.col
)
SELECT columns
FROM table_name1 t1
JOIN table_name2 t2
ON t1.col - t2.col;
```
# MindsDB's MCP Server with Anthropic's MCP Connector
Source: https://docs.mindsdb.com/model-context-protocol/anthropic
This tutorial walks you through the usage of MindsDB's MCP Server with [Anthropic's MCP Connector](https://docs.anthropic.com/en/docs/agents-and-tools/mcp-connector).
## Setup
Follow the steps below to connect MindsDB's MCP Server to Anthropic.
1. Start MindsDB's MCP Server following [this guide](/mcp/usage).
2. Expose the local instance of MindsDB via [ngrok](https://ngrok.com/) or similar tools.
3. Get the Anthropic API key and download the `anthropic` package.
## Chat with Data
Here is how to connect MindsDB's MCP Server to Anthropic.
```python theme={null}
import anthropic
client = anthropic.Anthropic(
api_key = "anthropic-api-key"
)
response = client.beta.messages.create(
model = "claude-sonnet-4-20250514",
max_tokens = 1000,
messages = [
{"role": "user", "content": "What tools do you have available?"}
],
mcp_servers = [
{
"type": "url",
"url": "https://5a52-88-203-84-191.ngrok-free.app/mcp/sse",
"name": "mindsdb-mcp",
"authorization_token": ""
}
],
betas = ["mcp-client-2025-04-04"]
)
print(response)
```
Here is the output:
```bash theme={null}
BetaMessage(id='msg_01SrYiUsK7Jb4a5BA2nszKsc', container=None, content=[BetaTextBlock(citations=None, text="I have access to two tools for working with MindsDB:\n\n1. **mindsdb-mcp_query** - Execute SQL queries against MindsDB\n - Parameters:\n - `query` (required): The SQL query to execute\n - `context` (optional): Additional context parameters for the query\n - Returns: Query results or error information\n\n2. **mindsdb-mcp_list_databases** - List all databases and their tables in MindsDB\n - Parameters: None required\n - Returns: A list of all databases and their associated tables\n\nThese tools allow me to help you explore your MindsDB instance, run SQL queries, and work with your data and ML models. Would you like me to start by showing you what databases are available, or do you have a specific query you'd like to run?", type='text')], model='claude-sonnet-4-20250514', role='assistant', stop_reason='end_turn', stop_sequence=None, type='message', usage=BetaUsage(cache_creation=None, cache_creation_input_tokens=0, cache_read_input_tokens=0, input_tokens=572, output_tokens=183, server_tool_use=None, service_tier='standard'))
```
Follow the [MCP Connector docs from Anthropic](https://docs.anthropic.com/en/docs/agents-and-tools/mcp-connector) to learn more.
# MindsDB's MCP Server with Cursor's MCP Client
Source: https://docs.mindsdb.com/model-context-protocol/cursor_usage
This tutorial walks you through the usage of MindsDB's MCP Server with [Cursor](https://www.cursor.com/) as an MCP Client.
See a [video tutorial here](https://www.youtube.com/watch?v=f5VFd5LIuPg).
## Setup
Follow the steps below to connect MindsDB's MCP Server to Cursor.
1. Start MindsDB's MCP Server following [this guide](/mcp/usage).
2. Open Cursor, go to the Cursor Settings, open the MCP tab, and click on *Add new global MCP server*. Alternatively, go to the Cursor settings -> Features -> MCP Servers.
3. Add the below content to the `mcp.json` file.
```yml theme={null}
{
"mcpServers": {
"mindsdb": {
"url": "http://127.0.0.1:47334/mcp/sse"
}
}
}
```
4. Ensure that MindsDB is listed as an MCP server.
## Chat with Data
1. Open the Cursor chat window and select the Agent mode from the dropdown.
2. Ask questions over your data. *Note that you need to approve each call of the MCP server’s tools by clicking on Run tool.*
3. The agent provides an answer with helpful suggestions of follow-up information that can be extracted from the available data.
# MindsDB's MCP Server with OpenAI's Remote MCP
Source: https://docs.mindsdb.com/model-context-protocol/openai
This tutorial walks you through the usage of MindsDB's MCP Server with [OpenAI's Remote MCP](https://platform.openai.com/docs/guides/tools-remote-mcp).
## Setup
Follow the steps below to connect MindsDB's MCP Server to OpenAI.
1. Start MindsDB's MCP Server following [this guide](/mcp/usage).
2. Expose the local instance of MindsDB via [ngrok](https://ngrok.com/) or similar tools.
3. Get the OpenAI API key and download the `openai` package.
## Chat with Data
Here is how to connect MindsDB's MCP Server to OpenAI.
```python theme={null}
import openai
client = openai.OpenAI(
api_key = 'openai-api-key'
)
response = client.responses.create(
model = "o3",
tools = [
{
"type": "mcp",
"server_label": "mdb",
"server_url": "https://5a52-88-203-84-191.ngrok-free.app/mcp/sse",
"headers": { "Authorization": "Bearer " },
"require_approval": "never",
}
],
input = "What tools do you have available?"
)
print(response)
```
Here is the output:
```bash theme={null}
Response(id='resp_68305d877eac81918e05a35beb23c40f054f254057b1b9a9', created_at=1748000135.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='o3-2025-04-16', object='response', output=[McpListTools(id='mcpl_68305d87913c8191ade2e249dc9a7cce054f254057b1b9a9', server_label='mdb', tools=[McpListToolsTool(input_schema={'properties': {'query': {'title': 'Query', 'type': 'string'}, 'context': {'anyOf': [{'type': 'object'}, {'type': 'null'}], 'default': None, 'title': 'Context'}}, 'required': ['query'], 'title': 'queryArguments', 'type': 'object'}, name='query', annotations=None, description='\n Execute a SQL query against MindsDB\n\n Args:\n query: The SQL query to execute\n context: Optional context parameters for the query\n\n Returns:\n Dict containing the query results or error information\n '), McpListToolsTool(input_schema={'properties': {}, 'title': 'list_databasesArguments', 'type': 'object'}, name='list_databases', annotations=None, description='\n List all databases in MindsDB along with their tables\n\n Returns:\n Dict containing the list of databases and their tables\n ')], type='mcp_list_tools', error=None), ResponseReasoningItem(id='rs_68305d8c00c08191964ba4e0b011f98a054f254057b1b9a9', summary=[], type='reasoning', encrypted_content=None, status=None), ResponseOutputMessage(id='msg_68305d8ee2cc8191966e94f464677dab054f254057b1b9a9', content=[ResponseOutputText(annotations=[], text='I currently have access to two kinds of tools:\n\n1. Image Input \n • I can receive an image along with your message and analyze the visible content (objects, text, layout, etc.) to help answer questions or perform tasks related to the image.\n\n2. MindsDB SQL Tools \n • mcp_mdb.list_databases – Lists the databases and tables that are registered in the MindsDB environment. \n • mcp_mdb.query – Lets me run SQL queries against those databases and return the results to you.\n\nLet me know if you’d like me to use either of these tools!', type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[Mcp(server_label='mdb', server_url='https://5a52-88-203-84-191.ngrok-free.app/', type='mcp', allowed_tools=None, headers={'Authorization': ''}, require_approval='always')], top_p=1.0, background=False, max_output_tokens=None, previous_response_id=None, reasoning=Reasoning(effort='medium', generate_summary=None, summary=None), service_tier='default', status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text')), truncation='disabled', usage=ResponseUsage(input_tokens=136, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=192, output_tokens_details=OutputTokensDetails(reasoning_tokens=64), total_tokens=328), user=None, store=True)
```
Follow the [Remote MCP docs from OpenAI](https://platform.openai.com/docs/guides/tools-remote-mcp) to learn more.
# Model Context Protocol (MCP)
Source: https://docs.mindsdb.com/model-context-protocol/overview
The **Model Context Protocol (MCP)** facilitates real-time communication between MCP clients, such as LLMs, AI agents, and AI applications, and MCP servers like MindsDB.
**MindsDB is an MCP server** that enables intelligent applications to query and reason over federated data from databases, data warehouses, and applications.
## Key Features
* **Unified Data Gateway**
MindsDB abstracts the complexity of dealing with disparate data sources. It enables AI apps and agents to run powerful, federated queries across structured and unstructured data systems.
* **Seamless User Experience**
MindsDB enhances MCP implementations with security, monitoring, and governance. It includes built-in integrations to ensure compatibility with traditional and non-MCP applications.
* **Advanced AI Workflows**
MindsDB supports composite AI operations like multi-source joins and orchestration of different models or services within a single query, which go beyond the native capabilities of most LLMs using MCP alone.
## Protocol Overview
MCP establishes a bidirectional communication channel between clients and servers, enabling LLMs, agents, or apps to execute queries over federated data infrastructures.
Federated data refers to data distributed across multiple systems, formats, or platforms, whether on-premises or in the cloud.
With MindsDB as your MCP server, you can treat this distributed data as a **single virtual database**.
## How It Works
Here's a simplified overview of the MCP data flow:
1. The client connects to the MCP server.
2. A query is issued from the client to the MCP server.
3. MindsDB routes the query to the appropriate federated data sources.
4. The data sources return results to MindsDB.
5. MindsDB returns unified results back to the client.
This enables AI-native applications to deliver rich, real-time insights over complex enterprise data with minimal integration effort.
# MindsDB's MCP Server Usage and Tools
Source: https://docs.mindsdb.com/model-context-protocol/usage
**MindsDB** is an MCP server that enables your MCP applications to answer questions over large-scale federated data spanning databases, data warehouses, and SaaS applications.
## Start MindsDB as an MCP Server
Follow the steps below to use MindsDB as an MCP server.
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. [Connect your data source](/mindsdb_sql/sql/create/database) and/or [upload files](/mindsdb_sql/sql/create/file) to MindsDB in order to ask questions over your data.
You can use our sample dataset that stores the sales manager data.
```sql theme={null}
CREATE DATABASE sales_manager_data
WITH ENGINE = "postgres",
PARAMETERS = {
"user": "demo_user",
"password": "demo_password",
"host": "samples.mindsdb.com",
"port": "5432",
"database": "sales_manager_data"
};
```
3. Start MindsDB MCP server, either with or without authentication.
* Start MindsDB MCP server without authentication to connect it to [Cursor](/mcp/cursor_usage).
```bash theme={null}
docker run --name mindsdb_container -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
```
* Start MindsDB MCP server with authentication to connect it to [OpenAI](/mcp/openai) or [Anthropic](/mcp/anthropic).
```bash theme={null}
docker run --name mindsdb_container -p 47334:47334 -p 47335:47335 -e MINDSDB_USERNAME=admin -e MINDSDB_PASSWORD=password123 mindsdb/mindsdb
```
Then get an auth token from MindsDB:
```bash theme={null}
curl -X POST -d '{"username":"admin","password":"password123"}' -H "Content-Type: application/json" http://localhost:47334/api/login
```
This will return a token that you can use in your MCP client.
4. To confirm the MindsDB MCP server is running use `http://127.0.0.1:47334/mcp/status`. A successful response means your MCP environment is ready.
## MCP Tools
MindsDB MCP API exposes a set of tools that enable users to interact with their data and extract valuable insights.
**1. List Databases**
The `list_databases` tool lists all data sources connected to MindsDB.
**2. Query**
The `query` tool executes queries on the federated data to extract data relevant to answering a given question.
# SDKs & APIs
Source: https://docs.mindsdb.com/overview_sdks_apis
The [Connect](/mindsdb-connect), [Unify](/mindsdb-unify), and [Respond](/mindsdb-respond) sections present the usage of MindsDB via its SQL interface.
Alongside the SQL interface, MindsDB provides access via REST APIs, Python SDK, JavaScript SDK.
Interact with MindsDB via API endpoints.Integrate MindsDB into the Python code.Integrate MindsDB into the JavaScript code.
# Tutorial to Get Started with MindsDB
Source: https://docs.mindsdb.com/quickstart-tutorial
Before we start, install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
Get started with MindsDB in a few simple steps:
Connect one or more data sources. Explore all available [data sources here](/integrations/data-overview).
Unify your data with [knowledge bases](/mindsdb_sql/knowledge_bases/overview).
Respond to questions over your data with [AI agents](/mindsdb_sql/agents/agent).
## Step 1. Connect
MindsDB enables connecting data from various data sources and operating on data without moving it from its source. Learn more [here](/mindsdb-connect).
* **Connecting Structured Data**
Use the [`CREATE DATABASE`](/mindsdb_sql/sql/create/database) statement to connect a data source to MindsDB.
```sql theme={null}
CREATE DATABASE mysql_demo_db
WITH ENGINE = 'mysql',
PARAMETERS = {
"user": "user",
"password": "MindsDBUser123!",
"host": "samples.mindsdb.com",
"port": "3306",
"database": "public"
};
```
This is the input data used in the following steps:
```sql theme={null}
SELECT *
FROM mysql_demo_db.home_rentals
LIMIT 3;
```
The sample contains contains information about properties for rent.
* **Connecting Unstructured Data**
Extract data from webpages using the [web crawler](/integrations/app-integrations/web-crawler) or [upload files](/integrations/files/csv-xlsx-xls) to MindsDB.
In this example, we fetch data from MindsDB Documentation webpage using the web crawler.
```sql theme={null}
CREATE DATABASE my_web
WITH ENGINE = 'web';
SELECT url, text_content
FROM my_web.crawler
WHERE url = 'https://docs.mindsdb.com/'
```
Now we save this data into a view which is saved in the default `mindsdb` project.
```sql theme={null}
CREATE VIEW mindsdb_docs (
SELECT url, text_content
FROM my_web.crawler
WHERE url = 'https://docs.mindsdb.com/'
);
SELECT *
FROM mindsdb.mindsdb_docs;
```
## Step 2. Unify
MindsDB enables unifying data from structured and unstructured data sources into a single, queryable interface. This unified view allows seamless querying and model-building across all data without consolidation into one system. Learn more [here](/mindsdb-unify).
Create a knowledge base to store all your data in a single location. Learn more about [knowledge bases here](/mindsdb_sql/knowledge_bases/overview).
```sql theme={null}
CREATE KNOWLEDGE_BASE my_kb
USING
embedding_model = {
"provider": "openai",
"model_name" : "text-embedding-3-large",
"api_key": "your-openai-api-key"
},
reranking_model = {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "your-openai-api-key"
},
content_columns = ['content'];
```
[Insert data](/mindsdb_sql/knowledge_bases/insert_data) from Step 1 into the knowledge base.
```sql theme={null}
INSERT INTO my_kb
SELECT
'number_of_rooms: ' || number_of_rooms || ', ' ||
'number_of_bathrooms' || number_of_bathrooms || ', ' ||
'sqft' || sqft || ', ' ||
'location' || location || ', ' ||
'days_on_market' || days_on_market || ', ' ||
'neighborhood' || neighborhood || ', ' ||
'rental_price' || rental_price
AS content
FROM mysql_demo_db.home_rentals;
INSERT INTO my_kb
SELECT text_content AS content
FROM mindsdb.mindsdb_docs;
```
[Query the knowledge base](/mindsdb_sql/knowledge_bases/query) to search your data.
```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'what is MindsDB';
SELECT *
FROM my_kb
WHERE content = 'rental price lower than 2000';
```
In order to keep the knowledge base up-to-date with your data, use [jobs](/mindsdb_sql/sql/create/jobs) to automate data inserts every time your data is modified.
```sql theme={null}
CREATE JOB update_kb (
INSERT INTO my_kb
SELECT
'number_of_rooms: ' || number_of_rooms || ', ' ||
'number_of_bathrooms' || number_of_bathrooms || ', ' ||
'sqft' || sqft || ', ' ||
'location' || location || ', ' ||
'days_on_market' || days_on_market || ', ' ||
'neighborhood' || neighborhood || ', ' ||
'rental_price' || rental_price
AS content
FROM mysql_demo_db.home_rentals
WHERE created_at > LATEST
)
EVERY 1 day;
```
## Step 3. Respond
MindsDB enables generating insightful and accurate responses from unified data using natural language. Learn more [here](/mindsdb-respond).
Create an [agent](https://docs.mindsdb.com/mindsdb_sql/agents/agent) that can answer questions over your unified data from Step 2.
```sql theme={null}
CREATE AGENT my_agent
USING
model = {
"provider": "openai",
"model_name" : "gpt-4o",
"api_key": "your-openai-api-key"
},
data = {
"knowledge_bases": ["mindsdb.my_kb"],
"tables": ["mysql_demo_db.home_rentals"]
},
prompt_template = 'mindsdb.my_kb stores data about mindsdb and home rentals,
mysql_demo_db.home_rentals stores data about home rentals';
```
Now you can ask questions over your data.
```sql theme={null}
SELECT *
FROM my_agent
WHERE question = 'what is MindsDB?';
```
Visit the [Respond tab in the MindsDB Editor](/mindsdb_sql/agents/agent_gui) to chat with an agent.
# MindsDB Releases
Source: https://docs.mindsdb.com/releases
MindsDB releases new features, functionalitites, and fixes on regular cadence. This document outlines the release process, versioning, and naming conventions.
## Release Types and Versioning
MindsDB uses [semantic versioning](https://semver.org/) to name all releases. This format is applied consistently across our GitHub tags, Python packages, and Docker images.
Each release name follows the structure:
```
v..()
```
Where:
* `MAJOR` indicates the major version, which introduces significant changes or backward-incompatible updates.
* `MINOR` indicates the minor version, which introduces new features that remain backward-compatible.
* `PATCH` indicates the patch version, which introduces small fixes or improvements.
* `TYPE` is an optional component, which informs about the nature of the (pre-)release.
* `NUMBER` is an optional component used when `TYPE` is provided that indicates the pre-release version.
The following are the release types and their naming conventions.
| **Release Type** | **Sample Version** | **Description** |
| ------------------------------ | ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------- |
| **GA (General Availability)** | `v25.9.3` | The stable public release. `25` is the major version, `9` is the minor version, and `3` is the patch number. |
| **Pre-GA (Release Candidate)** | `v25.9.3rc1` | A release candidate that is nearly ready for GA. `rc` stands for release candidate, and `1` indicates the version number of the pre-release. |
| **Alpha** | `v25.9.3alpha1` | An early testing version with limited features or stability. `alpha` denotes an initial stage for internal or early feedback. |
| **Beta** | `v25.9.3beta1` | A version close to final release. `beta` indicates a feature-complete build shared for broader testing and feedback. |
## Release Process
The `main` branch of the [MindsDB repository](https://github.com/mindsdb/mindsdb) contains the latest stable version of MindsDB and represents the GA (General Availability) release.
MindsDB follows the [Gitflow branching model](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow) to manage development and releases as follows.
All code changes are first committed to the `develop` branch.
When a release is approaching, a short-lived `release` branch is created from the `develop` branch.
* This branch is used for final testing and validation.
* Pre-GA artifacts are built at this stage, including both the Python package and the Docker image, and shared for broader testing and feedback.
After successful testing and validation:
* The `release` branch is merged into the `main` branch, making it an official GA release.
* The final GA versions of the Python package and Docker image are released, while the pre-GA version are removed.
If you are interested in contributing to MindsDB, follow [this link](/contribute/contribute).
# Create Agent
Source: https://docs.mindsdb.com/rest/agents/create
**POST `/api/projects/{project_name}/agents`**
This API endpoint creates an agent using the `POST` method.
Learn more about agents and the available parameters following [this doc page](/mindsdb_sql/agents/agent).
### Path Parameters
Defines the project where the agents are located. Note that the default project name is `mindsdb`.
### Body
Name of the agent.
Stores parameters of the model, including `provider`, `model_name`, and `api_key`. Note that agents can use the default model defined in the configuration, if no model provided when creating an agent.
Stores data connected to an agent, including `tables` and `knowledge_bases`.
Stores instruction to an agent. This should contain the description of connected data.
### Response
Unique identifier for the agent.
The name assigned to the agent.
The ID of the project where the agent resides.
Timestamp indicating when the agent was created.
Timestamp indicating when the agent was last updated.
Stores data connected to an agent, including `tables` and `knowledge_bases`.
In order to provide all tables from a database or all knowledge bases from a project, use the `*` wildcard like this:
```shell theme={null}
"data": {
"knowledge_bases": ["my_project.*"],
"tables": ["my_data_source.*"]
}
```
Stores parameters of the model, including `provider`, `model_name`, and `api_key`.
Stores instruction to an agent. This should contain the description of connected data.
```shell Shell theme={null}
curl --request POST \
--url http://127.0.0.1:47334/api/projects/mindsdb/agents \
--header 'Content-Type: application/json' \
--data '{
"agent": {
"name": "my_agent",
"model": {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-xxx"
},
"data": {
"knowledge_bases": ["my_project.my_kb"],
"tables": ["my_data_source.my_table"]
},
"prompt_template": "my_project.my_kb stores documentation of MindsDB, my_data_source.my_table stores documentation of MindsDB"
}
}'
```
```json Response theme={null}
{
"id": 197,
"name": "my_agent",
"project_id": 1,
"created_at": "2025-07-09 12:58:24.868202",
"updated_at": "2025-07-09 12:58:24.868199",
"data": {
"knowledge_bases": [
"my_project.my_kb"
],
"tables": [
"my_data_source.my_table"
]
},
"model": {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-xxx"
},
"prompt_template": "my_project.my_kb stores documentation of MindsDB, my_data_source.my_table stores documentation of MindsDB"
}
```
# Delete Agent
Source: https://docs.mindsdb.com/rest/agents/delete
**DELETE `/api/projects/{project_name}/agents/{agent_name}`**
This API endpoint deletes an agent using the `DELETE` method.
Learn more about agents and the available parameters following [this doc page](/mindsdb_sql/agents/agent).
### Path Parameters
Defines the project where the agent are located. Note that the default project name is `mindsdb`.
Defines the agent name.
### Body
None.
### Response
None.
```shell Shell theme={null}
curl --request DELETE \
--url http://127.0.0.1:47334/api/projects/mindsdb/agents/my_agent
```
```json Response theme={null}
200 OK
```
# Get Agent
Source: https://docs.mindsdb.com/rest/agents/get
**GET `/api/projects/{project_name}/agents/{agent_name}`**
This API endpoint lists details about an agent using the `GET` method.
Learn more about agents and the available parameters following [this doc page](/mindsdb_sql/agents/agent).
### Path Parameters
Defines the project where the agents are located. Note that the default project name is `mindsdb`.
Defines the agent name to get its details.
### Body
None.
### Response
Unique identifier for the agent.
The name assigned to the agent.
The ID of the project where the agent resides.
Timestamp indicating when the agent was created.
Timestamp indicating when the agent was last updated.
Stores data connected to an agent, including `tables` and `knowledge_bases`.
Stores parameters of the model, including `provider`, `model_name`, and `api_key`.
Stores instruction to an agent. This should contain the description of connected data.
```shell Shell theme={null}
curl --request GET \
--url http://127.0.0.1:47334/api/projects/mindsdb/agents/my_agent
```
```json Response theme={null}
{
"id": 197,
"name": "my_agent",
"project_id": 1,
"created_at": "2025-07-09 12:58:24.868202",
"updated_at": "2025-07-09 12:58:24.868199",
"data": {
"knowledge_bases": ["my_project.my_kb"],
"tables": ["my_data_source.my_table"]
},
"model": {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-xxx"
},
"prompt_template": "my_project.my_kb stores documentation of MindsDB, my_data_source.my_table stores documentation of MindsDB"
}
```
# List Agents
Source: https://docs.mindsdb.com/rest/agents/list
**GET `/api/projects/{project_name}/agents`**
This API endpoint lists all available agents using the `GET` method.
Learn more about agents and the available parameters following [this doc page](/mindsdb_sql/agents/agent).
### Path Parameters
Defines the project where the agents are located. Note that the default project name is `mindsdb`.
### Body
None.
### Response
Unique identifier for the agent.
The name assigned to the agent.
The ID of the project where the agent resides.
Timestamp indicating when the agent was created.
Timestamp indicating when the agent was last updated.
Stores data connected to an agent, including `tables` and `knowledge_bases`.
Stores parameters of the model, including `provider`, `model_name`, and `api_key`.
Stores instruction to an agent. This should contain the description of connected data.
```shell Shell theme={null}
curl --request GET \
--url http://127.0.0.1:47334/api/projects/mindsdb/agents
```
```json Response theme={null}
[
{
"id": 197,
"name": "my_agent",
"project_id": 1,
"created_at": "2025-07-09 12:58:24.868202",
"updated_at": "2025-07-09 12:58:24.868199",
"data": {
"knowledge_bases": ["my_project.my_kb"],
"tables": ["my_data_source.my_table"]
},
"model": {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-xxx"
},
"prompt_template": "my_project.my_kb stores documentation of MindsDB, my_data_source.my_table stores documentation of MindsDB"
}
]
```
# Query Agents
Source: https://docs.mindsdb.com/rest/agents/query
**POST `/api/projects/{project_name}/agents/{agent_name}/completions[/stream]`**
This API endpoint queries an agent using the `POST` method. The `completions` endpoints returns an answer, while the `completions/stream` endpoint streams the thoughts and returns an answer.
Learn more about agents and the available parameters following [this doc page](/mindsdb_sql/agents/agent).
### Path Parameters
Defines the project where the agents are located. Note that the default project name is `mindsdb`.
Defines the agent name.
### Body
Stores the question to an agent.
### Response
Returns data chunks containing thoughts and an answer.
```shell Shell theme={null}
curl --request POST \
--url http://127.0.0.1:47334/api/projects/mindsdb/agents/my_agent/completions/stream \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{
"question": "What is MindsDB?",
"answer": ""
}
]
}'
```
````json Response theme={null}
data: {"type": "start", "prompt": "What is MindsDB?", "trace_id": ""}
data: {"actions": [{"tool": "kb_list_tool", "tool_input": "", "log": "```\nThought: Do I need to use a tool? Yes\nAction: kb_list_tool\nAction Input: "}], "messages": [{"content": "```\nThought: Do I need to use a tool? Yes\nAction: kb_list_tool\nAction Input: "}], "trace_id": ""}
data: {"steps": [{"action": {"tool": "kb_list_tool", "tool_input": "", "log": "```\nThought: Do I need to use a tool? Yes\nAction: kb_list_tool\nAction Input: "}, "observation": "[\"kb_mindsdb_docs\"]"}], "messages": [{"content": "[\"kb_mindsdb_docs\"]"}], "trace_id": ""}
data: {"actions": [{"tool": "kb_query_tool", "tool_input": "SELECT * FROM `kb_mindsdb_docs` WHERE content = 'What is MindsDB?' LIMIT 1;", "log": "I have identified a knowledge base named `kb_mindsdb_docs` that contains documentation about MindsDB. I will now query this knowledge base to provide you with information about MindsDB.\n\n```\nAction: kb_query_tool\nAction Input: SELECT * FROM `kb_mindsdb_docs` WHERE content = 'What is MindsDB?' LIMIT 1;"}], "messages": [{"content": "I have identified a knowledge base named `kb_mindsdb_docs` that contains documentation about MindsDB. I will now query this knowledge base to provide you with information about MindsDB.\n\n```\nAction: kb_query_tool\nAction Input: SELECT * FROM `kb_mindsdb_docs` WHERE content = 'What is MindsDB?' LIMIT 1;"}], "trace_id": ""}
data: {"steps": [{"action": {"tool": "kb_query_tool", "tool_input": "SELECT * FROM `kb_mindsdb_docs` WHERE content = 'What is MindsDB?' LIMIT 1;", "log": "I have identified a knowledge base named `kb_mindsdb_docs` that contains documentation about MindsDB. I will now query this knowledge base to provide you with information about MindsDB.\n\n```\nAction: kb_query_tool\nAction Input: SELECT * FROM `kb_mindsdb_docs` WHERE content = 'What is MindsDB?' LIMIT 1;"}, "observation": "Output columns: 'id', 'chunk_id', 'chunk_content', 'metadata', 'distance', 'relevance'\nResult in CSV format (dialect is 'excel'):\nid,chunk_id,chunk_content,metadata,distance,relevance\r\nc2b24e025ed01388,c2b24e025ed01388:text_content:1766of1836:1633168to1634163,\"with MindsDB By integrating databases and OpenAI using MindsDB, developers can easily extract insights from text data with just a few SQL commands. These powerful natural language processing (NLP) models are capable of answering questions with or without context and completing general prompts. Furthermore, these models are powered by large pre-trained language models from OpenAI, so there is no need for manual development work. Ultimately, this provides developers with an easy way to incorporate powerful NLP capabilities into their applications while saving time and resources compared to traditional ML development pipelines and methods. All in all, MindsDB makes it possible for developers to harness the power of OpenAI efficiently! MindsDB is now the fastest-growing open-source applied machine-learning platform in the world. Its community continues to contribute to more than 70 data-source and ML-framework integrations. Stay tuned for the upcoming features - including more control\",\"{'_chunk_index': 1765, '_content_column': 'text_content', '_end_char': 1634163, '_original_doc_id': 'c2b24e025ed01388', '_original_row_index': '0', '_source': 'TextChunkingPreprocessor', '_start_char': 1633168, '_updated_at': '2025-07-01 12:36:41', 'url': 'https://docs.mindsdb.com/llms-full.txt'}\",0.24353297838910382,0.9321520551316381\r\n"}], "messages": [{"content": "Output columns: 'id', 'chunk_id', 'chunk_content', 'metadata', 'distance', 'relevance'\nResult in CSV format (dialect is 'excel'):\nid,chunk_id,chunk_content,metadata,distance,relevance\r\nc2b24e025ed01388,c2b24e025ed01388:text_content:1766of1836:1633168to1634163,\"with MindsDB By integrating databases and OpenAI using MindsDB, developers can easily extract insights from text data with just a few SQL commands. These powerful natural language processing (NLP) models are capable of answering questions with or without context and completing general prompts. Furthermore, these models are powered by large pre-trained language models from OpenAI, so there is no need for manual development work. Ultimately, this provides developers with an easy way to incorporate powerful NLP capabilities into their applications while saving time and resources compared to traditional ML development pipelines and methods. All in all, MindsDB makes it possible for developers to harness the power of OpenAI efficiently! MindsDB is now the fastest-growing open-source applied machine-learning platform in the world. Its community continues to contribute to more than 70 data-source and ML-framework integrations. Stay tuned for the upcoming features - including more control\",\"{'_chunk_index': 1765, '_content_column': 'text_content', '_end_char': 1634163, '_original_doc_id': 'c2b24e025ed01388', '_original_row_index': '0', '_source': 'TextChunkingPreprocessor', '_start_char': 1633168, '_updated_at': '2025-07-01 12:36:41', 'url': 'https://docs.mindsdb.com/llms-full.txt'}\",0.24353297838910382,0.9321520551316381\r\n"}], "trace_id": ""}
data: {"output": "MindsDB is an open-source platform that integrates databases and OpenAI to enable developers to extract insights from text data using SQL commands. It leverages powerful natural language processing (NLP) models, powered by large pre-trained language models from OpenAI, to answer questions and complete prompts without the need for manual development work. This makes it easier for developers to incorporate NLP capabilities into their applications, saving time and resources compared to traditional machine learning development methods. MindsDB is recognized as the fastest-growing open-source applied machine-learning platform, with a community contributing to over 70 data-source and ML-framework integrations.", "messages": [{"content": "MindsDB is an open-source platform that integrates databases and OpenAI to enable developers to extract insights from text data using SQL commands. It leverages powerful natural language processing (NLP) models, powered by large pre-trained language models from OpenAI, to answer questions and complete prompts without the need for manual development work. This makes it easier for developers to incorporate NLP capabilities into their applications, saving time and resources compared to traditional machine learning development methods. MindsDB is recognized as the fastest-growing open-source applied machine-learning platform, with a community contributing to over 70 data-source and ML-framework integrations."}], "trace_id": ""}
data: {"type": "end"}
````
# Update Agent
Source: https://docs.mindsdb.com/rest/agents/update
**PUT `/api/projects/{project_name}/agents/{agent_name}`**
This API endpoint updates an agent using the `PUT` method.
Learn more about agents and the available parameters following [this doc page](/mindsdb_sql/agents/agent).
### Path Parameters
Defines the project where the agents are located. Note that the default project name is `mindsdb`.
Defines the agent name.
### Body
Name of the agent.
Stores parameters of the model, including `provider`, `model_name`, and `api_key`.
Stores data connected to an agent, including `tables` and `knowledge_bases`.
Stores instruction to an agent. This should contain the description of connected data.
### Response
Unique identifier for the agent.
The name assigned to the agent.
The ID of the project where the agent resides.
Timestamp indicating when the agent was created.
Timestamp indicating when the agent was last updated.
Stores data connected to an agent, including `tables` and `knowledge_bases`.
Stores parameters of the model, including `provider`, `model_name`, and `api_key`.
Stores instruction to an agent. This should contain the description of connected data.
```shell Shell theme={null}
curl --request PUT \
--url http://127.0.0.1:47334/api/projects/mindsdb/agents/my_agent \
--header 'Content-Type: application/json' \
--data '{
"agent": {
"model": {
"provider": "openai",
"model_name": "gpt-4.1",
"api_key": "sk-xxx"
}
}
}'
```
```json Response theme={null}
{
"id": 197,
"name": "my_agent",
"project_id": 1,
"created_at": "2025-07-09 12:58:24.868202",
"updated_at": "2025-07-09 12:58:24.868199",
"data": {
"knowledge_bases": ["my_project.my_kb"],
"tables": ["my_data_source.my_table"]
},
"model": {
"provider": "openai",
"model_name": "gpt-4.1",
"api_key": "sk-xxx"
},
"prompt_template": "my_project.my_kb stores documentation of MindsDB, my_data_source.my_table stores documentation of MindsDB"
}
```
# Authentication
Source: https://docs.mindsdb.com/rest/authentication
MindsDB provides an optional authentication mechanism for its HTTP API. This includes setting up a username and a password for the MindsDB instance. Learn [more here](/setup/custom-config#auth).
If this authentication method is defined in the MindsDB configuration file, it is required to authenticate oneself when using the REST API endpoints of this MindsDB instance.
**Here is how to authenticate an HTTP session for calling MindsDB REST APIs.**
1. Call the `login` endpoint with the username and password parameters.
```
curl --request POST --url 'http://127.0.0.1:47334/api/login' \
--header 'Content-Type: application/json' \
--data-raw '{"username":"your-username","password":"your-password"}' -v
```
This command returns an HTTP status code 200 if the request is successful, along with a token in the response body.
2. Call any other endpoint providing the token.
```
curl --request GET \
--url http://127.0.0.1:47334/api/projects/mindsdb/... \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer pat_your_mindsdb_token_here' \
--data '{
...
}'
```
For example, query an agent under the authenticated session:
```
curl --request POST \
--url http://127.0.0.1:47334/api/projects/mindsdb/agents/my_agent/completions/stream \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer pat_your_mindsdb_token_here' \
--data '{
"messages": [
{
"question": "What is MindsDB?",
"answer": ""
}
]
}'
```
# Connect a Data Source
Source: https://docs.mindsdb.com/rest/databases/create-databases
POST /api/databases
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Remove a Data Source
Source: https://docs.mindsdb.com/rest/databases/delete-databases
DELETE /api/databases/{databaseName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Get a Data Source
Source: https://docs.mindsdb.com/rest/databases/list-database
GET /api/databases/{databaseName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# List Data Sources
Source: https://docs.mindsdb.com/rest/databases/list-databases
GET /api/databases
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Update a Data Source
Source: https://docs.mindsdb.com/rest/databases/update-databases
PUT /api/databases/{databaseName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Remove a File
Source: https://docs.mindsdb.com/rest/files/delete
DELETE /api/files/{fileName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# List Files
Source: https://docs.mindsdb.com/rest/files/list
GET /api/files
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Upload a File
Source: https://docs.mindsdb.com/rest/files/upload
PUT /api/files/{fileName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
Note that the trailing whitespaces on column names are erased upon uploading a file to MindsDB.
# Create a Job
Source: https://docs.mindsdb.com/rest/jobs/create
POST /api/projects/{projectName}/jobs
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Remove a Job
Source: https://docs.mindsdb.com/rest/jobs/delete
DELETE /api/projects/{projectName}/jobs/{jobName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Get a Job
Source: https://docs.mindsdb.com/rest/jobs/get
GET /api/projects/{projectName}/jobs/{jobName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# List Jobs
Source: https://docs.mindsdb.com/rest/jobs/list
GET /api/projects/{projectName}/jobs
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Alter Knowledge Base
Source: https://docs.mindsdb.com/rest/knowledge_bases/alter
**PUT `/api/projects/{project_name}/knowledge_bases/{kb_name}`**
This API endpoint alters an existing knowledge base using the `PUT` method.
Learn more about knowledge bases following [this doc page](/mindsdb_sql/knowledge_bases/overview).
### Path Parameters
Defines the project where the knowledge bases are located. Note that the default project name is `mindsdb`.
Defines the knowledge base to be altered.
### Body
Defines the embedding model used to embed data in vector representation.
Defines the reranking model used to rerank the search results by relevance.
Defines the columns that store content to be embedded.
Defines the columns that are considered metadata.
Defines the column that uniquely identifies each row from the data inserted into the knowledge base.
Defines the data preprocessing parameters.
### Response
Unique identifier for the knowledge base.
The name assigned to the knowledge base.
The ID of the project where the knowledge base resides.
The vector store used for storing vector embeddings.
The name of the collection or table within the vector database.
Timestamp indicating when the knowledge base was last updated.
Timestamp indicating when the knowledge base was created.
Optional field for linking specific queries to this knowledge base.
The embedding model used to convert content into vector representations.
Optional model used to rerank search results based on relevance.
Optional list of columns used for metadata-based filtering or enrichment.
Optional list of columns treated as the main content for embedding and retrieval.
The name of the column that uniquely identifies each content row.
A nested object that contains additional configuration parameters.
The name of the embedding model associated with this knowledge base at creation time.
```shell Shell theme={null}
curl -X PUT http://127.0.0.1:47334/api/projects/mindsdb/knowledge_bases/my_kb \
-H "Content-Type: application/json" \
-d '{
"knowledge_base": {
"embedding_model": {
"api_key": "sk-xxx"
},
"reranking_model": {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-xxx"
},
"content_columns": ["notes"],
"metadata_columns": ["product"],
"id_column": "order_id"
}
}'
```
```json Response theme={null}
{
"id": 2,
"name": "my_kb",
"project_id": 1,
"vector_database": "my_kb_chromadb",
"vector_database_table": "default_collection",
"updated_at": "2025-06-26 10:24:06.311655",
"created_at": "2025-06-26 10:24:06.311654",
"query_id": null,
"embedding_model": {
"provider": "openai",
"model_name": "text-embedding-3-small",
"api_key": "******"
},
"reranking_model": {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "******"
},
"metadata_columns": [
"product"
],
"content_columns": [
"notes"
],
"id_column": "order_id",
"params": {
"created_embedding_model": "kb_embedding_my_kbxxx"
}
}
```
# Create Knowledge Base
Source: https://docs.mindsdb.com/rest/knowledge_bases/create
**POST `/api/projects/{project_name}/knowledge_bases`**
This API endpoint creates a knowledge base using the `POST` method.
Learn more about knowledge bases following [this doc page](/mindsdb_sql/knowledge_bases/overview).
### Path Parameters
Defines the project where the knowledge bases are located. Note that the default project name is `mindsdb`.
### Body
Name of the knowledge base.
Underlying vector database that stores the embeddings.
Defines the embedding model used to embed data in vector representation.
Defines the reranking model used to rerank the search results by relevance.
Defines the columns that store content to be embedded.
Defines the columns that are considered metadata.
Defines the column that uniquely identifies each row from the data inserted into the knowledge base.
Defines the data preprocessing parameters.
### Response
Unique identifier for the knowledge base.
The name assigned to the knowledge base.
The ID of the project where the knowledge base resides.
The vector store used for storing vector embeddings.
The name of the collection or table within the vector database.
Timestamp indicating when the knowledge base was last updated.
Timestamp indicating when the knowledge base was created.
Optional field for linking specific queries to this knowledge base.
The embedding model used to convert content into vector representations.
Optional model used to rerank search results based on relevance.
Optional list of columns used for metadata-based filtering or enrichment.
Optional list of columns treated as the main content for embedding and retrieval.
The name of the column that uniquely identifies each content row.
A nested object that contains additional configuration parameters.
The name of the embedding model associated with this knowledge base at creation time.
```shell Shell theme={null}
curl -X POST http://127.0.0.1:47334/api/projects/mindsdb/knowledge_bases \
-H "Content-Type: application/json" \
-d '{
"knowledge_base": {
"name": "my_kb",
"storage": {
"database": "my_kb_chromadb",
"table": "default_collection"
},
"embedding_model": {
"provider": "openai",
"model_name": "text-embedding-3-small",
"api_key": "sk-xxx"
},
"reranking_model": {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-xxx"
},
"content_columns": ["notes"],
"metadata_columns": ["product"],
"id_column": "order_id"
}
}'
```
```json Response theme={null}
{
"id": 2,
"name": "my_kb",
"project_id": 1,
"vector_database": "my_kb_chromadb",
"vector_database_table": "default_collection",
"updated_at": "2025-06-26 10:24:06.311655",
"created_at": "2025-06-26 10:24:06.311654",
"query_id": null,
"embedding_model": {
"provider": "openai",
"model_name": "text-embedding-3-small",
"api_key": "******"
},
"reranking_model": {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "******"
},
"metadata_columns": [
"product"
],
"content_columns": [
"notes"
],
"id_column": "order_id",
"params": {
"created_embedding_model": "kb_embedding_my_kbxxx"
}
}
```
# Delete Knowledge Base
Source: https://docs.mindsdb.com/rest/knowledge_bases/delete
**DELETE `/api/projects/{project_name}/knowledge_bases/{knowledge_base_name}`**
This API endpoint deletes a knowledge base using the `DELETE` method.
Learn more about knowledge bases following [this doc page](/mindsdb_sql/knowledge_bases/overview).
### Path Parameters
Defines the project where the knowledge bases are located. Note that the default project name is `mindsdb`.
Defines the knowledge base name.
### Body
None.
### Response
None.
```shell Shell theme={null}
curl -X DELETE http://127.0.0.1:47334/api/projects/mindsdb/knowledge_bases/my_kb
```
```json Response theme={null}
200 OK
```
# Get Knowledge Base
Source: https://docs.mindsdb.com/rest/knowledge_bases/get
**GET `/api/projects/{project_name}/knowledge_bases/{knowledge_base_name}`**
This API endpoint lists details about a knowledge base using the `GET` method.
Learn more about knowledge bases following [this doc page](/mindsdb_sql/knowledge_bases/overview).
### Path Parameters
Defines the project where the knowledge bases are located. Note that the default project name is `mindsdb`.
Defines the knowledge base name to get its details.
### Body
None.
### Response
Unique identifier for the knowledge base.
The name assigned to the knowledge base.
The ID of the project where the knowledge base resides.
The name of the project where the knowledge base resides.
The vector store used for storing vector embeddings.
The name of the collection or table within the vector database.
Timestamp indicating when the knowledge base was last updated.
Timestamp indicating when the knowledge base was created.
Optional field for linking specific queries to this knowledge base.
The embedding model used to convert content into vector representations.
Optional model used to rerank search results based on relevance.
Optional list of columns used for metadata-based filtering or enrichment.
Optional list of columns treated as the main content for embedding and retrieval.
The name of the column that uniquely identifies each content row.
A nested object that contains additional configuration parameters.
The name of the embedding model associated with this knowledge base at creation time.
```shell Shell theme={null}
curl -X GET http://127.0.0.1:47334/api/projects/mindsdb/knowledge_bases/my_kb
```
```json Response theme={null}
{
"id": 2,
"name": "my_kb",
"project_id": 1,
"vector_database": "my_kb_chromadb",
"vector_database_table": "default_collection",
"updated_at": "2025-06-26 10:24:06.311655",
"created_at": "2025-06-26 10:24:06.311654",
"query_id": null,
"embedding_model": {
"provider": "openai",
"model_name": "text-embedding-3-small",
"api_key": "******"
},
"reranking_model": {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "******"
},
"metadata_columns": [
"product"
],
"content_columns": [
"notes"
],
"id_column": "order_id",
"params": {
"created_embedding_model": "kb_embedding_my_kb"
}
}
```
# Insert Into Knowledge Base
Source: https://docs.mindsdb.com/rest/knowledge_bases/insert
**PUT `/api/projects/{project_name}/knowledge_bases/{knowledge_base_name}`**
This API endpoint inserts data into a knowledge base using the `PUT` method.
Learn more about knowledge bases following [this doc page](/mindsdb_sql/knowledge_bases/overview).
### Path Parameters
Defines the project where the knowledge bases are located. Note that the default project name is `mindsdb`.
Defines the knowledge base name.
### Body
Defines the SQL query used to fetch data to be inserted into the knowledge base.
Defines raw data to be inserted into the knowledge base.
Defines the list of files to be inserted into the knowledge base.
Defines the list of URLs to be crawled and their content inserted into the knowledge base. For example, `"urls": ["https://docs.mindsdb.com/mindsdb_sql/knowledge_bases/overview"]`.
Defines the limit of pages to be crawled. For example, `"limit": 10`.
Defines the crawl depth limit for URLs. For example, `"crawl_depth": 2`.
Defines the list of domains to be filtered. For example, `"filters": { "allowed_domains": ["example.com"] }`.
Learn more about the [web crawler here](/integrations/app-integrations/web-crawler).
### Response
None.
```shell Shell theme={null}
curl -X PUT http://127.0.0.1:47334/api/projects/mindsdb/knowledge_bases/my_kb \
-H "Content-Type: application/json" \
-d '{
"knowledge_base": {
"rows": [
{
"order_id": "123",
"product": "Widget A",
"notes": "Great product, would buy again"
},
{
"order_id": "124",
"product": "Widget B",
"notes": "Poor quality"
}
],
"query": "SELECT * FROM sample_data.orders"
}
}'
```
```json Response theme={null}
200 OK
```
# List Knowledge Bases
Source: https://docs.mindsdb.com/rest/knowledge_bases/list
**GET `/api/projects/{project_name}/knowledge_bases`**
This API endpoint lists all available knowledge bases using the `GET` method.
Learn more about knowledge bases following [this doc page](/mindsdb_sql/knowledge_bases/overview).
### Path Parameters
Defines the project where the knowledge bases are located. Note that the default project name is `mindsdb`.
### Body
None.
### Response
Unique identifier for the knowledge base.
The name assigned to the knowledge base.
The ID of the project where the knowledge base resides.
The name of the project where the knowledge base resides.
The vector store used for storing vector embeddings.
The name of the collection or table within the vector database.
Timestamp indicating when the knowledge base was last updated.
Timestamp indicating when the knowledge base was created.
Optional field for linking specific queries to this knowledge base.
The embedding model used to convert content into vector representations.
Optional model used to rerank search results based on relevance.
Optional list of columns used for metadata-based filtering or enrichment.
Optional list of columns treated as the main content for embedding and retrieval.
The name of the column that uniquely identifies each content row.
A nested object that contains additional configuration parameters.
The name of the embedding model associated with this knowledge base at creation time.
The default storage used for storing vector data.
```shell Shell theme={null}
curl -X GET http://127.0.0.1:47334/api/projects/mindsdb/knowledge_bases
```
```json Response theme={null}
[
{
"id": 1,
"name": "my_kb",
"project_id": 1,
"vector_database": "my_kb_chromadb",
"vector_database_table": "default_collection",
"updated_at": "2025-06-25 13:04:01.864625",
"created_at": "2025-06-25 13:04:01.864624",
"query_id": null,
"embedding_model": null,
"reranking_model": null,
"metadata_columns": null,
"content_columns": null,
"id_column": null,
"params": {
"created_embedding_model": "kb_embedding_my_kb",
"default_vector_storage": "my_kb_chromadb"
},
"project_name": "mindsdb"
}
]
```
# Query Knowledge Base
Source: https://docs.mindsdb.com/rest/knowledge_bases/query
**POST `/api/sql/query`**
This API endpoint queries a knowledge base using the `POST` method. Learn more about [querying knowledge bases using semantic search and metadata filtering here](/mindsdb_sql/knowledge_bases/query).
Learn more about knowledge bases following [this doc page](/mindsdb_sql/knowledge_bases/overview).
### Path Parameters
None.
### Body
A query that is sent to the MindsDB instance.
### Response
Contains data stored in the knowledge base.
```shell Shell theme={null}
curl -X POST http://127.0.0.1:47334/api/sql/query \
--header 'Content-Type: application/json' \
--data '{
"query": "SELECT * FROM my_kb;"
}'
```
```json Response theme={null}
{
"type": "table",
"column_names": [
"id",
"chunk_id",
"chunk_content",
"metadata",
"relevance",
"distance"
],
"data": [
[
"A1B",
"A1B:notes:1of1:0to20",
"Request color: black",
{
"chunk_index": 0,
"content_column": "notes",
"end_char": 20,
"original_doc_id": "A1B",
"original_row_index": "0",
"product": "Wireless Mouse",
"source": "TextChunkingPreprocessor",
"start_char": 0
},
null,
null
],
[
"3XZ",
"3XZ:notes:1of1:0to19",
"Gift wrap requested",
{
"chunk_index": 0,
"content_column": "notes",
"end_char": 19,
"original_doc_id": "3XZ",
"original_row_index": "1",
"product": "Bluetooth Speaker",
"source": "TextChunkingPreprocessor",
"start_char": 0
},
null,
null
],
[
"Q7P",
"Q7P:notes:1of1:0to22",
"Prefer aluminum finish",
{
"chunk_index": 0,
"content_column": "notes",
"end_char": 22,
"original_doc_id": "Q7P",
"original_row_index": "2",
"product": "Aluminum Laptop Stand",
"source": "TextChunkingPreprocessor",
"start_char": 0
},
null,
null
]
],
"context": {
"show_secrets": false,
"db": "mindsdb"
}
}
```
# REST API
Source: https://docs.mindsdb.com/rest/overview
MindsDB provides REST API endpoints, enabling incorporation of AI building blocks into applications.
This section introduces REST API endpoints provided by MindsDB to bring data and AI together.
Follow these steps to get started:
Learn more about [usage here](/rest/usage).
Connect your data source to MindsDB via [this endpoint](/rest/databases/create-databases).
Explore all available [data sources here](/integrations/data-overview).
Create, train, and deploy AI/ML models within MindsDB via [this endpoint](/rest/models/train-model).
Explore all available [AI engines here](/integrations/ai-overview).
Query for predictions via [this endpoint](/rest/models/query-model).
# Get a Project
Source: https://docs.mindsdb.com/rest/projects/get-project
GET /api/projects/{projectName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# List Projects
Source: https://docs.mindsdb.com/rest/projects/get-projects
GET /api/projects
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Query
Source: https://docs.mindsdb.com/rest/sql
POST /api/sql/query
## Description
This API provides a REST endpoint for executing the SQL queries. Note:
* This endpoint is a HTTP POST method.
* This endpoint accept data via `application/json` request body.
* The only required key is the `query` which has the SQL statement value.
### Body
String that contains the SQL query that needs to be executed.
### Response
A list with the column names returned
The database where the query is executed
The actual data returned by the query in case of the table response type
The type of the response table | error | ok
```shell Shell theme={null}
curl --request POST \
--url https://cloud.mindsdb.com/api/sql/query \
--header 'Content-Type: application/json' \
--data '
{
"query": "SELECT * FROM example_db.demo_data.home_rentals LIMIT 10;"
}
```
```python Python theme={null}
import requests
url = 'https://cloud.mindsdb.com/api/sql/query'
resp = requests.post(url, json={'query':
'SELECT * FROM example_db.demo_data.home_rentals LIMIT 10;'})
```
```json Response theme={null}
{
"column_names": [
"sqft",
"rental_price"
],
"context": {
"db": "mindsdb"
},
"data": [
[
917,
3901
],
[
194,
2042
]
],
"type": "table"
}
```
# Create a Table
Source: https://docs.mindsdb.com/rest/tables/create-table
POST /api/databases/{databaseName}/tables
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Remove a Table
Source: https://docs.mindsdb.com/rest/tables/delete-table
DELETE /api/databases/{databaseName}/tables/{tableName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Get a Table
Source: https://docs.mindsdb.com/rest/tables/list-table
GET /api/databases/{databaseName}/tables/{tableName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# List Tables
Source: https://docs.mindsdb.com/rest/tables/list-tables
GET /api/databases/{databaseName}/tables
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Usage
Source: https://docs.mindsdb.com/rest/usage
Here is how to connect and use REST API to MindsDB.
## Local MindsDB
This example shows how to execute SQL statements, either raw or parametrized, on MindsDB via REST APIs.
```
import requests
# connect
url = 'http://127.0.0.1:47334/api/sql/query'
# query
resp = requests.post(url, json={
"query": "select * from my_datasource.my_table where name = :name and age = :age",
"params": {"name": "acme", "age": 1},
})
# response
print(resp.text) # alternative: print(resp.json())
```
Note that you can either send a raw SQL and omit the `params` parameter, or send a parametrized SQL in the `query` parameter and provide the `params` parameter that defines the values.
# Create a View
Source: https://docs.mindsdb.com/rest/views/create-view
POST /api/projects/{projectName}/views
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Remove a View
Source: https://docs.mindsdb.com/rest/views/delete-views
DELETE /api/projects/{projectName}/views/{viewName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Get a View
Source: https://docs.mindsdb.com/rest/views/list-view
GET /api/projects/{projectName}/views/{viewName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# List Views
Source: https://docs.mindsdb.com/rest/views/list-views
GET /api/projects/{projectName}/views
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# Update a View
Source: https://docs.mindsdb.com/rest/views/update-view
PUT /api/projects/{projectName}/views/{viewName}
The REST API endpoints can be used with MindsDB running locally at [http://127.0.0.1:47334/api](http://127.0.0.1:47334/api).
# How to Use Agents
Source: https://docs.mindsdb.com/sdks/javascript/agents
Currently, there is no JavaScript syntax for using Agents. To use Agents from JavaScript SDK, refer to the [Agents documentation in SQL](/mindsdb_sql/agents/agent) and execute SQL queries as below.
```
const query = `
CREATE AGENT my_agent
USING
model = {
"provider": "openai",
"model_name" : "gpt-4o",
"api_key": "sk-abc123"
},
data = {
"knowledge_bases": ["mindsdb.sales_kb", "mindsdb.orders_kb"],
"tables": ["postgres_conn.customers", "mysql_conn.products"]
},
prompt_template='
mindsdb.sales_kb stores sales analytics data
mindsdb.orders_kb stores order data
postgres_conn.customers stores customers data
mysql_conn.products stores products data
';
`;
const queryResult = await MindsDB.SQL.runQuery(query);
```
# Connect
Source: https://docs.mindsdb.com/sdks/javascript/connect
Before performing any operations, you must connect to MindsDB. By default, all operations will go through [MindsDB Cloud REST APIs](/rest/sql), but you can use a self-hosted version of MindsDB as well.
Here is how to connect to your local MindsDB server:
```
import MindsDB from 'mindsdb-js-sdk';
// const MindsDB = require("mindsdb-js-sdk").default; // alternative for CommonJS syntax
try {
// No authentication needed for self-hosting
await MindsDB.connect({ // alternative for ES6 module syntax: await MindsDB.default.connect({
host: 'http://127.0.0.1:47334'
});
console.log('connected');
} catch(error) {
// Failed to connect to local instance
console.log(error);
}
```
Here is how to connect using your own Axios instance (see [details on the default instance](https://github.com/mindsdb/mindsdb-js-sdk/blob/main/src/util/http.ts)):
```
import MindsDB from 'mindsdb-js-sdk';
// const MindsDB = require("mindsdb-js-sdk").default; // alternative for CommonJS syntax
import axios from 'axios';
// Use 'host' option in MindsDB.connect to specify base URL override
const customAxios = axios.create({
timeout: 1000,
});
try {
await MindsDB.connect({
user: mindsdbuser@gmail.com,
password: mypassword,
httpClient: customAxios
});
console.log('connected');
} catch(error) {
// Failed to authenticate
console.log(error);
}
```
Please note that all methods that use `await` must be wrapped in an `async` function, like this:
```
(async() => {
try {
// No authentication needed for self-hosting
await MindsDB.connect({
host: 'http://127.0.0.1:47334'
});
console.log('connected');
} catch(error) {
// Failed to connect to local instance
console.log(error);
}
})();
```
# Connect a Data Source
Source: https://docs.mindsdb.com/sdks/javascript/create_database
## Description
The `MindsDB.Databases.createDatabase` function connects a new data source to MindsDB.
## Syntax
Here is how to connect our sample MySQL database:
```
const connectionParams = {
'user': 'user',
'port': 3306,
'password': 'MindsDBUser123!',
'host': 'samples.mindsdb.com',
'database': 'public'
}
try {
const mysqlDatabase = await MindsDB.Databases.createDatabase(
'mysql_datasource',
'mysql',
connectionParams);
console.log('connected a database');
} catch (error) {
// Couldn't connect to database
console.log(error);
}
```
First, we define the connection parameters and then use the `createDatabase` function to connect a database.
# Create a Table
Source: https://docs.mindsdb.com/sdks/javascript/create_table
## Description
The `runQuery()` function executes a query given as its argument directly in MindsDB.
## Syntax
Here is the syntax:
```
const query = `CREATE TABLE integration_name.table_name (SELECT * FROM data);`;
const queryResult = await MindsDB.SQL.runQuery(query);
```
# Create a View
Source: https://docs.mindsdb.com/sdks/javascript/create_view
## Description
The `createView()` function creates a view in MindsDB.
## Syntax
Here is the syntax:
```
const viewSelect = `SELECT t.sqft, t.location, m.rental_price
FROM mysql_demo_db.home_rentals as t
JOIN mindsdb.home_rentals_model as m`;
const predictionsView = await MindsDB.Views.createView(
'view_name',
'project_name',
viewSelect);
```
# Delete From a Table
Source: https://docs.mindsdb.com/sdks/javascript/delete_from
## Description
The `runQuery()` function executes a query given as its argument directly in MindsDB.
## Syntax
Here is the syntax:
```
const query = `DELETE FROM datasource_name.table_name WHERE …`;
const queryResult = await MindsDB.SQL.runQuery(query);
```
# Remove a Table
Source: https://docs.mindsdb.com/sdks/javascript/delete_table
## Description
The `runQuery()` function executes a query given as its argument directly in MindsDB.
## Syntax
Here is the syntax:
```
const query = `DROP TABLE integration_name.table_name;`;
const queryResult = await MindsDB.SQL.runQuery(query);
```
# Remove a Data Source
Source: https://docs.mindsdb.com/sdks/javascript/drop_database
## Description
The `delete` function removes a data source from MindsDB. Please note that in order to delete a connected data source, we need to fetch it first with the `getDatabase` function.
## Syntax
Here is how to get an existing database and remove it:
```
try {
const db = await MindsDB.Databases.getDatabase('mysql_datasource');
console.log('got a database')
// Deleting a database
if (db) {
try {
await db.delete();
console.log('deleted a database');
} catch (error) {
// Couldn't delete a database
console.log(error);
}
}
} catch (error) {
// Couldn't connect to database
console.log(error);
}
```
# Remove a View
Source: https://docs.mindsdb.com/sdks/javascript/drop_view
## Description
The `deleteView()` function deletes an existing view from MindsDB.
## Syntax
Here is the syntax:
```
await MindsDB.Views.deleteView(
'view_name',
'project_name');
```
# Get a Data Source
Source: https://docs.mindsdb.com/sdks/javascript/get_database
You can save a data sources into a variable using the code below.
```
const db = await MindsDB.Databases.getDatabase('mysql_datasource');
```
# Insert Into a Table
Source: https://docs.mindsdb.com/sdks/javascript/insert_into_table
## Description
The `runQuery()` function executes a query given as its argument directly in MindsDB.
## Syntax
Here is the syntax:
```
const query = `INSERT INTO integration_name.table_name (SELECT ...)`;
const queryResult = await MindsDB.SQL.runQuery(query);
```
# Installation
Source: https://docs.mindsdb.com/sdks/javascript/installation
The MindsDB JavaScript SDK allows you to unlock the power of machine learning right inside your web applications. Read along to see how to install the MindsDB's JavaScript SDK.
## How to Install
To install the MindsDB JavaScript SDK, run the below command:
```bash theme={null}
npm install --save mindsdb-js-sdk
```
Here is the expected output:
# Join Tables On
Source: https://docs.mindsdb.com/sdks/javascript/join_on
## Description
The `runQuery()` function executes a query given as its argument directly in MindsDB.
## Syntax
Here is the syntax:
```
const query = `SELECT * FROM table_name t JOIN another_table a ON t…=a…`;
const queryResult = await MindsDB.SQL.runQuery(query);
```
# List Data Handlers
Source: https://docs.mindsdb.com/sdks/javascript/list_data_handlers
Here is how you can fetch all available data handlers directly from JavaScript code:
```
const query = 'SHOW HANDLERS WHERE type = \‘data\'’;
result = await MindsDB.SQL.runQuery(query);
console.log(result);
```
# List Data Sources
Source: https://docs.mindsdb.com/sdks/javascript/list_databases
You can list all data sources using the code below.
```
const query = 'SHOW FULL DATABASES WHERE type = \'data\'';
result = await MindsDB.SQL.runQuery(query); // alternative for ES6 module syntax: MindsDB.default.SQL.runQuery(query)
console.log(result);
```
# List Projects
Source: https://docs.mindsdb.com/sdks/javascript/list_projects
## Description
The `getAllProjects()` function lists all available projects.
## Syntax
Here is how to list all available projects:
```
const allProjects = await MindsDB.Projects.getAllProjects();
console.log('all projects:')
allProjects.forEach(p => {
console.log(p.name);
});
```
# List Views
Source: https://docs.mindsdb.com/sdks/javascript/list_views
## Description
The getAllViews() function lists all available views.
## Syntax
Here is how to list all available views:
```sql theme={null}
const allViews = await MindsDB.Views.getAllViews();
console.log('all views:')
allViews.forEach(v => {
console.log(v.name);
});
```
# Native Queries
Source: https://docs.mindsdb.com/sdks/javascript/native_queries
## Description
The `runQuery()` function executes a query given as its argument directly in MindsDB. And the native queries syntax ensures that the query is executed directly on the connected data source.
## Syntax
Here is the syntax:
```
const query = `SELECT * FROM datasource_name ()`;
const queryResult = await MindsDB.SQL.runQuery(query);
```
# Overview
Source: https://docs.mindsdb.com/sdks/javascript/overview
MindsDB provides JavaScript SDK, enabling its integration into JavaScript environments.
Follow these steps to get started:
For JavaScript, [install the package](/sdks/javascript/installation).
Connect a data source in [JavaScript](/sdks/javascript/create_database).
Explore all available [data sources here](/integrations/data-overview).
Configure an AI engine in [JavaScript](/sdks/javascript/create_ml_engine).
Explore all available [AI engines here](/integrations/ai-overview).
Create and deploy an AI/ML model in [JavaScript](/sdks/javascript/create_model).
Query for predictions in [JavaScript](/sdks/javascript/batchQuery).
Automate tasks by scheduling jobs in [JavaScript](/sdks/javascript/create_job).
# Query a File
Source: https://docs.mindsdb.com/sdks/javascript/query_files
## Description
The `runQuery()` function executes a query given as its argument directly in MindsDB.
## Syntax
Here is the syntax:
```
const query = `SELECT * FROM files.file_name`;
const queryResult = await MindsDB.SQL.runQuery(query);
```
# Query a Table
Source: https://docs.mindsdb.com/sdks/javascript/query_table
## Description
The `runQuery()` function executes a query given as its argument directly in MindsDB.
## Syntax
Here is the syntax:
```
const query = `SELECT * FROM table_name`;
const queryResult = await MindsDB.SQL.runQuery(query);
```
# Query a View
Source: https://docs.mindsdb.com/sdks/javascript/query_view
## Description
The `runQuery()` function executes a query given as its argument directly in MindsDB.
## Syntax
Here is the syntax:
```
const query = `SELECT * FROM project_name.view_name`;
const queryResult = await MindsDB.SQL.runQuery(query);
```
# Update a Table
Source: https://docs.mindsdb.com/sdks/javascript/update_table
## Description
The `runQuery()` function executes a query given as its argument directly in MindsDB.
## Syntax
Here is the syntax:
```
const query = `UPDATE integration_name.table_name
SET column_name = new_value
WHERE column_name = old_value`;
const queryResult = await MindsDB.SQL.runQuery(query);
```
# How to Use Agents
Source: https://docs.mindsdb.com/sdks/python/agents
Agents enable conversation with data, including structured and unstructured data connected to MindsDB.
## Create Agents
Here is the syntax for creating an agent:
```python theme={null}
agent = server.agents.create(
'my_agent',
model={
'model_name': 'gpt-4o',
'provider': 'openai',
'api_key': 'sk-abc123',
'base_url': 'http://example.com',
'api_version': '2024-02-01'
},
data={
'knowledge_bases': ['project_name.kb_name', ...]
'tables': ['datasource_conn_name.table_name', ...]
},
prompt_template='describe data'
)
```
It creates an agent that uses the defined model and has access to the connected data. Here is how to list all available agents.
```python theme={null}
agents = server.agents.list()
print(agents)
```
The following sections explain all the agent parameters.
### `model`
This parameter defines the underlying language model, including:
* `provider`
It is a required parameter. It defines the model provider from the list below.
* `model_name`
It is a required parameter. It defines the model name from the list below.
* `api_key`
It is an optional parameter (applicable to selected providers), which stores the API key to access the model. Users can provide it either in this `api_key` parameter, or using [environment variables](/mindsdb_sql/functions/from_env).
* `base_url`
It is an optional parameter (applicable to selected providers), which stores the base URL for accessing the model. It is the root URL used to send API requests.
* `api_version`
It is an optional parameter (applicable to selected providers), which defines the API version.
The available models and providers include the following.
Available models:
* claude-3-opus-20240229
* claude-3-sonnet-20240229
* claude-3-haiku-20240307
* claude-2.1
* claude-2.0
* claude-instant-1.2
Available models:
* gemini-2.5-pro-preview-03-25
* gemini-2.0-flash
* gemini-2.0-flash-lite
* gemini-1.5-flash
* gemini-1.5-flash-8b
* gemini-1.5-pro
Available models:
* gemma
* llama2
* mistral
* mixtral
* llava
* neural-chat
* codellama
* dolphin-mixtral
* qwen
* llama2-uncensored
* mistral-openorca
* deepseek-coder
* nous-hermes2
* phi
* orca-mini
* dolphin-mistral
* wizard-vicuna-uncensored
* vicuna
* tinydolphin
* llama2-chinese
* openhermes
* zephyr
* nomic-embed-text
* tinyllama
* openchat
* wizardcoder
* phind-codellama
* starcoder
* yi
* orca2
* falcon
* starcoder2
* wizard-math
* dolphin-phi
* nous-hermes
* starling-lm
* stable-code
* medllama2
* bakllava
* codeup
* wizardlm-uncensored
* solar
* everythinglm
* sqlcoder
* nous-hermes2-mixtral
* stable-beluga
* yarn-mistral
* samantha-mistral
* stablelm2
* meditron
* stablelm-zephyr
* magicoder
* yarn-llama2
* wizard-vicuna
* llama-pro
* deepseek-llm
* codebooga
* mistrallite
* dolphincoder
* nexusraven
* open-orca-platypus2
* all-minilm
* goliath
* notux
* alfred
* megadolphin
* xwinlm
* wizardlm
* duckdb-nsql
* notus
Available models:
* gpt-3.5-turbo
* gpt-3.5-turbo-16k
* gpt-3.5-turbo-instruct
* gpt-4
* gpt-4-32k
* gpt-4-1106-preview
* gpt-4-0125-preview
* gpt-4.1
* gpt-4.1-mini
* gpt-4o
* o4-mini
* o3-mini
* o1-mini
Available models:
* microsoft/phi-3-mini-4k-instruct
* mistralai/mistral-7b-instruct-v0.2
* writer/palmyra-med-70b
* mistralai/mistral-large
* mistralai/codestral-22b-instruct-v0.1
* nvidia/llama3-chatqa-1.5-70b
* upstage/solar-10.7b-instruct
* google/gemma-2-9b-it
* adept/fuyu-8b
* google/gemma-2b
* databricks/dbrx-instruct
* meta/llama-3\_1-8b-instruct
* microsoft/phi-3-medium-128k-instruct
* 01-ai/yi-large
* nvidia/neva-22b
* meta/llama-3\_1-70b-instruct
* google/codegemma-7b
* google/recurrentgemma-2b
* google/gemma-2-27b-it
* deepseek-ai/deepseek-coder-6.7b-instruct
* mediatek/breeze-7b-instruct
* microsoft/kosmos-2
* microsoft/phi-3-mini-128k-instruct
* nvidia/llama3-chatqa-1.5-8b
* writer/palmyra-med-70b-32k
* google/deplot
* meta/llama-3\_1-405b-instruct
* aisingapore/sea-lion-7b-instruct
* liuhaotian/llava-v1.6-mistral-7b
* microsoft/phi-3-small-8k-instruct
* meta/codellama-70b
* liuhaotian/llava-v1.6-34b
* nv-mistralai/mistral-nemo-12b-instruct
* microsoft/phi-3-medium-4k-instruct
* seallms/seallm-7b-v2.5
* mistralai/mixtral-8x7b-instruct-v0.1
* mistralai/mistral-7b-instruct-v0.3
* google/paligemma
* google/gemma-7b
* mistralai/mixtral-8x22b-instruct-v0.1
* google/codegemma-1.1-7b
* nvidia/nemotron-4-340b-instruct
* meta/llama3-70b-instruct
* microsoft/phi-3-small-128k-instruct
* ibm/granite-8b-code-instruct
* meta/llama3-8b-instruct
* snowflake/arctic
* microsoft/phi-3-vision-128k-instruct
* meta/llama2-70b
* ibm/granite-34b-code-instruct
Available models:
* palmyra-x5
* palmyra-x4
Users can define the model for the agent choosing one of the following options.
**Option 1.** Use the `model` parameter to define the specification.
```python theme={null}
...
model={
'model_name': 'gpt-4o',
'provider': 'openai',
'api_key': 'sk-abc123',
'base_url': 'http://example.com',
'api_version': '2024-02-01'
},
...
```
**Option 2.** Define the default model in the [MindsDB configuration file](/setup/custom-config).
If you define `default_llm` in the configuration file, you do not need to provide the `model` parameter when creating an agent. If provide both, then the values from the `model` parameter are used.
You can define the default models in the Settings of the MindsDB Editor GUI.
```bash theme={null}
"default_llm": {
"provider": "openai",
"model_name" : "got-4o",
"api_key": "sk-abc123",
"base_url": "https://example.com/",
"api_version": "2024-02-01"
}
```
### `data`
This parameter stores data connected to the agent, including knowledge bases and data sources connected to MindsDB.
The following parameters store the list of connected data.
* `knowledge_bases` stores the list of [knowledge bases](/mindsdb_sql/knowledge_bases/overview) to be used by the agent.
* `tables` stores the list of tables from data sources connected to MindsDB.
Note that you can insert all tables from a connected data source and all knowledge bases from a project using the `*` syntax.
```python theme={null}
...
data={
'knowledge_bases': ['project_name.*', ...]
'tables': ['datasource_conn_name.*', ...]
},
...
```
### `prompt_template`
This parameter stores instructions for the agent.
It is recommended to provide data description of the data sources listed in the `knowledge_bases` and `tables` parameters to help the agent locate relevant data for answering questions.
### `timeout`
This parameter defines the time the agent can take to come back with an answer.
For example, when the `timeout` parameter is set to 10, the agent has 10 seconds to return an answer. If the agent takes longer than 10 seconds, it aborts the process and comes back with an answer indicating its failure to return an answer within the defined time interval.
## Get Agents
You can get an existing agent with the `get()` method.
```python theme={null}
agent = server.agents.get('sales_agent')
```
## Query Agents
Query an agent to generate responses to questions.
```python theme={null}
completion = agent.completion([{'question': 'What is the average number of orders per customers?', 'answer': None}])
print(completion.content)
```
Here is how to query agents with enabled streaming, allowing users to view agent's thoughts when it is working on answering questions.
```python theme={null}
completion = agent.completion_stream([{'question': 'What is the average number of orders per customers?', 'answer': None}])
for chunk in completion:
print(chunk)
```
## Update Agents
Update existing agents with new data, model, or prompt.
```python theme={null}
agent.data['tables'].append('mysql_demo_db.car_sales')
updated_agent = server.agents.update('my_agent', agent)
print(updated_agent)
```
## Delete Agents
Here is the syntax for deleting an agent:
```python theme={null}
server.agents.drop('my_agent')
```
# Connect
Source: https://docs.mindsdb.com/sdks/python/connect
This documentation describes how you can connect to your MindsDB server from Python code.
Here is how to connect to your local MindsDB server:
```
import mindsdb_sdk
# connects to the default port (47334) on localhost
server = mindsdb_sdk.connect()
# connects to the specified host and port
server = mindsdb_sdk.connect('http://127.0.0.1:47334')
```
# Connect a Data Source
Source: https://docs.mindsdb.com/sdks/python/create_database
## Description
The `get_database()` and `create_database()` functions enable you to use the existing data source or connect a new one.
## Syntax
You can use the `get_database()` method to get an existing database:
```python theme={null}
mysql_demo_db = server.get_database('mysql_demo_db')
```
Or, the `create_database()` method to connect a new data source to MindsDB:
```python theme={null}
mysql_demo_db = server.create_database(
engine = "mysql",
name = "mysql_demo_db",
connection_args = {
"user": "user",
"password": "MindsDBUser123!",
"host": "samples.mindsdb.com",
"port": "3306",
"database": "public"
}
)
```
# Create a Job
Source: https://docs.mindsdb.com/sdks/python/create_job
## Description
The `get_job()` and `create_job()` functions let you save either an existing job or a newly created job into a variable.
## Syntax
Use the `get_job()` method to get an existing job:
```python theme={null}
my_job = project.get_job('my_job')
```
Or, the `create_job()` method to create a job:
```python theme={null}
my_job = project.create_job(
'job_name',
'select * from models',
repeat_str = '1 hour'
)
```
Alternatively, you can create a job using this syntax:
```python theme={null}
with project.jobs.create(name='job_name', repeat_min=1) as job:
job.add_query(model.retrain())
job.add_query(model.predict(database.tables.tbl1))
job.add_query(kb.insert(database.tables.tbl1))
job.add_query('show models')
```
Where:
* `name='job_name'` is the job name,
* `repeat_min=1` indicates periodicity of the job in minutes,
* `job.add_query(model.retrain())` adds a task to a job to retrain a model,
* `job.add_query(model.predict(database.tables.tbl1))` adds a task to a job to make predictions,
* `job.add_query(kb.insert(database.tables.tbl1))` adds a task to a job to insert data into a knowledge base,
* `job.add_query('show models')` adds a task to a job to run the statement provided as string value.
Note that the `add_query()` method adds tasks to a job and takes either String or Query as an argument.
Note that this method enables a job to manipulate Knowledge Bases, Models, Tables, Views, and Queries, but not Databases, Handlers, Jobs, ML Engines, or Projects.
# Create a Project
Source: https://docs.mindsdb.com/sdks/python/create_project
## Description
The `get_project()` and `create_project()` functions fetch an existing project or create a new one.
## Syntax
Use the `get_project()` method to get the default `mindsdb` project:
```python theme={null}
project = server.get_project()
```
Use the `get_project()` method to get other project:
```python theme={null}
project = server.get_project('project_name')
```
Use the `create_project()` method to create a new project:
```python theme={null}
project = server.create_project('project_name')
```
# Create a Table
Source: https://docs.mindsdb.com/sdks/python/create_table
## Description
The `get_table()` and `create_table()` functions let you save either an existing table or a newly created table into a variable.
## Syntax
Use the `get_table()` method to fetch a table from the `mysql_demo_db` database:
```python theme={null}
my_table = mysql_demo_db.get_table('my_table')
```
Or, the `create_table()` method to create a new table:
```python theme={null}
# option 1
my_table = mysql_demo_db.create_table('my_table', 'SELECT * FROM some_table WHERE key=value')
# option 2
my_table = mysql_demo_db.create_table('my_table', base_table)
# option 3
my_table = mysql_demo_db.create_table('my_table', base_table.filter(key='value'))
```
# Create a View
Source: https://docs.mindsdb.com/sdks/python/create_view
## Description
The `get_view()` and `create_view()` functions let you save either an existing view or a newly created view into a variable.
## Syntax
Use the `get_view()` method to get an existing view:
```python theme={null}
my_view = project.get_view('my_view')
```
Or, the `create_view()` method to create a view:
```python theme={null}
my_view = project.create_view(
'view_name',
mysql_demo_db.query('SELECT * FROM my_table LIMIT 100')
)
```
# Remove a File
Source: https://docs.mindsdb.com/sdks/python/delete_file
## Description
In MindsDB, files are treated as tables. These are stored in the default `files` database. To delete a file, you must save this `files` database into a variable and then, run the `tables.drop()` function on it.
## Syntax
Here is the syntax:
```sql theme={null}
files = server.get_database('files')
files.tables.drop('file_name')
```
# Delete From a Table
Source: https://docs.mindsdb.com/sdks/python/delete_from
## Description
The `delete()` function is executed on a table from a data source connected to MindsDB. It deletes rows from a table.
## Syntax
Here is the syntax:
```sql theme={null}
data_source.tables.table_name.delete(key=values, ...)
```
# Remove a Table
Source: https://docs.mindsdb.com/sdks/python/delete_table
## Description
The `tables.drop()` method enables you to delete a table from a connected data source.
## Syntax
Here is the syntax:
```sql theme={null}
data_source.tables.drop('table_name')
```
# Remove a Data Source
Source: https://docs.mindsdb.com/sdks/python/drop_database
## Description
The `drop_database()` function enables you to remove a defined data source connection from MindsDB.
## Syntax
Use the `drop_database()` method to remove a database:
```python theme={null}
server.drop_database('mysql_demo_db')
```
# Remove a Job
Source: https://docs.mindsdb.com/sdks/python/drop_job
## Description
The `drop_job()` function deletes a job from MindsDB.
## Syntax
Use the `drop_job()` method to remove a job:
```python theme={null}
project.drop_job('job_name')
```
# Remove a Project
Source: https://docs.mindsdb.com/sdks/python/drop_project
## Description
The `drop_project()` function removed a project from MindsDB.
## Syntax
Use the `drop_project()` method to remove a project:
```python theme={null}
server.drop_project('project_name')
```
# Remove a View
Source: https://docs.mindsdb.com/sdks/python/drop_view
## Description
The `drop_view()` function removes a view from MindsDB.
## Syntax
Use the `drop_view()` method to remove a view:
```python theme={null}
project.drop_view('view_name')
```
# Get Job History
Source: https://docs.mindsdb.com/sdks/python/get_history
## Description
The `get_history()` function lets you access the job history information where you can find a job record for each job execution, including execution errors.
## Syntax
Use the `get_history()` method to get history of job execution:
```python theme={null}
my_job.get_history()
```
# Insert Into a Table
Source: https://docs.mindsdb.com/sdks/python/insert_into_table
## Description
The `insert()` function is executed on a table from a data source connected to MindsDB. It inserts data into a table.
## Syntax
Here is the syntax:
```sql theme={null}
my_table.insert(table_to_be_inserted)
```
# Installation
Source: https://docs.mindsdb.com/sdks/python/installation
Python SDK enables you to connect to the MindsDB server from Python using HTTP API. Read along to see how to install and test the MindsDB Python SDK.
## Simple Installation
To install the MindsDB Python SDK, run the below command:
```bash theme={null}
pip install mindsdb_sdk
```
Here is the expected output:
## Advanced Installation
Instead of using the `pip install mindsdb_sdk` command, you can install it by cloning the [Python SDK repository](https://github.com/mindsdb/mindsdb_python_sdk). Then you should create a virtual environment, install all dependencies from the `requirements.txt` file, and run tests as instructed below.
To test all the components, go to the project directory (`mindsdb_sdk`) and run the below command:
```bash theme={null}
env PYTHONPATH=./ pytest
```
To generate the API documentation, run the below commands:
```bash theme={null}
pip install sphinx
cd docs
make html
```
The documentation is generated in the `docs/build/html` folder.
# Join Tables On
Source: https://docs.mindsdb.com/sdks/python/join_on
## Description
The `query()` function is executed on a data source connected to MindsDB and saved into a variable. It performs a join operation between tables.
## Syntax
Here is the syntax:
```sql theme={null}
my_data_source.query('SELECT * FROM my_table t JOIN another_table a ON t…=a… LIMIT 100')
```
# How to Create Knowledge Bases
Source: https://docs.mindsdb.com/sdks/python/knowledge_bases/create
A knowledge base is an advanced system that organizes information based on semantic meaning rather than simple keyword matching. It integrates embedding models, reranking models, and vector stores to enable context-aware data retrieval.
Learn more about features of [knowledge bases available via SQL API](/mindsdb_sql/knowledge_bases/overview).
## `create()` Function
Here is the syntax for creating a knowledge base:
```python theme={null}
my_kb = server.knowledge_bases.create(
'my_kb',
embedding_model={
'provider': 'openai',
'model_name': 'text-embedding-3-small',
'api_key': 'sk-...'},
reranking_model={
'provider': 'openai',
'model_name': 'gpt-4',
'api_key': 'sk-...'},
storage=server.databases.my_db.tables.my_table,
metadata_columns=['date', 'creator', ...],
content_columns=['review', 'content', ...],
id_column='id'
)
```
Upon execution, it registers `my_kb` and associates the specified models and storage. `my_kb` is a unique identifier of the knowledge base within MindsDB.
### Supported LLMs
Below is the list of all language models supported for the `embedding_model` and `reranking_model` parameters.
#### `provider = 'openai'`
This provider is supported for both `embedding_model` and `reranking_model`.
Users can define the default embedding and reranking models from OpenAI in Settings of the MindsDB GUI.
Furthermore, users can select `Custom OpenAI API` from the dropdown and use models from any OpenAI-compatible API.
When choosing `openai` as the model provider, users should define the following model parameters.
* `model_name` stores the name of the OpenAI model to be used.
* `api_key` stores the OpenAI API key.
Learn more about the [OpenAI integration with MindsDB here](/integrations/ai-engines/openai).
#### `provider = 'openai_azure'`
This provider is supported for both `embedding_model` and `reranking_model`.
Users can define the default embedding and reranking models from Azure OpenAI in Settings of the MindsDB GUI.
When choosing `openai_azure` as the model provider, users should define the following model parameters.
* `model_name` stores the name of the OpenAI model to be used.
* `api_key` stores the OpenAI API key.
* `base_url` stores the base URL of the Azure instance.
* `api_version` stores the version of the Azure instance.
Users need to log in to their Azure OpenAI instance to retrieve all relevant parameter values. Next, click on `Explore Azure AI Foundry portal` and go to `Models + endpoints`. Select the model and copy the parameter values.
#### `provider = 'google'`
This provider is supported for both `embedding_model` and `reranking_model`.
Users can define the default embedding and reranking models from Google in Settings of the MindsDB GUI.
When choosing `google` as the model provider, users should define the following model parameters.
* `model_name` stores the name of the Google model to be used.
* `api_key` stores the Google API key.
Learn more about the [Google Gemini integration with MindsDB here](/integrations/ai-engines/google_gemini).
#### `provider = 'bedrock'`
This provider is supported for both `embedding_model` and `reranking_model`.
When choosing `bedrock` as the model provider, users should define the following model parameters.
* `model_name` stores the name of the model available via Amazon Bedrock.
* `aws_access_key_id` stores a unique identifier associated with your AWS account, used to identify the user or application making requests to AWS.
* `aws_region_name` stores the name of the AWS region you want to send your requests to (e.g., `"us-west-2"`).
* `aws_secret_access_key` stores the secret key associated with your AWS access key ID. It is used to sign your requests securely.
* `aws_session_token` is an optional parameter that stores a temporary token used for short-term security credentials when using AWS Identity and Access Management (IAM) roles or temporary credentials.
#### `provider = 'snowflake'`
This provider is supported for both `embedding_model` and `reranking_model`.
When choosing `snowflake` as the model provider, users should choose one of the available models from [Snowflake Cortex AI](https://www.snowflake.com/en/product/features/cortex/) and define the following model parameters.
* `model_name` stores the name of the model available via Snowflake Cortex AI.
* `api_key` stores the Snowflake Cortex AI API key.
* `account_id` stores the Snowflake account ID.
Follow the below steps to generate the API key.
1. Generate a key pair according to [this instruction](https://docs.snowflake.com/en/user-guide/key-pair-auth) as below.
* Execute these commands in the console:
```bash theme={null}
# generate private key
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt
# generate public key
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub
```
* Save the public key, that is, the content of rsa\_key.pub, into your database user:
```sql theme={null}
ALTER USER my_user SET RSA_PUBLIC_KEY = ""
```
2. Verify the key pair with the database user.
* Install `snowsql` following [this instruction](https://docs.snowflake.com/en/user-guide/snowsql-install-config).
* Execute this command in the console:
```bash theme={null}
snowsql -a -u my_user --private-key-path rsa_key.p8
```
3. Generate JWT token.
* Download the Python script from [Snowflake's Developer Guide for Authentication](https://docs.snowflake.com/en/developer-guide/sql-api/authenticating). Here is a [direct download link](https://docs.snowflake.com/en/_downloads/aeb84cdfe91dcfbd889465403b875515/sql-api-generate-jwt.py).
* Ensure to have the PyJWT module installed that is required for running the script.
* Run the script using this command:
```bash theme={null}
sql-api-generate-jwt.py --account --user my_user --private_key_file_path rsa_key.p8
```
This command returns the JWT token, which is used in the `api_key` parameter for the `snowflake` provider.
#### `provider = 'ollama'`
This provider is supported for both `embedding_model` and `reranking_model`.
Users can define the default embedding and reranking models from Ollama in Settings of the MindsDB GUI.
When choosing `ollama` as the model provider, users should define the following model parameters.
* `model_name` stores the name of the model to be used.
* `base_url` stores the base URL of the Ollama instance.
### `embedding_model`
The embedding model is a required component of the knowledge base. It stores specifications of the embedding model to be used.
Users can define the embedding model choosing one of the following options.
**Option 1.** Use the `embedding_model` parameter to define the specification.
```python theme={null}
...
embedding_model = {
"provider": "azure_openai",
"model_name" : "text-embedding-3-large",
"api_key": "sk-abc123",
"base_url": "https://ai-6689.openai.azure.com/",
"api_version": "2024-02-01"
},
...
```
**Option 2.** Define the default embedding model in the [MindsDB configuration file](/setup/custom-config).
You can define the default models in the Settings of the MindsDB Editor GUI.
Note that if you define `default_embedding_model` in the configuration file, you do not need to provide the `embedding_model` parameter when creating a knowledge base. If provide both, then the values from the `embedding_model` parameter are used.
```bash theme={null}
"default_embedding_model": {
"provider": "azure_openai",
"model_name" : "text-embedding-3-large",
"api_key": "sk-abc123",
"base_url": "https://ai-6689.openai.azure.com/",
"api_version": "2024-02-01"
}
```
The embedding model specification includes:
* `provider`
It is a required parameter. It defines the model provider.
* `model_name`
It is a required parameter. It defines the embedding model name as specified by the provider.
* `api_key`
The API key is required to access the embedding model assigned to a knowledge base. Users can provide it either in this `api_key` parameter, or in the `OPENAI_API_KEY` environment variable for `"provider": "openai"` and `AZURE_OPENAI_API_KEY` environment variable for `"provider": "azure_openai"`.
* `base_url`
It is an optional parameter, which defaults to `https://api.openai.com/v1/`. It is a required parameter when using the `azure_openai` provider. It is the root URL used to send API requests.
* `api_version`
It is an optional parameter. It is a required parameter when using the `azure_openai` provider. It defines the API version.
### `reranking_model`
The reranking model is an optional component of the knowledge base. It stores specifications of the reranking model to be used.
Users can disable reranking features of knowledge bases by setting this parameter to `false`.
```python theme={null}
...
reranking_model = False,
...
```
Users can enable reranking features of knowledge bases by defining the reranking model choosing one of the following options.
**Option 1.** Use the `reranking_model` parameter to define the specification.
```python theme={null}
...
reranking_model = {
"provider": "azure_openai",
"model_name" : "gpt-4o",
"api_key": "sk-abc123",
"base_url": "https://ai-6689.openai.azure.com/",
"api_version": "2024-02-01",
"method": "multi-class"
},
...
```
**Option 2.** Define the default reranking model in the [MindsDB configuration file](/setup/custom-config).
You can define the default models in the Settings of the MindsDB Editor GUI.
Note that if you define [`default_reranking_model` in the configuration file](/setup/custom-config#default-reranking-model), you do not need to provide the `reranking_model` parameter when creating a knowledge base. If provide both, then the values from the `reranking_model` parameter are used.
```bash theme={null}
"default_reranking_model": {
"provider": "azure_openai",
"model_name" : "gpt-4o",
"api_key": "sk-abc123",
"base_url": "https://ai-6689.openai.azure.com/",
"api_version": "2024-02-01",
"method": "multi-class"
}
```
The reranking model specification includes:
* `provider`
It is a required parameter. It defines the model provider as listed in [supported LLMs](/mindsdb_sql/knowledge_bases/create#supported-llms).
* `model_name`
It is a required parameter. It defines the embedding model name as specified by the provider.
* `api_key`
The API key is required to access the embedding model assigned to a knowledge base. Users can provide it either in this `api_key` parameter, or in the `OPENAI_API_KEY` environment variable for `"provider": "openai"` and `AZURE_OPENAI_API_KEY` environment variable for `"provider": "azure_openai"`.
* `base_url`
It is an optional parameter, which defaults to `https://api.openai.com/v1/`. It is a required parameter when using the `azure_openai` provider. It is the root URL used to send API requests.
* `api_version`
It is an optional parameter. It is a required parameter when using the `azure_openai` provider. It defines the API version.
* `method`
It is an optional parameter. It defines the method used to calculate the relevance of the output rows. The available options include `multi-class` and `binary`. It defaults to `multi-class`.
**Reranking Method**
The `multi-class` reranking method classifies each document chunk (that meets any specified metadata filtering conditions) into one of four relevance classes:
1. Not relevant with class weight of 0.25.
2. Slightly relevant with class weight of 0.5.
3. Moderately relevant with class weight of 0.75.
4. Highly relevant with class weight of 1.
The overall `relevance_score` of a document is calculated as the sum of each chunk’s class weight multiplied by its class probability (from model logprob output).
The `binary` reranking method simplifies classification by determining whether a document is relevant or not, without intermediate relevance levels. With this method, the overall `relevance_score` of a document is calculated based on the model log probability.
### `storage`
The vector store is a required component of the knowledge base. It stores data in the form of embeddings.
It is optional for users to provide the `storage` parameter. If not provided, the default ChromaDB is created when creating a knowledge base.
The available options include either [PGVector](/integrations/vector-db-integrations/pgvector) or [ChromaDB](/integrations/vector-db-integrations/chromadb).
It is recommended to use PGVector version 0.8.0 or higher for a better performance.
If the `storage` parameter is not provided, the system creates the default ChromaDB vector database called `_chromadb` with the default table called `default_collection` that stores the embedded data. This default ChromaDB vector database is stored in MindsDB's storage.
In order to provide the storage vector database, it is required to connect it to MindsDB beforehand.
Here is an example for [PGVector](/integrations/vector-db-integrations/pgvector).
```python theme={null}
my_kb = server.knowledge_bases.create(
...
storage=server.databases.my_pgvector.tables.my_table,
...
)
```
Note that you do not need to have the `storage_table` created as it is created when creating a knowledge base.
### `metadata_columns`
The data inserted into the knowledge base can be classified as metadata, which enables users to filter the search results using defined data fields.
Note that source data column(s) included in `metadata_columns` cannot be used in `content_columns`, and vice versa.
This parameter is an array of strings that lists column names from the source data to be used as metadata. If not provided, then all inserted columns (except for columns defined as `id_column` and `content_columns`) are considered metadata columns.
Here is an example of usage. A user wants to store the following data in a knowledge base.
```sql theme={null}
+----------+-------------------+------------------------+
| order_id | product | notes |
+----------+-------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Laptop Stand | Prefer aluminum finish |
+----------+-------------------+------------------------+
```
Go to the *Complete Example* section below to find out how to access this sample data.
The `product` column can be used as metadata to enable metadata filtering.
```python theme={null}
my_kb = server.knowledge_bases.create(
...
metadata_columns=['product'],
...
)
```
### `content_columns`
The data inserted into the knowledge base can be classified as content, which is embedded by the embedding model and stored in the underlying vector store.
Note that source data column(s) included in `content_columns` cannot be used in `metadata_columns`, and vice versa.
This parameter is an array of strings that lists column names from the source data to be used as content and processed into embeddings. If not provided, the `content` column is expected by default when inserting data into the knowledge base.
Here is an example of usage. A user wants to store the following data in a knowledge base.
```sql theme={null}
+----------+-------------------+------------------------+
| order_id | product | notes |
+----------+-------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Laptop Stand | Prefer aluminum finish |
+----------+-------------------+------------------------+
```
Go to the *Complete Example* section below to find out how to access this sample data.
The `notes` column can be used as content.
```python theme={null}
my_kb = server.knowledge_bases.create(
...
content_columns=['notes'],
...
)
```
### `id_column`
The ID column uniquely identifies each source data row in the knowledge base.
It is an optional parameter. If provided, this parameter is a string that contains the source data ID column name. If not provided, it is generated from the hash of the content columns.
Here is an example of usage. A user wants to store the following data in a knowledge base.
```sql theme={null}
+----------+-------------------+------------------------+
| order_id | product | notes |
+----------+-------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Laptop Stand | Prefer aluminum finish |
+----------+-------------------+------------------------+
```
Go to the *Complete Example* section below to find out how to access this sample data.
The `order_id` column can be used as ID.
```python theme={null}
my_kb = server.knowledge_bases.create(
...
id_column='order_id'
)
```
Note that if the source data row is chunked into multiple chunks by the knowledge base (that is, to optimize the storage), then these rows in the knowledge base have the same ID value that identifies chunks from one source data row.
**Available options for the ID column values**
* User-Defined ID Column:
When users defined the `id_column` parameter, the values from the provided source data column are used to identify source data rows within the knowledge base.
* User-Generated ID Column:
When users do not have a column that uniquely identifies each row in their source data, they can generate the ID column values when inserting data into the knowledge base using functions like `HASH()` or `ROW_NUMBER()`.
```sql theme={null}
INSERT INTO my_kb (
SELECT ROW_NUMBER() OVER (ORDER BY order_id) AS id, *
FROM sample_data.orders
);
```
* Default ID Column:
If the `id_column` parameter is not defined, its default values are build from the hash of the content columns and follow the format: ``.
## `list()` and `get()` Functions
Users can get details about the knowledge base using the `get()` function.
```python theme={null}
my_kb = project.knowledge_bases.get('my_kb')
```
And list all available knowledge bases using the `list()` function.
```python theme={null}
kb_list = project.knowledge_bases.list()
```
## `drop()` Function
Here is the syntax for deleting a knowledge base:
```python theme={null}
project.knowledge_bases.drop('my_kb')
```
Upon execution, it removes the knowledge base with its content.
See more examples of [knowledge bases via SQL here](/mindsdb_sql/knowledge_bases/overview).
# How to Insert Data into Knowledge Bases
Source: https://docs.mindsdb.com/sdks/python/knowledge_bases/insert_data
Knowledge Bases (KBs) organize data across data sources, including databases, files, documents, webpages, enabling efficient search capabilities.
Here is what happens to data when it is inserted into the knowledge base.
Upon inserting data into the knowledge base, it is split into chunks, transformed into the embedding representation to enhance the search capabilities, and stored in a vector database.
Learn more about features of [knowledge bases available via SQL API](/mindsdb_sql/knowledge_bases/overview).
## `insert()` Function
Here is the syntax for inserting data into a knowledge base:
* Inserting raw data:
```python theme={null}
my_kb.insert([
{'type': 'apartment', 'price': 100000},
{'type': 'villa', 'price': 500000}
])
```
* Inserting data from data sources connected to MindsDB:
```python theme={null}
my_kb.insert_query(
server.databases.my_database.tables.my_table.filter(type='my_type')
)
```
* Inserting data from files uploaded to MindsDB:
```python theme={null}
my_kb.insert_files(['my_pdf_file', 'my_txt_file'])
```
* Inserting data from webpages:
```python theme={null}
kb.insert_webpages(
['https://example.com'],
crawl_depth=2,
filters=[r'.*\/blog\/.*'],
limit=10
)
```
Where:
* `urls`: Base URLs to crawl.
* `crawl_depth`: Depth for recursive crawling. Default is 1.
* `filters`: Regex patterns to include.
* `limit`: Max number of pages.
Upon execution, it inserts data into a knowledge base, using the embedding model to embed it into vectors before inserting into an underlying vector database.
The status of the insert operations is logged in the `information_schema.queries` table with the timestamp when it was ran.
**Handling duplicate data while inserting into the knowledge base**
Knowledge bases uniquely identify data rows using an ID column, which prevents from inserting duplicate data, as follows.
* **Case 1: Inserting data into the knowledge base without the `id_column` defined.**
When users do not define the `id_column` during the creation of a knowledge base, MindsDB generates the ID for each row using a hash of the content columns, as [explained here](/mindsdb_sql/knowledge_bases/create#id-column).
**Example:**
If two rows have exactly the same content in the content columns, their hash (and thus their generated ID) will be the same.
Note that duplicate rows are skipped and not inserted.
Since both rows in the below table have the same content, only one row will be inserted.
| name | age |
| ----- | --- |
| Alice | 25 |
| Alice | 25 |
* **Case 2: Inserting data into the knowledge base with the `id_column` defined.**
When users define the `id_column` during the creation of a knowledge base, then the knowledge base uses that column's values as the row ID.
**Example:**
If the `id_column` has duplicate values, the knowledge base skips the duplicate row(s) during the insert.
The second row in the below table has the same `id` as the first row, so only one of these rows is inserted.
| id | name | age |
| -- | ----- | --- |
| 1 | Alice | 25 |
| 1 | Bob | 30 |
**Best practice**
Ensure the `id_column` uniquely identifies each row to avoid unintentional data loss due to duplicate ID skipping.
### Update Existing Data
In order to update existing data in the knowledge base, insert data with the column ID that you want to update and the updated content.
Here is an example of usage. A knowledge base stores the following data.
```sql theme={null}
+----------+-------------------+------------------------+
| order_id | product | notes |
+----------+-------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Laptop Stand | Prefer aluminum finish |
+----------+-------------------+------------------------+
```
A user updated `Laptop Stand` to `Aluminum Laptop Stand`.
```sql theme={null}
+----------+-----------------------+------------------------+
| order_id | product | notes |
+----------+-----------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Aluminum Laptop Stand | Prefer aluminum finish |
+----------+-----------------------+------------------------+
```
Go to the *Complete Example* section below to find out how to access this sample data.
Here is how to propagate this change into the knowledge base.
```python theme={null}
my_kb.insert_query(
server.databases.sample_data.tables.orders.filter(order_id='Q7P')
)
```
The knowledge base matches the ID value to the existing one and updates the data if required.
### Insert Data using Partitions
In order to optimize the performance of data insertion into the knowledge base, users can set up partitions and threads to insert batches of data in parallel. This also enables tracking the progress of data insertion process including cancelling and resuming it if required.
Here is an example.
```python theme={null}
project.query(
'''
INSERT INTO my_kb
SELECT order_id, product, notes
FROM sample_data.orders
USING
batch_size = 200,
track_column = order_id,
threads = 10,
error = 'skip';
'''
)
```
The parameters include the following:
* `batch_size` defines the number of rows fetched per iteration to optimize data extraction from the source. It defaults to 1000.
* `threads` defines threads for running partitions. Note that if the [ML task queue](/setup/custom-config#overview-of-config-parameters) is enabled, threads are used automatically. The available values for `threads` are:
* a number of threads to be used, for example, `threads = 10`,
* a boolean value that defines whether to enable threads, setting `threads = true`, or disable threads, setting `threads = false`.
* `track_column` defines the column used for sorting data before partitioning.
* `error` defines the error processing options. The available values include `raise`, used to raise errors as they come, or `skip`, used to subside errors. It defaults to `raise` if not provided.
After executing the `INSERT INTO` statement with the above parameters, users can view the data insertion progress by querying the `information_schema.queries` table.
```python theme={null}
project.query(
'''
SELECT * FROM information_schema.queries;
'''
)
```
Users can cancel the data insertion process using the process ID from the `information_schema.queries` table.
```python theme={null}
project.query(
'''
SELECT query_cancel(1);
'''
)
```
If you want to cancel the data insertion process, look up the process ID value from the `information_schema.queries` table and pass it as an argument to the `query_cancel()` function. Note that canceling the query will not remove the already inserted data.
Users can resume the data insertion process using the process ID from the `information_schema.queries` table.
```python theme={null}
project.query(
'''
SELECT query_resume(1);
'''
)
```
If you want to resume the data insertion process (which may have been interrupted by an error or cancelled by a user), look up the process ID value from the `information_schema.queries` table and pass it as an argument to the `query_resume()` function. Note that resuming the query will not remove the already inserted data and will start appending the remaining data.
### Chunking Data
Upon inserting data into the knowledge base, the data chunking is performed in order to optimize the storage and search of data.
Each chunk is identified by its chunk ID of the following format: `:of:to`.
#### Text
Users can opt for defining the chunking parameters when creating a knowledge base.
```python theme={null}
my_kb = project.knowledge_bases.create(
...
params={
"preprocessing": {
"text_chunking_config" : {
"chunk_size": 2000,
"chunk_overlap": 200
}
}
}
)
```
The `chunk_size` parameter defines the size of the chunk as the number of characters. And the `chunk_overlap` parameter defines the number of characters that should overlap between subsequent chunks.
#### JSON
Users can opt for defining the chunking parameters specifically for JSON data.
```python theme={null}
my_kb = project.knowledge_bases.create(
...
params={
"preprocessing": {
"type": "json_chunking",
"json_chunking_config" : {
...
}
}
}
)
```
When the `type` of chunking is set to `json_chunking`, users can configure it by setting the following parameter values in the `json_chunking_config` parameter:
* `flatten_nested`\
It is of the `bool` data type with the default value of `True`.\
It defines whether to flatten nested JSON structures.
* `include_metadata`\
It is of the `bool` data type with the default value of `True`.\
It defines whether to include original metadata in chunks.
* `chunk_by_object`\
It is of the `bool` data type with the default value of `True`.\
It defines whether to chunk by top-level objects (`True`) or create a single document (`False`).
* `exclude_fields`\
It is of the `List[str]` data type with the default value of an empty list.\
It defines the list of fields to exclude from chunking.
* `include_fields`\
It is of the `List[str]` data type with the default value of an empty list.\
It defines the list of fields to include in chunking (if empty, all fields except excluded ones are included).
* `metadata_fields`\
It is of the `List[str]` data type with the default value of an empty list.\
It defines the list of fields to extract into metadata for filtering (can include nested fields using dot notation). If empty, all primitive fields will be extracted (top-level fields if available, otherwise all primitive fields in the flattened structure).
* `extract_all_primitives`\
It is of the `bool` data type with the default value of `False`.\
It defines whether to extract all primitive values (strings, numbers, booleans) into metadata.
* `nested_delimiter`\
It is of the `str` data type with the default value of `"."`.\
It defines the delimiter for flattened nested field names.
* `content_column`\
It is of the `str` data type with the default value of `"content"`.\
It defines the name of the content column for chunk ID generation.
### Underlying Vector Store
Each knowledge base has its underlying vector store that stores data inserted into the knowledge base in the form of embeddings.
Users can query the underlying vector store as follows.
* KB with the default ChromaDB vector store:
```python theme={null}
project.query(
'''
SELECT id, content, metadata, embeddings
FROM _chromadb.storage_table;
'''
)
```
* KB with user-defined vector store (either [PGVector](/integrations/vector-db-integrations/pgvector) or [ChromaDB](/integrations/vector-db-integrations/chromadb)):
```python theme={null}
project.query(
'''
SELECT id, content, metadata, embeddings
FROM .;
'''
)
```
# How Knowledge Bases Work
Source: https://docs.mindsdb.com/sdks/python/knowledge_bases/overview
A knowledge base is an advanced system that organizes information based on semantic meaning rather than simple keyword matching. It integrates embedding models, reranking models, and vector stores to enable context-aware data retrieval.
By performing semantic reasoning across multiple data points, a knowledge base delivers deeper insights and more accurate responses, making it a powerful tool for intelligent data access.
Learn more about features of [knowledge bases available via SQL API](/mindsdb_sql/knowledge_bases/overview).
Before diving into the syntax, here is a quick walkthrough showing how knowledge bases work in MindsDB.
We start by creating a knowledge base and inserting data. Next we can run semantic search queries with metadata filtering.
Use the `create()` function to create a knowledge base, specifying all its components.
```python theme={null}
server = mindsdb_sdk.connect()
project = server.get_project()
my_kb = project.knowledge_bases.create(
'my_kb',
embedding_model={'provider': 'openai', 'model_name': 'text-embedding-3-small', 'api_key': 'sk-...'},
reranking_model={'provider': 'openai', 'model_name': 'gpt-4o', 'api_key': 'sk-...'},
storage=server.databases.my_vector_db.tables.my_table,
metadata_columns=['product'],
content_columns=['notes'],
id_column='order_id'
)
```
In this example, we use a simple dataset containing customer notes for product orders which will be inserted into the knowledge base.
```sql theme={null}
+----------+-----------------------+------------------------+
| order_id | product | notes |
+----------+-----------------------+------------------------+
| A1B | Wireless Mouse | Request color: black |
| 3XZ | Bluetooth Speaker | Gift wrap requested |
| Q7P | Aluminum Laptop Stand | Prefer aluminum finish |
+----------+-----------------------+------------------------+
```
Use the `insert_query()` function to ingest data into the knowledge base from a query.
```python theme={null}
my_kb.insert_query(
server.databases.sample_data.tables.orders
)
```
Query the knowledge base using semantic search.
```python theme={null}
results = my_kb.find('color')
print(results.fetch())
```
This query returns:
```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
| 3XZ | 3XZ_notes:1of1:0to19 | Gift wrap requested | {"chunk_index":0,"content_column":"notes","end_char":19,"original_doc_id":"3XZ_notes","original_row_id":"3XZ","product":"Bluetooth Speaker","source":"TextChunkingPreprocessor","start_char":0} | Bluetooth Speaker | 0.8010851611432231 | 0.2500003885558766 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```
Query the knowledge base using semantic search and define the `relevance` parameter to receive only the best matching data for your use case.
```python theme={null}
results = project.query(
'''
SELECT *
FROM my_kb
WHERE content = 'color'
AND relevance >= 0.2502;
'''
)
print(results.fetch())
```
This query returns:
```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```
Add metadata filtering to focus your search.
```python theme={null}
results = project.query(
'''
SELECT *
FROM my_kb
WHERE product = 'Wireless Mouse'
AND content = 'color'
AND relevance >= 0.2502;
'''
)
print(results.fetch())
```
This query returns:
```sql theme={null}
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.504396172197583 |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
```
# How to Query Knowledge Bases
Source: https://docs.mindsdb.com/sdks/python/knowledge_bases/query
Knowledge Bases support two primary querying approaches: semantic search and metadata filtering, each of which offers different filtering capabilities, including filtering by the relevance score to ensure only data most relevant to the query is returned.
* **Semantic Search**
Semantic search enables users to query Knowledge Bases using natural language. When searching semantically, you reference the content column in your SQL statement. MindsDB will interpret the input as a semantic query and use vector-based similarity to find relevant results.
* **Metadata Filtering**
It allows users to query Knowledge Bases based on the available metadata fields. These fields can be used in the `WHERE` clause of a SQL statement.
* **Relevance Filtering**
Every semantic search result is assigned a relevance score, which indicates how closely a given entry matches your query. You can filter results by this score to ensure only the most relevant entries are returned.
* **Hybrid Search**
Hybrid search combines the flexibility of semantic search and exact keyword matching. [Learn more here](/mindsdb_sql/knowledge_bases/hybrid_search).
Learn more about features of [knowledge bases available via SQL API](/mindsdb_sql/knowledge_bases/overview).
## `find()` Function
Knowledge bases provide an abstraction that enables users to see the stored data.
Note that here a sample knowledge base created and inserted into in the previous **Example** sections is searched.
```python theme={null}
results = project.query(
'''
SELECT *
FROM my_kb;
'''
)
print(results.fetch())
```
Here is the sample output:
```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
| 3XZ | 3XZ_notes:1of1:0to19 | Gift wrap requested | {"chunk_index":0,"content_column":"notes","end_char":19,"original_doc_id":"3XZ_notes","original_row_id":"3XZ","product":"Bluetooth Speaker","source":"TextChunkingPreprocessor","start_char":0} | Bluetooth Speaker | 0.8010851611432231 | 0.2500003885558766 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```
### Data Stored in Knowledge Base
The following columns are stored in the knowledge base.
`id`
It stores values from the column defined in the `id_column` parameter when creating the knowledge base. These are the source data IDs.
`chunk_id`
Knowledge bases chunk the inserted data in order to fit the defined chunk size. If the chunking is performed, the following chunk ID format is used: `:of:to`.
`chunk_content`
It stores values from the column(s) defined in the `content_columns` parameter when creating the knowledge base.
`metadata`
It stores the general metadata and the metadata defined in the `metadata_columns` parameter when creating the knowledge base.
`distance`
It stores the calculated distance between the chunk's content and the search phrase.
`relevance`
It stores the calculated relevance of the chunk as compared to the search phrase. Its values are between 0 and 1.
Note that the calculation method of `relevance` differs as follows:
* When the ranking model is provided, the default `relevance` is equal or greater than 0, unless defined otherwise in the `WHERE` clause.
* When the reranking model is not provided and the `relevance` is not defined in the query, then no relevance filtering is applied and the output includes all rows matched based on the similarity and metadata search.
* When the reranking model is not provided but the `relevance` is defined in the query, then the relevance is calculated based on the `distance` column (`1/(1+ distance)`) and the `relevance` value is compared with this relevance value to filter the output.
### Semantic Search
Users can query a knowledge base using semantic search by providing the search phrase (called `content`) to be searched for.
```python theme={null}
results = my_kb.find('color')
print(results.fetch())
```
Here is the output:
```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
| 3XZ | 3XZ_notes:1of1:0to19 | Gift wrap requested | {"chunk_index":0,"content_column":"notes","end_char":19,"original_doc_id":"3XZ_notes","original_row_id":"3XZ","product":"Bluetooth Speaker","source":"TextChunkingPreprocessor","start_char":0} | Bluetooth Speaker | 0.8010851611432231 | 0.2500003885558766 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```
When querying a knowledge base, the default values include the following:
* `relevance`
If not provided, its default value is equal to or greater than 0, ensuring there is no filtering of rows based on their relevance.
* `LIMIT`
If not provided, its default value is 10, returning a maximum of 10 rows.
Note that when specifying both `relevance` and `LIMIT` as follows:
```python theme={null}
results = project.query(
'''
SELECT *
FROM my_kb
WHERE content = 'color'
AND relevance >= 0.5
LIMIT 20;
'''
)
print(results.fetch())
```
The query extracts 20 rows (as defined in the `LIMIT` clause) that match the defined `content`. Next, these set of rows is filtered out to match the defined `relevance`.
Users can limit the `relevance` in order to get only the most relevant results.
```python theme={null}
results = project.query(
'''
SELECT *
FROM my_kb
WHERE content = 'color'
AND relevance >= 0.5;
'''
)
print(results.fetch())
```
Here is the output:
```sql theme={null}
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+--------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5103766499957533 |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+--------------------+
```
By providing the `relevance` filter, the output is limited to only data with relevance score of the provided value. The available values of `relevance` are between 0 and 1, and its default value covers all available relevance values ensuring no filtering based on the relevance score.
Users can limit the number of rows returned.
```python theme={null}
results = project.query(
'''
SELECT *
FROM my_kb
WHERE content = 'color'
LIMIT 2;
'''
)
print(results.fetch())
```
Here is the output:
```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```
### Metadata Filtering
Besides semantic search features, knowledge bases enable users to filter the result set by the defined metadata.
```python theme={null}
results = project.query(
'''
SELECT *
FROM my_kb
WHERE product = 'Wireless Mouse';
'''
)
print(results.fetch())
```
Here is the output:
```sql theme={null}
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+----------+
| id | chunk_id | chunk_content | metadata | product | relevance | distance |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+----------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | [NULL] | [NULL] |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+----------+
```
Note that when searching by metadata alone, the `relevance` column values are not calculated.
Users can do both, filter by metadata and search by content.
```python theme={null}
results = project.query(
'''
SELECT *
FROM my_kb
WHERE product = 'Wireless Mouse'
AND content = 'color'
AND relevance >= 0.5;
'''
)
print(results.fetch())
```
Here is the output:
```sql theme={null}
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
| id | chunk_id | chunk_content | metadata | product | distance | relevance |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.504396172197583 |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
```
# List Data Handlers
Source: https://docs.mindsdb.com/sdks/python/list_data_handlers
Here is how you can fetch all available data handlers directly from Python code:
```
mindsdb = server.get_project('mindsdb')
data_handlers = mindsdb.query('SHOW HANDLERS WHERE type = \'data\'')
print(data_handlers.fetch())
```
# List Data Sources
Source: https://docs.mindsdb.com/sdks/python/list_databases
## Description
The `list_databases()` function lists all data sources connected to MindsDB.
## Syntax
Use the `list_databases()` method to list all databases:
```python theme={null}
server.list_databases()
```
# List Jobs
Source: https://docs.mindsdb.com/sdks/python/list_jobs
## Description
The `list_jobs()` function is executed on a project and lists all jobs available in this project.
## Syntax
Use the `list_jobs()` method to list all jobs in a project:
```python theme={null}
project.list_jobs()
```
# List Projects
Source: https://docs.mindsdb.com/sdks/python/list_projects
## Description
The `list_projects()` function lists all available projects.
## Syntax
Use the `list_projects()` method to lists all available projects:
```python theme={null}
server.list_projects()
```
# List Views
Source: https://docs.mindsdb.com/sdks/python/list_views
## Description
The `list_views()` function is executed on a project and lists all views available in this project.
## Syntax
Use the `list_views()` method to list all views in a project:
```python theme={null}
project.list_views()
```
# Native Queries
Source: https://docs.mindsdb.com/sdks/python/native_queries
## Description
The `query()` function is executed on a data source connected to MindsDB and saved into a variable. This native query is executed directly on a data source.
## Syntax
Here is the syntax:
```sql theme={null}
my_data_source.query('SELECT * FROM datasource_name ();')
```
# Overview
Source: https://docs.mindsdb.com/sdks/python/overview
MindsDB provides Python SDK, enabling its integration into Python environments.
Follow these steps to get started:
For Python, [install the package](/sdks/python/installation).
Connect a data source in [Python](/sdks/python/create_database).
Explore all available [data sources here](/integrations/data-overview).
Configure an AI engine in [Python](/sdks/python/create_ml_engine).
Explore all available [AI engines here](/integrations/ai-overview).
Create and deploy an AI/ML model in [Python](/sdks/python/create_model).
Query for predictions in [Python](/sdks/python/get-batch-predictions).
Automate tasks by scheduling jobs in [Python](/sdks/python/create_job).
# Query a File
Source: https://docs.mindsdb.com/sdks/python/query_files
## Description
In MindsDB, files are treated as tables. These are stored in the default `files` database. To query a file, you must save this `files` database into a variable and then, run the `query()` function on it.
## Syntax
Here is the syntax:
```
server.get_database('files').query('SELECT * FROM file_name')
```
# Query Projects
Source: https://docs.mindsdb.com/sdks/python/query_projects
## Description
The `query()` methods enables you to run queries on models, tables, and views stored in a project.
## Syntax
Use the `query()` method to submit a query to a project:
```python theme={null}
query = project.query('SELECT * FROM my_table;')
query.fetch()
```
# Query a Table
Source: https://docs.mindsdb.com/sdks/python/query_table
## Description
The `query()` function is executed on a data source connected to MindsDB and saved into a variable. It queries a table from this data source.
## Syntax
Here is the syntax:
```sql theme={null}
my_data_source.query('SELECT * FROM my_table LIMIT 100')
```
You can query for newly added data using the functionality introduced by the [`LAST` keyword](/mindsdb_sql/sql/create/jobs#last) as follows:
```sql theme={null}
query = server.databases.my_data_source.tables.table_name.filter(column_name='value').track('timestamp_column')
# first call returns no records
df = query.fetch()
# second call returns rows with timestamp_column greater than the timestamp of a previous fetch
df = query.fetch()
```
# Query a View
Source: https://docs.mindsdb.com/sdks/python/query_view
## Description
The `query()` function is executed on a view that resides in one of the projects.
## Syntax
Here is the syntax:
```sql theme={null}
project_name.query('SELECT * FROM my_project.my_view LIMIT 100')
```
You can query for newly added data using the functionality introduced by the [`LAST` keyword](/mindsdb_sql/sql/create/jobs#last) as follows:
```sql theme={null}
query = server.databases.my_data_source.views.view_name.filter(column_name='value').track('timestamp_column')
# first call returns no records
df = query.fetch()
# second call returns rows with timestamp_column greater than the timestamp of a previous fetch
df = query.fetch()
```
# Refresh a Job
Source: https://docs.mindsdb.com/sdks/python/refresh_job
## Description
The `refresh()` function synchronizes the job with MindsDB.
## Syntax
Use the `refresh()` method to retrieve job data from the MindsDB server:
```python theme={null}
my_job.refresh()
```
# Update a Table
Source: https://docs.mindsdb.com/sdks/python/update_table
## Description
The `update()` function is executed on a table from a data source connected to MindsDB. It updates a table on specified columns.
## Syntax
Here is the syntax:
```sql theme={null}
my_table.update(table_used_to_update, on=['column1', 'column2', ...])
```
Check out the [SQL syntax](/sql/api/update) to better understand how the `update()` function works.
# Upload a File
Source: https://docs.mindsdb.com/sdks/python/upload_file
## Description
In MindsDB, files are treated as tables. These are stored in the default `files` database. To upload a file, you must save this `files` database into a variable and then, run the `create_table()` function on it.
Note that the trailing whitespaces on column names are erased upon uploading a file to MindsDB.
## Syntax
Here is the syntax:
```
files_db = server.get_database('files')
files_db.create_table('file_name', data_frame)
```
# Extend the Default MindsDB Configuration
Source: https://docs.mindsdb.com/setup/custom-config
To follow this guide, install MindsDB locally via [Docker](/setup/self-hosted/docker-desktop) or [PyPI](/setup/self-hosted/pip/source).
## Starting MindsDB with Extended Configuration
Start MindsDB locally with your custom configuration by providing a path to the `config.json` file that stores custom config parameters listed in this section.
```bash Docker theme={null}
docker run --name mindsdb_container -e MINDSDB_CONFIG_PATH=/Users/username/path/config.json -e MINDSDB_APIS=http,mysql -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
```
```bash Python theme={null}
python -m mindsdb --api=http,mysql --config=/path-to-the-extended-config-file/config.json
```
### Available Config Parameters
Below are all of the custom configuration parameters that should be set according to your requirements and saved into the `config.json` file.
#### `permanent_storage`
```bash theme={null}
{
"permanent_storage": {
"location": "absent",
"bucket": "s3_bucket_name" # optional, used only if "location": "s3"
},
```
The `permanent_storage` parameter defines where MindsDB stores copies of user files, such as uploaded files, models, and tab content. MindsDB checks the `permanent_storage` location to access the latest version of a file and updates it as needed.
The `location` specifies the storage type.
* `absent` (default): Disables permanent storage and is recommended to use when MindsDB is running locally.
* `local`: Stores files in a local directory defined with `config['paths']['storage']`.
* `s3`: Stores files in an Amazon S3 bucket. This option requires the `bucket` parameter that specifies the name of the S3 bucket where files will be stored.
If this parameter is not set, the path is determined by the `MINDSDB_STORAGE_DIR` environment variable. MindsDB defaults to creating a `mindsdb` folder in the operating system user's home directory.
#### `paths`
```bash theme={null}
"paths": {
"root": "/home/mindsdb/var", # optional (alternatively, it can be defined in the MINDSDB_STORAGE_DIR environment variable)
"content": "/home/mindsdb/var/content", # optional
"storage": "/home/mindsdb/var/storage", # optional
"static": "/home/mindsdb/var/static", # optional
"tmp": "/home/mindsdb/var/tmp", # optional
"cache": "/home/mindsdb/var/cache", # optional
"locks": "/home/mindsdb/var/locks", # optional
},
```
The `paths` parameter allows users to redefine the file paths for various groups of MindsDB files. If only the `root` path is defined, all other folders will be created within that directory. If this parameter is absent, the value is determined by the `MINDSDB_STORAGE_DIR` environment variable.
The `root` parameter defines the base directory for storing all MindsDB files, including models, uploaded files, tab content, and the internal SQLite database (if running locally).
The `content` parameter specifies the directory where user-related files are stored, such as uploaded files, created models, and tab content. The internal SQLite database (if running locally) is stored in the `root` directory instead.
If the `['permanent_storage']['location']` is set to `'local'`, then the `storage` parameter is used to store copies of user files.
The `static` parameter is used to store files for the graphical user interface (GUI) when MindsDB is run locally.
The `tmp` parameter designates a directory for temporary files. Note that the operating system’s default temporary directory may also be used for some temporary files.
If the `['cache']['type']` is set to `'local'`, then the `cache` parameter defines the location for storing cached files for the most recent predictions. For example, if a model is queried with identical input, the result will be stored in the cache and returned directly on subsequent queries, instead of recalculating the prediction.
The `locks` parameter is used to store lock files to prevent race conditions when the `content` folder is shared among multiple applications. This directory helps ensure that file access is managed properly using `fcntl` locks. Note that this is not applicable for Windows OS.
#### `auth`
```bash theme={null}
"auth":{
"http_auth_type": "session" | "token"| "session_or_token",
"http_auth_enabled": true,
"username": "username",
"password": "password"
},
```
The `auth` parameter controls the authentication settings for APIs in MindsDB.
Users can define the authentication type by setting the `http_auth_type` parameter to one of the following values:
* `session_or_token` is the default value. When a user logs in to MindsDB, the session cookie is set and the token is returned in the response. To use the MindsDB API, users can utilize either one or both of these methods.
* `session` sets the session cookie when a user logs in. The session lifetime can be set with the `http_permanent_session_lifetime` parameter.
* `token` returns the token in the response, which is valid indefinitely.
The authentication type can also be set via the `MINDSDB_HTTP_AUTH_TYPE` environment variable with the same values as defined above.
If the `http_auth_enabled` parameter is set to `true`, then the `username` and `password` parameters are required. Otherwise these are optional.
#### `gui`
```bash theme={null}
"gui": {
"autoupdate": true,
"open_on_start": true
},
```
The `gui` parameter controls the behavior of the MindsDB graphical user interface (GUI) updates.
The `autoupdate` parameter defines whether MindsDB automatically checks for and updates the GUI to the latest version when the application starts. If set to `true`, MindsDB will attempt to fetch the latest available version of the GUI. If set to `False`, MindsDB will not try to update the GUI on startup.
The `open_on_start` parameter defines whether MindsDB automatically opens the GUI on start. If set to `true`, MindsDB will open the GUI automatically. If set to `False`, MindsDB will not open the GUI on startup.
#### `api`
```bash theme={null}
"api": {
"http": {
"host": "127.0.0.1",
"port": "47334",
"restart_on_failure": true,
"max_restart_count": 1,
"max_restart_interval_seconds": 60,
"a2wsgi": {
"workers": 15,
"send_queue_size": 10
}
},
"mysql": {
"host": "127.0.0.1",
"port": "47335",
"database": "mindsdb",
"ssl": true,
"restart_on_failure": true,
"max_restart_count": 1,
"max_restart_interval_seconds": 60
},
},
```
The `api` parameter contains the configuration settings for running MindsDB APIs.
Currently, the supported APIs are:
* `http`: Configures the HTTP API. It requires the `host` and `port` parameters. Alternatively, configure HTTP authentication for your MindsDB instance by setting the environment variables `MINDSDB_USERNAME` and `MINDSDB_PASSWORD` before starting MindsDB, which is a recommended way for the production systems.
* `mysql`: Configures the MySQL API. It requires the `host` and `port` parameters and additionally the `database` and `ssl` parameters.
Connection parameters for the HTTP API include:
* `host`: Specifies the IP address or hostname where the API should run. For example, `"127.0.0.1"` indicates the API will run locally.
* `port`: Defines the port number on which the API will listen for incoming requests. The default ports are `47334` for HTTP, and `47335` for MySQL.
* `restart_on_failure`: If it is set to `true` (and `max_restart_count` is not reached), the restart of MindsDB will be attempted after the MindsDB process was killed - with code 9 on Linux and MacOS, or for any reason on Windows.
* `max_restart_count`: This defines how many times the restart attempts can be made. Note that 0 stands for no limit.
* `max_restart_interval_seconds`: This defines the time limit during which there can be no more than `max_restart_count` restart attempts. Note that 0 stands for no time limit, which means there would be a maximum of `max_restart_count` restart attempts allowed.
Here is a usage example of the restart features:
Assume the following values:
* max\_restart\_count = 2
* max\_restart\_interval\_seconds = 30 seconds
Assume the following scenario:
* MindsDB fails at 1000s of its work - the restart attempt succeeds as there were no restarts in the past 30 seconds.
* MindsDB fails at 1010s of its work - the restart attempt succeeds as there was only 1 restart (at 1000s) in the past 30 seconds.
* MindsDB fails at 1020s of its work - the restart attempt fails as there were already max\_restart\_count=2 restarts (at 1000s and 1010s) in the past 30 seconds.
* MindsDB fails at 1031s of its work - the restart attempt succeeds as there was only 1 restart (at 1010s) in the past 30 seconds.
* `a2wsgi` is an WSGI wrapper with the following parameters: `workers` defines the number of requests that can be processed in parallel, and `send_queue_size` defines the buffer size.
Connection parameters for the MySQL API include:
* `host`: Specifies the IP address or hostname where the API should run. For example, `"127.0.0.1"` indicates the API will run locally.
* `port`: Defines the port number on which the API will listen for incoming requests. The default ports are `47334` for HTTP, and `47335` for MySQL.
* `database`: Specifies the name of the database that MindsDB uses. Users must connect to this database to interact with MindsDB through the respective API.
* `ssl`: Indicates whether SSL support is enabled for the MySQL API.
* `restart_on_failure`: If it is set to `true` (and `max_restart_count` is not reached), the restart of MindsDB will be attempted after the MindsDB process was killed - with code 9 on Linux and MacOS, or for any reason on Windows.
* `max_restart_count`: This defines how many times the restart attempts can be made. Note that 0 stands for no limit.
* `max_restart_interval_seconds`: This defines the time limit during which there can be no more than `max_restart_count` restart attempts. Note that 0 stands for no time limit, which means there would be a maximum of `max_restart_count` restart attempts allowed.
#### `cache`
```bash theme={null}
"cache": {
"type": "local",
"connection": "redis://localhost:6379" # optional, used only if "type": "redis"
},
```
The `cache` parameter controls how MindsDB stores the results of recent predictions to avoid recalculating them if the same query is run again. Note that recent predictions are cached for ML models, like Lightwood, but not in the case of large language models (LLMs), like OpenAI.
The `type` parameter specifies the type of caching mechanism to use for storing prediction results.
* `none`: Disables caching. No prediction results are stored.
* `local` (default): Stores prediction results in the `cache` folder (as defined in the `paths` configuration). This is useful for repeated queries where the result doesn't change.
* `redis`: Stores prediction results in a Redis instance. This option requires the `connection` parameter, which specifies the Redis connection string.
The `connection` parameter is required only if the `type` parameter is set to `redis`. It stores the Redis connection string.
#### `logging`
```bash theme={null}
"logging": {
"handlers": {
"console": {
"enabled": true,
"formatter": "default", # optional, available values include default and json
"level": "INFO" # optional (alternatively, it can be defined in the MINDSDB_CONSOLE_LOG_LEVEL environment variable)
},
"file": {
"enabled": False,
"level": "INFO", # optional (alternatively, it can be defined in the MINDSDB_FILE_LOG_LEVEL environment variable)
"filename": "app.log",
"maxBytes": 524288, # 0.5 Mb
"backupCount": 3
}
}
},
```
The above parameters are implemented based on [Python's Logging Dictionary Schema](https://docs.python.org/3/library/logging.config.html#logging-config-dictschema).
The `logging` parameter defines the details of output logging, including the logging levels.
The `handler` parameter provides handlers used for logging into streams and files.
* `console`: This parameter defines the setup for saving logs into a stream.
* If the `enabled` parameter is set to `true`, then the logging output is saved into a stream.
* Users can define the `formatter` parameter that configures the format of the logs, where the available values include `default` and `json`.
* Users can also define the logging level in the `level` parameter or in the `MINDSDB_CONSOLE_LOG_LEVEL` environment variable - one of `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`.
* `file`: This parameter defines the setup for saving logs into a file.
* If the `enabled` parameter is set to `true`, then the logging output is saved into a file.
* Users can define the logging level in the `level` parameter or in the `MINDSDB_FILE_LOG_LEVEL` environment variable - one of `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`.
* Additionally, the `filename` parameter stores the name of the file that contains logs.
* And the `maxBytes` and `backupCount` parameters determine the rollover process of the file - that is, if the file reached the size of `maxBytes`, then the file is closed and a new file is opened, where the number of files is defined by the `backupCount` parameter.
#### `ml_task_queue`
```bash theme={null}
"ml_task_queue": {
"type": "local",
"host": "localhost", # optional, used only if "type": "redis"
"port": 6379, # optional, used only if "type": "redis"
"db": 0, # optional, used only if "type": "redis"
"username": "username", # optional, used only if "type": "redis"
"password": "password" # optional, used only if "type": "redis"
},
```
The `ml_task_queue` parameter manages the queueing system for machine learning tasks in MindsDB. ML tasks include operations such as creating, training, predicting, fine-tuning, and retraining models. These tasks can be resource-intensive, and running multiple ML tasks simultaneously may lead to Out of Memory (OOM) errors or performance degradation. To address this, MindsDB uses a task queue to control task execution and optimize resource utilization.
The `type` parameter defines the type of task queue to use.
* `local`: Tasks are processed immediately as they appear, without a queue. This is suitable for environments where resource constraints are not a concern.
* `redis`: Tasks are added to a Redis-based queue, and consumer process (which is run with `--ml_task_consumer`) ensures that tasks are executed only when sufficient resources are available.
* Using a Redis queue requires additional configuration such as the `host`, `port`, `db`, `username`, and `password` parameters.
* To use the Redis queue, start MindsDB with the following command to initiate a queue consumer process: `python3 -m mindsdb --ml_task_queue_consumer`. This process will monitor the queue and fetch tasks for execution only when sufficient resources are available.
#### `url_file_upload`
```bash theme={null}
"url_file_upload": {
"enabled": true,
"allowed_origins": ["https://example.com"],
"disallowed_origins": ["http://example.com"]
}
```
The `url_file_upload` parameter restricts file uploads to trusted sources by specifying a list of allowed domains. This ensures that users can only upload files from the defined sources, such as S3 or Google Drive.
The `enabled` flag turns this feature on (`true`) or off (`false`).
The `allowed_origins` parameter lists allowed domains. If left empty, then any domain is allowed.
The `disallowed_origins` parameter lists domains that are not allowed. If left empty, then there are no restricted domains.
#### `web_crawling_allowed_sites`
```bash theme={null}
"web_crawling_allowed_sites": [],
```
The `web_crawling_allowed_sites` parameter restricts web crawling operations to a specified list of allowed IPs or web addresses. This ensures that the application only accesses pre-approved and safe URLs (`"web_crawling_allowed_sites": ["https://example.com", "https://api.mysite.com"]`).
If left empty (`[]`), the application allows access to all URLs by default (marked with a wildcard in the open-source version).
#### `default_llm`
```bash theme={null}
"default_llm": {
"provider": "azure_openai",
"model_name" : "gpt-4o",
"api_key": "sk-abc123",
"base_url": "https://ai-6689.openai.azure.com/",
"api_version": "2024-02-01",
"method": "multi-class"
}
```
The `default_llm` parameter specifies the default LLM that will be used with the [`LLM()` function](/mindsdb_sql/functions/llm_function), the [`TO_MARKDOWN()` function](/mindsdb_sql/functions/to_markdown_function), and as a default model for [agents](/mindsdb_sql/agents/agent).
#### `default_embedding_model`
```bash theme={null}
"default_embedding_model": {
"provider": "azure_openai",
"model_name" : "text-embedding-3-large",
"api_key": "sk-abc123",
"base_url": "https://ai-6689.openai.azure.com/",
"api_version": "2024-02-01"
}
}
```
The `default_embedding_model` parameter specifies the default embedding model used with knowledge bases. Learn more about the parameters following the [documentation of the `embedding_model` of knowledge bases](/mindsdb_sql/knowledge_bases/create#embedding-model).
#### `default_reranking_model`
```bash theme={null}
"default_reranking_model": {
"provider": "azure_openai",
"model_name" : "gpt-4o",
"api_key": "sk-abc123",
"base_url": "https://ai-6689.openai.azure.com/",
"api_version": "2024-02-01",
"method": "multi-class"
}
```
The `default_reranking_model` parameter specifies the default reranking model used with knowledge bases. Learn more about the parameters following the [documentation of the `reranking_model` of knowledge bases](/mindsdb_sql/knowledge_bases/create#reranking-model).
#### `data_catalog`
```bash theme={null}
{
"data_catalog": {
"enabled": true
}
}
```
This parameter enables the [data catalog](/data_catalog/overview).
### Example
First, create a `config.json` file.
```bash theme={null}
{
"permanent_storage": {
"location": "absent"
},
"paths": {
"root": "/path/to/root/location"
},
"auth":{
"http_auth_enabled": true,
"username": "username",
"password": "password"
},
"gui": {
"autoupdate": true
},
"api": {
"http": {
"host": "127.0.0.1",
"port": "47334",
"restart_on_failure": true,
"max_restart_count": 1,
"max_restart_interval_seconds": 60
},
"mysql": {
"host": "127.0.0.1",
"port": "47335",
"database": "mindsdb",
"ssl": true,
"restart_on_failure": true,
"max_restart_count": 1,
"max_restart_interval_seconds": 60
}
},
"cache": {
"type": "local"
},
"logging": {
"handlers": {
"console": {
"enabled": true,
"formatter": "default",
"level": "INFO"
},
"file": {
"enabled": false,
"level": "INFO",
"filename": "app.log",
"maxBytes": 524288,
"backupCount": 3
}
}
},
"ml_task_queue": {
"type": "local"
},
"url_file_upload": {
"enabled": true,
"allowed_origins": ["https://example.com"],
"disallowed_origins": ["http://example.com"]
},
"web_crawling_allowed_sites": []
}
```
Next, start MindsDB providing this `config.json` file.
```bash theme={null}
python -m mindsdb --config=/path-to-the-extended-config-file/config.json
```
## Modifying Config Values
Users can modify config values by directly editing the `config.json` file they created.
# Environment Variables
Source: https://docs.mindsdb.com/setup/environment-vars
Most of the MindsDB functionality can be modified by extending the default configuration, but some of the configuration options
can be added as environment variables on the server where MindsDB is deployed.
[Here is the list](/setup/full-list-environment-vars) of all the available environment variables.
## MindsDB Authentication
MindsDB does not require authentication by default. If you want to enable authentication, you can set the `MINDSDB_USERNAME` and `MINDSDB_PASSWORD` environment variables.
### Example
```bash Docker theme={null}
docker run --name mindsdb_container -e MINDSDB_USERNAME='mindsdb_user' -e MINDSDB_PASSWORD='mindsdb_password' -e MINDSDB_APIS=http,mysql -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
```
```bash Shell theme={null}
export MINDSDB_USERNAME='mindsdb_user'
export MINDSDB_PASSWORD='mindsdb_password'
```
## MindsDB Authentication Type
Users can define the authentication type by setting the `MINDSDB_HTTP_AUTH_TYPE` environment variable to one of the following values:
* `session_or_token` is the default value. When a user logs in to MindsDB, the session cookie is set and the token is returned in the response. To use the MindsDB API, users can utilize either one or both of these methods.
* `session` sets the session cookie when a user logs in. The session lifetime can be set with the `http_permanent_session_lifetime` parameter.
* `token` returns the token in the response, which is valid indefinitely.
### Example
```bash Docker theme={null}
docker run --name mindsdb_container -e MINDSDB_HTTP_AUTH_TYPE='session' -e MINDSDB_APIS=http,mysql -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
```
```bash Shell theme={null}
export MINDSDB_HTTP_AUTH_TYPE='session_or_token'
```
## MindsDB Configuration File
In order to start MindsDB with a [custom configuration file](/setup/custom-config), the `MINDSDB_CONFIG_PATH` environment variable should store the file path.
### Example
```bash Docker theme={null}
docker run --name mindsdb_container -e MINDSDB_CONFIG_PATH=/Users/username/path/config.json -e MINDSDB_APIS=http,mysql -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
```
```bash Shell theme={null}
export MINDSDB_CONFIG_PATH=/Users/username/path/config.json
```
## MindsDB Storage
By default, MindsDB stores the configuration files by determining appropriate platform-specific directories, e.g. a "user data dir":
* On Linux `~/.local/share/mindsdb/var`
* On MacOS `~/Library/Application Support/mindsdb/var`
* On Windows `C:\Documents and Settings\\Application Data\Local Settings\\mindsdb\var`
In the `MINDSDB_STORAGE_DIR` location, MindsDB stores users' data, models and uploaded data files, the static assets for the frontend application and the
`sqlite.db` file.
You can change the default storage location using `MINDSDB_STORAGE_DIR` variable.
### Example
```bash Docker theme={null}
docker run --name mindsdb_container -e MINDSDB_STORAGE_DIR='~/home/mindsdb/var' -e MINDSDB_APIS=http,mysql -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
```
```bash Shell theme={null}
export MINDSDB_STORAGE_DIR='~/home/mindsdb/var'
```
## MindsDB Configuration Storage
MindsDB uses `sqlite` database by default to store the required configuration as models, projects, files metadata etc.
The full list of the above schemas can be found [here](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/interfaces/storage/db.py#L69). You can change the
default storage option and use different database by adding the new connection string using `MINDSDB_DB_CON` variable.
### Example
````bash Docker docker run --name mindsdb_container -e theme={null}
MINDSDB_DB_CON='postgresql://user:secret@localhost' -e MINDSDB_APIS=http,mysql
-p 47334:47334 -p 47335:47335 mindsdb/mindsdb ``` ```bash Shell export
MINDSDB_DB_CON='postgresql://user:secret@localhost' ```
#### `MINDSDB_STORAGE_BACKUP_DISABLED`
- **Type:** Boolean (`1`, `true`, `True`)
- **Description:** Disables permanent storage backup
- **Default:** `false`
- **Example:** `MINDSDB_STORAGE_BACKUP_DISABLED=1`
## MindsDB APIs
The `MINDSDB_APIS` environment variable lets users define which APIs to start. Learn more about the [available APIs here](/setup/mindsdb-apis).
### Example
```bash Docker
docker run --name mindsdb_container -e MINDSDB_APIS=http,mysql -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
````
```bash Shell theme={null}
export MINDSDB_APIS='http,mysql'
```
## MindsDB Logs
This environment variable defines the level of logging generated by MindsDB. You can choose one of the values [defined here](https://docs.python.org/3/library/logging.html#logging-levels). The `INFO` level is used by default.
### Example
```bash Docker theme={null}
docker run --name mindsdb_container -e MINDSDB_LOG_LEVEL='DEBUG' -e MINDSDB_APIS=http,mysql -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
```
```bash Shell theme={null}
export MINDSDB_LOG_LEVEL='DEBUG'
```
#### `MINDSDB_CONSOLE_LOG_LEVEL`
* **Type:** String (`DEBUG`, `INFO`, `WARNING`, `ERROR`)
* **Description:** Sets console log level
* **Default:** `INFO`
* **Example:** `MINDSDB_CONSOLE_LOG_LEVEL=DEBUG`
#### `MINDSDB_FILE_LOG_LEVEL`
* **Type:** String (`DEBUG`, `INFO`, `WARNING`, `ERROR`)
* **Description:** Sets file log level and enables file logging
* **Default:** `INFO` (disabled by default)
* **Example:** `MINDSDB_FILE_LOG_LEVEL=DEBUG`
## MindsDB Default Project
By default, MindsDB creates a project named `mindsdb` where all the models and other objects are stored. You can change the default project name by setting the `MINDSDB_DEFAULT_PROJECT` environment variable.
If this environment variable is set or modified after MindsDB has started, the default project will be **renamed** accordingly upon restart. To start using the new default project, a `USE` statement will also need to be executed.
### Example
```bash Docker theme={null}
docker run --name mindsdb_container -e MINDSDB_DEFAULT_PROJECT='my_project' -e MINDSDB_APIS=http,mysql -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
```
```bash Shell theme={null}
export MINDSDB_DEFAULT_PROJECT='my_project'
```
#### `MINDSDB_DEFAULT_LLM_API_KEY`
* **Type:** String
* **Description:** API key for default LLM (Large Language Model)
* **Default:** None
* **Example:** `MINDSDB_DEFAULT_LLM_API_KEY=sk-...`
#### `MINDSDB_DEFAULT_EMBEDDING_MODEL_API_KEY`
* **Type:** String
* **Description:** API key for default embedding model
* **Default:** None
* **Example:** `MINDSDB_DEFAULT_EMBEDDING_MODEL_API_KEY=sk-...`
#### `MINDSDB_DEFAULT_RERANKING_MODEL_API_KEY`
* **Type:** String
* **Description:** API key for default reranking model
* **Default:** None
* **Example:** `MINDSDB_DEFAULT_RERANKING_MODEL_API_KEY=sk-...`
## MindsDB's PID File
When running MindsDB via [Docker](/setup/self-hosted/docker) or [Docker Extension](/setup/self-hosted/docker-desktop), the PID file is not used by default. Users can opt for enabling the PID file by defining the `USE_PIDFILE` environment variable.
If used, the PID file is stored in the temp directory (`$TMPDIR` on MacOS and Linux, `%TEMP%` on Windows) under the `mindsdb` folder.
### Example
```bash Docker theme={null}
docker run --name mindsdb_container -e USE_PIDFILE=1 -e MINDSDB_APIS=http,mysql -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
```
```bash Shell theme={null}
export USE_PIDFILE=1
```
## MindsDB GUI Updates
In order to disable automatic GUI updates, the `MINDSDB_GUI_AUTOUPDATE` environment variable should be set to `false` (or `0`).
By default, the automatic GUI updates are enabled and the `MINDSDB_GUI_AUTOUPDATE` environment variable is set to `true` (or `1`).
### Example
```bash Docker theme={null}
docker run --name mindsdb_container -e MINDSDB_GUI_AUTOUPDATE=false -e MINDSDB_APIS=http,mysql -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
```
```bash Shell theme={null}
export MINDSDB_GUI_AUTOUPDATE=false
```
## MindsDB GUI Startup and Updates
In order to not open the MindsDB GUI automatically when starting the instance (and to disable automatic GUI updates), the `MINDSDB_NO_STUDIO` environment variable should be set to `true` (or `1`).
By default, the MindsDB GUI starts automatically when starting the instance (and the automatic GUI updates are enabled), that is, the `MINDSDB_NO_STUDIO` environment variable is set to `false` (or `0`).
Note that the `MINDSDB_NO_STUDIO` is not recommended for the MindsDB instance running in Docker. Instead, use the `MINDSDB_GUI_AUTOUPDATE` environment variable to disable automatic GUI updates.
### Example
```bash Docker theme={null}
docker run --name mindsdb_container -e MINDSDB_NO_STUDIO=true -e MINDSDB_APIS=http,mysql -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
```
```bash Shell theme={null}
export MINDSDB_NO_STUDIO=true
```
### ML Task Queue
#### `MINDSDB_ML_QUEUE_TYPE`
* **Type:** String (`local`, `redis`)
* **Description:** Type of ML task queue to use
* **Default:** `local`
* **Example:** `MINDSDB_ML_QUEUE_TYPE=redis`
#### `MINDSDB_ML_QUEUE_HOST`
* **Type:** String (hostname)
* **Description:** Redis host for ML task queue (only when `MINDSDB_ML_QUEUE_TYPE=redis`)
* **Default:** `localhost`
* **Example:** `MINDSDB_ML_QUEUE_HOST=redis.example.com`
#### `MINDSDB_ML_QUEUE_PORT`
* **Type:** Integer
* **Description:** Redis port for ML task queue
* **Default:** `6379`
* **Example:** `MINDSDB_ML_QUEUE_PORT=6380`
#### `MINDSDB_ML_QUEUE_DB`
* **Type:** Integer
* **Description:** Redis database number for ML task queue
* **Default:** `0`
* **Example:** `MINDSDB_ML_QUEUE_DB=1`
#### `MINDSDB_ML_QUEUE_USERNAME`
* **Type:** String
* **Description:** Redis username for ML task queue
* **Default:** None
* **Example:** `MINDSDB_ML_QUEUE_USERNAME=redis_user`
#### `MINDSDB_ML_QUEUE_PASSWORD`
* **Type:** String
* **Description:** Redis password for ML task queue
* **Default:** None
* **Example:** `MINDSDB_ML_QUEUE_PASSWORD=redis_pass`
## Reranker Configuration
#### `MINDSDB_RERANKER_N`
* **Type:** Integer
* **Description:** Number of results to rerank
* **Default:** None
* **Example:** `MINDSDB_RERANKER_N=10`
#### `MINDSDB_RERANKER_LOGPROBS`
* **Type:** Boolean (`true`, `false`, `1`, `0`, `yes`, `no`)
* **Description:** Enable log probabilities in reranker
* **Default:** None
* **Example:** `MINDSDB_RERANKER_LOGPROBS=true`
#### `MINDSDB_RERANKER_TOP_LOGPROBS`
* **Type:** Integer
* **Description:** Number of top log probabilities to return
* **Default:** None
* **Example:** `MINDSDB_RERANKER_TOP_LOGPROBS=5`
#### `MINDSDB_RERANKER_MAX_TOKENS`
* **Type:** Integer
* **Description:** Maximum tokens for reranker
* **Default:** None
* **Example:** `MINDSDB_RERANKER_MAX_TOKENS=512`
#### `MINDSDB_RERANKER_VALID_CLASS_TOKENS`
* **Type:** String (comma-separated list)
* **Description:** Valid class tokens for reranker
* **Default:** None
* **Example:** `MINDSDB_RERANKER_VALID_CLASS_TOKENS=token1,token2,token3`
## Features
#### `MINDSDB_DATA_CATALOG_ENABLED`
* **Type:** Boolean (`1`, `true`)
* **Description:** Enables the data catalog feature
* **Default:** `false`
* **Example:** `MINDSDB_DATA_CATALOG_ENABLED=1`
## Runtime
#### `MINDSDB_DOCKER_ENV`
* **Type:** Any value (presence check)
* **Description:** Indicates MindsDB is running in Docker environment (changes default API host to `0.0.0.0`)
* **Default:** Not set
* **Example:** `MINDSDB_DOCKER_ENV=1`
#### `MINDSDB_RUNTIME`
* **Type:** String (`1`)
* **Description:** Indicates MindsDB runtime environment
* **Default:** Not set
* **Example:** `MINDSDB_RUNTIME=1`
***
# MindsDB APIs
Source: https://docs.mindsdb.com/setup/mindsdb-apis
MindsDB provides multiple APIs with optional authentication mechanisms.
## APIs
When you start MindsDB, the following APIs become available:
* **HTTP API**, along with **A2A API** and **MCP API**, runs on port `47334`.
* Access the MindsDB Editor at `mindsdb-instance-url:47334`
* Access the MCP API at `mindsdb-instance-url:47334/mcp/`
* Access the A2A API at `mindsdb-instance-url:47334/a2a/`
* **MySQL API** runs on port `47335`.
* Connect to MindsDB from database clients as if it were a standard MySQL database.
## Authentication
Authentication mechanism covers HTTP API, A2A API, and MCP API.
You can configure authentication by setting [environment variables](/setup/environment-vars#mindsdb-authentication) or by defining credentials in the [configuration file](/setup/custom-config#auth).
For details on generating and using MindsDB authentication tokens, refer to the [authentication guide](/rest/authentication).
# Docker for MindsDB
Source: https://docs.mindsdb.com/setup/self-hosted/docker
MindsDB provides Docker images that facilitate running MindsDB in Docker containers.
As MindsDB integrates with numerous [data sources](/integrations/data-overview) and [AI frameworks](/integrations/ai-overview), each integration requires a set of dependencies. Hence, MindsDB provides multiple Docker images for different tasks, as outlined below.
* `mindsdb/mindsdb:latest` (or `mindsdb/mindsdb`)
It is the lightweight Docker image of MindsDB that comes with mysql,postgresql,snowflake,bigquery,mssql and salesforce pre-installed.
* `mindsdb/mindsdb:lightwood`
It is the Docker image of MindsDB that comes with thee Lightwood integration preloaded.
* `mindsdb/mindsdb:huggingface`
It is the Docker image of MindsDB that comes with the Hugging Face integration preloaded.
## Prerequisites
Before proceeding, ensure you have installed Docker, following the [official Docker documentation](https://docs.docker.com/install).
## Setup
This setup of MindsDB uses one of the available Docker images, as listed above.
When running MindsDB in one container and the integration you want to connect to it (such as Ollama or PGVector) in another container, then use `http://host.docker.internal` instead of `localhost` when connecting this integration to MindsDB.
Follow the steps to set up MindsDB in a Docker container.
### Install MindsDB
Run this command to create a Docker container with MindsDB:
```bash theme={null}
docker run --name mindsdb_container \
-e MINDSDB_APIS=http,mysql \
-p 47334:47334 -p 47335:47335 \
mindsdb/mindsdb
```
Where:
* `docker run` is a native Docker command used to spin up a container.
* `--name mindsdb_container` defines a name for the container.
* `-e MINDSDB_APIS=http,mysql` defines the APIs to be exposed by the MindsDB instance. All available APIs include `http`, `mysql`, and `postgres`.
* `-p 47334:47334 -p 47335:47335` defines the ports where the APIs are exposed (HTTP and MySQL respectively).
* `mindsdb/mindsdb` is a Docker image provided by MindsDB. You can choose a different one from the list above.
Once the container is created, you can use the following commands:
* `docker stop mindsdb_container` to stop the container. *Note that this may not always be necessary because when turning off the host machine, the container will also be shut down.*
* `docker start mindsdb_container` to restart a stopped container with all its previous changes (such as any dependencies that were installed) intact. *Note that `docker start` restarts a stopped container, while `docker run` creates a new container.*
If you don't want to follow the logs and get the prompt back, add the `-d` flag that stands for *detach*.
```bash theme={null}
docker run --name mindsdb_container -e MINDSDB_APIS=http -d -p 47334:47334 mindsdb/mindsdb
```
If you want to persist your models and configurations in the host machine, run these commands:
```bash theme={null}
mkdir mdb_data
docker run --name mindsdb_container -e MINDSDB_APIS=http -p 47334:47334 -v $(pwd)/mdb_data:/root/mdb_storage mindsdb/mindsdb
```
Where `-v $(pwd)/mdb_data:/root/mdb_storage` maps the newly created folder `mdb_data` on the host machine to the `/root/mdb_storage` inside the container.
Now you can access the MindsDB editor by going to `127.0.0.1:47334` in your browser.
If you experience any issues related to MKL or your training process does not complete, please add the `MKL_SERVICE_FORCE_INTEL` environment variable, as below.
```bash theme={null}
docker run --name mindsdb_container -e MKL_SERVICE_FORCE_INTEL=1 -e MINDSDB_APIS=http -p 47334:47334 mindsdb/mindsdb
```
If you want to enable authentication for MindsDB, you do so by passing the `MINDSDB_USERNAME` and `MINDSDB_PASSWORD` environment variables when running the container.
```bash theme={null}
docker run --name mindsdb_container -e MINDSDB_USERNAME='admin' -e MINDSDB_PASSWORD='password' -e MINDSDB_APIS=http -p 47334:47334 mindsdb/mindsdb
```
### Install dependencies
MindsDB integrates with numerous data sources and AI frameworks. To use any of the integrations, you should ensure that the required dependencies are installed in the Docker container.
**Method 1**
Install dependencies directly from MindsDB editor. Go to *Settings* and *Manage Integrations*, select integrations you want to use and click on *Install*.
**Method 2**
Start the MindsDB Docker container:
```bash theme={null}
docker start mindsdb_container
```
If you haven't specified a container name when spinning up a container with `docker run`, you can find it by running `docker ps`.
If you haven't yet created a container, use this command:
```bash theme={null}
docker run --name mindsdb_container -e MINDSDB_APIS=http -d -p 47334:47334 mindsdb/mindsdb
```
Start an interactive shell in the container:
```bash theme={null}
docker exec -it mindsdb_container sh
```
Install the dependencies:
```bash theme={null}
pip install .[handler_name]
```
For example, run this command to install dependencies for the [OpenAI handler](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/openai_handler):
```bash theme={null}
pip install .[openai]
```
Exit the interactive shell:
```bash theme={null}
exit
```
Restart the container:
```bash theme={null}
docker restart mindsdb_container
```
## Configuration
This is a configuration for MindsDB's Docker image that includes storage location, log level, debugging information, installed integrations, and API endpoints. These parameters can be customized by modifying a JSON file that stores default configuration.
### Default configuration
The default configuration for MindsDB's Docker image is stored as a JSON code, as below.
```json theme={null}
{
"config_version":"1.4",
"paths": {
"root": "/root/mdb_storage"
},
"debug": false,
"integrations": {},
"api": {
"http": {
"host": "0.0.0.0",
"port": "47334"
},
"mysql": {
"host": "0.0.0.0",
"password": "",
"port": "47335",
"user": "mindsdb",
"database": "mindsdb",
"ssl": true
}
}
}
```
### Custom configuration
To override the default configuration, you can mount a config file created in your host machine over `/root/mindsdb_config.json`, as below.
```bash theme={null}
docker run --name mindsdb_container -e MINDSDB_APIS=http -d -p 47334:47334 -v $(pwd)/mdb_config.json:/root/mindsdb_config.json mindsdb/mindsdb
```
**What's next?**
Now that you installed and started MindsDB locally in your Docker container, go ahead and find out how to create and train a model using the [`CREATE MODEL`](/sql/create/model) statement.
Check out the [Use Cases](/use-cases/overview) section to follow tutorials that cover Large Language Models, Chatbots, Time Series, Classification, and Regression models, Semantic Search, and more.
# Docker Desktop Extension for MindsDB
Source: https://docs.mindsdb.com/setup/self-hosted/docker-desktop
MindsDB provides an extension for Docker Desktop that facilitates running MindsDB on Docker Desktop.
Visit the [GitHub repository for MindsDB Docker Desktop Extension](https://github.com/mindsdb/mindsdb-docker-extension) to learn more.
## Prerequisites
Before proceeding, ensure you have installed Docker Desktop, following the [official Docker Desktop documentation](https://www.docker.com/products/docker-desktop/).
## Setup
This setup of MindsDB uses the `mindsdb/mindsdb:latest` Docker image, which is a lightweight Docker image of MindsDB that comes with mysql,postgresql,snowflake,bigquery,mssql,and salesforce pre-installed.
Follow the steps to set up MindsDB in Docker Desktop.
### Install the MindsDB Docker Desktop Extension
If you are a Windows user, ensure that you have enabled Developer Mode under settings before installing the extension.
It is not necessary to keep Developer Mode enabled to use the extension. Once the extension is installed, you can disable Developer Mode if you wish.
Go to the Extensions page in Docker Desktop and search for MindsDB.
Install the MindsDB extension.
The first time the extension is installed, it will run the latest version of MindsDB. Moving forward, it's advisable to regularly update the MindsDB image used by the extension to ensure access to the latest features and improvements.
As mentioned previously, the extension uses the `mindsdb/mindsdb:latest` Docker image. To update the image, follow these steps:
1. Navigate to the 'Images' tab in Docker Desktop.
2. Search or locate the mindsdb/mindsdb:latest image.
3. Click on the three dots on the right side of the image and click 'Pull'. If the image is already up to date, you will see a message stating so and you can skip the next step.
4. Wait for the image to be pulled and restart Docker Desktop.
Access MindsDB inside Docker Desktop.
### Install dependencies
In the MindsDB editor, go to *Settings* and *Manage Integrations*.
Select integrations you want to use and click on *Install*.
### View logs
In order to view the logs generated by MindsDB when running the extension, follow these steps:
1. Navigate to the 'Containers' tab in Docker Desktop.
2. Search or locate the multi-container application running the MindsDB extension. This can be done by searching for 'mindsdb'.
If you do not see the application listed here, navigate to the 'Extensions' tag in Settings and ensure that the 'Show Docker Extensions system containers' option is enabled.
3. Click on the container named 'mindsdb\_service'. This will direct you to the container running MindsDB.
4. View the logs in the 'Logs' tab.
# MindsDB System Defaults
Source: https://docs.mindsdb.com/setup/system-defaults
System defaults in MindsDB provide a convenient way to set application-wide configurations for commonly used AI models. By defining these defaults once, users can streamline workflows and avoid repeatedly specifying model parameters when creating or using various MindsDB objects and functions.
## Usage of System Defaults
When system defaults are set, MindsDB can automatically use the configured models across the platform for various components such as:
* [Agents](/mindsdb_sql/agents/agent) that can answer questions over the connected data and are powered by a default large language model (LLM).
* [Knowledge Bases](/mindsdb_sql/knowledge_bases/overview) that can store and search both structured and unstructured data, and use a default embedding model for embedding the content and a default reranking model for reranking the search results. Additionally, knowledge bases use a default model for evaluating performance with the [EVALUATE KNOWLEDGE\_BASE command](/mindsdb_sql/knowledge_bases/evaluate).
* Custom functions such as [LLM()](/mindsdb_sql/functions/llm_function) and [TO\_MARKDOWN()](/mindsdb_sql/functions/to_markdown_function) that rely on the default LLM for text generation and formatting.
Once configured, users can create and use agents, knowledge bases, and custom functions without having to specify model parameters each time. This ensures consistent behavior across the system and simplifies deployment.
## Available System Defaults
MindsDB supports the following system defaults:
| System Default | Used By | Description |
| ----------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
| Default LLM | Agents, EVALUATE, KNOWLEDGE\_BASE, LLM(), TO\_MARKDOWN() | Used as an underlying LLM for reasoning, conversation, and text generation and formatting. |
| Default Embedding Model | Knowledge Bases | Converts inserted content and user questions into embeddings for semantic search. |
| Default Reranking Model | Knowledge Bases | Reranks search results to improve retrieval accuracy. |
## Supported Model Providers
Different components in MindsDB support different sets of model providers.
**Knowledge Bases**
Supported providers for **embedding models**:
* Azure OpenAI
* Bedrock
* Google
* OpenAI (and OpenAI-compatible model providers)
* Snowflake Cortex AI
Supported providers for **reranking models**:
* Azure OpenAI
* Bedrock
* Google
* OpenAI (and OpenAI-compatible model providers)
* Snowflake Cortex AI
Supported providers for **models used to evaluate knowledge bases**:
* Azure OpenAI
* Bedrock
* Google
* OpenAI (and OpenAI-compatible model providers)
* Snowflake Cortex AI
**Agents**
Supported providers for **default models**:
* Bedrock
* Google
* Ollama
* OpenAI (and OpenAI-compatible model providers)
**LLM()**
Supported providers for **default models**:
* Ollama
* OpenAI (and OpenAI-compatible model providers)
**TO\_MARKDOWN()**
Supported providers for **default models**:
* Azure OpenAI
* Google
* OpenAI (and OpenAI-compatible model providers)
## How to Configure System Defaults
You can configure system defaults using either the MindsDB UI or the configuration file, depending on your setup preferences.
The configuration variables include a provider name, a model name, and – if available – base URL, API key, and API version.
**Option 1: Configure via MindsDB UI**
1. Open the MindsDB UI.
2. Navigate to Settings → Models.
3. Define the models for each of the system defaults as follows:
a. Under Provider, select the model provider from the dropdown.
b. Under Model, define the model name that is available with the selected model provider.
c. Under Base URL, define the base URL of the model provider, if available.
d. Under API key, provide the API key, if available.
e. Under API version, define the API version, if available.
4.Click the Test & Save button to validate and save the configuration.
After saving, the defaults take immediate effect across your MindsDB instance.
**Option 2: Configure via MindsDB Configuration File**
You can also define system defaults in the [MindsDB configuration file](/setup/custom-config). This method is recommended for advanced or automated deployments.
When MindsDB is started with the custom configuration file, it will automatically load and apply these default models.
**Option 3: Environment Variables**
For functions like [LLM()](/mindsdb_sql/functions/llm_function) and [TO\_MARKDOWN()](/mindsdb_sql/functions/to_markdown_function), system defaults can also be defined using environment variables. This allows for easy configuration in containerized or cloud deployments.
Refer to the individual function documentation for details on environment variables.
**Option 4: Define Models at Object Creation**
You can specify models when creating [agents](/mindsdb_sql/agents/agent_syntax) and [knowledge bases](/mindsdb_sql/knowledge_bases/create). These models override the system defaults for that specific object.
This allows you to tailor model behavior per agent or per knowledge base while keeping system-wide defaults in place.
Note that after changing the default model, the existing objects are not updated with the new default model. All objects being created going forward will use the updated default models.
## Summary
System defaults in MindsDB simplify the AI development by standardizing the models used across various components. Whether configured through the UI, YAML configuration file, or environment variables, defaults help maintain consistency and reduce setup time.
# MindsDB Projects
Source: https://docs.mindsdb.com/sql/project
MindsDB enables you to group all objects within [projects](/sql/project).
Projects store [all MindsDB schema objects](/sql/table-structure#the-information-schema-database) except for handlers, connected data sources, and configured AI/ML engines. That is, projects can store models, views, jobs, triggers, agents, skills, knowledge bases, and chatbots.
MindsDB provides the default `mindsdb` project where all objects created without defining a project are stored.
## Working with MindsDB Projects
### Create a Project
Use the below command to create a project.
```sql theme={null}
CREATE PROJECT project_name;
```
Use lower-case letters for a project name.
### List All Projects
Use the below command to list all projects.
```sql theme={null}
SHOW [FULL] DATABASES
WHERE type = 'project';
```
### Create an Object within a Project
Use the below command template to create an object within a project.
```sql theme={null}
CREATE