In this section, we present how to connect ChromaDB to MindsDB.
ChromaDB is the open-source embedding database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs.
Prerequisites
Before proceeding, ensure the following prerequisites are met:
- Install MindsDB locally via Docker or Docker Desktop.
- To connect ChromaDB to MindsDB, install the required dependencies following this instruction.
- Install or ensure access to ChromaDB.
Connection
This handler is implemented using the chromadb Python library.
To connect to a remote ChromaDB instance, use the following statement:
CREATE DATABASE chromadb_datasource
WITH ENGINE = 'chromadb'
PARAMETERS = {
"host": "YOUR_HOST",
"port": YOUR_PORT,
"distance": "l2/cosine/ip" -- optional, default is cosine
}
The required parameters are:
host: The host name or IP address of the ChromaDB instance.
port: The TCP/IP port of the ChromaDB instance.
distance: It defines how the distance between vectors is calculated. Available method include l2, cosine, and ip, as explained here.
To connect to an in-memory ChromaDB instance, use the following statement:
CREATE DATABASE chromadb_datasource
WITH ENGINE = "chromadb",
PARAMETERS = {
"persist_directory": "YOUR_PERSIST_DIRECTORY",
"distance": "l2/cosine/ip" -- optional
}
The required parameters are:
persist_directory: The directory to use for persisting data.
distance: It defines how the distance between vectors is calculated. Available method include l2, cosine, and ip, as explained here.
Usage
Now, you can use the established connection to create a collection (or table in the context of MindsDB) in ChromaDB and insert data into it:
CREATE TABLE chromadb_datasource.test_embeddings (
SELECT embeddings,'{"source": "fda"}' as metadata
FROM mysql_datasource.test_embeddings
);
mysql_datasource is another MindsDB data source that has been created by connecting to a MySQL database. The test_embeddings table in the mysql_datasource data source contains the embeddings that we want to store in ChromaDB.
You can query your collection (table) as shown below:
SELECT *
FROM chromadb_datasource.test_embeddings;
To filter the data in your collection (table) by metadata, you can use the following query:
SELECT *
FROM chromadb_datasource.test_embeddings
WHERE `metadata.source` = "fda";
To conduct a similarity search, the following query can be used:
SELECT *
FROM chromadb_datasource.test_embeddings
WHERE search_vector = (
SELECT embeddings
FROM mysql_datasource.test_embeddings
LIMIT 1
);