As of now, the Data Catalog is available for the following integrations:
Enabling the Data Catalog
To enable the Data Catalog feature in MindsDB, update your config.json
file by setting the data_catalog
flag to true
:
{
"data_catalog": {
"enabled": true
}
}
Follow this doc page to learn how to start MindsDB with custom configuration.
Note that the data catalog is generated for a data source only after this data source is connected to an agent.
Here is an example:
CREATE DATABASE snowflake_data
WITH
ENGINE = 'snowflake',
PARAMETERS = {
"account": "abc123-xyz987",
"user": "username",
"password": "password",
"database": "database_name",
"schema": "schema_name",
"warehouse": "warehouse_name"
};
CREATE AGENT my_agent
USING
include_tables= ['snowflake_data.table_name', ...];
Now you can query the data catalog generated for the snowflake_data
integration.
How It Works
When you create an agent in MindsDB that connects to one of the supported integrations, the Data Catalog automatically:
- Inspects the data source.
- Extracts metadata for all accessible tables and columns.
- Stores this information in a dedicated catalog schema (
DATA_CATALOG
).
- Makes this metadata available to agents and users via both SQL queries and internal reasoning.
Current Limitations
This feature is still evolving and has some known limitations:
- One-Time Snapshot: Metadata is generated only once—at the time the agent is created. If the data schema changes (e.g., new columns, renamed tables), the Data Catalog will not automatically update. A refresh mechanism is planned in a future release.
- No Manual Feedback: If any metadata appears to be incorrect (e.g., wrong row counts or data types), there is currently no way for users to flag or correct it. A feedback system will be introduced soon.
Responses are generated using AI and may contain mistakes.