This documentation describes the integration of MindsDB with Databricks, the world’s first data intelligence platform powered by generative AI. The integration allows MindsDB to access data stored in a Databricks workspace and enhance it with AI capabilities.Documentation Index
Fetch the complete documentation index at: https://docs.mindsdb.com/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Before proceeding, ensure the following prerequisites are met:- Install MindsDB locally via Docker or Docker Desktop.
- To connect Databricks to MindsDB, install the required dependencies following this instruction.
If the Databricks cluster you are attempting to connect to is terminated, executing the queries given below will attempt to start the cluster and therefore, the first query may take a few minutes to execute.To avoid any delays, ensure that the Databricks cluster is running before executing the queries.
Connection
Establish a connection to your Databricks workspace from MindsDB by executing the following SQL command:server_hostname: The server hostname for the cluster or SQL warehouse.http_path: The HTTP path of the cluster or SQL warehouse.access_token: A Databricks personal access token for the workspace.
session_configuration: Additional (key, value) pairs to set as Spark session configuration parameters. This should be provided as a JSON string.http_headers: Additional (key, value) pairs to set in HTTP headers on every RPC request the client makes. This should be provided as"http_headers": [['Header-1', 'value1'], ['Header-2', 'value2']].catalog: The catalog to use for the connection. Default ishive_metastore.schema: The schema (database) to use for the connection. Default isdefault.
Usage
Retrieve data from a specified table by providing the integration name, catalog, schema, and table name:The catalog and schema names only need to be provided if the table to be queried is not in the specified (or default) catalog and schema.
The above examples utilize
databricks_datasource as the datasource name, which is defined in the CREATE DATABASE command.