This is the implementation of the BigQuery data handler for MindsDB.

BigQuery is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a Platform-as-a-Service that supports querying using ANSI SQL.

Prerequisites

Before proceeding, ensure the following prerequisites are met:

  1. Install MindsDB locally via Docker or use MindsDB Cloud.
  2. To connect Google BigQuery to MindsDB, install the required dependencies following this instruction.
  3. Install or ensure access to Google BigQuery.

Implementation

This handler is implemented using the google-cloud-bigquery Python library.

The required arguments to establish a connection are as follows:

  • project_id is a globally unique identifier for your project.
  • dataset defines the default dataset. Tables from this dataset are shown in the object tree.
  • service_account_keys is a full path to the service account key file. This parameter is required but can be replaced by service_account_json.
  • service_account_json stores the content of a JSON file defined by the service_account_keys parameter. This parameter is required when service_account_keys is not provided.

For more info about creating and managing the service account key visit this page.

Usage

In order to make use of this handler and connect to the BigQuery database in MindsDB, the following syntax can be used:

You can use either the service_account_keys parameter or the service_account_json parameter.

If you use MindsDB Cloud, it is recommended to provide the service_account_json parameter, as in the example below.

If you use local installation of MindsDB, you can provide either the service_account_keys parameter or the service_account_json parameter, as in the example below.

CREATE DATABASE bqdataset
WITH
   engine = "bigquery",
   parameters = {
      "project_id": "bgtest-1111",
      "dataset": "mydataset",
      "service_account_keys": "/tmp/keys.json"
   };

The mydataset dataset is the default one. The below queries are equivalent:

SELECT * FROM bq.rentals;
-- results in the same output as
SELECT * FROM bq.mydataset.rentals;

You can query from other datasets, like this:

SELECT * FROM bq.otherdataset.fish;

You can use this established connection to query your dataset as follows:

SELECT *
FROM bgdataset.dataset.table
LIMIT 10;