Apache Impala
This is the implementation of the Impala data handler for MindsDB.
Apache Impala is an MPP (Massive Parallel Processing) SQL query engine for processing huge volumes of data that is stored in the Apache Hadoop cluster. It is an open source software written in C++ and Java. It provides high performance and low latency compared to other SQL engines for Hadoop. In other words, Impala is the highest performing SQL engine (giving RDBMS-like experience) that provides the fastest way to access data stored in Hadoop Distributed File System.
Prerequisites
Before proceeding, ensure the following prerequisites are met:
- Install MindsDB locally via Docker or Docker Desktop.
- To connect Apache Impala to MindsDB, install the required dependencies following this instruction.
- Install or ensure access to Apache Impala.
Implementation
This handler is implemented using impyla
, a Python library that allows you to use Python code to run SQL commands on Impala.
The required arguments to establish a connection are:
user
is the username associated with the database.password
is the password to authenticate your access.host
is the server IP address or hostname.port
is the port through which TCP/IP connection is to be made.database
is the database name to be connected.
Usage
In order to make use of this handler and connect to the Impala database in MindsDB, the following syntax can be used:
CREATE DATABASE impala_datasource
WITH
engine = 'impala',
parameters = {
"user":"root",
"password":"p@55w0rd",
"host":"127.0.0.1",
"port":21050,
"database":"Db_NamE"
};
You can use this established connection to query your table as follows:
SELECT *
FROM impala_datasource.TEST;
Was this page helpful?