Amazon S3
This documentation describes the integration of MindsDB with Amazon S3, an object storage service that offers industry-leading scalability, data availability, security, and performance.
This data source integration is thread-safe, utilizing a connection pool where each thread is assigned its own connection. When handling requests in parallel, threads retrieve connections from the pool as needed.
Prerequisites
Before proceeding, ensure that MindsDB is installed locally via Docker or Docker Desktop.
Connection
Establish a connection to your Amazon S3 bucket from MindsDB by executing the following SQL command:
Note that sample parameter values are provided here for reference, and you should replace them with your connection parameters.
Required connection parameters include the following:
aws_access_key_id
: The AWS access key that identifies the user or IAM role.aws_secret_access_key
: The AWS secret access key that identifies the user or IAM role.
Optional connection parameters include the following:
aws_session_token
: The AWS session token that identifies the user or IAM role. This becomes necessary when using temporary security credentials.bucket
: The name of the Amazon S3 bucket. If not provided, all available buckets can be queried, however, this can affect performance, especially when listing all of the available objects.
Usage
Retrieve data from a specified object (file) in a S3 bucket by providing the integration name and the object key:
If a bucket name is provided in the CREATE DATABASE
command, querying will be limited to that bucket and the bucket name can be ommitted from the object key as shown in the example above. However, if the bucket name is not provided, the object key must include the bucket name, such as s3_datasource.
my-bucket/my-folder/my-file.csv`.
Wrap the object key in backticks (`) to avoid any issues parsing the SQL statements provided. This is especially important when the object key contains spaces, special characters or prefixes, such as my-folder/my-file.csv
.
At the moment, the supported file formats are CSV, TSV, JSON, and Parquet.
The above examples utilize s3_datasource
as the datasource name, which is defined in the CREATE DATABASE
command.
The special files
table can be used to list all objects available in the specified bucket or all buckets if the bucket name is not provided:
The content of files can also be retrieved by explicitly requesting the content
column. This column is empty by default to avoid unnecessary data transfer:
This table will return all objects regardless of the file format, however, only the supported file formats mentioned above can be queried.
Troubleshooting Guide
Database Connection Error
- Symptoms: Failure to connect MindsDB with the Amazon S3 bucket.
- Checklist:
- Make sure the Amazon S3 bucket exists.
- Confirm that provided AWS credentials are correct. Try making a direct connection to the S3 bucket using the AWS CLI.
- Ensure a stable network between MindsDB and AWS.
SQL statement cannot be parsed by mindsdb_sql
- Symptoms: SQL queries failing or not recognizing object names containing spaces, special characters or prefixes.
- Checklist:
- Ensure object names with spaces, special characters or prefixes are enclosed in backticks.
- Examples:
- Incorrect: SELECT * FROM integration.travel/travel_data.csv
- Incorrect: SELECT * FROM integration.‘travel/travel_data.csv’
- Correct: SELECT * FROM integration.`travel/travel_data.csv`
Was this page helpful?