SQL API
- Overview
- MindsDB SQL Syntax
- Connect
- Data Sources
- AI/ML Engines
- Projects
- Models
- Predictions
- Tables, Views, and Files
- Jobs
- Triggers
- AI Agents
- Knowledge Bases
- Functions
- Standard SQL Support
The TO_MARKDOWN() Function
MindsDB provides the TO_MARKDOWN()
function that lets users extract the content of their documents in markdown by simply specifying the document path or URL. This function is especially useful for passing the extracted content of documents through LLMs or for storing them in a Knowledge Base.
At the moment, the TO_MARKDOWN()
function only supports PDF documents.
Prerequisites
The TO_MARKDOWN()
function in MindsDB requires the use of an LLM. To enable this, select one of the supported model providers and set the relevant environment variables.
The model provider can be set using the TO_MARKDOWN_FUNCTION_PROVIDER
environment variable. The default provider is OpenAI and the default model is gpt-4o. The following providers are supported:
- OpenAI (
openai
) - Azure OpenAI (
azure_openai
)
The model you select must support multi-modal inputs (e.g., both images and text). For example, OpenAI’s gpt-4o is a supported multi-modal model.
You can also set up your default models in the MindsDB configuration file. With this, you will not need to set the environment variables specified below.
Here are the environment variables for the OpenAI provider:
TO_MARKDOWN_FUNCTION_API_KEY (required)
TO_MARKDOWN_FUNCTION_MODEL_NAME
TO_MARKDOWN_FUNCTION_TEMPERATURE
TO_MARKDOWN_FUNCTION_MAX_RETRIES
TO_MARKDOWN_FUNCTION_MAX_TOKENS
TO_MARKDOWN_FUNCTION_BASE_URL
TO_MARKDOWN_FUNCTION_API_ORGANIZATION
TO_MARKDOWN_FUNCTION_REQUEST_TIMEOUT
Here are the environment variables for the Azure OpenAI provider:
TO_MARKDOWN_FUNCTION_API_KEY (required)
TO_MARKDOWN_FUNCTION_BASE_URL (required)
TO_MARKDOWN_FUNCTION_API_VERSION (required)
TO_MARKDOWN_FUNCTION_MODEL_NAME
TO_MARKDOWN_FUNCTION_TEMPERATURE
TO_MARKDOWN_FUNCTION_MAX_RETRIES
TO_MARKDOWN_FUNCTION_MAX_TOKENS
TO_MARKDOWN_FUNCTION_API_ORGANIZATION
TO_MARKDOWN_FUNCTION_REQUEST_TIMEOUT
Usage
You can use the TO_MARKDOWN()
function to extract the content of your documents in markdown format. The arguments for this function are:
file_path_or_url
: The path or URL of the document you want to extract content from.
The following example shows how to use the TO_MARKDOWN()
function with a PDF document without using a LLM:
SELECT TO_MARKDOWN('https://www.princexml.com/howcome/2016/samples/invoice/index.pdf');
Here is the output:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| to_markdown |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ```markdown |
| # Invoice |
| |
| YesLogic Pty. Ltd. |
| 7 / 39 Bouverie St |
| Carlton VIC 3053 |
| Australia |
| |
| www.yeslogic.com |
| ABN 32 101 193 560 |
| |
| Customer Name |
| Street |
| Postcode City |
| Country |
| |
| Invoice date: | Nov 26, 2016 |
| --- | --- |
| Invoice number: | 161126 |
| Payment due: | 30 days after invoice date |
| |
| | Description | From | Until | Amount | |
| |---------------------------|-------------|-------------|------------| |
| | Prince Upgrades & Support | Nov 26, 2016 | Nov 26, 2017 | USD $950.00 | |
| | Total | | | USD $950.00 | |
| |
| Please transfer amount to: |
| |
| Bank account name: | Yes Logic Pty Ltd |
| --- | --- |
| Name of Bank: | Commonwealth Bank of Australia (CBA) |
| Bank State Branch (BSB): | 063010 |
| Bank State Branch (BSB): | 063010 |
| Bank State Branch (BSB): | 063019 |
| Bank account number: | 13201652 |
| Bank SWIFT code: | CTBAAU2S |
| Bank address: | 231 Swanston St, Melbourne, VIC 3000, Australia |
| |
| The BSB number identifies a branch of a financial institution in Australia. When transferring money to Australia, the BSB number is used together with the bank account number and the SWIFT code. Australian banks do not use IBAN numbers. |
| |
| www.yeslogic.com |
| ``` |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
The content of each PDF page is intelligently extracted by first assessing how visually complex the page is. Based on this assessment, the system decides whether traditional text parsing is sufficient or if the page should be processed using an LLM.
Usage with Knowledge Bases
You can also use the TO_MARKDOWN()
function to extract content from documents and store it in a Knowledge Base. This is particularly useful for creating a Knowledge Base from a collection of documents.
INSERT INTO my_kb (
SELECT
HASH('https://www.princexml.com/howcome/2016/samples/invoice/index.pdf') as id,
TO_MARKDOWN('https://www.princexml.com/howcome/2016/samples/invoice/index.pdf') as content
)
Was this page helpful?