The TO_MARKDOWN() Function
MindsDB provides the TO_MARKDOWN()
function that lets users extract the content of their documents in markdown by simply specifying the document path or URL. This function is especially useful for passing the extracted content of documents through LLMs or for storing them in a Knowledge Base.
Prerequisites
To enable the use of a LLM with the TO_MARKDOWN()
function in MindsDB, choose one of the available model providers and define the following environment variables.
This function can be executed with or without the help of a LLM. The use of a LLM is optional, and the function will work without it. For most documents, the content can be extracted without the need for a LLM. However, in the followig cases, using a LLM is recommended:
- When PDF documents contain images, the LLM can generate descriptions for those images.
- When the document itself is an image, the LLM can generate a description of the image content.
Usage
You can use the TO_MARKDOWN()
function to extract the content of your documents in markdown format. The arguments for this function are:
file_path_or_url
: The path or URL of the document you want to extract content from.use_llm
: A boolean value that indicates whether to use a LLM for generating the markdown content. If set toTrue
, the environment variables for the LLM provider must be set. If set toFalse
, the function will work without a LLM.
The following example shows how to use the TO_MARKDOWN()
function with a PDF document without using a LLM:
Here is the output:
The following example shows how to use the TO_MARKDOWN()
function with a PDF document using a LLM:
The output includes the markdown content of the document, including the LLM-generated descriptions for any images within it.