TO_MARKDOWN()
function that lets users extract the content of their documents in markdown by simply specifying the document path or URL. This function is especially useful for passing the extracted content of documents through LLMs or for storing them in a Knowledge Base.
TO_MARKDOWN()
function supports different file formats and methods of passing documents into it, as well as an LLM required for processing documents.
TO_MARKDOWN()
function supports PDF, XML, and Nessus file formats. The documents can be provided from URLs, file storage, or Amazon S3 storage.
TO_MARKDOWN()
function requires an LLM to process the document content into the Markdown format.
The supported LLM providers include:
TO_MARKDOWN()
function.
The TO_MARKDOWN_FUNCTION_PROVIDER
environment variable defines the selected provider, which is one of openai
, azure_openai
, or google
.
OpenAI
Azure OpenAI
TO_MARKDOWN()
function to extract the content of your documents in markdown format. The arguments for this function are:
file_path_or_url
: The path or URL of the document you want to extract content from.From Amazon S3
TO_MARKDOWN()
function with a PDF document from Amazon S3 storage connected to MindsDB.public_url
of the file is generated in the s3_datasource.files
table upon connecting the Amazon S3 data source to MindsDB.public_url
of the file is selected from the s3_datasource.files
table.From URL
TO_MARKDOWN()
function with a PDF document from URL.TO_MARKDOWN()
function to extract content from documents and store it in a Knowledge Base. This is particularly useful for creating a Knowledge Base from a collection of documents.