Anyscale Endpoints
This documentation describes the integration of MindsDB with Anyscale Endpoints, a fast and scalable API to integrate OSS LLMs into apps. The integration allows for the deployment of Anyscale Endpoints models within MindsDB, providing the models with access to data from various data sources.
Prerequisites
Before proceeding, ensure the following prerequisites are met:
- Install MindsDB locally via Docker or Docker Desktop.
- To use Anyscale Endpoints within MindsDB, install the required dependencies following this instruction.
- Obtain the Anyscale Endpoints API key required to deploy and use Anyscale Endpoints models within MindsDB. Follow the instructions for obtaining the API key.
Setup
Create an AI engine from the Anyscale Endpoints handler.
Create a model using anyscale_endpoints_engine
as an engine.
It is possilbe to override certain parameters set for a model at prediction time instead of recreating the model. For example, to change the temperature parameter for a specific prediction, use the following query:
The parameters that can be overridden as shown above are mentioned below in the detailed explanation.
The following is a more detailed explanation of the parameters used in the CREATE MODEL
statement:
engine
engine
This is the engine name as created with the CREATE ML_ENGINE
statement.
api_base
api_base
This parameter is optional.
It replaces the default Anyscale’s base URL with the defined value.
mode
mode
This parameter is optional.
The available modes include default
, conversational
and conversational-full
.
- The
default
mode is used by default. The model will generate a separate response for each input provided. No context is maintained between the inputs. - The
conversational
mode will maintain context between the inputs and generate a single response. This response will be placed in the last row of the result set. - The
conversational-full
mode will maintain context between the inputs and generate a response for each input.
model_name
model_name
This parameter is optional.
By default, the meta-llama/Llama-2-7b-chat-hf
model is used.
question_column
question_column
This parameter is optional.
It contains the column name that stores user input.
context_column
context_column
This parameter is optional.
It contains the column name that stores context for the user input.
prompt_template
prompt_template
This parameter is optional if you use question_column
.
It stores the message or instructions as a base template with placeholders to be filled in by the user input at prediction time. Please note that this parameter can be overridden at prediction time.
prompt
prompt
This parameter is optional.
It defines the initial (system) prompt for the model.
max_tokens
max_tokens
This parameter is optional.
It defines the maximum token cost of the prediction. Please note that this parameter can be overridden at prediction time.
temperature
temperature
This parameter is optional.
It defines how risky the answers are. The value of 0
marks a well-defined answer, and the value of 0.9
marks a more creative answer.
Please note that this parameter can be overridden at prediction time.
json_struct
json_struct
This parameter is optional.
It is used to extract JSON data from a text column provided in the prompt_template
parameter. See examples here.
The implementation of this integration is based on the engine for the OpenAI API, as Anyscale conforms to it. There are a few notable differences, though:
- All models supported by Anyscale Endpoints are open source. A full list can be found here for inference-only under section Supported models.
- Not every model is supported for fine-tuning. You can find a list here under section Fine Tuning - Supported models.
Please check both lists regularly, as they are subject to change. If you try to fine-tune a model that is not supported, you will get a warning and subsequently an error from the Anyscale endpoint.
- This integration only offers chat-based text completion models, either for normal text or specialized for code.
- When providing a description, this integration returns the respective HuggingFace model card.
- Fine-tuning requires that your dataset complies with the chat format. That is, each row should contain a context and a role. The context is the text that is the message in the chat, and the role is who authored it (system, user, or assistant, where the last one is the model). For more information, please check the fine tuning guide in the Anyscale Endpoints docs.
The base URL for this API is https://api.endpoints.anyscale.com/v1
.
Usage
The following usage examples utilize anyscale_endpoints_engine
to create a model with the CREATE MODEL
statement.
The output generated for a single input will be the same regardless of the mode used. The difference between the modes is in how the model handles multiple inputs.
files.unrelated_questions
is a simple CSV file containing a question
column with simple (unrelated) questions that has been uploaded to MindsDB, while files.related_questions
is a similar file containing related questions. files.unrelated_questions_with_context
and files.related_questions_with_context
are similar files containing an additional context
column.
These files are used in the examples given below to provide multiple inputs to the models created. It is possible to use any other supported data source in the same manner.
Default mode
Default mode
In the default
mode, the model will generate a separate response for each input provided. No context is maintained between the inputs.
Prompt completion
Prompt completion
To generate a response for a single input, the following query can be used:
The response will look like the following:
To generate responses for multiple inputs, the following query can be used:
The response will look like the following:
Question answering
Question answering
To generate a response for a single input, the following query can be used:
The response will look like the following:
To generate responses for multiple inputs, the following query can be used:
The response will look like the following:
Question answering with context
Question answering with context
To generate a response for a single input, the following query can be used:
The response will look like the following:
To generate responses for multiple inputs, the following query can be used:
The response will look like the following:
Conversational mode
Conversational mode
In the conversational
mode, the model will maintain context between the inputs and generate a single response. This response will be placed in the last row of the result set.
Prompt completion
Prompt completion
To generate a response for a single input, the following query can be used:
The response will look like the following:
To generate responses for multiple inputs, the following query can be used:
The response will look like the following:
question | answer |
---|---|
Where is Stockholm located? | |
What are some fun activities to do there? | Stockholm is the capital city of Sweden and is located in the southeastern part of the country. Some fun activities to do in Stockholm include visiting the famous Vasa Museum, exploring the beautiful archipelago, taking a stroll through the charming Gamla Stan neighborhood, and trying out some of the local food and drinks. |
Question answering
Question answering
To generate a response for a single input, the following query can be used:
The response will look like the following:
To generate responses for multiple inputs, the following query can be used:
The response will look like the following:
question | answer |
---|---|
Where is Stockholm located? | |
What are some fun activities to do there? | Stockholm is the capital city of Sweden and is located in the southeastern part of the country. Some fun activities to do in Stockholm include visiting the famous Vasa Museum, exploring the beautiful archipelago, taking a stroll through the charming Gamla Stan neighborhood, and trying out some of the local food and drinks. |
Question answering with context
Question answering with context
To generate a response for a single input, the following query can be used:
The response will look like the following:
To generate responses for multiple inputs, the following query can be used:
The response will look like the following:
question | context | answer |
---|---|---|
Where is Anna planning a trip to next month? | Anna is planning a trip to Kyoto next month. | |
What does Anna plan on doing there? | Anna plans on going sightseeing. | Anna plans on going sightseeing during her trip to Kyoto next month. |
Conversational-full mode
Conversational-full mode
In the conversational-full
mode, the model will maintain context between the inputs and generate a response for each input.
Prompt completion
Prompt completion
To generate a response for a single input, the following query can be used:
The response will look like the following:
To generate responses for multiple inputs, the following query can be used:
The response will look like the following:
question | answer |
---|---|
Where is Stockholm located? | Stockholm is the capital city of Sweden, located in the southeastern part of the country. It is situated on an island in the Stockholm archipelago, which is made up of more than 30,000 islands. The city is known for its beautiful architecture, museums, and cultural attractions, as well as its vibrant food and nightlife scene. |
What are some fun activities to do there? | Stockholm is the capital city of Sweden and is located in the southeastern part of the country, on the east coast of the Stockholm archipelago. Some fun activities to do in Stockholm include visiting the famous Vasa Museum, exploring the charming old town of Gamla Stan, taking a stroll through the beautiful parks and gardens, and trying out some of the local food and drinks. There are also many opportunities for shopping, cultural experiences, and outdoor activities such as hiking and biking |
Question answering
Question answering
To generate a response for a single input, the following query can be used:
The response will look like the following:
To generate responses for multiple inputs, the following query can be used:
The response will look like the following:
question | answer |
---|---|
Where is Stockholm located? | Stockholm is the capital city of Sweden, located in the southeastern part of the country. It is situated on an island in the Stockholm archipelago, which is made up of more than 30,000 islands. The city is known for its beautiful architecture, museums, and cultural attractions, as well as its vibrant food and nightlife scene. |
What are some fun activities to do there? | Stockholm is the capital city of Sweden and is located in the southeastern part of the country, on the east coast of the Stockholm archipelago. Some fun activities to do in Stockholm include visiting the famous Vasa Museum, exploring the charming old town of Gamla Stan, taking a stroll through the beautiful parks and gardens, and trying out some of the local food and drinks. There are also many opportunities for shopping, cultural experiences, and outdoor activities such as hiking and biking |
Question answering with context
Question answering with context
To generate a response for a single input, the following query can be used:
The response will look like the following:
To generate responses for multiple inputs, the following query can be used:
The response will look like the following:
question | context | answer |
---|---|---|
Where is Anna planning a trip to next month? | Anna is planning a trip to Kyoto next month. | Anna is planning a trip to Kyoto next month. |
What does Anna plan on doing there? | Anna plans on going sightseeing. | Anna plans on going sightseeing during her trip to Kyoto next month. |
Next Steps
Follow this tutorial to see more use case examples.
Troubleshooting Guide
Authentication Error
- Symptoms: Failure to authenticate to the Anyscale.
- Checklist:
- Make sure that your Anyscale account is active.
- Confirm that your API key is correct.
- Ensure that your API key has not been revoked.
- Ensure that you have not exceeded the API usage or rate limit.
SQL statement cannot be parsed by mindsdb_sql
- Symptoms: SQL queries failing or not recognizing table and model names containing spaces or special characters.
- Checklist:
- Ensure table names with spaces or special characters are enclosed in backticks.
Examples:
- Incorrect:
- Incorrect:
- Correct:
- Incorrect:
- Ensure table names with spaces or special characters are enclosed in backticks.
Examples: