Follow this blog post for a comprehensive tutorial on how to fine-tune a Mistral 7B model.

All Anyscale models belong to the group of Large Language Models (LLMs).

These are some of the supported models:

  • Mistral7B
  • Llama-2-7b
  • Llama-2-13b
  • Llama-2-70b
  • Code Llama

Let’s create a model to answer questions about MindsDB’s custom SQL syntax.

First, create an AnyScale engine, passing your Anyscale API key:

CREATE ML_ENGINE anyscale_engine
FROM anyscale_endpoints
USING
    anyscale_endpoints_api_key = 'your-anyscale-api-key';

Then, create a model using this engine:

CREATE MODEL mymistral7b
PREDICT completion
USING
    engine = 'anyscale_engine',
    model_name = 'mistralai/Mistral-7B-Instruct-v0.1',
    prompt_template = 'Return a valid SQL string for the following question about MindsDB in-database machine learning: {{prompt}}';

You can check model status with this command:

DESCRIBE mymistral7b;

Once the status is complete, we can query for predictions:

SELECT prompt, completion
FROM mymistral7b as m
WHERE prompt = 'What is the SQL syntax to join input data with predictions from a MindsDB machine learning model?'
USING max_tokens=400;

On execution, we get:

+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| prompt                                                                                            | completion                                                                                           |
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| What is the SQL syntax to join input data with predictions from a MindsDB machine learning model? | The SQL syntax is: SELECT * FROM input_data INNER JOIN predictions ON input_data.id = predictions.id |
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+

If you followed one of the MindsDB tutorials before, you’ll see that the syntax provided by the model is not exactly as expected.

Now, we’ll fine-tune our model using a table that stores details about MindsDB’s custom SQL syntax.

Let’s connect to a DB that hosts a table we’ll use to fine-tune our model:

CREATE DATABASE example_db
WITH ENGINE = "postgres",
PARAMETERS = {
    "user": "demo_user",
    "password": "demo_password",
    "host": "samples.mindsdb.com",
    "port": "5432",
    "database": "demo",
    "schema": "demo_data"
    };

Now we can take a look at the fine-tuning data:

SELECT message_id, role, content
FROM example_db.chat_llm_mindsdb_docs
LIMIT 5;

And here are the first few rows:

message_idrolecontent
0systemYou are a helpful assistant. Your task is to answer a user’s question regarding the SQL syntax supported by MindsDB, a machine learning product for training models and seamlessly deploying them where your data lives.
1userIn the context of MindsDB: 1. Testing CREATE DATABASE
2assistantCREATE DATABASE example_db WITH ENGINE = "postgres", PARAMETERS = { "user": "demo_user", "password": "demo_password", "host": "samples.mindsdb.com", ... };
Output:
status
------
Query successfully completed
3systemYou are a helpful assistant. Your task is to answer a user’s question regarding the SQL syntax supported by MindsDB, a machine learning product for…
4userIn the context of MindsDB: 2. Testing Preview the Available Data Using SELECT

Notice it is formatted as a series of chats that conform to the standard OpenAI chat format. Every message has a “role” and some “content”. By chaining together a series of messages, we can create a conversation.

Now, you can fine-tune a Mistral model with this data like so:

FINETUNE mymistral7b
FROM example_db
    (SELECT * FROM chat_llm_mindsdb_docs);

The FINETUNE command creates a new version of the mistralai/Mistral-7B-Instruct-v0.1 model. You can query all available versions as below:

SELECT *
FROM models_versions
WHERE name = 'mymistral7b';

While the model is being generated and trained, it is not active. The model becomes active only after it completes generating and training.

Once the new version status is complete and active, we can query the model again, expecting a more accurate output.

SELECT prompt, completion
FROM mymistral7b as m
WHERE prompt = 'What is the SQL syntax to join input data with predictions from a MindsDB machine learning model?'
USING max_tokens=400;

On execution, we get:

+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| prompt                                                                                            | completion                                                                                           |
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| What is the SQL syntax to join input data with predictions from a MindsDB machine learning model? | SELECT * FROM mindsdb.models.my_model JOIN mindsdb.input_data_name;                                  |
+---------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+

If you have dynamic data that gets updated regularly, you can set up an automated fine-tuning as below.

Note that the data source must contain an incremental column, such as timestamp or integer, so MindsDB can pick up only the recently added data.

Create a view to store recently added data with the help of the LAST keyword:

CREATE VIEW recent_data (
    SELECT *
    FROM example_db.chat_llm_mindsdb_docs
    WHERE timestamp > LAST
);

Create a job to fine-tune the model periodically.

CREATE JOB automated_finetuning (

    FINETUNE mymistral7b
    FROM mindsdb
        (SELECT * FROM recent_data)
)
EVERY 1 day;

Now your model will be fine-tuned with newly added data every day.