Evaluating knowledge bases verifies how accurate and relevant is the data returned by the knowledge base.

EVALUATE KNOWLEDGE_BASE Syntax

With the EVALUATE KNOWLEDGE_BASE command, users can evaluate the relevancy and accuracy of the documents and data returned by the knowledge base.

Below is the complete syntax that includes both required and optional parameters.

EVALUATE KNOWLEDGE_BASE my_kb
USING
    test_table = my_datasource.my_test_table,
    version = 'doc_id',
    generate_data = {
        'from_sql': 'SELECT id, content FROM my_datasource.my_table',
        'count': 100
    }, 
    evaluate = false,
    llm = {
        'provider': 'openai',
        'api_key':'sk-xxx',
        'model_name':'gpt-4'
    },
    save_to = my_datasource.my_result_table; 

test_table

This is a required parameter that stores the name of the table from one of the data sources connected to MindsDB. For example, test_table = my_datasource.my_test_table defines a table named my_test_table from a data source named my_datasource.

This test table stores test data commonly in form of questions and answers. Its content depends on the version parameter defined below.

Users can provide their own test data or have the test data generated by the EVALUATE KNOWLEDGE_BASE command, which is performed when setting the generate_data parameter defined below.

version

This is an optional parameter that defines the version of the evaluator. If not defined, its default value is doc_id.

  • version = 'doc_id' The evaluator checks whether the document ID returned by the knowledge base matched the expected document ID as defined in the test table.

  • version = 'llm_relevancy' The evaluator uses a language model to rank and evaluate responses from the knowledge base.

generate_data

This is an optional parameter used to configure the test data generation, which is saved into the table defined in the test_table parameter. If not defined, its default value is false, meaning that no test data is generated.

Available values are as follows:

  • A dictionary containing the following values:

    • from_sql defines the SQL query that fetches the test data. For example, 'from_sql': 'SELECT id, content FROM my_datasource.my_table'. If not defined, it fetches test data from the knowledge base on which the EVALUATE command is executed: SELECT chunk_content, id FROM my_kb.
    • count defines the size of the test dataset. For example, 'count': 100. Its default value is 20.

    When providing the from_sql parameter, it requires specific column names as follows:

    • With version = 'doc_id', the from_sql parameter should contain a query that returns the id and content columns, like this: 'from_sql': 'SELECT id_column_name AS id, content_column_names AS content FROM my_datasource.my_table'

    • With version = 'llm_relevancy', the from_sql parameter should contain a query that returns the content column, like this: 'from_sql': 'SELECT content_column_names AS content FROM my_datasource.my_table'

  • A value of true, such as generate_data = true, which implies that default values for from_sql and count will be used.

evaluate

This is an optional parameter that defines whether to evaluate the knowledge base. If not defined, its default value is true.

Users can opt for setting it to false, evaluate = false, in order to generate test data into the test table without running the evaluator.

llm

This is an optional parameter that defines a language model to be used for evaluations, if version is set to llm_relevancy.

If not defined, its default value is the reranking_model defined with the knowledge base.

Users can define it with the EVALUATE KNOWLEDGE_BASE command in the same manner.

EVALUATE KNOWLEDGE_BASE my_kb
USING
    ...
    llm = {
        "provider": "azure_openai",
        "model_name" : "gpt-4o",
        "api_key": "sk-abc123",
        "base_url": "https://ai-6689.openai.azure.com/",
        "api_version": "2024-02-01",
        "method": "multi-class"
    },
    ...

save_to

This is an optional parameter that stores the name of the table from one of the data sources connected to MindsDB. For example, save_to = my_datasource.my_result_table defines a table named my_result_table from the data source named my_datasource. If not defined, the results are not saved into a table.

This table is used to save the evaluation results.

By default, evaluation results are returned after executing the EVALUATE KNOWLEDGE_BASE statement.

Evaluation Results

When using version = 'doc_id', the following columns are included in the evaluation results:

  • total stores the total number of questions.
  • total_found stores the number of questions to which the knowledge bases provided correct answers.
  • retrieved_in_top_10 stores the number of top 10 questions to which the knowledge bases provided correct answers.
  • cumulative_recall stores data that can be used to create a chart.
  • avg_query_time stores the execution time of a search query of the knowledge base.
  • name stores the knowledge base name.
  • created_at stores the timestamp when the evaluation was created.

When using version = 'llm_relevancy', the following columns are included in the evaluation results:

  • avg_relevancy stores the average relevancy.
  • avg_relevance_score_by_k stores the average relevancy at k.
  • avg_first_relevant_position stores the average first relevant position.
  • mean_mrr stores the Mean Reciprocal Rank (MRR).
  • hit_at_k stores the Hit@k value.
  • bin_precision_at_k stores the Binary Precision@k.
  • avg_entropy stores the average relevance score entropy.
  • avg_ndcg stores the average nDCG.
  • avg_query_time stores the execution time of a search query of the knowledge base.
  • name stores the knowledge base name.
  • created_at stores the timestamp when the evaluation was created.