How to Evaluate Knowledge Bases
Evaluating knowledge bases verifies how accurate and relevant is the data returned by the knowledge base.
EVALUATE KNOWLEDGE_BASE
Syntax
With the EVALUATE KNOWLEDGE_BASE
command, users can evaluate the relevancy and accuracy of the documents and data returned by the knowledge base.
Below is the complete syntax that includes both required and optional parameters.
test_table
This is a required parameter that stores the name of the table from one of the data sources connected to MindsDB. For example, test_table = my_datasource.my_test_table
defines a table named my_test_table
from a data source named my_datasource
.
This test table stores test data commonly in form of questions and answers. Its content depends on the version
parameter defined below.
Users can provide their own test data or have the test data generated by the EVALUATE KNOWLEDGE_BASE
command, which is performed when setting the generate_data
parameter defined below.
version
This is an optional parameter that defines the version of the evaluator. If not defined, its default value is doc_id
.
-
version = 'doc_id'
The evaluator checks whether the document ID returned by the knowledge base matched the expected document ID as defined in the test table. -
version = 'llm_relevancy'
The evaluator uses a language model to rank and evaluate responses from the knowledge base.
generate_data
This is an optional parameter used to configure the test data generation, which is saved into the table defined in the test_table
parameter. If not defined, its default value is false
, meaning that no test data is generated.
Available values are as follows:
-
A dictionary containing the following values:
from_sql
defines the SQL query that fetches the test data. For example,'from_sql': 'SELECT id, content FROM my_datasource.my_table'
. If not defined, it fetches test data from the knowledge base on which theEVALUATE
command is executed:SELECT chunk_content, id FROM my_kb
.count
defines the size of the test dataset. For example,'count': 100
. Its default value is 20.
When providing the
from_sql
parameter, it requires specific column names as follows:-
With
version = 'doc_id'
, thefrom_sql
parameter should contain a query that returns theid
andcontent
columns, like this:'from_sql': 'SELECT id_column_name AS id, content_column_names AS content FROM my_datasource.my_table'
-
With
version = 'llm_relevancy'
, thefrom_sql
parameter should contain a query that returns thecontent
column, like this:'from_sql': 'SELECT content_column_names AS content FROM my_datasource.my_table'
-
A value of
true
, such asgenerate_data = true
, which implies that default values forfrom_sql
andcount
will be used.
evaluate
This is an optional parameter that defines whether to evaluate the knowledge base. If not defined, its default value is true
.
Users can opt for setting it to false, evaluate = false
, in order to generate test data into the test table without running the evaluator.
llm
This is an optional parameter that defines a language model to be used for evaluations, if version
is set to llm_relevancy
.
If not defined, its default value is the reranking_model
defined with the knowledge base.
Users can define it with the EVALUATE KNOWLEDGE_BASE
command in the same manner.
save_to
This is an optional parameter that stores the name of the table from one of the data sources connected to MindsDB. For example, save_to = my_datasource.my_result_table
defines a table named my_result_table
from the data source named my_datasource
. If not defined, the results are not saved into a table.
This table is used to save the evaluation results.
By default, evaluation results are returned after executing the EVALUATE KNOWLEDGE_BASE
statement.
Evaluation Results
When using version = 'doc_id'
, the following columns are included in the evaluation results:
total
stores the total number of questions.total_found
stores the number of questions to which the knowledge bases provided correct answers.retrieved_in_top_10
stores the number of top 10 questions to which the knowledge bases provided correct answers.cumulative_recall
stores data that can be used to create a chart.avg_query_time
stores the execution time of a search query of the knowledge base.name
stores the knowledge base name.created_at
stores the timestamp when the evaluation was created.
When using version = 'llm_relevancy'
, the following columns are included in the evaluation results:
avg_relevancy
stores the average relevancy.avg_relevance_score_by_k
stores the average relevancy at k.avg_first_relevant_position
stores the average first relevant position.mean_mrr
stores the Mean Reciprocal Rank (MRR).hit_at_k
stores the Hit@k value.bin_precision_at_k
stores the Binary Precision@k.avg_entropy
stores the average relevance score entropy.avg_ndcg
stores the average nDCG.avg_query_time
stores the execution time of a search query of the knowledge base.name
stores the knowledge base name.created_at
stores the timestamp when the evaluation was created.