EVALUATE KNOWLEDGE_BASE
SyntaxEVALUATE KNOWLEDGE_BASE
command, users can evaluate the relevancy and accuracy of the documents and data returned by the knowledge base.
Below is the complete syntax that includes both required and optional parameters.
test_table
test_table = my_datasource.my_test_table
defines a table named my_test_table
from a data source named my_datasource
.
This test table stores test data commonly in form of questions and answers. Its content depends on the version
parameter defined below.
Users can provide their own test data or have the test data generated by the EVALUATE KNOWLEDGE_BASE
command, which is performed when setting the generate_data
parameter defined below.
version
doc_id
.
version = 'doc_id'
The evaluator checks whether the document ID returned by the knowledge base matched the expected document ID as defined in the test table.
version = 'llm_relevancy'
The evaluator uses a language model to rank and evaluate responses from the knowledge base.
generate_data
test_table
parameter. If not defined, its default value is false
, meaning that no test data is generated.
Available values are as follows:
from_sql
defines the SQL query that fetches the test data. For example, 'from_sql': 'SELECT id, content FROM my_datasource.my_table'
. If not defined, it fetches test data from the knowledge base on which the EVALUATE
command is executed: SELECT chunk_content, id FROM my_kb
.count
defines the size of the test dataset. For example, 'count': 100
. Its default value is 20.from_sql
parameter, it requires specific column names as follows:version = 'doc_id'
, the from_sql
parameter should contain a query that returns the id
and content
columns, like this: 'from_sql': 'SELECT id_column_name AS id, content_column_names AS content FROM my_datasource.my_table'
version = 'llm_relevancy'
, the from_sql
parameter should contain a query that returns the content
column, like this: 'from_sql': 'SELECT content_column_names AS content FROM my_datasource.my_table'
true
, such as generate_data = true
, which implies that default values for from_sql
and count
will be used.
evaluate
true
.
Users can opt for setting it to false, evaluate = false
, in order to generate test data into the test table without running the evaluator.
llm
version
is set to llm_relevancy
.
If not defined, its default value is the reranking_model
defined with the knowledge base.
Users can define it with the EVALUATE KNOWLEDGE_BASE
command in the same manner.
save_to
save_to = my_datasource.my_result_table
defines a table named my_result_table
from the data source named my_datasource
. If not defined, the results are not saved into a table.
This table is used to save the evaluation results.
By default, evaluation results are returned after executing the EVALUATE KNOWLEDGE_BASE
statement.
version = 'doc_id'
, the following columns are included in the evaluation results:
total
stores the total number of questions.total_found
stores the number of questions to which the knowledge bases provided correct answers.retrieved_in_top_10
stores the number of top 10 questions to which the knowledge bases provided correct answers.cumulative_recall
stores data that can be used to create a chart.avg_query_time
stores the execution time of a search query of the knowledge base.name
stores the knowledge base name.created_at
stores the timestamp when the evaluation was created.version = 'llm_relevancy'
, the following columns are included in the evaluation results:
avg_relevancy
stores the average relevancy.avg_relevance_score_by_k
stores the average relevancy at k.avg_first_relevant_position
stores the average first relevant position.mean_mrr
stores the Mean Reciprocal Rank (MRR).hit_at_k
stores the Hit@k value.bin_precision_at_k
stores the Binary Precision@k.avg_entropy
stores the average relevance score entropy.avg_ndcg
stores the average nDCG.avg_query_time
stores the execution time of a search query of the knowledge base.name
stores the knowledge base name.created_at
stores the timestamp when the evaluation was created.