Supported Filtering Operators
content = ‘xxx’
, content LIKE ‘xxx’
id != xxx
, id <> xxx
, content NOT LIKE ‘zzz’
id NOT IN (SELECT DISTINCT id FROM my_kb WHERE content = ‘xxx’)
content IN (‘xxx’, ‘yyy’)
which is equivalent to content = ‘xxx’ OR content = ‘yyy’
, content NOT IN (‘zzz’, ‘aaa’)
content = ‘xxx’ OR content = ‘yyy’
which is a union of results for both conditions, content = ‘xxx’ AND content = ‘yyy’
which is an intersection of results for both conditionsWHERE
clause of a SQL statement.
Supported Filtering Operators
=
, <>
, !=
>
, <
, >=
, <=
, BETWEEN ... AND ...
LIKE
, NOT LIKE
, IN
, NOT IN
AND
, OR
, NOT
Finetune Filtering using Relevance Score
relevance > 0.75
.
relevance
to restrict results to those above your chosen threshold. The results set contains only data with relevance greater than 0.75.
SELECT FROM KB
Syntaxid
It stores values from the column defined in the id_column
parameter when creating the knowledge base. These are the source data IDs.
chunk_id
Knowledge bases chunk the inserted data in order to fit the defined chunk size. If the chunking is performed, the following chunk ID format is used: <id>:<chunk_number>of<total_chunks>:<start_char_number>to<end_char_number>
.
chunk_content
It stores values from the column(s) defined in the content_columns
parameter when creating the knowledge base.
metadata
It stores the general metadata and the metadata defined in the metadata_columns
parameter when creating the knowledge base.
distance
It stores the calculated distance between the chunk’s content and the search phrase.
relevance
It stores the calculated relevance of the chunk as compared to the search phrase. Its values are between 0 and 1.
relevance
differs as follows:relevance
is equal or greater than 0, unless defined otherwise in the WHERE
clause.relevance
is not defined in the query, then no relevance filtering is applied and the output includes all rows matched based on the similarity and metadata search.relevance
is defined in the query, then the relevance is calculated based on the distance
column (1/(1+ distance)
) and the relevance
value is compared with this relevance value to filter the output.content
) to be searched for.
relevance
LIMIT
relevance
and LIMIT
as follows:LIMIT
clause) that match the defined content
. Next, these set of rows is filtered out to match the defined relevance
.relevance
in order to get only the most relevant results.
relevance
filter, the output is limited to only data with relevance score of the provided value. The available values of relevance
are between 0 and 1, and its default value covers all available relevance values ensuring no filtering based on the relevance score.
Users can limit the number of rows returned.
relevance
column values are not calculated.
Users can do both, filter by metadata and search by content.
JOIN
Syntaxmovie_id
column to uniquely identify each entry. The content
column stores the description of the movie, and the metadata includes genre
, rating
, and expanded_genre
columns.
Let’s see the query examples.
content LIKE 'heist bank robbery space alien planet'
- and multiple metadata filtering conditions - genre != 'Romance' AND expanded_genres NOT LIKE '%Romance%' AND rating > 7.0
.
content LIKE 'car chase driving speed race'
- and multiple metadata filtering conditions - expanded_genres LIKE '%Action%' AND expanded_genres LIKE '%Comedy%' AND rating > 6.5
.
content LIKE 'historical period past century era' AND content NOT LIKE 'war battle soldier military' AND content NOT LIKE 'fight combat weapon'
- and multiple metadata filtering conditions - expanded_genres LIKE '%Drama%' AND rating > 3.5
.
(content LIKE 'detective mystery investigation' AND (genre = 'Mystery' OR expanded_genres LIKE '%Thriller%'))
- and a metadata filtering condition - rating > 7.0
.
content LIKE 'adventure journey quest treasure'
- and multiple metadata filtering conditions - genre NOT IN ('Horror', 'Romance', 'Family') AND rating > 6.5
.
content LIKE 'comedy funny humor laugh'
- and multiple metadata filtering conditions - rating BETWEEN 7.0 AND 9.0 AND expanded_genres LIKE '%Comedy%'
.
UNION
operator.