> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mindsdb.com/llms.txt
> Use this file to discover all available pages before exploring further.

# How to Query Knowledge Bases

Knowledge Bases support two primary querying approaches: semantic search and metadata filtering, each of which offers different filtering capabilities, including filtering by the relevance score to ensure only data most relevant to the query is returned.

* **Semantic Search**

  Semantic search enables users to query Knowledge Bases using natural language. When searching semantically, you reference the content column in your SQL statement. MindsDB will interpret the input as a semantic query and use vector-based similarity to find relevant results.

  ```sql theme={null}
  SELECT * FROM my_kb
  WHERE content = 'what document types store reviews?';
  ```

  <Accordion title="Supported Filtering Operators">
    Only specific operators are allowed when filtering semantically using the content column.

    * Standard vector search: `content = ‘xxx’`, `content LIKE ‘xxx’`
    * Exclusions from search: `id != xxx`, `id <> xxx`, `content NOT LIKE ‘zzz’`
    * Nested queries: `id NOT IN (SELECT DISTINCT id FROM my_kb WHERE content = ‘xxx’)`
    * Multiple queries: `content IN (‘xxx’, ‘yyy’)` which is equivalent to `content = ‘xxx’ OR content = ‘yyy’`, `content NOT IN (‘zzz’, ‘aaa’)`
    * Logical operators: `content = ‘xxx’ OR content = ‘yyy’` which is a union of results for both conditions, `content = ‘xxx’ AND content = ‘yyy’` which is an intersection of results for both conditions
  </Accordion>

* **Metadata Filtering**

  It allows users to query Knowledge Bases based on the available metadata fields. These fields can be used in the `WHERE` clause of a SQL statement.

  ```sql theme={null}
  SELECT * FROM my_kb
  WHERE document_type = ‘cover letter’
  AND document_author = 'bot';
  ```

  <Accordion title="Supported Filtering Operators">
    You can apply a variety of filtering conditions to metadata columns, such as equality checks, range filters, or pattern matches.

    * Equality checks: `=`, `<>`, `!=`
    * Range filters: `>`, `<`, `>=`, `<=`, `BETWEEN ... AND ...`
    * Pattern matching: `LIKE`, `NOT LIKE`, `IN`, `NOT IN`
    * Logical operators: `AND`, `OR`, `NOT`
  </Accordion>

* **Relevance Filtering**

  Every semantic search result is assigned a relevance score, which indicates how closely a given entry matches your query. You can filter results by this score to ensure only the most relevant entries are returned.

  <Accordion title="Finetune Filtering using Relevance Score">
    Here is how to fine-tune the filtering of data.

    * Start by querying the knowledge base without a WHERE clause on the relevance column. This will show you a range of relevance scores returned by your query.

    * Determine a cutoff relevance value that fits your use case. For example, `relevance > 0.75`.

    * Re-run your query with the condition on `relevance` to restrict results to those above your chosen threshold. The results set contains only data with relevance greater than 0.75.

      ```sql theme={null}
      SELECT * FROM my_kb
      WHERE content = 'what document types store reviews?’
      AND relevance > 0.75;
      ```
  </Accordion>

<Tip>
  See more [examples here](/mindsdb_sql/knowledge_bases/query#examples).
</Tip>

## `SELECT FROM KB` Syntax

Knowledge bases provide an abstraction that enables users to see the stored data.

Note that here a sample knowledge base created and inserted into in the previous **Example** sections is searched.

```sql theme={null}
SELECT *
FROM my_kb;
```

Here is the sample output:

```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id  | chunk_id             | chunk_content           | metadata                                                                                                                                                                                            | product               | distance           | relevance          |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black    | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0}        | Wireless Mouse        | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish  | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
| 3XZ | 3XZ_notes:1of1:0to19 | Gift wrap requested     | {"chunk_index":0,"content_column":"notes","end_char":19,"original_doc_id":"3XZ_notes","original_row_id":"3XZ","product":"Bluetooth Speaker","source":"TextChunkingPreprocessor","start_char":0}     | Bluetooth Speaker     | 0.8010851611432231 | 0.2500003885558766 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```

The following columns are stored in the knowledge base.

* `id`
  It stores values from the column defined in the `id_column` parameter when creating the knowledge base. These are the source data IDs.

* `chunk_id`
  Knowledge bases chunk the inserted data in order to fit the defined chunk size. If the chunking is performed, the following chunk ID format is used: `<id>:<chunk_number>of<total_chunks>:<start_char_number>to<end_char_number>`.

* `chunk_content`
  It stores values from the column(s) defined in the `content_columns` parameter when creating the knowledge base.

* `metadata`
  It stores the general metadata and the metadata defined in the `metadata_columns` parameter when creating the knowledge base.

* `distance`
  It stores the calculated distance between the chunk's content and the search phrase.

* `relevance`
  It stores the calculated relevance of the chunk as compared to the search phrase. Its values are between 0 and 1.

<Note>
  Note that the calculation method of `relevance` differs as follows:

  * When the ranking model is provided, the default `relevance` is equal or greater than 0, unless defined otherwise in the `WHERE` clause.
  * When the reranking model is not provided and the `relevance` is not defined in the query, then no relevance filtering is applied and the output includes all rows matched based on the similarity and metadata search.
  * When the reranking model is not provided but the `relevance` is defined in the query, then the relevance is calculated based on the `distance` column (`1/(1+ distance)`) and the `relevance` value is compared with this relevance value to filter the output.
</Note>

### Semantic Search

Users can query a knowledge base using semantic search by providing the search phrase (called `content`) to be searched for.

```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'color'
```

Alternatively, users can filter by the `chunk_content` column of the knowledge base.

```sql theme={null}
SELECT *
FROM my_kb
WHERE chunk_content LIKE '%color%'
```

Here is the output:

```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id  | chunk_id             | chunk_content           | metadata                                                                                                                                                                                            | product               | distance           | relevance          |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black    | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0}        | Wireless Mouse        | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish  | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
| 3XZ | 3XZ_notes:1of1:0to19 | Gift wrap requested     | {"chunk_index":0,"content_column":"notes","end_char":19,"original_doc_id":"3XZ_notes","original_row_id":"3XZ","product":"Bluetooth Speaker","source":"TextChunkingPreprocessor","start_char":0}     | Bluetooth Speaker     | 0.8010851611432231 | 0.2500003885558766 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```

<Note>
  When querying a knowledge base, the default values include the following:

  * `relevance` <br />
    If not provided, its default value is equal to or greater than 0, ensuring there is no filtering of rows based on their relevance.

  * `LIMIT` <br />
    If not provided, its default value is 10, returning a maximum of 10 rows.
</Note>

<Tip>
  Note that when specifying both `relevance` and `LIMIT` as follows:

  ```sql theme={null}
  SELECT *
  FROM my_kb
  WHERE content = 'color'
  AND relevance >= 0.5
  LIMIT 20;
  ```

  The query extracts 20 rows (as defined in the `LIMIT` clause) that match the defined `content`. Next, these set of rows is filtered out to match the defined `relevance`.
</Tip>

Users can limit the `relevance` in order to get only the most relevant results.

```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'color'
AND relevance >= 0.5;
```

Here is the output:

```sql theme={null}
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+--------------------+
| id  | chunk_id             | chunk_content          | metadata                                                                                                                                                                                     | product        | distance           | relevance          |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black   | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.5103766499957533 |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+--------------------+
```

By providing the `relevance` filter, the output is limited to only data with relevance score of the provided value. The available values of `relevance` are between 0 and 1, and its default value covers all available relevance values ensuring no filtering based on the relevance score.

Users can limit the number of rows returned.

```sql theme={null}
SELECT *
FROM my_kb
WHERE content = 'color'
LIMIT 2;
```

Here is the output:

```sql theme={null}
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| id  | chunk_id             | chunk_content           | metadata                                                                                                                                                                                            | product               | distance           | relevance          |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black    | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0}        | Wireless Mouse        | 0.5743341242061104 | 0.5093188026135379 |
| Q7P | Q7P_notes:1of1:0to22 | Prefer aluminum finish  | {"chunk_index":0,"content_column":"notes","end_char":22,"original_doc_id":"Q7P_notes","original_row_id":"Q7P","product":"Aluminum Laptop Stand","source":"TextChunkingPreprocessor","start_char":0} | Aluminum Laptop Stand | 0.7744703514692067 | 0.2502580835880018 |
+-----+----------------------+-------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+--------------------+--------------------+
```

### Metadata Filtering

Besides semantic search features, knowledge bases enable users to filter the result set by the defined metadata.

```sql theme={null}
SELECT *
FROM my_kb
WHERE product = 'Wireless Mouse';
```

Here is the output:

```sql theme={null}
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+----------+
| id  | chunk_id             | chunk_content          | metadata                                                                                                                                                                                     | product        | relevance | distance |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+----------+
| A1B | A1B_notes:1of1:0to20 | Request color: black   | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | [NULL]    | [NULL]   |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+-----------+----------+
```

Note that when searching by metadata alone, the `relevance` column values are not calculated.

Users can do both, filter by metadata and search by content.

```sql theme={null}
SELECT *
FROM my_kb
WHERE product = 'Wireless Mouse'
AND content = 'color'
AND relevance >= 0.5;
```

Here is the output:

```sql theme={null}
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
| id  | chunk_id             | chunk_content          | metadata                                                                                                                                                                                     | product        | distance           | relevance         |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
| A1B | A1B_notes:1of1:0to20 | Request color: black   | {"chunk_index":0,"content_column":"notes","end_char":20,"original_doc_id":"A1B_notes","original_row_id":"A1B","product":"Wireless Mouse","source":"TextChunkingPreprocessor","start_char":0} | Wireless Mouse | 0.5743341242061104 | 0.504396172197583 |
+-----+----------------------+------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+--------------------+-------------------+
```

## `JOIN` Syntax

Knowledge bases can be used in the standard SQL JOIN statements.

```sql theme={null}
SELECT t.order_id, t.product, t.notes, kb.chunk_content, kb.relevance
FROM local_postgres.orders AS t
JOIN my_kb AS kb
ON t.order_id = kb.id
WHERE t.order_id = 'A1B'
AND kb.content = 'color'
AND kb.product = 'Wireless Mouse';
```

Here is the output:

```sql theme={null}
+----------+------------------+------------------------+------------------------+--------------------+
| order_id | product          | notes                  | chunk_content          | relevance          |
+----------+------------------+------------------------+------------------------+--------------------+
| A1B      | Wireless Mouse   | Request color: black   | Request color: black   | 0.5106591666649376 |
+----------+------------------+------------------------+------------------------+--------------------+
```

## Examples

We have a knowledge base that stores data about movies.

```sql theme={null}
+----------+-----------------------------------+-------------------------------------------------------------------------+
| id       | content                           | metadata                                                                |
+----------+-----------------------------------+-------------------------------------------------------------------------+
| movie_id | "A bank security expert plots..." | {"genre":"Crime","rating":6.3,"expanded_genres":"Comedy, Crime, Drama"} |
+----------+-----------------------------------+-------------------------------------------------------------------------+
```

It uses the `movie_id` column to uniquely identify each entry. The `content` column stores the description of the movie, and the metadata includes `genre`, `rating`, and `expanded_genre` columns.

Let's see the query examples.

* Selecting high-rated action movies with heist themes and no romance.

  ```sql theme={null}
  SELECT * FROM movies_kb 
  WHERE content LIKE 'heist bank robbery space alien planet'
  AND genre != 'Romance' 
  AND expanded_genres NOT LIKE '%Romance%'
  AND rating > 7.0;
  ```

  This query includes a semantic search filtering condition - `content LIKE 'heist bank robbery space alien planet'` - and multiple metadata filtering conditions - `genre != 'Romance' AND expanded_genres NOT LIKE '%Romance%' AND rating > 7.0`.

* Selecting action-comedies with car chase scenes.

  ```sql theme={null}
  SELECT * FROM movies_kb 
  WHERE content LIKE 'car chase driving speed race'
  AND expanded_genres LIKE '%Action%'
  AND expanded_genres LIKE '%Comedy%'
  AND rating > 6.5;
  ```

  This query includes a semantic search filtering condition - `content LIKE 'car chase driving speed race'` - and multiple metadata filtering conditions - `expanded_genres LIKE '%Action%' AND expanded_genres LIKE '%Comedy%' AND rating > 6.5`.

* Selecting historical dramas without war themes.

  ```sql theme={null}
  SELECT * FROM movies_kb 
  WHERE content LIKE 'historical period past century era'
  AND content NOT LIKE 'war battle soldier military'
  AND content NOT LIKE 'fight combat weapon'
  AND expanded_genres LIKE '%Drama%'
  AND rating > 3.5;
  ```

  This query includes multiple semantic search filtering conditions - `content LIKE 'historical period past century era' AND content NOT LIKE 'war battle soldier military' AND content NOT LIKE 'fight combat weapon'` - and multiple metadata filtering conditions - `expanded_genres LIKE '%Drama%' AND rating > 3.5`.

* Selecting multi-genre movies with different ratings.

  ```sql theme={null}
  SELECT * FROM movies_kb 
  WHERE (content LIKE 'detective mystery investigation' AND (genre = 'Mystery' OR expanded_genres LIKE '%Thriller%'))
  OR (content LIKE 'romance love relationship' AND (genre = 'Romance' OR expanded_genres LIKE '%Romance%'))
  AND rating > 7.0;
  ```

  This query includes nested semantic search filtering conditions - `(content LIKE 'detective mystery investigation' AND (genre = 'Mystery' OR expanded_genres LIKE '%Thriller%'))` - and a metadata filtering condition - `rating > 7.0`.

* Selecting adventure movies excluding some genres.

  ```sql theme={null}
  SELECT * FROM movies_kb 
  WHERE content LIKE 'adventure journey quest treasure'
  AND genre NOT IN ('Horror', 'Romance', 'Family')
  AND rating > 6.5;
  ```

  This query includes multiple semantic search filtering conditions - `content LIKE 'adventure journey quest treasure'` - and multiple metadata filtering conditions - `genre NOT IN ('Horror', 'Romance', 'Family') AND rating > 6.5`.

* Selecting comedy movies in specific rating range.

  ```sql theme={null}
  SELECT * FROM movies_kb 
  WHERE content LIKE 'comedy funny humor laugh'
  AND rating BETWEEN 7.0 AND 9.0
  AND expanded_genres LIKE '%Comedy%';
  ```

  This query includes multiple semantic search filtering conditions - `content LIKE 'comedy funny humor laugh'` - and multiple metadata filtering conditions - `rating BETWEEN 7.0 AND 9.0 AND expanded_genres LIKE '%Comedy%'`.

* Selecting different thriller subgenres.

  ```sql theme={null}
  SELECT * FROM movies_kb 
  WHERE content LIKE 'detective investigation mystery' AND rating > 7.0
  UNION
  SELECT * FROM movies_kb 
  WHERE content LIKE 'heist robbery theft steal' AND rating > 7.0
  UNION
  SELECT * FROM movies_kb 
  WHERE content LIKE 'spy secret agent undercover' AND rating > 7.0;
  ```

  This query combines the results of three queries using the `UNION` operator.
