INSERT INTO
SyntaxINSERT INTO
is logged in the information_schema.queries
table with the timestamp when it was ran, and can be queried as follows:INSERT INTO
command with the kb_no_upsert
flag. It is recommended to use this flag only when the knowledge base is empty, that is, only for the initial data insertion.id_column
defined.
When users do not define the id_column
during the creation of a knowledge base, MindsDB generates the ID for each row using a hash of the content columns, as explained here.
Example:
If two rows have exactly the same content in the content columns, their hash (and thus their generated ID) will be the same.
Note that duplicate rows are skipped and not inserted.
Since both rows in the below table have the same content, only one row will be inserted.
name | age |
---|---|
Alice | 25 |
Alice | 25 |
id_column
defined.
When users define the id_column
during the creation of a knowledge base, then the knowledge base uses that column’s values as the row ID.
Example:
If the id_column
has duplicate values, the knowledge base skips the duplicate row(s) during the insert.
The second row in the below table has the same id
as the first row, so only one of these rows is inserted.
id | name | age |
---|---|---|
1 | Alice | 25 |
1 | Bob | 30 |
id_column
uniquely identifies each row to avoid unintentional data loss due to duplicate ID skipping.Laptop Stand
to Aluminum Laptop Stand
.
batch_size
defines the number of rows fetched per iteration to optimize data extraction from the source. It defaults to 1000.
threads
defines threads for running partitions. Note that if the ML task queue is enabled, threads are used automatically. The available values for threads
are:
threads = 10
,threads = true
, or disable threads, setting threads = false
.track_column
defines the column used for sorting data before partitioning.
error
defines the error processing options. The available values include raise
, used to raise errors as they come, or skip
, used to subside errors. It defaults to raise
if not provided.
INSERT INTO
statement with the above parameters, users can view the data insertion progress by querying the information_schema.queries
table.
information_schema.queries
table.
information_schema.queries
table.
<id>:<chunk_number>of<total_chunks>:<start_char_number>to<end_char_number>
.
chunk_size
parameter defines the size of the chunk as the number of characters. And the chunk_overlap
parameter defines the number of characters that should overlap between subsequent chunks.
type
of chunking is set to json_chunking
, users can configure it by setting the following parameter values in the json_chunking_config
parameter:
flatten_nested
bool
data type with the default value of True
.include_metadata
bool
data type with the default value of True
.chunk_by_object
bool
data type with the default value of True
.True
) or create a single document (False
).
exclude_fields
List[str]
data type with the default value of an empty list.include_fields
List[str]
data type with the default value of an empty list.metadata_fields
List[str]
data type with the default value of an empty list.extract_all_primitives
bool
data type with the default value of False
.nested_delimiter
str
data type with the default value of "."
.content_column
str
data type with the default value of "content"
.content_columns
parameter was not specified, the column storing content must be aliased AS content
as below.DELETE FROM
SyntaxCREATE INDEX ON KNOWLEDGE_BASE
Syntax