Text Summarization with MindsDB and OpenAI using MQL
Introduction
In this blog post, we present how to create OpenAI models within MindsDB. In this example, we ask a model to provide a summary of a text. The input data is taken from our sample MongoDB database.
Prerequisites
To follow along, install MindsDB locally via Docker or Docker Desktop.
How to Connect MindsDB to a Database
We use a collection from our MongoDB public demo database, so let’s start by connecting MindsDB to it.
You can use Mongo Compass or Mongo Shell to connect our sample database like this:
test> use mindsdb
mindsdb> db.databases.insertOne({
'name': 'mongo_demo_db',
'engine': 'mongodb',
'connection_args': {
"host": "mongodb+srv://user:MindsDBUser123!@demo-data-mdb.trzfwvb.mongodb.net/",
"database": "public"
}
})
Tutorial
In this tutorial, we create a predictive model to summarize an article.
Now that we’ve connected our database to MindsDB, let’s query the data to be used in the example:
mindsdb> use mongo_demo_db
mongo_demo_db> db.articles.find({}).limit(3)
Here is the output:
{
_id: '63d01398bbca62e9c7774ab8',
article: "Video footage has emerged of a law enforcement officer…",
highlights: 'The 53-second video features…"
}
{
_id: '63d01398bbca62e9c7774ab9',
article: "A new restaurant is offering a five-course…",
highlights: "The Curious Canine Kitchen is…"
}
{
_id: '63d01398bbca62e9c7774aba',
article: 'Mother-of-two Anna Tilley survived after spending four days…',
highlights: 'Experts have warned hospitals not using standard treatment…'
}
Let’s create a model collection to summarize all articles from the input dataset:
Note that you need to create an OpenAI engine first before deploying the OpenAI model within MindsDB.
Here is how to create this engine:
mongo_demo_db> use mindsdb
mindsdb> db.ml_engines.insertOne(
{
"name": "openai_engine",
"handler": "openai",
"params": {
"openai_api_key": "your-openai-api-key"
}
})
mongo_demo_db> use mindsdb
mindsdb> db.models.insertOne({
name: 'text_summarization',
predict: 'highlights',
training_options: {
engine: 'openai_engine',
prompt_template: 'provide an informative summary of the text text:{{article}} using full sentences'
}
})
In practice, the insertOne
method triggers MindsDB to generate an AI collection called text_summarization
that uses the OpenAI integration to predict a field named highlights
. The model is created inside the default mindsdb
project. In MindsDB, projects are a natural way to keep artifacts, such as models or views, separate according to what predictive task they solve. You can learn more about MindsDB projects here.
The training_options
key specifies the parameters that this handler requires.
- The
engine
parameter defines that we use theopenai
engine. - The
prompt_template
parameter conveys the structure of a message that is to be completed with additional text generated by the model.
Follow this instruction to set up the OpenAI integration in MindsDB.
Once the insertOne
method has started execution, we can check the status of the creation process with the following query:
mindsdb> db.models.find({
'name': 'text_summarization'
})
It may take a while to register as complete depending on the internet connection. Once the creation is complete, the behavior is the same as with any other AI collection – you can query it either by specifying synthetic data in the actual query:
mindsdb> db.text_summarization.find({
article: "Apple's Watch hits stores this Friday when customers and employees alike will be able to pre-order the timepiece. And boss Tim Cook is rewarding his staff by offering them a 50 per cent discount on the device."
})
Here is the output data:
{
highlights: "Apple's Watch hits stores this Friday, and employees will be able to pre-order the",
article: "Apple's Watch hits stores this Friday when customers and employees alike will be able to pre-order the timepiece. And boss Tim Cook is rewarding his staff by offering them a 50 per cent discount on the device."
}
Or by joining with a collection for batch predictions:
mindsdb> db.text_summarization.find(
{
'collection': 'mongo_demo_db.articles'
},
{
'text_summarization.highlights': 'highlights',
'articles.article': 'article'
}
).limit(3)
Here is the output data:
{
highlights: 'A video has emerged of a law enforcement officer grabbing a cell phone from a woman who was',
article: "Video footage has emerged of a law enforcement officer..."
}
{
highlights: 'A new restaurant in London is offering a five-course drink-paired menu for dogs',
article: "A new restaurant is offering a five-course..."
}
{
highlights: "Sepsis is a potentially life-threatening condition that occurs when the body's response to an",
article: 'Mother-of-two Anna Tilley survived after spending four days...'
}
The articles
collection is used to make batch predictions. Upon joining the text_summarization
model with the articles
collection, the model uses all values from the article
field.
Was this page helpful?