- NLP
- Extract JSON from Text Data
NLP
Extract JSON from Text Data
In this example, we use the OpenAI model to extract data in a predefined JSON format from the input text data.
Default Model
When you create an OpenAI model in MindsDB, it uses the gpt-3.5-turbo
model by default. But you can use the gpt-4
model as well by passing it to the model-name
parameter.
Example in SQL
Let’s create an OpenAI model.
CREATE MODEL mindsdb.nlp_model
PREDICT json
USING
engine = 'openai',
json_struct = {
'rental_price': 'rental price',
'location': 'location',
'nob': 'number of bathrooms'
},
input_text = 'sentence';
We pass three parameters in the USING
clause:
- The
engine
parameter ensures we use the OpenAI engine. - The
json_struct
parameter stores a predefined JSON structure used for the output. - The
input_text
parameter contains the name of the column that stores input text.
Now we can query the model, passing the input text stored in the sentence
column.
SELECT json
FROM mindsdb.nlp_model
WHERE sentence = 'Amazing 3 bedroom apartment located at the heart of Manhattan, has one full bathrooms and one toilet room for just 3000 a month.';
On execution, we get:
+----------------------------------------------------------+
| json |
+----------------------------------------------------------+
| {"location":"Manhattan","nob":"1","rental_price":"3000"} |
+----------------------------------------------------------+
Example in MQL
Please check out our docs on how to connect Mongo Compass and Mongo Shell to MindsDB.
To create this model in MQL, run the below command from Mongo Compass or Mongo Shell:
db.models.insertOne({
name: 'nlp_model',
predict: 'json',
training_options: {
engine: 'openai',
input_text: 'sentence',
json_struct: {
'rental_price': 'rental price',
'location': 'location',
'nob': 'number of bathrooms'
}
}
})
We pass the same three parameters here.
- The
engine
parameter ensures we use the OpenAI engine. - The
json_struct
parameter stores a predefined JSON structure used for the output. - The
input_text
parameter contains the name of the column that stores input text.
Now we can query the model, passing the input text stored in the sentence
column.
db.nlp_model.find({
'sentence': 'Amazing 3 bedroom apartment located at the heart of Manhattan, has one full bathrooms and one toilet room for just 3000 a month.'
})
On execution, we get:
{
json: {
rental_price: '3000',
location: 'Manhattan',
nob: '1'
},
sentence: 'Amazing 3 bedroom apartment located at the heart of Manhattan, has one full bathrooms and one toilet room for just 3000 a month.'
}