Scikit-LLM: Power Up Your Text Analysis in Python Using LLM Models within scikit-learn Framework | by Esmaeil Alizadeh | Jun, 2023


One of the features of Scikit-LLM is the ability to perform zero-shot text classification. Scikit-LLM provides two classes for this purpose:

  • ZeroShotGPTClassifier: used for single label classification (e.g. sentiment analysis),
  • MultiLabelZeroShotGPTClassifier: used for a multi-label classification task.

Let’s do a sentiment analysis of a few movie reviews. For training purposes, we define the sentiment for each review (defined by a variable movie_review_labels). We train the model with these reviews and labels, so that we can predict new movie reviews using the trained model.

The sample dataset for the movie reviews is given below:

movie_reviews = [
"This movie was absolutely wonderful. The storyline was compelling and the characters were very realistic.",
"I really loved the film! The plot had a few unexpected twists which kept me engaged till the end.",
"The movie was alright. Not great, but not bad either. A decent one-time watch.",
"I didn't enjoy the film that much. The plot was quite predictable and the characters lacked depth.",
"This movie was not to my taste. It felt too slow and the storyline wasn't engaging enough.",
"The film was okay. It was neither impressive nor disappointing. It was just fine.",
"I was blown away by the movie! The cinematography was excellent and the performances were top-notch.",
"I didn't like the movie at all. The story was uninteresting and the acting was mediocre at best.",
"The movie was decent. It had its moments but was not consistently engaging."
]

movie_review_labels = [
"positive",
"positive",
"neutral",
"negative",
"negative",
"neutral",
"positive",
"negative",
"neutral"
]

new_movie_reviews = [
# A positive review
"The movie was fantastic! I was captivated by the storyline from beginning to end.",

# A negative review
"I found the film to be quite boring. The plot moved too slowly and the acting was subpar.",

# A neutral review
"The movie was okay. Not the best I've seen, but certainly not the worst."
]

Let’s train the model and then check what the model predicts for each new review.

from skllm import ZeroShotGPTClassifier

# Initialize the classifier with the OpenAI model
clf = ZeroShotGPTClassifier(openai_model="gpt-3.5-turbo")

# Train the model
clf.fit(X=movie_reviews, y=movie_review_labels)

# Use the trained classifier to predict the sentiment of the new reviews
predicted_movie_review_labels = clf.predict(X=new_movie_reviews)

for review, sentiment in zip(new_movie_reviews, predicted_movie_review_labels):
print(f"Review: {review}nPredicted Sentiment: {sentiment}nn")

Review: The movie was fantastic! I was captivated by the storyline from beginning to end.
Predicted Sentiment: positive

Review: I found the film to be quite boring. The plot moved too slowly and the acting was subpar.
Predicted Sentiment: negative

Review: The movie was okay. Not the best I've seen, but certainly not the worst.
Predicted Sentiment: neutral

As can be seen above, the model predicted the sentiment of each movie review correctly.

In the previous section, we had a single-label classifier ([“positive”, “negative”, “neutral”]). Here, we are going to use the MultiLabelZeroShotGPTClassifier estimator to assign multiple labels to a list of restaurant reviews.

restaurant_reviews = [
"The food was delicious and the service was excellent. A wonderful dining experience!",
"The restaurant was in a great location, but the food was just average.",
"The service was very slow and the food was cold when it arrived. Not a good experience.",
"The restaurant has a beautiful ambiance, and the food was superb.",
"The food was great, but I found it to be a bit overpriced.",
"The restaurant was conveniently located, but the service was poor.",
"The food was not as expected, but the restaurant ambiance was really nice.",
"Great food and quick service. The location was also very convenient.",
"The prices were a bit high, but the food quality and the service were excellent.",
"The restaurant offered a wide variety of dishes. The service was also very quick."
]

restaurant_review_labels = [
["Food", "Service"],
["Location", "Food"],
["Service", "Food"],
["Atmosphere", "Food"],
["Food", "Price"],
["Location", "Service"],
["Food", "Atmosphere"],
["Food", "Service", "Location"],
["Price", "Food", "Service"],
["Food Variety", "Service"]
]

new_restaurant_reviews = [
"The food was excellent and the restaurant was located in the heart of the city.",
"The service was slow and the food was not worth the price.",
"The restaurant had a wonderful ambiance, but the variety of dishes was limited."
]

Let’s train the model and then predict the labels for new reviews.

from skllm import MultiLabelZeroShotGPTClassifier

# Initialize the classifier with the OpenAI model
clf = MultiLabelZeroShotGPTClassifier(max_labels=3)

# Train the model
clf.fit(X=restaurant_reviews, y=restaurant_review_labels)

# Use the trained classifier to predict the labels of the new reviews
predicted_restaurant_review_labels = clf.predict(X=new_restaurant_reviews)

for review, labels in zip(new_restaurant_reviews, predicted_restaurant_review_labels):
print(f"Review: {review}nPredicted Labels: {labels}nn")

Review: The food was excellent and the restaurant was located in the heart of the city.
Predicted Labels: ['Food', 'Location']

Review: The service was slow and the food was not worth the price.
Predicted Labels: ['Service', 'Price']

Review: The restaurant had a wonderful ambiance, but the variety of dishes was limited.
Predicted Labels: ['Atmosphere', 'Food Variety']

The predicted labels for each review are spot-on.



Source link

Leave a Comment