Text Classification with Transformer Encoders | by Ruben Winastwan | Aug, 2023


Step-by-step explanation of utilizing Transformer encoders to classify text

Photo by Mel Poole on Unsplash

Transformer is, without a doubt, one of the most important breakthroughs in the field of deep learning. The encoder-decoder architecture of this model has proven to be powerful in cross-domain applications.

Initially, Transformer was used solely for language modeling tasks, such as machine translation, text generation, text classification, question-answering, etc. However, recently, Transformer has also been used for computer vision tasks, such as image classification, object detection, and semantic segmentation.

Given its popularity and the existence of numerous Transformer-based sophisticated models such as BERT, Vision-Transformer, Swin-Transformer, and the GPT family, it is crucial for us to understand the inner workings of the Transformer architecture.

In this article, we will dissect only the encoder part of Transformer, which can be used mainly for classification purposes. Specifically, we will use the Transformer encoders to classify texts. Without further ado, let’s first take a look at the dataset that we’re going to use in this article.

The dataset that we’re going to use is the email dataset. You can download this dataset on Kaggle via this link. This dataset is licensed under CC0: Public Domain, which means that you can use and distribute this dataset freely.

import math
import torch
import torch.nn as nn
import torchtext
import pandas as pd
from sklearn.model_selection import train_test_split
from torch.utils.data import DataLoader
from tqdm import tqdm
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

df = pd.read_csv('spam_ham.csv')
df_train, df_test = train_test_split(df, test_size=0.2, random_state=42)
print(df_train.head())

# Output
'''
Category Message
1978 spam Reply to win £100 weekly! Where will the 2006 ...
3989 ham Hello. Sort of out in town already. That . So ...
3935 ham How come guoyang go n tell her? Then u told her?
4078…



Source link

Leave a Comment