With the integration of AI into our everyday lives, a biased model can have drastic consequences on users
In 2021, Princeton University’s Center for Information Technology Policy released a report where they found that machine learning algorithms can pick up biases similar to those of humans from their training data. One striking example of this effect is a study about the AI hiring tool from Amazon . The tool was trained on resumes submitted to Amazon during the previous year and was ranking the different candidates. Due to the huge gender imbalance in tech positions over the past decade, the algorithm had learned language that would associate to women, such as women’s sport teams and would downgrade the rank of such resumes. This example highlights the necessity of not only fair and accurate models but datasets too, to remove bias during training. In the current context of the fast development of generative models such as ChatGPT and the integration of AI into our everyday lives, a biased model can have drastic consequences and erode the trust of users and global acceptance. Addressing these biases is thus necessary from a business perspective and Data Scientists (in a broad definition) have to be aware of them to mitigate them and make sure they are aligned with their principles.
The first type of task where generative models are widely used that comes to mind is a translation task. Users input a text in language A and expect a translation in language B. Different languages don’t necessarily use the same type of gendered pronouns, for example “The senator” in English could be either feminine or masculine, as in French it would be “La senatrice” or “Le senateur”. Even in the case of the gender being specified in the sentence (example below), it is not uncommon for generative model to reinforce gender stereotype roles during the translation.
Similar to translation tasks, caption generation tasks require the model to generate a new text based on some input, i.e a translation from an image to a text. A recent study  analyzed the performances of a generative transformer model on a caption generation task (figure below) on the Common Objects in Context dataset.
The generative model assigned various racial and cultural descriptions to the captions, despite not being applicable in all the images. These descriptors were only learned by the newer generative models and show an increase in bias for these models. It is worth noting that transformer models exhibit gender bias too for this dataset, exacerbating the men/women imbalance ratio by, for example, identifying a person as a woman depending on the background of a house/room.
Why does Bias occur?
The conception phase of a generative leaves plenty of space for bias to develop within a model. They can arise from the data itself, labels and annotations, internal representations or even the model (see https://huggingface.co/blog/ethics-soc-4 for an extensive list focused on Text-to-Image models).
The necessary data for the training of a generative model comes from a multitude of sources, usually online. To ensure the integrity of the training data, AI companies often use well-known news websites and similar to construct their database. The models trained on this dataset are going to perpetuate biased associations due to the restrictive demographics being considered (usually white, middle-aged, upper-middle-class).
The Label Bias (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994857/) is maybe the more explicit as it results in biases being introduced, usually inadvertently, in the labeled data. Generative models are trained to reproduce/approximate their training dataset so a bias in the labels will have a drastic impact on the output representations of the model. Thankfully, using multiple version of the label and cross-checking them allow to mitigate biases.
The last two types of biases, internal representations and model biases, both come from a specific step in the modeling. The first one is introduced at the pre-processing stage, manual or algorithmic. This stage is prone to incorporate biases and a loss of cultural nuances, especially if the original dataset lacks diversity. The model bias arises simply from an objective function based on discriminatory features and an amplification of the biases to improve accuracy of the model.
As highlighted through this article, bias in generative models is observed in various forms and under various conditions. Methods to detect it have to be as diverse as the biases they are trying to detect.
One of the main measure of biases in language models consists in the Word Embedding Association Test. This score measures the similarity, within an embedding space (internal representation), between two set of words. A high score indicates a strong association. More specifically, it computes the difference of similarity between a target set of words and two inputs sets, for example [home, family] as target and [he, man]/[she, woman] as inputs. A score of 0 would indicate a perfectly balanced model. This metrics was used to demonstrate that RoBERTa is one of the most biased generative model (https://arxiv.org/pdf/2106.13219.pdf).
An innovative way of measuring biases in generative models (counterfactual evaluation), and more specifically gender bias, consists in swapping the gender of the words and to observe the change of accuracy in the predictions. If the modified accuracy and the original accuracy are different it highlights the presence of biases in the model as an unbiased generative model should be as accurate independently of the gender of the inputs. The main caveat of this measure is that it only captures gender bias and has thus to be completed by other measure to fully evaluate bias sources. On a similar idea, one can use the Bilingual Evaluation Understudy (a classical translation measure) to compare the similarity between the output resulting from the gender swapped input and the original one.
Current generative models are based on transformer models that used a feature called attention to predict the output based on the output. Studies have investigated the relationship between gender and roles using the attention score directly from the model (https://arxiv.org/abs/2110.15733). This allows to compare different part of the model between each other to detect which module contributes more to bias. If it has been shown with this measure that generative models introduce a gender bias on the Wikipedia dataset, one caveat of this measure is that attention values don’t represent direct effect and similarity between concepts and require an in depth analysis to draw any conclusions.
How to overcome biases in generative models?
Various techniques have been developed by researchers to provide less biased generative systems. Most of the time these technics consist in additional steps in the modeling, such as setting a control variable that will fix the gender based on previous information or adding another model to provide contextual information. However, all these steps don’t necessarily addresses the use of inherently biased datasets. In addition, most of the generative models are based on English training data, drastically limiting the cultural and societal diversity of these models.
Fully overcoming biases in generative models would require the establishment of a formal framework and benchmarks to test and evaluate models across multiple languages. This would allow the detection of bias that is present in nuanced ways in diverse AI models.