AI Training Outsourced to AI and Not Humans | by Mastafa Foufa | Jul, 2023

Photo by davide ragusa on Unsplash

The Risk of Introducing Further Errors into Models

“We reran an abstract summarization task from the literature on Amazon Mechanical Turk and, through a combination of keystroke detection and synthetic text classification, estimate that 33–46% of crowd workers used LLMs when completing the task. Although generalization to other, less LLM friendly tasks is unclear, our results call for platforms, researchers, and crowd workers to find new ways to ensure that human data remain human, perhaps using the methodology proposed here as a stepping stone.” From Veselovsky, V., Ribeiro, M.H. and West, R.

Recently, a study by the Swiss Federal Institute of Technology (EPFL) found that between 33% and 46% of gig workers paid to train AI models may be outsourcing their work to AI.

The MIT Technology Review discusses this research paper and explains how people who are paid to train AI are indeed outsourcing their work to AI. It explains that AI can now be used to create data sets and labels, tasks that are traditionally done by humans. It also discusses the implications of this trend, such as the potential for AI to learn from other AI which integrates further bias.

Resource: From Veselovsky, V., Ribeiro, M.H. and West, R. A model to discriminate mTurks responses generated manually by a human and responses generated by an AI. The authors use a classifier (real vs AI-generated) on real MTurk responses (where workers may or may not have relied on LLMs), estimating the prevalence of LLM usage.

How do we train AI systems?

AI systems can be seen as Machine learning models. In a supervised setting, such systems need gold standard labels to build qualitative training data. This can be done internally, especially in big tech companies like Microsoft or Google. Then, for complex tasks involving large datasets, data labeling can also be outsourced to vendors who are typically expected to be subject-matter experts.

However, they can also be online gig workers, with no particular expertise in the subject in question. Indeed, you can find gig workers on platforms like Mechanical Turk to complete tasks that are typically hard to automate.

Amazon Mechanical Turk (MTurk) is a crowdsourcing marketplace that makes it easier for individuals and businesses to outsource their processes and jobs to a distributed workforce who can perform…

