What are ChatGPT's ethical issues?
At WUR we are committed to integrity, sustainability and inclusion. In research and teaching, but also in business operations, including the use of software. We therefore think it is important that you are aware of some of the ethical issues surrounding ChatGPT.
Any bias present in the data ChatGPT was trained on (e.g. racism, sexism) is also present in its model, and thus mimicked in its output. This can reinforce stereotypes. Because human annotators flagged harmful and problematic output during training, a large part of this content is being filtered out. However, this does not always work, especially for more subtle and systematic biases.
The previous page makes it clear that ChatGPT might very well produce false information. This can exacerbate the spread of misinformation. Additionally, through its ability to produce persuasive content, publicity and advertisement are getting much faster and cheaper, while the information itself can be totally false.
Since English was overrepresented in the training data as compared to minority languages, this is reflected in the model as well. This leads to a further reduction of linguistic diversity and marginalisation of underrepresented languages. The same applies to other (cultural) minorities and cultural diversity.
ChatGPT relies on hidden human labour in two ways. The first is that its training data (e.g., social media posts and open-source code) were scraped from the web, which was often done without the posters’ consent or their acknowledgement. Secondly, the workers who do the annotating are often underpaid. OpenAI outsourced the work needed for creating ChatGPT’s ’harmful content filter’ to Kenya, where employees were paid less than 2 dollars per hour to do work so damaging that many described being ‘mentally scarred’ (Perrigo, 2023).
The fact that ChatGPT used training data without authors' consent or proper attribution (see 'Hidden human labour') might result in copyright infringement. At the moment, multiple lawsuits against ChatGPT's creator OpenAI and related GenAI companies are ongoing. One example is the lawsuit filed by authors Sarah Silverman, Christopher Golden, and Richard Kadrey, who claim that OpenAI used their copyrighted books as training data without their consent.
Models as large as ChatGPT use vast amounts of energy. It cost 1287 MWh, or 552.1 metric tonnes of CO2 equivalent emissions, to train GPT-3 (which is the model that the current free version of ChatGPT is based on) (Patterson, n.d.). This can be compared to about 600 London New York flights, and does not include the extra training steps and the model’s usage. Both the energy needed for training and for usage of GPT-4 are expected to be even higher.
Since models like ChatGPT need vast amounts of resources, this is an industry dominated by big tech companies.
Do you want to know more about how ChatGPT works and what the implications of this are? Then check out the source below. If you have more, feel free to email us at [email protected]!
“An Evening with ChatGPT” by the Computational Linguistics Group (GroNLP) of the University of Groningen https://www.youtube.com/watch?v=PgpmbXHMEsI
This page is based largely on lectures by dr. Joshua K. Schäuble and the University of Groningen Computational Linguistics Group (GroNLP).