AI output depends entirely on its input,in the form of the prompt it is fed, the dataset used for training and the engineers who create and develop it. This can result in explicit and implicit bias, both uninentional and intentional.
To “train” the system, generative AI ingests enormous amounts of training data from across the internet. Using the internet as training data means generative AI can replicate the biases, stereotypes, and hate speech found on the web. In addition, as of January 2024, 52% of information available on the internet is in English, which means this bias is built into the system through training data. About 70% of people working in AI are male (World Economic Forum, 2023 Global Gender Gap Report) and the majority are white (Georgetown University, The US AI Workforce: Analyzing Current Supply and Growth, January 2024). As a result, there have been numerous cases of algorithmic bias, which is when algorithms make decisions that systemaically disadvantage certain groups, in generative AI systems.
While this does not mean that content generated by AI has no value, users should be aware of the possibility of bias influencing AI output.
There are ongoing privacy concerns and uncertainties about how AI systems harvest personal data from users. Some of this personal information, like phone numbers, is voluntarily given by the user. However, users may not realize that the system is also harvesting information like the user’s IP address and their activity while using the service. This is an important consideration when using AI in an educational context, as some students may not feel comfortable having their personal information tracked and saved.
Additionally, OpenAI may share aggregated personal information with third parties in order to analyze usage of ChatGPT. While this information is only shared in aggregate after being de-identified (i.e. stripped of data that could identify users), users should be aware that they no longer have control of their personal information after it is provided to a system like ChatGPT.
UT's license for Microsoft CoPilot addresses some privacy concerns. User data is neither stored nor used to train the model.
AI is typically associated with virtuality and the cloud, yet these systems rely on vast physical infrastructures that span the globe and require tremendous amounts of natural resources, including energy, water, and rare earth minerals. A 2019 study found that training large language models "can emit more than 626,000 pounds of carbon dioxide equivalent—nearly five times the lifetime emissions of the average American car (and that includes manufacture of the car itself)" (MIT Technology Review).
AI still needs human intervention to function properly, but this necessary labor is often hidden. For example, ChatGPT uses prompts entered by users to train its models. Since these prompts are also used to train its subscription model, many consider this unpaid labor.
Taylor & Francis recently signed a $10 million deal to provide Microsoft with access to data from approximately 3,000 scholarly journals. Authors in those journals were not consulted or compensated for the use of their articles. Some argue that using scholarly research to train generative AI will result in better AI tools, but authors have expressed concern about how their information will be used, including whether the use by AI tools will negatively impact their citation numbers
In a more extreme case, investigative journalists discovered that OpenAI paid workers in Kenya, Uganda and India only $1-$2 per hour to review data for disturbing, graphic and violent images. In improving their product, the company exposed their underpaid workers to psychologically scarring content. One worker referred to the work as “torture”.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 Generic License.