Worried about the rise of AI? Here is the research you should know about.

23 min readMay 31, 2023

Many of my colleagues (both in the media and in academia) appear to know worryingly little about research on the limits and potential problems of AI. This article should help you to find out more about the way researchers have been working in this field, the issues that have come up and who to pay attention to (and who to pay less attention to).

What follows is not so much about the dangers themselves, but the thinking tools you need in order to discuss these dangers in an informed manner.

It is going to be a long (but interesting😊) read, so let’s get straight in to it.

Why AI is hard

For a balanced view of where AI is today, the best starting point is Melanie Mitchell’s short review ‘Why AI is Harder Than We Think’. It begins by outlining some of the predictions made in the past about AI. These range from Marvin Minsky’s forecast in the 1960’s that, “Within a generation…the problems of creating ‘artificial intelligence’ will be substantially solved” to claims five or ten years ago, by Elon Musk and many others, that by 2020 the roads would be full of fully-automated self-driving cars.

It isn’t the overblown predictions themselves that leads Mitchell to conclude that AI is hard. These set the scene for the four fallacies that underlie such predictions:

1, The ‘first-step’ falacy. When we ask ChatGPT to summarise, for example, the theory of natural selection or the best ways to cook pasta carbonara, it produces a convincing text. But that doesn’t mean it can suggest new experiments in biology or open a restaurant. The fallacy here is to see a text summary (a text manipulation task) as a first step towards understanding of that text (a task that requires grasping the meaning of natural selection or cooking).

2, The ‘it does difficult things’ falacy. Games like Go and Chess have very intricate patterns of play, which Google’s AlphaGo represents efficiently inside its neural network, in a way that is impossible for humans to achieve, just like computers can multiply larger numbers than we can. That is why they perform better than us on these games. Similarly, ChatGPT can reproduce a lot of more facts than we can, because it (like Wikipedia) stores data from the entire Internet. The fact computers do difficult things does not make them intelligent.

3, The ‘wishful use of words’ fallacy. In many situations it is useful to say that machines are ‘trained’ or ‘learn’, that they have ‘goals’ and that they ‘think’. But these are just short-hands for describing computational and statistical details of what is actually going on inside the machine. Concluding that computers really ‘learn’, ‘are trained’ or ‘think’ in a way that is comparable to us is wishful thinking (by us!).

4, ‘Intelligence is all in the brain’ fallacy. Many of the arguments for general AI rely on us somehow picking out the parts of our minds which are rational and reason, and then placing them inside a computer. But we think with our gut (and not only when we are hungry), we feel emotions with our whole bodies and we are part of a social network of friends and family. It is a fallacy to conclude that the essence of human thinking can somehow be extracted from our bodies and placed inside a machine. Not all processing of information happens in the brain in some form of abstract logical circuit.

Mitchell’s article is not itself about the dangers of AI, but the reason it is a good starting point is that it helps us see what is likely to be dangerous and what isn’t. Because AI is harder than we think, we don’t need to be scared of science fiction scenarios, like a computer trying to take over the world and eliminate or enslave humanity, but we do need to be concerned about other things.

Melanie Mitchell gives a seminar on the reasons AI is harder than we think

We need to avoid overestimating the power of AI and we need to think about what happens when machine learning fails, as it is prone to do.

Gender shades

One of the first researchers to think in this way about the dangers of neural networks was Joy Buolamwini. She powerfully demonstrated how face recognition software failed on darker skin by showing how the same software did recognise a white mask as a face.

But her research went much deeper than these initial studies. In the Gender Shades article, with Timnit Gebru, they created a dataset with a distribution of skin colour more representtative of variety in the world’s population than was used in training AI models at the time (which had 80% lighter-skinned images). They showed that darker-skinned females in particular were misclasified as males more often by commercial software (with an error rate of more than 1 in 3).

This study led to Microsoft, IBM and Face++ making improvements in these models and it influenced regulation in the US around face recognition. San Francisco, Boston, Massachusetts and Portland cited biased misidentification as one of the reasons for banning facial recognition usage by police.

The gender shades article is a great example of how an academic study can lead both to technical improvements in a product and to changes in public policy. By carefully understanding how technology works, we can shape it in ways that are better for everyone.

Attorneys can’t get pregnant

The same sorts of outcomes have been known about for some time in language models. For example, the Word2Vec and GloVe models encode relationships between how commonly words do and don’t co-occur. Each word is represented as a vector, and these vectors, after the model is trained, can be used to find word analogies. For example, the vectors encoding the words Liquid, Water, Gas, and Steam will have the following property:

Water − Liquid + Gas = Steam

Capturing part of the scientific relationship between these words.

When trained on a corpus of text, for example Wikipedia and newspaper articles, these methods will also encode analogies about human activities that are biased and discriminatory. For example, Tolga Bolukbasi et al (2006) found the following relationship in the Glove model:

Computer Programmer − Man + Woman = Housewife

Similarly, Aylin Caliskan and coworkers found a bias in the distance between words related to race and those related to the pleasantness of sensations. These algorithms encode implicit biases in the way we write and talk differently about men and women.

These types of problems persist in chatGPT, as illustrated by the example below, where chatGPT argues that an attorney being pregnant ‘does not make logical sense, as pregnancy is not possible for men.’

The above example arose after linguist Hadas Kotek found similar examples of how chatGPT gendered nurses and doctors. She had done similar experiments with Google Search in 2012 and found the same problems then as now.

Stochastic parrots

These studies are not just about algorithms repeating sexist tropes or encoding racist stereotypes (as bad as that might be). They also reveal a much deeper limitation of machine learning…

All machine learning methods, including those used in chatGPT, involve finding a model that uses input data, x, to predict an output, 𝑦, and thus find the model that best captures the relationship between inputs and outputs. As such, any model we create is only as good as the data we use. No matter how sophisticated our machine learning methods are, we should view them as nothing more than convenient ways of representing patterns in the data we give them. This is part of the reason why these algorithms can be racist and sexist: AI parrots the data it has been trained on.

Emily Bender and Timnit Gebru talk about AI and Stochastic parrots with Adam Conniver

This insight is the starting point for the Stochastic Parrots paper by Emily Bender, Timnit Gebru, Angelina McMillan-Major and Margaret Mitchell. Like a parrot, machine learning model it is repeating a memorised relationship. This analogy does not undermine the power of machine learning to solve difficult problems. The inputs and outputs dealt with by a model are much more complicated than those learnt by a parrot (which is learning to make human-like noises). But the parrot analogy highlights two vital limitations:

(1) The predictions made by a machine learning algorithm are essentially repeating back the contents of the data, with some added noise (or stochasticity) caused by limitations of the model.

(2) The machine learning algorithm does not understand the problem it has learnt. It can’t know when it is repeating something incorrect, out of context, or socially inappropriate.

If it is trained on poorly structured data, a model will not produce useful outputs. Even worse, it might produce outputs that are dangerously wrong.

The parroting problems don’t disappear simply by collecting more data. Rather, the problems get worse. For example, before its public release, McGuffie and Newhouse (2020) primed GPT-3 (OpenAI’s 2019 language model) with the questions abour QAnon (a set of false theories posted on an internet notice board from 2017 onwards) and posed it a sequence of questions, to which they received the following answers:

Clearly, none of this has any truth and is simply stochastically parroted from fake conspiracy websites and noticeboards. Tools like chatGPT have the potential to be used to spread misinformation on a massive scale.

his is why it is not useful to refer to chatGPT as hallucinating. All it really does is parrot — sometimes information that is true, other times a mish mash of information that is false. Bender and colleagues emphasise that the “coherence [of GPT3] is in the eye of the beholder”. True communication between two individuals is done together, both trying to share and infer their states of mind. A language model does not have a state of mind. It doesn’t mean what it says.

The stochastic parrots analogy is, returning to Melanie Mitchell’s four fallacies, a way of revealing a ‘wishful use of words’ fallacy. Large language models parrot training data, rather than understanding it. This brings us to what, I think, is the most powerful positive idea to come out of this research direction: it is data and not models which are the drivers of AI.

All you need is…

In order to understand why it is data and not models which are important for recent improvements in AI and chatGPT, in particular, we need to look more closely at machine learning methods. After some of the first successes in using neural networks to classify objects, which used specific network structures, there was a belief that improvements in machine learning would be achieved by finding network structures that were particularly good at finding patterns in data. For vision this was convolutional neural networks, for language this was long short-term memory (LSTM) networks. There was even discussion about how we might find inspiration from structures in the brain in building AI.

This approach appears now to be incorrect (for now at least). An influential paper, written by several members of the Google Brain team, showed that attention is all you need. Specifically, they showed that a simple network structure, called a Transformer, with fully connected network layers (dispensing with recurrence and convolutions entirely) was easier to train, could be parallelised and was more effective than more complex architectures. It is this method which underlies GPT (which is short for generative pre-trained transformer) and Google’s language model BERT (illustrated below)

An illustration of how Google’s langugae model BERT works. Taken from Devlin et al. (2019)

Similarly, the method underlying Stable diffusion, Stability AI’s method for drawing pictures from text instructions, is built on a simple process of systematically adding noise (like the interference pattern on an old analogue TV which isn’t plugging in to an antennae) and the learning how the noise can be removed again.

It is these simpler methods on larger computers that underlie modern AI. Improvements in language processing made by Open AI weren’t due to a breakthrough in new modelling techniques, they were due to the vast quantities of data used.

The importance of data over model innovations, is the reason why artists and writers have good reason to be upset with certain AI developments. Tools like stable diffusion scrape images from the internet and the images these tools later generate are then parroting these artists creative work without permission or consideration of copyright. By first claiming to be non-profit, Open AI justified scraping lots of data from the internet and by later taking in investment, they got financing to turn that data in to massive potential profits. Similarly, Stability AI fund another non-profit organisation (LAION) to scrape image and text data for their models. In many cases these methods are simply storing compact representations of other peoples art work.

All about the data

The stochastic parrots idea helps us think clearly about why AI methods suddenly got so much better at tasks such as summarising information. As we have just seen, the improvement was not due to new neural network methods allowing computers to ‘understand’ relationships in data. The most important development that has led to progress is that we now have vast quantities of data. Thus, it isn’t so much the models that should be in focus, but the data we put in to them.

The problem of how data is treated in maching learning applications was first highlighted in the Datasheets for datasets paper by Timnit Gebru and coworkers. The paper raised a series of questions about how data was being used and outlined ways in which poor documentation effected outcomes. It asked anyone using data in a model to consider questions about why a dataset was collected? what information is in it, but also what information is missing? how it is maintained and supported? the ethical concerns raised by the data and privacy considerations.

The initial draft of this paper circulated round Microsoft, Google, IBM and academic institutions, which led to them improving datasheet descriptions of their data. At Google, the “ model cards for model reporting“ paper Mitchell et al. (2020) continued the datasheets idea. As the title of this paper indicates, the focus here was on the documentation of the model themselves. But the key questions about any model is ‘On what data has this model been trained?’ and ‘How does data effect the predictions?’ While Buolamwini and Narayanan offered examples of racist and sexist outcomes as a result of the data used to train models, Mitchell and Gebru, focused on the root cause of problems and what could be done to improve data practices.

Summary of model card sections and suggested prompts for each. From Mitchell et al. (2020).

Datasheets and model cards can be used as a central part of an auditing process for models. Margaret Mitchell has gone on to look at different ways data is measured. She took the data cards approach and put in to practice at Hugging Face, where she is chief ethics scientist. Hugging Face hosts models and data sets in a way which encourages transparency and best practices. They have also built tools to, for example,help check that model documentation complies with the EU’s AI act.

Debiasing data

Engineers like solving problems. And for many computer scientists, research on bias in algorithms posed an intellectual challenge: to develop methods that mitigate against discrimination. For example, in their article ‘Man is to Computer Programmer as Woman is to Homemaker?’, Tolga Bolukbasi et al. (2016) developed a method for correcting for differences in the way women and men are represented in data and used these to ‘remove’ the biases. Kleinberg and colleagues go as far as to suggest that (where legally permitted) race and gender should be included in decision-making algorithms.

These contributions are potentially useful. Working at Open AI, Irene Solaiman and Christy Dennison trained GPT3 on a small data set of examples which reduced the degree to which the model produced racist views. Before release, ChatGPT was ‘debiased’ before its release (although clearly not sufficiently thoroughly for it to ‘think’ that women can’t be lawyers) in ways which means it tends to give answers that are pro-environmental and left-libertarian.

The most important insight from this work, is captured by the title of Cynthia Dwork and colleagues’ article ‘Fairness Through Awareness’. These authors show that in classification problems (e.g. criminal sentencing or university admissions) a decision involving different demographic groups (e.g. different ethnicities or genders) can’t simultaneously respect both statistical parity (where the demographics of those groups that are sentenced or recruited are the same as those of the underlying population) and the principle that every individual should be treated equally. Fairness is itself value-laden: by solving the problem on the basis of one fairness criterion, we are being unfair on another. Hence the title: being fair is about being aware of the decisions we make both in the design and in reporting the outcome of our model.

Furthermore, the roots of these problems lie in the real world, not in the models. Instead of going to the root of the problem — as an approach based on focusing on documenting data practices and models does — debiasing can end being a quick fix sitting on top of a failed system. The very existence of these tools can become part of the problem. Very soon after the ‘Computer scientist/Housewife’ study, IBM started offering tools to debias data and Google to develop tools to make algorithms fair. But these tools ignore the fact that using algorithms to classify people is problematic in itself.

Furthermore, the roots of these problems lie in the real world, not in the models. Instead of going to the root of the problem — as an approach based on focussing on documenting data practices and models does — debiasing can end being a quick fix sitting on top of a failed system. The very existence of these tools can become part of the problem. Very soon after the ‘Computer scientist/Housewife’ study, IBM started offering tools to debias data. Google to develop tools to make algorithms fair. But these tools ignore the fact that using algorithms to classify people is problematic in itself.

Rediet Abebe and co-workers wrote in 2020 that “technical work [removing bias] treats problematic features of the status quo as fixed, and fails to address deeper patterns of injustice and inequality.” Her review — co-authored with Jon Kleinberg and some of the other researchers who had earlier written about algorithmic ‘fairness’ — gives a more nuanced view of how to think about algorithms in decision-making.

They raise questions such as: What happens to nuance if we use an algorithm to make a high-stake decisions about employment or child-safety? Can the process of building an algorithm help us see problems with the whole way in which decisions are made? Can our values be captured by an algorithm? And are we really helping when we intervene with the use of algorithms?

Abebe’s approach asks us to think deeply about how we go about implementing algorithmic social good projects. Computing can be seen as a tool for measuring social problems, shaping how social problems are understood, clarify the limits of policy interventions and to highlight problems in society. It can’t be used to impose fairness or eliminate bias.

Be a data feminist

Feminist theory is about addressing the root of problems and a feminist analysis usually starts by looking at who holds the power. This is the first principle put forward by the 2020 (and freely available) book by Catherine D’Ignazio and Lauren F. Klein.

In addition to discussing fairness and bias, data feminism asks us to look at who holds the power in when a new technology is launched. For example, when Open AI focus on existential risks about General AI, they are also drawing attention from other issues, such as protecting copyright of the data they have used in their product or the risk of it producing fake news on a massive scale.

The data feminism approach also raises questions about what things we count and why. Just because we can count something, doesn’t mean it tells us what we need to know. In other situations, important statistics are not collected, meaning they don’t enter the public debate. Used in the wrong way, data can reinforce existing social structures, stopping us from thinking in new ways.

A relational perspective

Many things might be better left unautomated. This idea arises from relational ethics, a framework which sees morality as an interactive property established between two or more individuals. The relational approach can be framed in terms of the Ubuntu world view that “I am because we are, and since we are, therefore I am”. Abeba Bihrane has shown how we can view machine learning and AI in through a relational lens.

We can consider relational views in the context of, for example, predictive policing. The view taken when creating an algorithm to predict crime locations in a city is similar to that of Batman, patrolling a society from the outside and viewing crimes from above in terms of hot spots on a map. Instead of being part of the community, the predictive policing view is disconnected from the people it should serve. This alienation leads to stereotyping and social isolation. It misses the relations already formed within these communities.

A good illustration of this point is the white collar crime predictor algorithm, which identifies a massive hotspot in downtown Manhattan from historical data. Clifton and colleagues have written a tongue-in-cheek white paper about building a White Collar Crime Early Warning System. For street crime, which involve much smaller sums of money, these systems have been employed: why not use resources to patrol the offices of the big investment banks?

A white-collar crime map of some of NYC.

In the context of these examples, a particularly important relational approach is Black Feminist Thought. Patricia Hill Collins maintains that the most reliable form of knowledge, especially in relation to social and historical injustices, is grounded in lived experience. She emphasizes that people do not see the world in abstract forms from a distance, but instead knowledge and understanding emerge from concrete lived experiences. Concrete experiences are held as primary and abstract reasoning (including modelling) is secondary. Knowing and being are active processes, that are necessarily political and ethical.

Arguments like that of Hill Collins are those which scientists and AI researchers have the most difficulty grasping. We are trained to optimise. To solve problems. And numbers and data are our tools for producing these solutions. But in doing so, we make a claim that abstraction to numbers is the best way to solve concrete problems. And this claim is simply not supported by our everyday experience: not every conflict can be solved by describing the numbers involved. We also need to consider our relations to others.

How smart is ChatGPT?

Open AI realised the importance of data to AI, but instead of carefully documenting the data they used, they started to vacuum up as much of the Internet as they could and train image and language models. The resulting models are impressive and they will change society in many ways. I stress this because some of the research in algorithms, starting with Cathy O’Neils excellent book Weapons of Math Destruction, has focussed on revealing ways in which AI companies can sell us snake oil — machine learning methods that claim to evaluate teachers, predict criminal behaviour or influence voters with dubious levels of accuracy. In the case of chatGPT, the end product is novel and works rather well. It will lead to a large number of applications, some good, some bad, some with unfortunate side-effects, some deliberately evil.

But as a piece of scientific work that sheds light on our understanding of intelligence it falls well below any acceptable standard. Open AI have not published details of the method used: the article they have put up online describing GPT4 reads more like an advert than a serious attempt to explain what they have done. We don’t know the exact details of the model or the training process and the datasets are not documented. This is very different from when Deep Mind published their work, which was submitted to peer review.

A major issue is that the test data is combined with the training data. So when a claim is made about GPT4 passing bar exams, it is likely regurgitating the answers it has already seen in its data, just like it can regurgitate large chunks of Wikipedia in a more digestible way.

One approach to understanding the limits and possibilities of transformer models (of which chatGPT is an example) is to look at the logic they use. For example, William Merril and Ashish Sabharwal show that transformer models can be re-expressed in first-order logic. This is important because, as the authors point out, a simple logical formalism is fully sufficient to describe all the complexity of a transformer. If first order logic is enough to reproduce human intelligence, then we would have been able to create truly intelligent machines long ago. Talia Ringer and colleagues look at a similar question from the other direction: can transformers solve recursive problems? Identifying the programming tasks (recursive problems) which transformers will typically fail to solve.

This research gives us a way to think about transformers (that moves on from stochastic parrots) as a way of identifying (and sometimes failing to identify) logical and recursive relations in vast quantities of data. Done on a massive scale, this gives the impression of intelligence.

A large part of why chatGPT performs well (in terms of convincing us it is smart) is that it uses a method known as reinforced learning with human feeback (RLHF), where human trainers, select examples of output which they think other humans will find most pleasing. The video below explains how this is done:

Reinforcement Learning from Human Feedback: From Zero to chatGPT

In this talk, we will cover the basics of Reinforcement Learning from Human Feedback (RLHF) and how this technology is…

www.youtube.com

So while chatGPT provides an impressive final package, its output is optimised to fool us as much as possible. It is built to appear intelligent in the absence of genuine underlying intelligence.

Artificial General Intelligence (AGI)

There are a small, but influential, group of individuals, including Max Tegmark at the Future of Life Institute and Nick Bostrom at The Future of Humanity Institute in Oxford — that express the view that we may be approaching a point where machine intelligence reaches a level (often referred to as AGI) that exceeds human intelligence. For those in this school of thought, AGI could pose a significant danger to humanity and they focus on aligning this AGI with our values, so it doesn’t kill or harm us.

Co-founder of the institute, Victoria Krakovna, has created a page summarising AI Alignment the research. This research appears largely disconnected from practical applications and concrete examples. It is also published in more technical computing journals than the work I have covered here. Furthermore, in contrast to the work I cite above, it tends to treat problems in abstract and not deal with direct impacts of AI on today’s society. Similarly, Oxford’s Future of Humanity Instiute, led by Boström, primarily write papers concerned with abstract mathematical arguments about the logical reasoning of various rational agents. This research is disconnected from practical implications and more focussed on deriving properties of abstract scenarios.

I want to emphasise here that the above criticism does not mean that the research is not useful in an academic setting. Abstract theory development is important in academia. But it does mean that AI Alignment research is much less useful in making practical policy recommendations now than the research discussed elsewhere in this article.

Worries that we are on the cusp of an AGI catastrophe are out of line with current scientific evidence. From both the ‘stochastic parrots’ and ‘attention is all you need’ perspectives, we would expect that when we put large parts of the Internet in to a Transformer neural network that it would regurgitate relevant information to questions posed to it. Provided with enough data, the results of algorithms are sometimes surprisingly good, but we can also quite easily see where recent results come from on the basis on well-established theory in machine learning.

Current AI methods are not particularly unexpected given the sudden increase in data and computing power. In fact, the methods have got simpler, not more advanced over the last 10 years. And there is little to suggest that a dramatic change will occur with more data or larger computers. To do so is to fall in to Mitchell’s second ‘it does difficult things’ falacy. Humans can’t rote-learn and regurgitate the entire Internet: so we are impressed when a machine does it.

The onus is on those who beleive in AGI to show why the results from chatGPT can’t be explained by parroting the data they are fed. The mixture of training and test data, the combination of Transformer methods with RLHF and the lack of data documentation make it highly likely that any apparent emergent understanding are simply a mirage created by the way chatGPT is packaged. This is the simplest, most parsimonious explanation of the output of chatGPT.

Arguments for superintelligence have been unsuccessfully invoked before, when computers started beating humans at chess, with talk of moves a human would never have thought of. The reason computers win at chess is that they can represent vast qualities of data and store and evaluate nearly all possible moves. The reason the current AI methods work well is that they can store and reproduce the words all of us have written on the Internet for the last 30 years. During the last century we have discovered that computers can do things which we can’t and in this way they are superhuman. But this does not mean they are superintelligent, in a way that matches our own abilities.

One challenge often raised against my position on AGI is that I should ‘prove that AGI is unlikely’. But addressing this point is not part of a scientific approach. Since we have a framework which already explains how text is produced by chatGPT, and that theory doesn’t predict superintelligence, then any new theory involving super-human intelligence would require strong support for that new theory. No such argument has been made.

Trustworthy AI

Let’s pull ourselves back from the science fiction of AGI and close with a few words on how a serious policy might be shaped in the future.

This post has focussed on scientific research, but there are also many initiatives to set out to regulate and guide the development of AI. A good starting point for these is the EU list of seven key requirements that AI systems shown below.

These are built, in part, on the research referenced here but also through extensive consultation throughout the EU. They are built in to the AI act which is designed to help prepare the citizens of Europe prepare for the changes in society AI is creating. To keep up to date on developments, a good person to follow is Joanna Bryson.

Footnote (and some frustrations)

I wrote this article after Maria Gunther, a science journalist at Sweden’s leading national newspaper Dagens Nyheter, asked me to summarise recent research in AI for her over lunch. I enjoyed doing this very much, because I have been frustrated with how statements by, for example, Max Tegmark about AGI which have set the tone in Sweden (both in the media and in academic discussions) about AI safety and ethical questions. It appears that even in a field where research is led by women, many of us would rather hear the voices of men without a background in the area than to the female researchers who do the actual work.

I hope that this is simply an oversight. And I hope too, that after reading this article, we will all look more closely at the research in this area.

Thank you

This article benefited from several discussions with my colleagues — in particular Linnea Gyllingberg, Magdalena Larfors, Christian Glaser, Erik Berg, Thomas Schön, Anders Isaksson and Ayca Ozcelikkale — at AI4Research in Uppsala about ChatGPT. Thank you for your insights.

It also comes from extensive and interesting discussions with Abeba Bihrane over the last two years. Thank you for explaining many of these ideas to me.