On our No-Go Theorem for AI Consciousness
Can mathematics help us decide whether AI systems are conscious?
How do we find out whether AI systems are conscious? In a recent paper, Tim Ludwig and I attempted a new approach to this question, based on a methodology that has proven valuable in research on foundations of physics: no-go theorems.
Making use of general facts about how contemporary computer chips are built, we show, by use of a mathematical proof, that under a condition called ‘dynamical relevance’, contemporary AI systems cannot be conscious. That’s Theorem 4 of our paper:
Theorem 4.
If consciousness is dynamically relevant, then AI systems are not conscious.
No-go theorems serve a variety of purposes. One is to shift the focus from the theorem’s conclusion to the theorem’s assumptions. It is a pleasure to see that this is happening already shortly after the publication, in a blog post by our very first critic, Justin Sampson.
This post is a reply to Justin's criticism, which to a large degree follows intuitions we too had at some point of development of our research, but which, for reasons that I will explain below, don’t work out. But despite my objections below, I’m convinced that Justin’s perspective is crucial for the further exploration of our result. I hope that the following reply might be of help to others too in thinking about our theorem.
For all those who are reading this reply before Justin’s post, let’s start with a kind summary that Justin provides of our work in the beginning of his post:
“The crux of the argument is that electronic circuits are designed and verified (…) to exactly implement a computational model. There might be slight deviations, but they would be rare and random, leaving no room for the dynamical relevance of consciousness to creep in. The mathematical formalism introduced for the proof is quite nice, laying out the relationships between different levels of description of the world, such as theories of consciousness, neuroscience, and physics. Dynamical relevance is defined in terms of the dynamical evolution of a system, that is, the trajectory of states that a system goes through.”
That’s an excellent summary of our work. The crucial point is that contemporary computer chips (CPUs, GPUs, etc.) are built in such a way that they adhere, strictly, to a formal system that is specified in full detail before a chip is even produced. Violation of this requirement can easily cost hundreds of millions of USD, such as for example in the Pentium FDIV bug.
With this summary in mind, let’s go through what I think are the three most important criticisms in Justin’s post.
Causal Closure vs. Dynamical Relevance
Our theorem rests on a new concept which we introduce in the paper, called dynamical relevance. Dynamical relevance is a property of a theory of consciousness, defined in formal terms here and here. Translated into non-formal terms, the definition is roughly as follows:
Consciousness is dynamically relevant according to a theory of consciousness iff the theory of consciousness changes the dynamics of some reference theory in the natural sciences when systems are conscious.
This is the case, for example, if a theory of consciousness proposes a new brain function which isn't contained in established neuroscientific theories of the brain. Here, “dynamics” refers to the temporal evolution of states of a theory.
Justin, understandably, worries that dynamical relevance might be incompatible with causal closure of the physical (and because of this replaces our definition with another definition, that we’ll discuss below).
But this is not the case. Causal closure of the physical is a metaphysical condition, often put as the requirement that every physical effect has a sufficient physical cause. Here, “physical” designates the metaphysical category of physical things and properties. If a theory of consciousness postulates a new brain function that is part of this metaphysical category, it satisfies causal closure while still being dynamically relevant. Hence, there is no conflict here.
Differences in understanding of general concepts can easily happen given the huge interdisciplinary nature of our field, and in some cases, that can be a good thing. For example, I’m tempted to think that what Justin has in mind in the blog post when discussing causal closure is precisely the dynamical irrelevance of a theory of consciousness for all theories of physics. (The field of physics, that is, not the metaphysical category of “physical things and properties” mentioned above). While that’s not causal closure as canonically understood, that’s an excellent proposal, and aligns perfectly well with my intuition that dynamical (ir)relevance is a substantive concept that can help us make progress in thinking about consciousness independently of our theorem.1
What I find promising about the notion of dynamical relevance is that it is orthogonal to many of the concepts that we use to describe the *relation* between consciousness and the physical. We say more about this in this section of the paper. For causal closure of the physical, for example, we have the following logical landscape:
With respect to Justin’s criticism, the important case is that we can have a theory of consciousness which does not violate the causal closure of the physical, but still describes consciousness as dynamically relevant with respect to existing theories. That’s because consciousness can be physical and also dynamically relevant. The simplest example is a theory of consciousness that postulates that consciousness is a brain function that neuroscience doesn’t know.
As a consequence of Justin’s particular understanding of causal closure, he argues that if causal closure is assumed, our no-go theorem can't take dynamical relevance as an assumption and “should be expressed, instead, in terms of a conscious system vs. an unconscious system, not a theory of consciousness vs. a reference theory”. While, for the above-mentioned reasons, I do not think that this statement is entirely accurate (dynamical relevance is fully compatible with causal closure), I do agree that it’s very instructive to consider this alternative case. So let’s do that!2
Conscious vs. Unconscious Systems
It is very tempting to think about the whole issue at stake here as an issue only of conscious vs. unconscious systems. Perhaps the idea that “consciousness makes a difference”, which drove the development of the concept of dynamical relevance, should be flashed out in terms of dynamical evolutions of conscious and unconscious systems? That’s a proposal which Justin makes, and I think a very good proposal, which we had also considered in earlier phases of the research. It would make things a lot easier!
Unfortunately, though, framing the issue in those terms doesn’t work. That's because it is *always* the case that conscious systems evolve differently than unconscious systems, independently of what consciousness is taken to be.3 Here is why:
Consider, as an example, a supervenience theory of consciousness which postulates that consciousness supervenes on a brain function. Let’s simplify and assume that the theory says that if the function is active, the system is conscious, and if the function isn't active, the system isn't conscious. But for any such theory, there is a difference between the dynamical evolutions of the conscious and the unconscious cases, simply because the activity of the brain function makes a difference to the dynamical evolution. That is the case even though, in virtue of supervenience, consciousness doesn't make a difference to what happens in the brain in a more common sense of the term.
The same holds true on the opposite end of the spectrum. If, for example, consciousness is causally efficacious, there is also a difference between the dynamical evolutions of the conscious and unconscious cases.
Therefore, the difference between dynamical evolutions of conscious and unconscious systems cannot discern the above cases. In fact, there is always such difference, provided that there is some regularity between conscious experience and physical states. So if one were to think in these terms, one would be unable to address the difference between the above examples. That’s, in a nutshell, why we had to drop this way of thinking in the early stages of the research on our paper.4
Dynamical relevance, in contrast, captures the difference between the above theories. A supervenience theory postulates consciousness as dynamically irrelevant to a theory of the supervenience base. And a causally efficacious theory postulates consciousness as dynamically relevant to a theory that describes the effects.
During our research, we thought long and hard about how to turn “consciousness makes a difference” into a good definition. As often in formal research, this is, unfortunately, not so easily seen in the resulting manuscript. One other thing we considered, for example, is the result of transformations of variables in systems of equations, which can change which variables one would naively take to not make a difference to other variables.
The specific version of dynamical relevance we propose captures, in formal terms, what we think is a minimal condition for consciousness to make a difference. It doesn't have metaphysical ballast, and whether it applies or not is something that can easily be assessed when constructing theories of consciousness.
I think I speak for both Tim and I if I say that we wholeheartedly welcome further work on this concept. It’s surely not the end of the story. But replacing it by a “conscious vs. unconscious system” distinction doesn't work, unfortunately.5
The Simulation Question
A third idea that Justin proposes is to think about our theorem in terms of the following question: “if a given brain evolving in a certain way is conscious, would a sufficiently-detailed simulation of that brain be conscious as well?”
That’s a great question and even though it's not the question we ask in the paper it's very instructive to think about it. Let’s call it the “simulation question”. We all agree that if consciousness isn’t dynamically relevant, the question is open (and depends on other factors, e.g. which theory of consciousness is correct). But what if consciousness is dynamically relevant?
We address the easy part of the answer to this question already in the paper: if the true theory of consciousness were known, we could take it into account when building a simulation of the brain. In a nutshell, that’s because we can change the formal system or the verification of the chips, so as to make room for the dynamical changes that the theory of consciousness proposes for conscious systems. That’s not covered by our result because the result only applies to contemporary processing units (PUs).
So let’s ask the difficult part of the simulation question: What if the simulation runs on the CPUs/GPUs we have today?
The answer to this question lies in the definition of dynamical relevance, and in carefully distinguishing states, structure, and dynamics. Let me try to explain this as follows.
Let’s assume we would like to build a sufficiently-detailed simulation of a brain. How would we go about creating it?
Presumably, we would make use of our best neuroscientific theory of the brain, built upon detailed measurement of the neural connections in the brain, together with information about the update rules (or differential equations) that determine how neurons update their state in response to input, and information about neural brain states. Given such theory, we would then implement the structure and update rules in the simulation, choose one of the brain states, and click on “run” so as to start the simulation.
So what happens if consciousness is dynamically relevant? If consciousness is dynamically relevant with respect to the neuroscientific theory, the dynamical evolution of the brain will diverge from the one in the theory as soon as consciousness arises in the brain. So when comparing the prescription of the neuroscientific theory with the actual dynamics of the brain, we would see, if we had good enough resolution and measured a big enough portion of the brain, that what happens in the brain is different from what the theory specifies. That’s what it means to say that consciousness is dynamically relevant with respect to a neuroscientific theory of the brain. Perhaps we can see that there must be some function module related to consciousness which the neuroscientific theory doesn’t describe correctly or doesn’t take into account, or perhaps we see that we need to introduce another variable into the model.
But what about the simulation? For the simulation, no such divergence would happen. Rather, because of verification and error correction, the processing units (PUs) would make sure to produce exactly those dynamics that we have specified in the theory. There is no room for divergence in the microprocessor. The processor is built so that it spits out, exactly, what we have put in. It follows the path laid out by the state, update rules and structure in the theory. Hence, if we build our simulation correctly, the PUs produce the dynamics of the neuroscientific theory.
But because of the logic of the definition of dynamical relevance, this can only be the case if the system is not conscious. That’s simply what dynamical relevance means. If consciousness is dynamically relevant with respect to a theory, then if a system is conscious, its dynamical trajectories differ from that of the neuroscientific theory. So, by contraposition (the logical operation of taking the converse of a statement, which gives an equivalent statement), if consciousness is dynamically relevant with respect to a theory, then if a system’s dynamical trajectories do not differ from that of the theory, the system isn't conscious.
The crux of this example is that there is no magic trick by which we can tell the GPU to produce the exact same dynamics as the brain. By the very nature of computation, we can only give it the same structure, states, and update rules, and then tell it to compute precisely what we put into the simulation.
The only option that we’d have, in this hypothetical scenario, once we realize that the dynamics diverge from that of the brain, is to try to change the time evolution of the PUs so as to try to reproduce dynamics that agree with the brain also in the cases where the brain is conscious. But that’s precisely searching for a theory of consciousness, and if we found such a theory, we can take it into account in the simulation, which is the first case we considered above.6
So, in summary, we see that the answer to the simulation question fits perfectly with our theorem: if consciousness is dynamically relevant with respect to a neuroscientific theory, then a simulation of that theory isn’t conscious. That’s a direct consequence of what dynamical relevance means. Because we might be able to *imagine* a faithful computer simulation of the brain even if consciousness were dynamically relevant, this consequence may be somewhat unintuitive. But, in my view, that’s a good sign. It’s a sign of progress that goes beyond the limitations of intuition.
Conclusion
Much more could be said, but in the interest of length, I will leave it at this. The above is, to some degree, meant as a defence of our result against the objections that Justin’s original post puts forward. But I would rather think of this as an ongoing discussion of our result, whose purpose isn’t to show that an objection is straight out wrong, but rather to help us all explore the logical landscape around this result together.
I see the paper that Tim and I have written as a first step, not a final answer. A first step to showcase that we can make rigorous progress with the artificial consciousness question while evidence to distinguish theories of consciousness is still nascent and while new theories of consciousness are still being developed.
I find myself pondering a lot too. But in my view, in cases where the math works out, that's a good sign. A sign of having achieved some insight that isn't available to intuition otherwise. (Tim would probably disagree, and I admire his capacity for intuition, especially when it comes to formal questions.)
I think I speak for both Tim and I when I say that we cordially welcome any further work on the paper and its implications, including any criticism people might have. A huge thank you to Justin on that occasion!
Epilogue: The Function of Blog Posts in Scientific Debates
Let me add, at the end of this post, a quick note on the function of blog posts in scientific debates. I think there are certain dynamics of blog posts that are not so conducive for a progressive debate on a scientific topic that might be interesting to see, independently of the issue at hand.
Blog posts are an important part of scientific progress. They are excellent tools of science communication with a broader audience and the general public, of dissemination of research to colleagues, and can be a great educational resource. One could probably also make a point that they serve an important purpose in allowing researchers to put their own opinion out, over and above research papers (I’m not fully convinced of this last point as of yet, but alas).
But I think it’s important to note that there are also things for which blog posts are not good for. One of them, in my view, is criticism of individual papers.
The problem with blog posts that criticize individual papers is that they address an entirely different audience than papers do.
Papers that criticize other papers address those that are interested in a particular debate, who know the background of a debate, understand the material well, and can see that there are pros and cons of various sides in a debate. Blog posts, on the other hand, are easy to read and comparably short. They are read by members of a field that are not engaged in a particular discussion, as well as researchers working in adjacent fields, and perhaps even the general public.
As a consequence of this, criticism put forward in a blog post emanates well beyond the basin of attraction of a scientific debate. It influences opinions of others outside of the debate or even outside of the field, on the content of the paper, but also, implicitly, on the quality of the work of authors.
This is why, I think, blog posts are not the right medium to make a progressive scientific debate happen. Because blog posts are expected to be simple to read and simple to follow, they can, in my view, easily be the “simple but wrong” option of scientific progress.7 They are outside of the checks and balances of peer review. No independent expert had a chance to check the logic and soundness of the criticism, and neither did any of the recipients of the criticism. Yet due to their short length and accessible form, blog posts can be extremely instrumental in forming opinions. That’s not a very conducive combination for scientific progress in a debate, and can put the authors somewhat on the spot.
In light of these implications, I think that where criticism of individual papers is concerned, it is better to write reply papers than blog posts. Many journals have a “commentary” section for articles that are about as long as a blog post and that don’t require much more work. Such commentaries are usually sent for peer review, so get checked by anonymous reviewers before publication, count towards the publication score, and target the right audience. And they make it much easier for the authors of a criticised paper to write a proper reply.
That said, we all have to deal with constraints of time and other factors that limit just how much effort we can put into a thought we’d like to share with the public. And when push comes to shove, it’s probably better to put criticism out in any suitable form.
In a revised version of his post, Justin uses the term “dynamical involvement” to designate the dynamical irrelevance of a theory of consciousness for theories of physics. I think it’s great to have a term for this! I was thinking “dynamical closure” might be an option too, though perhaps that sounds too much like “causal closure”.
A quick note for those who have read Justin’s original post before the revision: There is an issue with the logic of the above-mentioned criticism which I think I should point out briefly. Just as one cannot criticize an argument by replacing a premise by something else, one cannot criticize a theorem by replacing a premise by something else. So even if the above criticism of dynamical relevance were valid, the remaining criticism of Justin’s original post wouldn’t apply, as it is based on an assumption that differs from the one we actually make. But let’s ignore that and consider the other interesting points that Justin raises.
Provided that there is some regularity between consciousness and physical states/dynamics. It’s tempting to flash this out formally, but I’ll try to refrain from this temptation for the sake of readability. 😉
Another way to put this is that if an AI system is conscious, then it will be the case that the dynamical evolution of the conscious cases differs from the dynamical evolution of the unconscious cases. Arguing that nothing stands against this possibility doesn't establish that AI consciousness is possible, or that our no-go theorem doesn't apply. Arguing that nothing stands against this possibility is addressing an additional necessary condition for AI consciousness, that is independent from our result.
One might be tempted (and we were 😉) to compare the difference between conscious and unconscious conditions not for systems as a whole, but for systems in specific, fixed states. That is, one could consider the difference in the dynamical evolution of a single state between cases where the system in the state is conscious, and cases where the system in the state is unconscious. That would resolve the above-mentioned worries. Unfortunately, though, that doesn’t work either. That’s because for a single theory of consciousness, a system in a given state is either conscious or unconscious. Therefore, for any such comparison to work, one has to make use of a second theory, and a reference theory is the most reasonable choice.
There is a little bit of subtlety here which goes beyond the scope of a blog post I fear. That subtlety has to do with the fact that dynamical relevance passes downward (Lemma 5 in our paper). In a nutshell: if a theory or model T’ is dynamically reducible to another theory or model T, then if consciousness is dynamically relevant with respect to T’, then it is also dynamically relevant with respect to T. Since a programme running on a CPU is always dynamically reducible to the machine code of the CPU (running a programme means running the machine code that the compilation of the programme provides), this applies to the case at hand: if consciousness is dynamically relevant with respect to the simulation, it is dynamically relevant with respect to the machine code—viz. the formal system that the CPU is running. Hence the case considered last indeed reduces to the case considered first.
Let me point out that I do *not* think this about Justin’s post. This section contains general and independent observations, and I am very grateful of Justin’s very kind handling of our paper in his post.
It seems that your definition of dynamical relevance is congruent to interactionist dualism. It states that consciousness must be an additional orthogonal thing that can be added to the base theory that makes a difference when you add it. Is that fair?
The place where you lose me is here:
"Causal closure of the physical is a metaphysical condition, often put as the requirement that every physical effect has a sufficient physical cause. Here, “physical” designates the metaphysical category of physical things and properties. If a theory of consciousness postulates a new brain function that is part of this metaphysical category, it satisfies causal closure while still being dynamically relevant. Hence, there is no conflict here."
It is only dynamically relevant up until the moment you incorporate that new property into the theory! But this is a weird property to insist a theory of consciousness should have. It's clearer in this later quote:
"consciousness can be physical and also dynamically relevant. The simplest example is a theory of consciousness that postulates that consciousness is a brain function that neuroscience doesn’t know."
Here, the unknowness of the brain function seems to be an essential feature in rendering it compatible with your criterion. If it were known, then we could update our theory of neuroscience to include it, and then consciousness would not be dynamically relevant relative to that revised theory, and so... we should expect that the new theory is wrong too, because consciousness should change something when added to it?
This is something I hear implicitly in a lot of the philosophical arguments around consciousness that seem intuitive to a lot of people and seem off to me. A requirement that only a black box can conscious, because to have an understanding of how it works is to have an alternate explanation to consciousness. So a computer program cannot be conscious because we know the AI does what it does because it's sensors update its state and then the CPU follows the instructions to turn that into its actions (implicitly, ergo it did this for a different reason than because it is conscious), but this is just what having an explanation looks like. (Or, if you've got other ideas of what having an explanation could look like, I'd love to hear some examples of the sort of thing where you fill in the part we don't know with some speculative bullshit that would fit the bill.) What if this sort of computation *is* the sort of thing that gives rise to consciousness? Then the question of dynamical relevance is ill formed, because you can't ask how the theory changes with vs without consciousness.
More generally, in any theory of consciousness where consciousness is explained in terms of other more fundamental properties -- basically any theory of consciousness that isn't dualist -- you can't really ask the question of dynamical relevance as you formulate it, because you can't take consciousness out of the theory and leave it otherwise in tact, so there is no comparison to make to see if consciousness does something. If consciousness is explained by other things in the theory, then (theory - consciousness) is not well defined, so dynamical relevance is not well defined in a broad class of theories of consciousness I don't think we should rule out!