Storytelling with Impact | Blog posts on crafting stories

Navigating Carbon Markets from the Climate Rising Podcast

March 15, 2025/in Climate Change, Entrepreneurs, Environment, Government/by Mark Lovett

As climate change continues to be the most significant threat our planet has ever faced, telling stories about this crisis — sometimes focused on personal experiences, but other times focused on the efforts intended to mitigate the issue. In either case, storytelling is one of the most powerful tools available, considering its ability to shine a light on what is happening, and what we are presently doing about it. The Climate Rising Podcast is a brilliant example of telling stories about climate change solutions.

On the Climate Rising Podcast, business and policy leaders join Harvard Business School faculty to discuss what businesses are doing, can do, and should do to confront climate change. Each episode dives into a particular topic with a subject matter expert in the field. While not presented in a true storytelling fashion, if you’re personal story involves a technical or scientific topic, there is much to learn about how complex issues can be dissected and presented in a way the general public will understand.

Alexia Kelly, Managing Director of the Carbon Policy and Markets Initiative (CPMI) at the High Tide Foundation, joins host Mike Toffel for the fifth episode in our series on voluntary carbon markets. Alexia has worked for nearly two decades at the intersection of carbon markets, policy, and finance, with roles spanning government, private industry, and nonprofits.

In this episode, Alexia discusses how voluntary carbon markets are evolving, the critical role of policy in shaping carbon finance,and how standards and governance can improve market integrity. She also explores how advances in digital technology, data transparency,and AI-driven monitoring are transforming carbon credit verification and market confidence.

Additionally, she shares her perspective on the integration of voluntary and compliance markets, including recent developments in Article 6 of the Paris Agreement.Alexia also offers career advice for those looking to enter the field and shares resources for staying informed on carbon markets and climate finance.

Okay, so there’s a lot of technical information here, along with acronyms, policies, and agencies. But note how Mike steers the conversation — at times diving deeper, while in other cases changing direction or seeking clarification.

You can use this technique by asking a trusted friend to interview you. They do need to have a basic understanding of your topic, of course, but having another person ask questions allows you to think about your story differently. Give this episode a listen and peruse the transcript to uncover new ways that complex stories can be told. The world needs to hear you, but also understand your message.

Transcript

Note: The following was AI generated, and may not perfectly match the interview.

Mike Toffel:

Alexia, thank you so much for joining us here on Climate Rising.

Alexia Kelly:

Thanks for having me, Mike.

Mike Toffel:

So, you have a really interesting biography. You’ve worked in the nonprofit sector. You’ve worked in the private sector. You’ve worked in the government sector. Tell us a little bit about your background and how you ended up where you are today.

Alexia Kelly:

Yeah, sure. Great to be here and have this conversation today. I’ve had the opportunity to work on carbon markets really since some of the earliest days. So, I started my career in 2006, actually the first regulation of the greenhouse gases in the United States at a nonprofit in Oregon called the Climate Trust, which was established under the 1997 carbon dioxide regulation. And so basically ran parallel to the development of the Kyoto Protocol under UNFCCC. So, over the course of nearly two decades working in and around carbon markets and carbon pricing, I’ve worked in the nonprofit sector, including with the World Resources Institute on Waxman-Markey when we were trying to get federal legislation passed in the 2009, 2010 era. I was recruited to join the State Department team in 2010 at the beginning of the Obama administration and had the opportunity to spend a number of years there working both on international development work through our Enhancing Capacity for Low Emissions Development Strategies program, as well as our lead negotiator for emissions trading under the UNFCCC, which was pre Article 6, but we’ll talk a little more about Article 6 later, which are the emissions trading provisions of the Paris Agreement.

And then after that, I wanted to go into the private sector. So, I ended up working in a family office trying to understand why we had this big finance gap between all of the investment opportunities we were seeing in the climate clean energy space, particularly in emerging markets, and why that money wasn’t moving. So did a bunch of work on understanding capital markets and really tried to understand what motivates the private sector to take action, which led me to joining Netflix in 2019 to help establish its inaugural sustainability program. So, they had no sustainability program prior to that, not even a greenhouse gas footprint. And so, we really built the program all the way from the ground up. And so, I served as our director of Net Zero in nature there for about three years before joining the High Tide Foundation just a couple of years ago to establish on what is now the Carbon Policy and Markets Initiative.

Mike Toffel:

Great. So, it’s such an interesting and varied history. Let’s talk about the High Tide Foundation. What was it formed? It’s a family office. It’s a funder. And it has a few different sorts of lines of interest. So, if you could tell us about each of them, then we’ll dive more into the Voluntary Carbon Market’s piece of it.

Alexia Kelly:

Yeah, absolutely. So, the High Tide Foundation is a private family office philanthropy focused exclusively on addressing the climate crisis. There are two primary areas of focus for the foundation. We do a tremendous amount of work on methane through our funding of Carbon Mapper, the Methane Hub and other major initiatives to help really address and tackle methane pollution, which of course is one of the fastest kinds of emergency breaks we have to pull as we’re looking at addressing the climate crisis.

I lead our work focused on carbon markets and carbon pricing. And so that’s through the carbon policy and markets initiative, which we established just two years ago, to really help bridge what we observed was an increasing divide between where civil society and the science wants companies to be and where companies operating in the real economy actually are.

And so, I joke that now I’m just a professional board member and we spend lots and lots of time really trying to build bridges and establish high integrity rules of the road that accelerate action and ambition while also recognizing that there are very real constraints that folks operating in the global economy are facing as we seek to advance decarbonization solutions.

Mike Toffel:

Great. Now, we’ve spoken with Mark Kenber at VCMI and Amy Merrill at ICVCM, some of the organizations that you directly work with. Can you tell us a little bit about the bridges between the High Tide Foundation and your role and those organizations as well as others that you’re also helping to coordinate and sync up?

Alexia Kelly:

Absolutely. Yeah. So, I serve on the, as a board alternate, a high tide foundation was a major funder of the ICVCM, the integrity council for voluntary carbon markets predecessor, which was called the task force for voluntary carbon markets. and that you think you’ll cover that when you talk to Amy, but really coming in and having real capital markets actors looking for the first time at this artisanal baby carbon market that’s been out in operation for a number of years and really thinking, okay, what’s it going to take to grow that market?

So, the High Tide Foundation has been a primary funder of that. We also invest significant in-kind resources there. I serve on the standard oversight committee of the ICVCM as well as the, as the chair of our continuous improvement work programs, which I can talk a little bit more about later.

I also sit on the expert advisory group of the Voluntary Carbon Markets Integrity Initiative and on the US Technical Advisory Group of ISO, which is the International Standards Organization. So really spending a lot of time helping to connect dots and accelerate connective tissue among and between all of the different standard-setting bodies that are out there that are now really deciding what constitutes credible, voluntary corporate climate action and how do we do the accounting for those actions so that we know what’s working and what’s not working and really have a basis upon which to compare level of effort and engagement across this set of issues.

Mike Toffel:

So all these, as I understand it, all of these organizations basically are trying to lift the tide of standards so that the critiques that have befallen much of the voluntary carbon markets are addressed by, again, raising the bar, setting standards, say, you have to be at least some level of integrity in order to even be in the market of voluntary carbon markets. Because to the extent that there’s these laggards, it pulls down the whole industry. Is that the right way to think about it?

Alexia Kelly:

Absolutely. I mean, you need that strong floor. And actually, when I was at the Climate Trust in 2008, we established something called the Offset Quality Initiative because there was a lot of concerns over whether or not we were really going to be able to set that bar for offsets because this endeavor of pricing and measuring and quantifying the impact of actions that we’re taking to mitigate climate change is so challenging.

And so, we’ve really needed to agree on what a rule set is for how you do that accounting and how you do that measurement, because it didn’t exist. Like we’ve been literally making all of this up as we go for the last two and a half decades. And we’ve learned a lot. But I think it’s important to remember that this is really the first time we’ve ever tried to do this. And certainly, when we were writing the rules, the first time, you know, 15, 20 years ago,

We lacked a lot of the technologies, science, and data that we have available to us today, which means that you sort of have to set rules with what you have, right? As a policymaker, that’s what you do. You take the best available information, and you set the policy in the best, most rigorous way you think possible.

And so, we’ve seen that kind of continue to evolve in the carbon markets over the last 20 years and we’ve also gotten a lot better at doing this. And I think that doesn’t get enough credit and attention. We have actually a fair degree of uniformity and agreement now on what the core rules should be for what constitute high-quality carbon credit.

And we have a much better sense of what needs to happen between now and the next five years to really get this market to scale because it’s still not operating on the scale, right? It represented about $750 million in value last year. Like it’s a baby market still relative to almost any other capital market that’s out there.

Mike Toffel:

So, can you say that you’re seeing some progress in these areas and getting some consensus? Can you describe an example or two of what that looks like? So, what would have been allowed in the past that’s now not going to meet the minimum bars that these organizations are lobbying for?

Alexia Kelly:

Yeah, it’s a really good question because what we’ve seen over the last couple of years, particularly through the work of the Integrity Council for Voluntary Carbon Markets, is an emerging consensus on what global thresholds quality standards look like. So the core carbon principles that we’ve laid out, thinking about things like additionality, things like permanence, those key tenets of what constitute a credible way to measure the impact of the things we’re doing are pretty well established now. I think we’ve explored that from just about every avenue and there’s broad consensus on how you do that. The assessment framework that the ICVCM published a couple of years ago, which is the rule book by which we measure how these core carbon principles are implemented by the programs, also represents a really important step forward in terms of the consensus of what right looks like.

But one of the things that struck me the most as we’ve been going through, so what we do at the ICVCM is we go through methodology by methodology, and we look at all of the methodologies that the programs are using for a given type of project. So maybe a landfill gas destruction project, for example, or renewable energy credits or RED+ or improved forest management.

So, there’s different methodologies that have emerged to measure the impact of all of those things. And it often comes down to one or two key decisions or like emissions factors in each of those methodologies that determine how much credit you get for that particular project and determine whether we’re doing a good job of assessing the net impact of that particular project.

That set of work has been particularly challenging. So, we’ve assessed about 37 % of the market at the ICVCM right now, and we’ve only approved 3 % of the core methodologies. And one of the reasons that that’s the case is because of renewable energy. And renewable energy is kind of a fascinating example of how this market has evolved because when the first renewable energy projects were getting going, 25 years ago, 30 years ago, they were substantially more expensive than their fossil fuel incumbent alternatives, orders of magnitude more expensive. And over the last 20 years, we’ve seen just a precipitous decline in the cost of renewables, which is a wonderful thing. That’s exactly what all the policies we’ve put in place is what we wanted to see happen. But it calls into question whether a lot of those projects really need now the carbon finance to help close that financial viability gap. And yeah, sorry.

Mike Toffel:

Right. Let me just interject a minute. So just to make sure people are following this and make sure I’m following this. So, one of the criteria to have a carbon credit issued is it has to pass these many tests, but one of them is this additionality test, which essentially states, this project, does the project success in terms of being profitable depend, is it hinge on the climate finance coming from the carbon credits? Because if it would be profitable on its own, well then you don’t really need the carbon credits. And only if the carbon credit pushes it over that threshold, that’s when it can be issued a carbon credit. That’s the game, saying I’m going to invest in one carbon credit in order to create emissions reductions or removals that wouldn’t otherwise have happened. So, this is all about the counterfactual, just to catch folks up.

Alexia Kelly:

Exactly. Yeah. Thanks so much for that explanation. And that’s exactly right. And so we have about eight different ways in which we test for what’s called additionality, ranging from a peer financial additionality test, which are the carbon credits helping you meet your internal rate of return threshold, for example, all the way through to what we call positive list of technologies where we say, okay, we know that this technology is still more expensive and not common practice. And so therefore, up until a certain rate of penetration, all of that’s going to be considered additional and you don’t even have to think about it. If you meet this technology test, then you’re good to go.

Testing for additionality is extremely difficult, right? Because what we’re trying to do is develop a universally applicable sort of set of tests that can accommodate the almost infinite range of individual project circumstances, while also setting a pretty good threshold bar that’s enabling us to separate out the stuff that really didn’t need the carbon finance from the stuff that did. And, you know, as in most other fields, it’s not going to be perfect every time.

That’s just not the way the world works. And that’s OK. And I think that’s one of the things that the carbon market has really suffered from is this expectation that it’s perfect all the time. And it’s not going to be. And we don’t need it to be. What we need to do it to be is effective. And we do need it to be actually delivering the environmental impact that it needs in order to help us solve this problem. Because for me, if I’m going to spend $1 on climate change mitigation, I want to make sure that that dollar is doing something that wasn’t going to happen anyways, that has a lasting atmospheric impact.

And that’s really helping us move the needle on the climate fight because otherwise I’m wasting my money. And so that is what the carbon markets enable us to do. It’s the first place I always joke that, you know, the carbon markets aren’t the worst. They just went first. Like they were the first time we ever tried to do this huge, complicated thing. And we are getting much, much better at it than we used to be. And I’m feeling particularly optimistic that, you know, moving forward, we’re really going to be able to do that granular assessment much more robustly and rigorously, you know, in no small part, as we talked about yesterday, you know, just around the advent of new technologies and data and sources of information that we just simply didn’t have available the first time around.

Mike Toffel:

Great. And I want to dive into that in just a minute. Before I do that, one of the organizations that you mentioned you’re working with is ISO, or the International Organization for Standardization, Swiss-based with lots with national, in a sense, national offices or national outreach in each of the countries, including the US, under other names that are like ANSI and so on. And ISO has lots and lots of standards. I’ve done research in this area for some time. They have ISO 14,001, which is an environmental management standard, the 9,000-quality management standard, a more recent occupational health and safety, 45,000 in one standard. And they so far don’t have a standard, I understand it, on voluntary carbon markets and what constitutes like an ISO meets an ISO bar. And so, it sounds like they’re working on that. It’s taken… I don’t know, you mentioned 20 years for us to get there and we’re not there yet. Tell us a little bit about what ISO’s angle on this and why it has taken so long?

Alexia Kelly:

Yeah, there’s two things happening at ISO that are related to both supply and the quality of the credits and then demand. And I’ll touch briefly on both of those. The supply side is actually interesting because ISO has had a standard on how you do project-based greenhouse gas accounting for a very long time. We finished it in, I think, 2006 maybe, 1464-2, for those of you who want to go look it up. And what that does is basically provide very high level guidance around how you do this quantification. So, you need to think about additionality. You need to think about permanence. But it’s really at a principles level.

So, ISO tends to deliver standards that are at a relatively high level. And then you layer underneath it in more detail at the other standard level. So, as we’ve noted, what’s happened in this space in particular is that because it’s not regulated and because we haven’t been able to come up with a uniform global system to address this problem, we’ve seen this kind of thousand-flowers-blooming approach in the voluntary carbon market where a bunch of different nonprofits and regulatory regimes around the globe have sort of popped up and said, okay, this is important. We need to figure out how to do this. And somebody should really be bringing some transparency and oversight to this space. We’re going to do that on a nonprofit basis.

And so, you hear about the big four, Verra, ACR, and other carbon crediting standards that are active. Those are nonprofit organizations. They are often built on top of the ISO 1464-2 rules. So, what’s happened is that’s become the backbone. And then you have other organizations that have come in and kind of put the muscles and the flesh on the whole system so that you have a system for actually getting all the way through to market issuance and crediting. So, they are in the process of reviewing some of those core standards around what are the principles of defining quality.

On the other hand, and I think of interest to folks, you there’s been a lot of debate around what defines a credible net zero claim for companies that are taking voluntary action and what should be allowed to count. How do you set your target? How do you account for progress towards your target? What does that all mean? That’s the other half of the work that we do at CPMI quite actively. And that’s really part of this debate.

If we want a functional market, you need high quality supply. And you need high quality demand. We have to have both of those things. A market with no demand is not a market. And there are real ideological divisions in civil society and in the nonprofit space in particular around whether markets can ever be an appropriate way to solve environmental challenges. I think, over the course of my career, we’ve seen a number of instances where they’ve been harnessed very effectively to do that.

But there’s also been high profile failures, right? And if you read the New Yorker article about some of the stuff that’s happened in the voluntary carbon market, like the lack of oversight and regulation has been a real problem. And there are opportunities for carbon cowboys to come in and take advantage of the system. And so that’s really eroded trust and confidence, think, particularly among young people who are sort of saying, look, you guys have had 20 years to implement these market systems.

They aren’t working. We’re still in a climate crisis. And it’s a real honest and important and legitimate conversation to be having about where do we deploy market-based solutions effectively and where do we just need regulation? I think if you ask most of us who’ve worked in this space a long time, we’d prefer to have regulation hands down every time. But we’ve also really struggled to get regulation in place. And so now we’re in this world where voluntary action plays a really important role.

So ISO has actually established a process now to write rules around what a net zero standard might look like and what type of information you would need to disclose if you wanted to make a claim and have it, what we call assured or have a third party come in and look at what you’re saying to make sure it’s legitimate and meets the rules of an existing standard that’s out there in the world. So that conversation is happening right now under this big international umbrella in ISO’s incredibly complicated and kind of Byzantine architecture that frankly is worse than the UN. It’s kind of amazing. But those conversations are moving forward apace, and the hope is that they will have a kind of initial agreement on what our credible net zero standard looks like that’s internationally applicable in the next 18 months or so.

Mike Toffel:

Got it. I want to circle back to something you said earlier, where you said these markets have been evolving, and they don’t need to be perfect every time. I don’t know. wonder if, in fact, you mentioned the New York article has been a number of articles of instances when things haven’t worked, when the measurement has been proven to be overestimating, for example, how much carbon is actually being sequestered, or reversals occur. You prevent a stand of trees that had intended to be felled from, in fact, being felled and you’ve addressed leakage, no one else’s forest is falling instead, but then there’s a hurricane that knocks them down anyway or a forest fire.

So those emissions aren’t sort of secured for the long run, the permanence concept. So, my sense is that they don’t have to be perfect, but there has to be consequences when those imperfections arise because, just like when you buy a product or a service and you have an expectation that it’s going to work, if it doesn’t work, you have some recourse for a warranty. And I do know that there’s some mechanisms in place that are called buffer pools where you can tap into if in a reversal case, or there’s even insurance that can arise. My sense is that those mechanisms are not widely understood, and they’re often not in the

And maybe they don’t always exist. I don’t actually know. What’s your sense about that? When these failures do occur, do you feel like the systems are in place actually to nonetheless sort of compensate for the losses? Or are there still cases where those losses occur and then you’re just at a loss?

Alexia Kelly:

Yeah, this is one of my favorite questions and most important issues facing us. And I’ll take baselines and permanents separately because they’re two distinct but related issues. The permanent one is interesting because as you know, we actually have quite extensive systems in place to manage reversals and address the risk of reversal and storage. And they are not well understood. I think they don’t get talked about enough and people really don’t understand how they work.

A fundamental principle of almost all crediting systems is this notion of conservativeness. So almost always, you’re going to be the most conservative number across the board. And so, what we find is often projects are already pretty significantly discounted even before they get to issuance and the credits are out there. So, there’s this whole additional layer of protection that the system has built in that the public really never even sees because it all happens before the credits are issued. As you’re looking, so say I have a forestry project, I’m going to be applying my methodology. I’m going to be bringing my experts to collect the data. They do it all through statistically validated sampling systems, right? So, they’ll go out and they’ll do random plots and they’ll send people out into the middle of these woods and they have to go out and do direct measurements.

This is being helped and accelerated a bunch by remote sensing and AI, but we still do on the ground verification. So, it will often take a week to cover an entire project, and they send scientists out to go do direct measurement of the trees and then use really complicated sort of stratification systems in order to make sure that we’re getting a statistically valid representation of the carbon stored in that particular project. That then gets run through this whole process and you have a series of discounts that are applied before you even get to the credit issuance. So, there’s conservativeness built into the system there. Then after issuance, you have long-term monitoring requirements where farmer foresters will sign up and say, look, I’m signing up legally to make sure that this carbon stays sequestered in this particular project area for X number of years. And it varies. You have to figure it out, and the challenge here is that what we’re trying to do is design a system that delivers real atmospheric benefit.

And there’s been a raging debate in the academic and scientific community for a very long time about, we like to say as long as it matters, how long does carbon need to be stored in order for it to have a real meaningful atmospheric impact? And the answer actually turns out to be not very long, right? So at this point, particularly given where we are with CO2 loading in the atmosphere, anything we can do to store and defer the release of carbon has an atmospheric benefit. How much atmospheric benefit, right, is the thing that’s in question? And there’s the additional complexity of if I’m using that credit to compensate for a fossil-based emission somewhere else, then you need to think about the equivalent impact.

So, there’s two parts to permanence that you have to think about as we’re thinking about this system design. So, it gets very complicated. The way the standards have dealt with this is that they, in addition to the discounting and conservative calculations, also have what are called buffer pools. So, post issuance, they’ll look, and you are actually required to basically take a portion, and it varies across standards. It’s anywhere from 5 % for the jurisdictional level programs down to as much as 30 % 40 % of your total credits must be set aside in case there is a reversal. If a reversal occurs, then there’s an extensive process that they go through to actually directly measure the amount of carbon that was released from the project. And then they will go back and cancel an equivalent number of in the buffer reserve to make sure that the atmosphere is made whole and that the credit

And you’re not allowed to issue more credits until you’ve paid the buffer reserve back. So, it acts basically like an insurance mechanism. There’s lots of debate about how big those buffer reserves should be? What types of credit should be in them? What are the rules around administering them? Should we be looking at a two to one cancellation? Like there’s a lot of pieces to unpack as we design that system, but I believe that having nature in these markets is not optional. We must use the power of nature, which is our original carbon removal machine. Trees and forests and wetlands and soils play just an essential role in regulating the climate and in helping us fight climate change cost effectively and efficiently. And so, we’re going to need to take a system level approach to addressing this challenge. And I believe we can do that.

I actually think that we have both science and the system capacity to set up really robust safety nets so that if reversals do occur, we can still make the atmosphere whole because that of course is the most important objective of any of these systems.

Mike Toffel:

Is there transparency around the levels of these buffer pools, like an annual report that would declare these as assets and the usage of them?

Alexia Kelly:

So, the treatment of assets is a whole separate ball of wax, but generally, no. And that was actually one of the things that ICVCM really came in and did was to establish minimum transparency requirements across the board. actually, co-chair our continuous improvement work program on permanence. And one of the things that we’re looking at is doing buffer reserve stress testing. So, we’re working with the standards to go in and say, OK, guys, let’s look at how you do this. Because I have to say, you know, the standards get a lot of criticism for this, but the fact of the matter is nobody else cared or was paying attention when we were setting this stuff up the first time. And so, they’ve had to be judge, jury and executioner, you know, in terms of managing and building this infrastructure to make sure that we can deliver long-term atmospheric benefit and assurance.

And so, there’s a lot of really exciting conversations happening right now about, what would a global permanence approach look like? And how could we think about these types of systems in a more comprehensive way so that you don’t have these small nonprofits trying to stand up these systems and manage this big problem by themselves but actually think about it more holistically from a market wide perspective. So, I’m expecting that there’s going to be a lot of really exciting and interesting progress in this set of topics in the next couple of years. But we still have a way to go. There’s the system is not where I think any of us would like it to be if you asked the people who work in the market today.

And this I think is just a fundamental question that we have to ask ourselves is, know, is it better to have something that is working well, but maybe not perfectly, but actually doing things in the real world or which we can easily do, should we wait until we have the perfect design, and we’ve got it all figured out and we talk to each other about it for another 20 years and then maybe we set something up once we’re sure that we have absolutely every T crossed and every I dotted. I’m very much, especially after working in the private sector for so long in the camp of it is much better to be doing things even if they aren’t delivering 100 % of what we need them to be doing because we’re in a climate emergency. And I want to throw water on the fire, not debating the color of bucket of throwing the water.

Mike Toffel:

Yeah, and even if every drop of that water doesn’t reach the fire, the majority of it is doing so, and there’s these mechanisms to reroute those drops, it sounds like.

Alexia Kelly:

And I do just have to say also, the thing about the markets and the reason why I’m still like here and doing this work is because A, I don’t think we get to where we need to be in terms of transforming capitalism without putting a price on carbon. And that’s what carbon markets do. And B, I have never, ever, ever seen a better instrument, not even in all my years working with USAID and managing hundreds of millions of international assistance dollars.

I’ve never seen a better system for getting money from where it is today to where it needs to be in the global South and to frontline communities and ecosystems in particular. It is one of the best, most transparent, most rigorous systems we have for getting that money deployed into the things that really, really matter today for helping us address climate change and in addressing, you know, historic environmental inequity and providing support to frontline communities.

And I’ll just give one super-fast anecdote on that. When I was at Netflix, there was due diligence in our project that we were buying for our portfolio. And it was this amazing rainforest for dry lands protection project in Kenya. And there’s one paragraph, like one line in their kind of spec sheet that caught my eye. And I asked about it. And they said, it’s, the headwaters of a major river. And I asked in the interview about it, and they said, yeah, that’s the, actually is also the headwaters of the drinking water source of a city of a million people down river.

So, these projects like that wasn’t quantified. It was literally just a line in the report. But the fact that this forest was providing drinking water for a million people in an area that has severely affected by drought is just an example of what we call core benefits, not co-benefits of these projects and the climate resilience and adaptation benefits that they deliver along with biodiversity and economic justice and financial transfer. So, you know, we’re big believers in the power of these markets because we’ve seen them work and we’ve seen firsthand the impact and benefits that they deliver to people.

Mike Toffel:

Not to mention the whole point of carbon pricing is to try and reduce carbon at the most efficient way possible system-wide. That’s why economists are so in favor of this as a regulatory tool and the absence of regulations, borrowing that same logic in these, in the voluntary market; that’s the goal which I think often gets overlooked.

Alexia Kelly:

It very much does. And I think in this debate too, it’s getting a little lost. We talk in the net zero conversation, there’s a big focus on how do we drive supply chain decarbonization, right? Like we want companies that have a net zero target to reduce their emissions, which is totally fair. That’s true. We do want them to reduce their emissions. We also want their money to fund all of the other mitigation that is often, actually almost always substantially lower cost than what it’s going to cost them to reduce their emissions internally, particularly in the United States.

And so, balancing that in net zero kind of rules is really important and one of the big debates we’re having right now is how do we do that so that we can keep companies focused on the hard work of decarbonizing their actual operations, right? Getting electric vehicles in, deploying energy efficiency, buying renewable energy, doing all of that really important work, while also mobilizing the money we need for nature and all of the other core of carbon removal and building that whole ecosystem and set of technologies. It’s all going to take money, and we need to be doing it all simultaneously and as quickly as possible.

Mike Toffel:

So let me pivot a bit. Now, you mentioned these markets have been around in one form or another for quite some time. And more recently, we’ve seen the injection of all sorts of interesting data revolution pieces here, whether it’s the satellites collecting CO2 and methane data, including I think methane sat you guys were involved with. EDF has just recently launched a satellite, which we did an episode on, to monitor methane emissions.

There’s all this new data; there’s machine learning and AI technologies out there. Digitization is just really like the force that and climate change seems to be like two of the next gen, like disrupt everything forces. How do you view this revolution affecting the voluntary carbon market?

Alexia Kelly:

Yeah, great question. I’m so excited about what remote sensing, AI, machine learning, and internet of things is going to enable us to do to build really excellent monitoring reporting and verification systems. We call them MRV systems. And it’s the whole system you build to make sure that the thing you’ve done is actually doing what it was supposed to do. So High Tide has invested heavily in remote sensing capability and building out that ecosystem of both technologies and support nonprofits that are doing work. So, we are an early investor and instrumental in getting Carbon Mapper up and out, which is a compliment to EDF’s methane set. Carbon Mapper does very precise measurements of super emission sources of methane around the globe.

And these emission sources were going undetected because we didn’t have granular enough data review technology. So, we’re very excited about what that transparency is. And having philanthropy come in and fund this work means that this data is available to the public. So, there’s been a lot of work that’s happened in the remote sensing and satellite space because it’s very expensive to do, right? These satellites cost tens of millions of dollars each. There’s been a lot of that that’s been built behind proprietary paywalls.

And so having philanthropic dollars coming in means that we get to build the things and then make them publicly available, which then, of course, unleashes a whole new world of accountability and transparency. And we’re seeing that it really plays out in the carbon markets in big ways as well. So, what happened a few years ago as some of these new satellite technologies were coming on board, as folks were able to look at and see at much lower cost what was actually going on in a lot of these project areas. And in some instances, they found that we were underestimating the amount of carbon that was being stored. Those stories don’t tend to make headlines. What does make the headlines are the instances, and these also existed, where we overestimated the amount of deforestation that was happening.

And that, I think, it’s a wonderful thing that we have all of this new data and information, but it also means that it’s caused sort of a reset in the market in some ways, because when we wrote these methodologies the first time, we didn’t have any of that information available. And so, it’s caused, I think, a revolution in the way in which we are thinking about and making measurements for carbon projects and overall, incredibly exciting ways. I feel so much better knowing that we have this data and information because it enables us to say with much more precision and accuracy, this is working and this isn’t working. And that’s essential to building out this entire system.

So, I, and you know, I’ll also say the internet of things. So, often, particularly in developing countries, you’ll have a bunch of cook stoves, right? You’ll have thousands of cook stoves distributed over a bunch of countries. We know there’s real environmental and atmospheric benefits to those projects. But precisely measuring exactly the amount of wood that’s going into the cook stove on a daily basis in places that often don’t even have electricity is extremely challenging, right? So, we rely on statistics, which we should continue to do. We rely on sampling and survey technologies. But we can also now put sensors on that are just automatically pinging data all the time so that we can measure much more precisely and consistently; the atmospheric impact of these things and what’s really happening on the ground.

So it’s hard to overstate the importance of these technologies for enabling us to build out that really comprehensive MRV system that’s also going to enable us to help build better permanent systems overall, inform how much should be going into buffer reserves and do a much better job of setting baselines, which as you mentioned earlier, is that counterfactual projection into the future of what would have happened in the absence of the project. And so, we can get much better historical data at much lower cost now. So, we’re really excited to see this set of technologies come to fruition and it’s going to have huge impacts on the market moving forward.

Mike Toffel:

Got it. Let’s just dive in for just a second on the cook stove example, because I want to provide a little bit of background or seek a little background. So typically, cook stoves projects involve someone financing the deployment of cook stoves to enable folks, in some cases, to, for example, use less wood or use wood more efficiently in cooking. And so, the idea is it reduces deforestation. It still allows people to cook their meals.

And the reduction in the deforestation is the carbon credit claim and the financing that these cook stoves wouldn’t be available without carbon financing. That’s the additionality claim. And the issue, as I understand it, is like the measurement question, which, for example, is, well, if we give out 1,000 cook stoves, are all 1,000 of those being used every day? Are some of them maybe breaking and therefore like only 900 in year two are being used? And then maybe some people use two of them.

And so, there’s actually, they’re getting twice as much cooked. So that’s good for the family. But they’re still using the same amount of forest, for example. So, there’s all these different question marks in the deployment. Is that the type of thing? And you’re saying with technology, we can perhaps identify the extent to which these cook stoves are being deployed to take away some of the assumptions that one would otherwise have to do or the manual census every year for X years to sort of check in on this. Is that what we’re talking about?

Alexia Kelly:

That’s right. Yeah. I think the cook stove is actually an interesting example because as you noted, a lot of these impact numbers are tied to the relative deforestation rates around the project area where people are collecting the fuel wood from.

And there’s been a lot of debate about what is the right number, how do you measure it? It’s obviously enormously variable depending on cultural, geographic, know, a wide range of conditions that are in place all over the place. And so, there’s a new model that’s being developed using remote sensing data in order to do a better job of calculating the net impact of cook stove projects in surrounding areas. And so, it’s just a good example of how new technologies are enabling us to do better and more accurate measurement. But it also is causing corrections in the market because the data we had before wasn’t as good as the data we have now. And so, most of the standards are now in the process of rewriting the rules to reflect and accommodate this new data that we have. Does that mean that everything that came before is garbage? No, it doesn’t. But it does mean that we are now even better at more accurately measuring what’s happening on the ground.

Mike Toffel:

Great. OK, so I want to pivot again. You had mentioned earlier in your intro, of course, that you’d worked for the State Department. You’d mentioned that you’d sort of helped negotiate the Paris Agreement. And so, I want to take advantage of that expertise that you bring to talk a little bit about the compliance markets, which are working in parallel to these voluntary carbon markets, and just to sort of explore the use of carbon credits in that market, which I think date all the way back to the Kyoto Protocol in the late 90s and the CDM or Clean Development Mechanism, which set up some rules for how you can, in countries that didn’t themselves have to reduce, they could nonetheless reduce and sell the benefits of that to the countries that had to reduce, which sounds a lot like a carbon credit, but maybe at the more national level, with the of the endorsement and backing of the UN, which again is different from the voluntary carbon market.

So, give us a brief history of carbon credits in the compliance market. And then all the way to today, where most recently in Baku, the most recent COP, as I understand it, there was finally some real precision around, you mentioned Article 6 of the Paris Agreement, which sets up some rules again for going above and beyond regulatory requirements using mechanisms that seem a lot like the carbon credits we’ve been talking about.

Alexia Kelly:

Yeah, absolutely. And it’s important to note that the unit of measures in carbon markets, regardless of whether they’re in voluntary or compliance systems, is the same. So, it’s one ton of carbon dioxide reduced, removed, or avoided, and typically measured in metric tons. And so, the difference comes in the methodologies, typically. and the issuing bodies for the credits. And so, one of the disadvantages of the fact that we haven’t had a global system in place so far is that there is no kind of one decider on what quality looks like in the market. So, we have this patchwork of both regulatory and voluntary standard setting bodies that have emerged over the course of the last 20 years to say, this is how you measure a ton of carbon.

These are the rules under the UNFCCC, the United Nations Framework Convention on Climate Change, which is of course where we negotiate international agreements related to climate change, under the UN has been working for almost 30 years on trying to broker an international global deal to address the climate crisis. I joined the UN process in 2010 when I joined the beginning of the Obama administration and we had just come out of Copenhagen where the US came and said, we’re back.

But the system we built under Kyoto that we helped architect, we didn’t really like so much anymore because it differentiated between developed and developing countries. And in order to get a deal through Congress, the US felt that it needed to, we needed to be on equal footing with China and India and some of our economic competitors or we were never going to get a deal done.

Mike Toffel:

Right, because those were categorized at the time as developing countries which didn’t have obligations unlike, for example, Europe and the US.

Alexia Kelly:

Correct. So, under Kyoto, global North countries, the US, the EU, Australia, Canada had emission reduction obligations that were quantified, and the global South did not. Which look, there are good reasons for that. The global North is responsible for most of historic emissions, and it’s only been until the last decade or so that the emissions profiles have flipped pretty dramatically and China and India, South Africa have all emerged as large, globally important, and impactful emitters.

So, we came to Copenhagen and, after Copenhagen, and basically said, we need a new deal. We’re going to have to renegotiate everything because we need developing countries. We know your emissions are increasing, and we need developing countries at the table and on equal fitting with the US. And so, it took us six years, but that’s what the Paris Agreement is.

It is a normative system that enables us to bring all countries to the table. In order to get all countries to the table though, what it basically meant is that the countries demanded that they were able to write their own rules by and large for what they wanted to do domestically. And so, we have these things called nationally determined contributions, which is where every country kind of sits down and says, this is what I’m willing to contribute.

These are the targets I’m setting. These are the coverage requirements. And we have reporting and transparency about that at the UN. We also have a court of public opinion where you come and you bring your report, you have this thing called a biennial report, you bring it to the UN, you stand up in front of the UN, you report on your progress and you defend it and countries ask you questions about your report.

What are you saying? What are you doing? How is this policy working? What happened here? Why did you do that? But that is now the kind of international architecture and system we have emissions training has always been a contentious part of the UN system as well. Historically, under Kyoto, there was a bidirectional, it was one-way flow. There are the clean development mechanism and a variety of other instruments, but mostly the EU, which was really the only country that had a significant carbon price and obligation, bought from developing countries who generated credits and sold them through a UN administered mechanism called the Clean Development Mechanism.

Because we are now in a world where all countries have emission reduction obligations, accounting gets a little more complicated and so do the kind of flows of credits and the potential flows of credits. So, there’s three articles or three paragraphs under Article 6, which are the emissions trading provisions of the UN. Article 6.2 covers bilateral agreements. Its countries now are able to sort of write their own rules for how they want to do emissions trading.

If two countries want to cooperate on a bilateral basis, they can do that. They set up their program. They set up their systems and they report it to the UN on this is what we’re doing. Article six, four is the centralized UN body. So, we have a centralized UN body that works a little bit like the CDM did where there is a decision-making group that sets the rules, approves the methodologies and issues the credits supported by the UN secretariat.

And it’s called Article 6 supervisory body. And they are responsible for writing and approving the rules under Article 6.4. So that’s for countries that would really like to have that UN oversight and want a centralized and administrative, you know, it’s like a system that’s all kind of taken care of for them. That’s what 6.4 is. And then the third paragraph is 6.8, which are non-market-based approaches, which is in there at the assistance of countries that don’t believe in market based mechanisms. I wanted to have a note in there. So that’s the story about over a beer. The important ones are really 6-2 and 6-4.

Mike Toffel:

Great, so what’s the recent. What happened in Baku? Because these paragraphs were in from the beginning, or are these paragraphs new? Yeah, okay.

Alexia Kelly:

Those are the original paragraphs. And then what we needed to do was write all the rules for how you did that because 6.4 just says we’re establishing this body. And then you had to go out and write all of the what we call modalities and procedures, all of the rules for how the body was going to work, who was going to sit on the body, how many times was it going to meet, who was going to be responsible for making the decisions, what were the criteria that you’re using to make the decisions, like all of that had to be written up. And that ended up taking 10 years.

So, they worked for the last nine years. So, what happened in Baku that’s so significant is that we finally agreed to the rule set. So that means that in the next year or so, we should start to see credit flowing through this UN system as methodologies get approved and as projects and pipelines start to come in. So that’s net-net a great thing because it hopefully means that we’ll get that money going again into mitigation globally.

But we’re going to need to watch it as well because the thing I personally don’t like about the Paris Agreement is that it is normative, and it is largely voluntary, and it doesn’t have as much oversight as I would like of stuff that’s happening in different parts of the world. We rely very heavily on transparency. So, it’s going to be important for civil society and others to keep an eye particularly on 6-2 and these bilateral agreements that countries are writing because here you have two parties that are both interested in potentially inflating baselines or creating credits because it’s going to make it cheaper and easier for them to meet their international commitments.

So that’s a place where I think folks are going to need to keep a close watch and that we’re certainly watching closely, and we’ll be seeking to advocate. I think it’s also a place where the ICVCM is going to help a lot because we are becoming truly a global benchmark for both voluntary and systems about what right looks like and what good looks like. So, making sure that we push the whole world towards that threshold is going to be incredibly important in the next few years.

Mike Toffel:

Yeah, a very interesting enmeshing of the voluntary institutions and the compliance markets that are emerging. OK, so before I let you go, let me ask you the same question I ask all of our guests before I let them go, which is advice. So, for folks who are listening to this podcast, some might be interested in learning more, maybe thinking about career opportunities in the voluntary carbon markets or maybe now in the compliance carbon markets, now that there’s some interesting evolution going on – What advice do you have for them as far as podcasts or conferences or websites or newsletters, whatever it might be?

Alexia Kelly:

Yeah, but well, before I give you my quick list, I will say I really encourage people to go work in government. Go work in government. You learn a tremendous amount. Our civil society and government institutions are really fragile right now. And whether it’s the local level, the federal level, we need good people filling those jobs. And so, it is the best way to develop your career I can possibly imagine. And its’ incredibly important work for keeping the fabric of our society knit together. If you want to get more involved in carbon markets, there’s lots of good resources out there. So, there’s the Navigating the American Carbon World Conference that happens every year in Los Angeles in March.

That’s hosted by the Climate Action Reserve and the International Emissions Trading Association. The International Emissions Trading Association is an industry association that is basically the carbon market. And they host a number of conferences as well that are quite good. Then if you can, go to a COP or go to the UN because that is where a lot of these conversations end up happening. So that’s a really excellent place to learn more as well.

Mike Toffel:

Great. Terrific. And you’re launching a podcast?

Alexia Kelly:

I am launching a podcast, yep, just in the next couple of months. It’ll be called Navigating Net Zero and we’ll cover both corporate decarbonization perspectives as well as really dig into some of these meaty issues that you did such a great job raising here today, Mike, in terms of how does the carbon market work, what’s happening, where is it going, and how do I stay abreast of the rapidly moving and evolving spaces.

Mike Toffel:

Well, I look forward to subscribing once that’s launched. It was a wonderful conversation, Alex. Very wide-ranging and I’ve learned a lot. So, thank you so much.

Alexia Kelly:

Thanks so much and thanks so much for having me today. This is great.

◆

If you enjoyed this article…

◆

Learn more about the coaching process or
contact me to discuss your storytelling goals!

◆

Subscribe to our newsletter for the latest updates!

Science Storytelling with TILclimate – Farm to table, with a side of fossil fuels

March 6, 2025/in Climate Change, Education, Environment, Science, Storytelling/by Mark Lovett

Climate change stories can be complex, especially when they’re full of technical descriptions and lists of numbers. But when these stories are linked to our daily lives we can understand the issue more clearly. So I had to laugh when listening to this episode of the TILclimate Podcast as I happen to be a fan of tortilla chips — artisan style, of course — and this story highlights how fossil fuels are part of the journey, from start to finish, of tortilla chips traveling from the farm to store.

While this narrative involves a specific food product, you can easily see how the process applies, with minor variations, to a host of other items that end up on our table. In this case, we have farm machinery, giant fans, trucks, fermenters, grinders, dryers, fryers, fertilizer, and even the plastic bag the chips come in.

Instead of people, the cast of characters includes objects, chemicals, processes, but we get a visual sense of how everything works as raw ingredients are grown, processed, packaged and delivered. Instead of a, “This is what climate change is doing to the planet” story, we have a, “Behind the scenes look at how the things we consume contribute to the problem of climate change” type of story.

There are no villains here, no finger pointing or blame, just a real life example of how a food manufacturing process works. And since there are many steps in the tortilla chip supply chain, finding a more sustainable solution involves solving a number of problems.

If you’re working on a personal story that’s founded in science, think about how your technology or research can be explained within the context of a story that your audience can relate to. And if you need help creating and presenting that story, reach out, I enjoy working with scientists who are making an impact!

Transcript

LHF: Hello, and welcome to Today I Learned: Climate from the Massachusetts Institute of Technology. I’m Laur Hesse Fisher.

If you’re like many of our listeners, you might be wondering: okay, so the CO₂ from burning fossil fuels is warming the planet, right? So why haven’t we just gotten rid of all these fossil fuels already?

Because we live in a world that’s currently dependent on fossil fuels, yet a lot of that dependence is invisible to us. So we collaborated with TABLE, an international coalition of universities that helps the public understand our food system. Their recent podcast miniseries is called Fuel to Fork, and it explores all the many ways that fossil fuels are involved in putting food on our plates.

And today, we’re going to get a glimpse into the hard work that is happening to eliminate pollution from the food system—and in doing so, explore the very real ways that our food now depends on fossil fuels.

Even to produce the simplest thing, like a tortilla chip.

JC: I love tortilla chips. In fact, I had some on the weekend, and they were very tasty.

LHF: That’s Jennifer Clapp.

JC: I’m a professor and Canada research chair in global food security and sustainability at the University of Waterloo in Canada. I’m also a member of IPES-Food, which is the international panel of experts on sustainable food systems.

LHF: She’s here to help us follow the journey of a tortilla chip from farm to grocery store, taking note of all the ways fossil fuels are used along the way. So let’s get started.

JP: Well, tortilla chips have relatively few ingredients. They’re made of corn, or in the rest of the world, it’s called maize.

LHF: Here in the U.S., we have over 90 million acres of cornfields. If that were a state, it would be the fifth largest, just barely behind Montana. And if you took a drive through this great state of corn, the first thing you might notice above the vast, waving expanses of green are the machines that tend the corn from planting to harvest.

JP: Farm machinery typically runs on Diesel fuel. And that’s the machinery used to plow the fields, drill the seeds, spread the fertilizer, spread the pesticides, spread the herbicides. Also for harvesting crops, big machinery is used, you know, combine harvesters and other kinds of machines that thresh the grain.

LHF: It’s probably no surprise that these great machines need fuel to run. But what about the quieter parts of a corn farm—like the barns?

JC: Corn has a lot of moisture in it. It’s a heavy crop, and to store it properly it needs to be dried. And farmers typically use giant fans in a barn to dry out the corn and typically heat those barns with propane fuel.

LHF: The two things we’ve mentioned so far—the farming machinery, and drying the crop—make up about half of the fossil energy use on a typical corn farm. There’s one last big chunk of emissions that we’re going to come back to a little bit later in this episode.

For now, though, we’re packing up our corn for sale.

JC: Commodities like corn do travel around a fair bit. If it’s trucked, it’s typically using diesel fuel. And also, if it’s shipped, it’s definitely using oil.

LHF: Those fossil fuels get our dried corn to a factory, where it will be turned into masa, the delicious dough that makes a tortilla.

JC: And what it involves is soaking and simmering, like cooking, these dried kernels of corn for up to 12 hours. And that process is called wet milling.

LHF: For our tortilla chips, this is almost the end of the line: the masa from the wet mill is ready to be shaped, baked and fried. Other corn products will keep passing through more screens and grinders and dryers and fermenters, on their way to becoming things like cornstarch, and corn syrup, and even the ethanol we add to gasoline.

There isn’t good recent data on this, but back in 2001 the US Energy Information Administration did a study of corn wet mills and found that they used 15% of all the energy in, not just corn, but the entire U.S. food industry.

JC: So that gives you a sense of just how energy consumptive it is.

LHF: When you hear about “ultra-processed” foods, this is what it means: the ingredients go through a whole bunch of machines to break them down to their proteins and fibers and oils and such. And it tends to use a lot of fossil fuels—and be less healthy for us, too.

With our tortilla chips, the last machine would be the fryer that makes them nice and crispy and snackable. But there’s one more step before they’re shipped to the grocery store, and that’s packaging.

JC: In my local community I can buy corn chips that come in a paper bag, which really makes me happy. But most corn chips that you’re going to find in a grocery store shelf are packaged in plastic.

LHF: And that plastic is made of—do you know? It’s oil!

Yeah, our food system doesn’t rely on fossil fuels just for energy. Tons of stuff—packaging, farm equipment—is also made of fossil fuels.

JC: You might have seen large sheets of plastic covering farm fields that sort of keep in moisture and keep temperatures warm in the soil, or covering a greenhouse, Herbicides, pesticides; they’re all fossil fuel sort of oil based chemicals. So when we think about fossil fuels on the farm, they’re just, they’re everywhere.

LHF: Remember earlier, when we found that the farming and drying machinery added up to about half of a farm’s fossil energy use? Well, most of the remaining half comes from just one of those fossil fuel-based chemicals alone.

JC: The fertilizer use is probably the biggest use of fossil energy when we’re talking about growing corn.

LHF: For as long as there’s been farming, people have been adding fertilizers like manure and wood ash to soil to revitalize it.

JC: These products really started to be used much more frequently after around the 1840s, when scientific developments led to an understanding about the importance of nitrogen, phosphorus, and potassium as key nutrients that plants need for better plant growth.

Phosphorus and potash are actually today typically mined from the earth and processed to make fertilizers.

LHF: But the third nutrient, nitrogen, is trickier: there’s no nitrogen rock that we can mine. On the other hand, there is one very abundant source of nitrogen very close to hand. It’s in the air we’re breathing. Earth’s atmosphere is almost 80% nitrogen gas.

JC: And scientists knew that nitrogen was in the air. They just didn’t know how to capture it and make it into a physical, usable form that could be applied to soil.

LHF: And then, in the early 1900s, two German chemists, Fritz Haber and Carl Bosch, figured it out. If you react nitrogen with hydrogen, they mix to make NH₃, also known as ammonia. And this became the main ingredient for modern fertilizers.

The catch is that the hydrogen comes from yet another fossil fuel: natural gas.

JC: So the Haber Bosch process really changed everything because people didn’t have to worry about where the nitrogen was going to come from to fertilize crops. And the use of synthetic nitrogen increased massively.And what that’s meant is that more crops can be grown. More land around the world can be cultivated for agriculture, because the nutrients can be continually replenished.

LHF: And on that land, humans are supplying a regular stream of nitrogen, provided mostly by natural gas. Where, unfortunately, it continues to impact the climate in yet another way.

JC: There’s been a tendency to over-apply fertilizer. Just as kind of like an insurance policy that farmers want to be sure that they’re putting enough on the field to ensure plant growth.

But not all nitrogen that’s put down in the field is taken up by the plant. And then soil microbes eat up the nitrogen, and it converts it into a gas called nitrous oxide, which is more damaging than carbon dioxide when we’re talking about climate change. And corn uses a lot of fertilizer, so it has a lot of nitrous oxide emissions.

LHF: Fertilizer is by far the biggest way that humans create nitrous oxide, this highly climate-warming gas. If you add both the manufacturing process and the nitrous oxide, fertilizer has the same impact on the climate as a major country—in fact, it contributes as much to climate change each year as Japan does, which is the world’s seventh-largest climate polluter.

JC: So all in all, the fertilizer industry is pretty significant.

LHF: Okay, so what do we do about all this? You might ask: is it even possible to have our tortilla chips without the climate pollution?

JC: Can I imagine a fossil fuel free bag of corn chips? I think, in this current world that we live in, that’s a bit hard to imagine, given all of the places in the whole production process that have relied on and continue to rely on fossil energy.

LHF: Let’s take farming machinery for a moment. You might say, well, couldn’t we just run these machines on electricity, like switching a gas-powered car for an electric car? And, yeah—we probably could.

JC: But it’s not straightforward. Because a tractor has to have a lot of horsepower, especially for plowing, especially for these sort of harvesting and threshing activities.

LHF: That means that an electric tractor would need to hold a lot of energy in its battery. For the heaviest equipment like combine harvesters, the industry is still waiting on more powerful motors and batteries to hit the market—and to be affordable.

But don’t throw up your hands. There is a lot we can do right now. Like in the drying barns, which can be heated electrically, and the wet mills that can switch to clean power sources. Or what about the problem of overapplying nitrogen? That’s no good for anybody who cares about our climate—but it’s also especially bad for the people buying all this fertilizer that just ends up being wasted.

JC: Because it’s a big cost for farmers. And the big companies are all investing in digital technology that can analyze the type of soil and its fertility, and then provide advice to farmers that says you should only put this much fertilizer in this part of your field. Maybe you want to use a little bit more in that part of your field.

LHF: There are also these things called “slow release” fertilizers, which are coated in a slow-dissolving plastic so all the nitrogen doesn’t get dumped on the field at once. Or, could we produce the nitrogen our corn needs without using natural gas? There are emerging processes that use clean electricity instead, or even engineered microbes in the soil. All of these ideas are being actively pursued right now—and also studied to see what kinds of unintended effects might arise if we start doing things like treating our soils with plastics, or using a lot of energy for AI-powered digital farming tools.

So today, we wanted to highlight the often hidden fossil fuel use in our food system—but we also wanted to highlight the often-invisible solutions that are happening. Because as more and more of us get activated and equipped to tackle this issue, researchers, innovators, investors, and folks working across the food system get creative, and solutions like these become possible.

JC: So it’s a big ask to say, okay, throw that model out the window and start from scratch with something else. But there are models of other things that can work, such as agroecology, which is using nature’s own processes to provide the fertilization of soils by growing different crops next to each other. It’s a big change. And so it’s not going to happen overnight.

But I always think about the fact that the way that we ended up with the agriculture we have today took about 200 years. Farmers did adopt synthetic fertilizers. They did adopt hybrid seeds. You know all of the aspects that we think of as conventional farming today were at one point new technologies. So we shouldn’t think necessarily that farmers are going to be resistant to change. But that change has to be tangible for them in terms of the benefits, and it has to be easy, and it has to be affordable.

LHF: And that’s harder than just saying, keep the fossil fuels in the ground. But in the end, this hard, steady work is what it’s going to take to have a clean economy that offers us a good living and the things that we need. And even the things that we like, like a bag of chips.

That is our show. But if you’re interested in learning more about fossil fuels in the food system, I invite you to check out the entire Fuel to Fork miniseries from TABLE, in collaboration with IPES-Food and the Global Alliance for the Future of Food. Just look up Fuel to Fork on Apple Podcasts, Spotify, or wherever you get your podcasts.

And hey, you can also look up TILclimate there and follow us—there are lots more episodes to brush up on your climate knowledge. Or get in touch and ask us your climate change questions! Email us at tilclimate@mit.edu, or leave us a voicemail at 617 253 3566.

TILclimate is the climate change podcast of the Massachusetts Institute of Technology. Aaron Krol is our Writer and Executive Producer. David Lishansky is our Audio Producer. Michelle Harris is our fact-checker. Grace Sawin is our Student Production Assistant. The music is by Blue Dot Sessions. And I’m your Host and Senior Editor, Laur Hesse Fisher.

A big thanks to Prof. Jennifer Clapp for speaking with us, and to you, our listeners. Keep up your climate curiosity.

And if you want to dive deeper into this topic:

Read more about Prof. Clapp.
For a deeper dive into where fossil fuels are used in the global food system, check out the Fuel to Fork podcast mini-series produced by TABLE, IPES-Food and the Global Alliance for the Future of Food.
For detailed data on the sources of greenhouse gas emissions in the global food system, see this scientific publication from the Food and Agriculture Organization of the United Nations. The data is also summarized in this report, and made available in an interactive tool where you can break down emissions by source, country, and type of greenhouse gas.
Learn more about how fertilizer is produced and why it contributes to climate change with this Explainer from the MIT Climate Portal.
This episode breaks down the use of fossil energy on a typical corn farm. You can find data on this question from the University of Minnesota and Iowa State University.
TILclimate has covered related topics in our episodes on farming a warmer planet and what I eat.
For an overview of climate change, check out our climate primer: Climate Science and Climate Risk (by Prof. Kerry Emanuel).
For more episodes of TILclimate by the MIT Climate Project, visit tilclimate.mit.edu.

◆

If you enjoyed this article…

◆

Learn more about the coaching process or
contact me to discuss your storytelling goals!

◆

Subscribe to the newsletter for the latest updates!

Santa Fe Institute.- Nature of Intelligence – Complexity Wrap Up

March 2, 2025/in AI, Artificial Intelligence, Intelligence, Neuroscience, Society/by Mark Lovett

Hopefully you have been along for the ride and have listened to all six episodes. It’s been a lot to digest, a lot to think about. While the field of neuroscience has made great strides, when it comes to the subject of human intelligence there’s still so much to learn. Which is why I’ve appreciated this podcast.

And now we have AI entering the picture. Will it augment our IQ, or surpass us, to our detriment? It’s a mystery. So much upside, yet there’s a dark side to how AI can be used by bad actors operating behind the scenes. If you missed a post:

As a key to this series is an exploration of AI, I asked Google’s NotebookLM to provide some insights as to the key points that were explored over the series. Does this synopsis align with your impressions? Here’s the cast of characters:

Melanie Mitchell (host) – Professor at the Santa Fe Institute working on artificial intelligence and cognitive science. In the final episode, she is interviewed about her background, views on AI, AGI, and the future of the field.
Abha Eli Phoboo (host) – Abha is a writer and an obsessive rewriter. Interested in the arts and sciences, she explores the weak interaction between the two. A CERN Press Officer, she translates physics into English and helps scientists communicate their research to the world.
Alison Gopnik – Professor of psychology and philosophy, member of the Berkeley AI Research group, external professor with the Santa Fe Institute, who studies how children learn.
John Krakauer – Professor of neurology, neuroscience, physical medicine, and rehabilitation at Johns Hopkins University School of Medicine, who researches intelligence and physical movement in animals, machines, and humans.
Ev Fedorenko – Featured in the second episode discussing the relationship between language and thought. Her work includes using fMRI brain scans to examine the relationship between language and other forms of cognition.
Steve Piantadosi – Featured in the second episode discussing the relationship between language and thought. He provides examples of how language can make learning more efficient.
Gary Lupyan – Featured in the second episode discussing the relationship between language and thought. He believes language is one of the major reasons for human intelligence, potentially more of a cause than a result.
Murray Shanahan – Professor of cognitive robotics at Imperial College London and principal research scientist at Google DeepMind.
Tomer Ullman – Psychologist at Harvard University studying computation, cognition, and development.
Linda Smith – Chancellor’s Professor of Psychological and Brain Sciences at Indiana University, a developmental psychologist and pioneer of head-mounted camera research with infants.
Mike Frank – Professor of psychology at Stanford, who studies how children learn and uses large datasets and new methodologies.
Erica Cartmill – Professor of cognitive science, animal behavior, anthropology, and psychology at Indiana University, who studies cognition and communication across a wide range of species, including great apes and human children.
Ellie Pavlick – Discusses how we assess intelligence, particularly in machines, and the challenges of applying human tests to AI. She also talks about the difficulty of understanding how LLMs work internally.

AI Summary via NotebookLM

This podcast series explores the complex question: What is intelligence?. It highlights that defining intelligence is difficult and that there is no single, simple definition; it’s more like a “suitcase word” packed with various capabilities. The series draws on insights from cognitive scientists, child development specialists, animal researchers, and AI experts.

Human intelligence involves many facets. It includes learning about cause and effect by experimenting and interacting with the world. Humans are good at generalizing knowledge and making analogies, applying what they learn in one situation to new ones without needing vast amounts of retraining. Common sense, which relies on innate understandings of the physical world and flexibility in thinking, is also crucial.

Language is seen as a backbone of human culture and a powerful tool for sharing information and ideas, enabling us to learn without direct experience and understand abstract concepts. There is debate, however, on whether language is a cause or a result of human intelligence, and whether language and thought are fundamentally separate or intertwined. Some evidence suggests they can be separate, at least in adults. Human intelligence also relies heavily on our social nature, drive to collaborate, and the unique role of caregiving in development.

Large Language Models (LLMs) like ChatGPT are a focus of the series. These systems are trained on enormous amounts of human-generated text data from the internet. They work by finding statistical correlations in language and predicting the most likely next word or “token”. While LLMs can produce sophisticated and sometimes creative language, there are significant differences compared to human intelligence.

LLMs learn passively from data, unlike humans who learn actively through interaction with the world. They lack an inherent drive to explore or understand the world. There is debate on whether LLMs truly “understand” language in a meaningful sense or simply know how to use words based on patterns. They also cannot engage with the world to update “beliefs” and sometimes make things up, a behavior called “hallucinating”.

Assessing the intelligence of LLMs is challenging. Applying tests designed for humans, like the SAT, might not mean the same thing for a machine. Some researchers suggest LLMs might be learning how to pass the test rather than exhibiting general reasoning ability. Understanding how LLMs actually work internally (“mechanistic understanding”) is seen as crucial but is still a nascent area of research. Some propose thinking of LLMs as sophisticated “role-players” rather than entities with beliefs or consciousness. LLMs might also be better understood as reflecting collective knowledge rather than a single agent’s intelligence.

The concept of Artificial General Intelligence (AGI), often described as human-level intelligence in machines, is discussed, but its definition remains vague and debated. The current path to building powerful AI is seen by some as unsustainable due to the immense data and energy requirements, suggesting that future AI might need to be more “embodied” and learn more like humans or animals.

Beyond theoretical fears, the series highlights real, present risks of AI, including the spread of deepfakes and disinformation, which can erode trust and make it harder to find reliable information online. The unauthorized use of human-generated data for training AI is also raised as an ethical concern.

Top Five Audience Takeaways

Defining “Intelligence” is Surprisingly Difficult. Instead of being a simple, single thing we can measure, intelligence is like a “suitcase word” packed with many different abilities and ways of being. Researchers across various fields agree that there’s no easy, complete definition of what makes something intelligent, whether it’s a person, an animal, or a machine.
Human Intelligence is Deeply Tied to Active Experience and Social Interaction. Humans don’t just passively absorb information; we learn by actively exploring the world, doing “little experiments,” and figuring out cause and effect. Our ability to generalize knowledge to new situations with limited examples is crucial. Furthermore, language, our drive to collaborate, and the unique role of caregiving are fundamental to how our intelligence develops and functions.
Today’s Powerful AI, like ChatGPT (LLMs), Works Very Differently from Human Intelligence. These systems are trained on enormous amounts of text data from the internet, learning by finding statistical patterns and predicting the next word. Unlike humans, they learn passively, lack an inherent drive to explore the world, don’t have beliefs, and can sometimes “hallucinate” or make things up. While they can produce impressive language, there’s a significant debate about whether they truly “understand” in a human sense or are just very sophisticated at using patterns.
Testing AI Intelligence Using Human Standards is Tricky. Applying tests designed for humans, like the SAT or theory-of-mind tasks, to LLMs might not accurately reflect their capabilities. LLMs might simply be learning how to pass the specific test through pattern matching from their vast training data, rather than exhibiting genuine reasoning or understanding. Understanding how these AI systems arrive at their answers – looking “under the hood” – is a crucial but difficult area of research. We also need to be mindful that our human-centric view can limit how we assess intelligence in other entities, including animals.
Current AI Approaches Face Significant Challenges and Present Real Risks. The reliance on massive data and energy to build powerful AI systems may not be sustainable or efficient in the long run. Beyond theoretical fears about Artificial General Intelligence (AGI), there are immediate concerns like the spread of deepfakes and misinformation, which can erode trust and make finding reliable information difficult. There are also ethical questions about using vast amounts of human-generated data to train AI without permission or benefit to the creators. Some researchers suggest future AI development might need to take a different path, perhaps learning more like babies or animals, to be more sustainable and genuinely intelligent.

◆

If you enjoyed this article…

◆

Learn more about the coaching process or
contact me to discuss your storytelling goals!

◆

Subscribe to the newsletter for the latest updates!

Nature of Intelligence – Episode Six – AI’s changing seasons

March 1, 2025/in AI, Artificial Intelligence, Intelligence, Language, Neuroscience/by Mark Lovett

In this final episode of the Complexity podcast, Melanie Mitchell provides us with a bit of her backstory — how she became interested in the topic of AI — and the path she’s been on in the 35 years since she got her PhD, years. She shares the little-known fact that AI wasn’t always the hot topic it’s been in our recent memory, having been through a few up and down cycles along the way.

The world of AI has gone through several cycles of huge optimism and people thinking that true AI is just around the corner, just a few years away. And then disappointment because the methods that AI is using at the time don’t actually turn out to be as promising as people thought. ~ Melanie Mitchell

When she mentions that, “cognitive scientists have been trying to understand what human level intelligence is for a century now” it’s a stark reminder that it doesn’t make sense to compare human intelligence to artificial intelligence if we’re not sure what’s going on in our own minds.

Intelligence, as we’ve seen throughout the podcast is not a well-defined sort of rigorously mathematically defined notion. It’s what Marvin Minsky, the AI pioneer, called a suitcase word. And by that he meant that it’s like a suitcase that’s packed full of a jumble of different things, some of which are related and some of which aren’t. ~ Melanie Mitchell

And there’s no single thing that intelligence is. It’s a whole bunch of different capabilities and ways of being that perhaps are not just one single thing that you could either have more of or less of, or get to the level of something. It’s just not that kind of simple thing. It’s much more of a complex notion. ~ Melanie Mitchell

The dark side to AI is also brought to light, with mention of deep fakes and voice cloning, alongside the perils of misinformation and disinformation. As to what is on the horizon, a big worry is that impersonating humans will become a thing. The bottom line, as AI gets more intelligent, there’s an upside, and a downside.

Hopefully this podcast series gave you some insight as to how the story of our common humanity, and your own story, may unfold.

Transcript

Abha Eli Phoboo: From the Santa Fe Institute, this is Complexity

Melanie Mitchell: I’m Melanie Mitchell

Abha: And I’m Abha Eli Phoboo

Abha: Melanie, it’s so wonderful to be able to sit down and ask you questions this time. Could we maybe get started with, you know, how you got into the business of AI, could you maybe tell us a little bit about that?

Melanie: Yeah, so I majored in math in college. And after college, I worked as a math teacher in a high school in New York City. But while I was there, I didn’t really know what I wanted to do. I knew I didn’t want to teach forever. So I was reading a lot. And I happened to read a book called Gödel, Escher, Bach by Douglas Hofstadter.

And it was a book about, well, Gödel, the mathematician, Escher, the artist, and Bach, the composer, obviously. But it was really much more. It was about how intelligence can emerge from non-intelligent substrate, either in biological systems or perhaps in machines. And it was about the nature of thinking and consciousness. And it just grabbed me like nothing else ever had in my whole life. And I was just so excited about these ideas.

So I decided I wanted to go into AI, which is what Hofstadter himself was working on. So I contacted him. He was at Indiana University and I never heard back. In the meantime, I moved to Boston for a job there and was hanging around on the MIT campus and saw a poster advertising a talk by Douglas Hofstadter. I was so excited.

So I went to the talk and I tried to talk to him afterwards, but there was a huge crowd of people around him. His book was extremely famous and had a big cult following. So then I tried to call him at his office. He was on sabbatical at MIT, it turned out, and left messages and never heard back. So finally I figured out he’s never at his office during the day, so he must be there at night.

So I tried to call him at 10 in the evening and he answered the phone and was in a very good mood and very friendly and invited me to come talk to him. So I did and I ended up being an intern in his group and then going to graduate school to work with him. So that was the story of how I got to my PhD program.

It was actually at University of Michigan where he was moving to, and worked with him for my PhD working on how people make analogies and how a machine might be able to make analogies in a similar way.

Abha: That’s so interesting. I mean, you were very tenacious, you kept not giving up.

Melanie: Yeah, exactly. That was the key.

Abha: So when you graduated, I’ve heard you mentioned before that you were discouraged from mentioning AI in your job search. Could you maybe tell a little bit about what the world of AI was like at that point?

Melanie: Yeah, so the world of AI has gone through several cycles of huge optimism and people thinking that true AI is just around the corner, just a few years away. And then disappointment because the methods that AI is using at the time don’t actually turn out to be as promising as people thought.

And so these are called sort of the AI springs and AI winters. And in 1990, when I got my PhD, AI was in the winter phase. I was advised not to use the term artificial intelligence on my job applications. I was advised to use something more like intelligent systems or machine learning or something like that, but the term AI itself was not looked well upon.

Abha: So what do you think now of the fact that the Nobel Prize just recently went to people working in AI? The one for physics went to John Hopfield and Geoffrey Hinton for their work in machine learning. And then Demis Hasabis for chemistry. What do you think of that?

Melanie: Well, obviously we’re in an AI spring or summer right now and the field is very hot and people are again predicting that we’re going to have, you know, general human and level machine intelligence any day now. I think it’s really interesting that the Nobel prizes this year were sort of, you know, the AI sweep.

There were a lot of people joking that ChatGPT would get the literature prize. But, I was a little surprised at the physics prize, not so much at the chemistry prize. You know, the chemistry prize was for Alpha Fold, which is a program from Google DeepMind, which is better than anything that ever came before in predicting protein structure. That was obviously a huge, huge success and incredible achievement.

So I think that was not surprising to me at all that the DeepMind people got that award. The physics award, you know, Hopfield is a physicist and the work that he did on what are now called Hopfield networks was very inspired by physics. Hinton I was a little more confused about just because I don’t didn’t really see the physics connection so much. I think it is just more the impact that machine learning is having on physics. And machine learning today is all about neural networks, and Hinton was obviously a big pioneer in that field. So I think that’s the thinking behind that. But I know a lot of physicists who have grumbled that that’s not physics.

Abha: Yes, it’s been very interesting to see that debate in the physics community. You and I, you know, we’ve talked to so many researchers over the course of the season, and I wanted to ask if there was something you were hoping to learn when we first started building this podcast together?

Melanie: Well, I think one reason I was excited to do this podcast was because I wanted to talk to people, not just in AI, but also in cognitive science. The voices of cognitive science and AI haven’t been given as much sort of airtime as people who are at big AI companies or big AI labs. I think that they’ve been missing a key element, which is, what is this thing we’re calling intelligence?

What is the goal of something like general AI or AGI? What’s the thing we’re trying to get to when we talk about human level intelligence and cognitive scientists have been trying to understand what human level intelligence is for a century now. The ideas that these people have about intelligence seem to be very different from those of people sort of leading the pack in the AGI world. So I think that’s an interesting contrast.

Abha: I agree. I think I learned a lot too. And John Krakauer, one of the first guests we had in the first episode of the season, you and he are currently going through a three-year discussion project to understand the nature of intelligence. And I’m curious about what you’ve learned. I know you had your first meeting. So what you learned in that first meeting and why do you think it is so important that you want to put this exercise together for a number of years, not just a couple of sessions that end in a month or two.

Melanie: Well, I think there are several aspects to this. So John Krakauer and I have been talking for years about intelligence and AI and learning, and we finally decided that we should really have a set of very focused workshops that include people from all these different fields, similar to this podcast, about the nature of intelligence. AI and machine learning, it’s a very fast moving field.

You hear about new progress every day. There’s many, many new papers that are published or submitted to preprint servers. And it’s just overwhelming. It’s very fast. But there’s not a lot of more slow thinking, more long-term, more in-depth thinking about what it is that we’re actually trying to do here. What is this thing called intelligence? And what are its implications, especially if we imbue machines with it?

So that’s what we decided we would do, kind of slow thinking rather than very fast research that is taking over the machine learning and AI fields. And that’s what in some sense, SFI or Santa Fe Institute is really all about is trying to foster this kind of very in-depth thinking about difficult topics. And that’s one of the reasons we wanted to have it here at the Santa Fe Institute.

Abha: It almost seems counterintuitive to think of AI now in slower terms because the world of AI is moving at such speed and people are trying to figure out what it is. But going back to our original question in this podcast, what do we know about intelligence right now?

Melanie: Well, intelligence, as we’ve seen throughout the podcast is not a well-defined sort of rigorously mathematically defined notion. It’s what Marvin Minsky, the AI pioneer, called a suitcase word. And by that he meant that it’s like a suitcase that’s packed full of a jumble of different things, some of which are related and some of which aren’t.

And there’s no single thing that intelligence is. It’s a whole bunch of different capabilities and ways of being that perhaps are not just one single thing that you could either have more of or less of, or get to the level of something. It’s just not that kind of simple thing. It’s much more of a complex notion. There’s a lot of different hallmarks that people think of. For me, it’s generalization, the ability to generalize, to not just understand something specific, but to be able to take what you know and apply it in new situations without having to be retrained with vast numbers of examples.

So just as an example, AlphaGo, the program that is so good at playing Go. If you wanted to teach it to play a different game, it would have to be completely retrained. It really wouldn’t be able to use its knowledge of Go, or its knowledge of sort of game playing, to apply to a new kind of game. But we humans take our knowledge and we apply it to new situations. And that’s generalization, that’s to me one of the hallmarks of intelligence.

Abha: Right. I’d like to go into your research now, and if you could tell us a little bit about the work you’ve done in conceptual abstraction, analogy making, and visual recognition and AI systems. The problems you’re working on right now, could you tell us a little bit about that?

Melanie: Sure. So I started my career working on analogy making. And when I got to Doug Hofstadter’s group, he was working on building a computer system that could make analogies in a very idealized domain, what he called letter string analogies. So I’ll give you one. If the string ABC changes to the string ABD, what did the string IJK change to?

Abha: IJL.

Melanie: Okay, very good. So you could have said, ABC changes to ABD, that means change the last letter to a D, and you would say IJD. Or you could have said, ABC changes to ABD, but there’s no Cs or Ds in IJK, so just leave it alone. But instead, you looked at a more abstract description. You said, okay, the last letter changed to its alphabetic successor.

That’s more abstract. That’s sort of ignoring the details of what the letters are and so on and applying that rule to a new situation, a new string. And so people are really good at this. You can make up thousands of these little letter string problems that do all kinds of transformations and people get the rules instantly.

But how do you get a machine to do that? How do you get a machine to perceive things more abstractly and apply what they perceive to some new situation? That’s sort of the key of analogy. And it turned out it’s quite difficult because machines don’t have the kind of abstraction abilities that we humans have. So that was back when I was first starting my PhD, that was back in the 1980s.

So that was a long time ago in AI years. But even now, we see that even the most advanced AI systems like ChatGPT still have trouble with these kinds of analogies, and there’s a new kind of idealized analogy benchmark that was recently developed called the Abstraction and Reasoning Corpus, which features more visual analogies, but similar to the ones that I just mentioned.

You have to try and figure out what the rule is and apply it to a new situation. And there’s no machine that’s able to do these anywhere near as well as people. The organizers of this benchmark have offered a prize, right now it’s at $600,000 for anybody who can write a program or build some kind of machine learning system that can get to the level of humans on these tasks. And that prize is still unclaimed.

Abha: I hope one of our listeners will work on it. It would be very cool to have that solved.

Melanie: We’ll put the information in the show notes.

Abha: So can you tell me know how do you go about testing these abilities?

Melanie: So the key for the letter string analogies and also for the abstraction and reasoning corpus problems that’s abbreviated to ARC is to show a few demonstrations of a concept. So like when I said ABC changes to ABD, the concept is, change the rightmost letter to its successor.

Okay, and so I showed you an example and now say, here’s a new situation. Do the same thing. Do something analogous. And the issue is, I haven’t shown you millions of examples, I’ve just shown you one example or sometimes with these problems you can give two or three examples. That’s not something that machine learning is built to do. Machine learning is built to pick up patterns after seeing hundreds to millions to billions of examples, not just one to three examples. So this is what’s called few-shot learning or few-shot generalization.

The few-shot being you just get a few examples. And this is really the key to a lot of human intelligence, is being able to look at a few examples, and then figure out what’s going on and apply that to new kinds of situations. And this is something that machines still haven’t been able to do in any general way.

Abha: So say, if a child sees a dog, right, of a certain kind, but then it sees a Dalmatian, which has different kinds of spots, they can still tell it’s a dog and not a cow, even though they’ve seen a cow with those kinds of patterns on their bodies before. So when you do that in machines, what do you actually find out? What have you found out in your testing of the ARC?

Melanie: We found out that machines are very bad at this kind of abstraction. We’ve tested both humans and machines on these problems. And humans tend to be quite good and are able to explain what the rule is they’ve learned and how they apply it to a new task. And machines are not good at figuring out what the rule is or how to apply a rule to a new task.

That’s what we found so far. Why machines can’t do this well? That’s a big question. And what do they need to do it well? That’s another big question that we’re trying to figure out. And there’s a lot of research on this. Obviously, people always love it when there’s a competition and a prize. So there’s a lot of people working on this. But I don’t think the problem has been solved in any general way yet.

Abha: I want to ask about this other workshop you’ve done quite frequently is the understanding workshop, which actually came out of the barriers of meaning. If you could tell a little bit about what the idea of understanding there was, I thought that was fascinating. Could you maybe recount a little bit?

Melanie: Yeah, so, many decades ago, the mathematician John Carlo Rota wrote an essay about AI. This was long before I was even in AI. And he asked: When will AI crash the barrier of meaning? And by that he meant like, we humans, language and visual data and auditory data, mean something to us. We seem to be able to abstract meaning from these inputs.

But his point was that machines don’t have this kind of meaning. They don’t live in the world, they don’t experience the world, and therefore they don’t get the kind of meaning that we get and he thought of this as a barrier, this is their barrier to general intelligence.

So we had a couple of workshops called AI and the barrier of meaning, because I kind of like that phrase, about what it would take for machines to understand ,and what even understand means. And we heard from many different people in many different kinds of fields. And, it turns out the word understand itself is another one of those suitcase words that I mentioned.

Words that can mean many different things to different people in different contexts. And so we’re still trying to nail down exactly what it is we want to mean when we say, do machines understand? And I don’t think we’ve come to any consensus yet, but it certainly seems that there are some features of understanding that are still missing in machines that people want machines to have this idea of abstraction, this idea of being able to predict what’s gonna happen in the world, this idea of being able to explain oneself, explain one’s own thinking processes and so on.

So understanding is still kind of this ill-defined word that we use to mean many different things and we have to really understand in some sense what we mean by understanding.

Abha: Right. Another question that you asked one of our guests, you posted Tomer and Murray. Some AI researchers are worried about what’s known as the alignment problem, as in, if we have an AI system that is told to, for example, fix global warming, and you have said, what’s to stop it from deciding that humans are the problem and the best solution is to kill us all. What’s your take on this and are you worried?

Melanie: Well, I find it… mysterious when people pose this kind of question, because often the way it’s posed is, imagine you had a super intelligent AI system, one that’s smarter than humans across the board, including in theory of mind and understanding other people and so on. Because it’s super intelligent, you give it some intractable problem like fixed climate change.

And then it says, okay, humans are the source of the problem. Therefore, let’s kill all the humans. Well, this is a popular science fiction trope, right? We’ve seen this in different science fiction movies. But does it even make sense to say that something could be super intelligent across the board and yet try to solve a problem for humans in a way that it knows humans would not support.

So, there’s so much packed into that. There’s so many assumptions packed into that, that I really want to question a lot of the assumptions about whether intelligence could work that way. I mean, it’s possible. We’ve certainly seen machines do unintended things. Remember a while ago, there was the stock market flash crash which was due to machines, allowing machines to do trading and them doing very unintended things, which created a stock market crash.

But the assumption that you could do that with a super intelligent machine, that you would be willing to hand over control of the world and say, go fix climate change, do whatever you want. Here’s all the resources of the world to do it and then have it not have that kind of sort of understanding or… lack of, in some sense, common sense. It really seems strange to me.

So every time I talk about this with people who worry about this, they say things like, well, the machine doesn’t care what we want. It’s just going to try and maximize its reward. And its reward is, does it achieve its goal? And so it will try and create sub goals to achieve its reward. The sub-goal might be, kill all the humans, and it doesn’t care because it’s going to try and achieve its reward in any way possible.

I don’t think that’s how intelligence works or could work. And I guess it’s all speculation right now. And the question is how likely is that to happen? And should we really put a whole lot of resources in preventing that kind of scenario? Or is that incredibly far-fetched and should we put our resources in much more concrete and known risks of AI.

And this was a debate going on, for instance, just in California recently with a California Senate bill to regulate AI. And it was very much influenced by this notion of existential threat to humanity. And it was vetoed by the California governor, and one of the reasons was that the assumptions that it was based on, he felt ,were too speculative.

Abha: What do you think are the real risks of the way we would function with AI if AI would be flourishing in the world at the pace it is?

Melanie: Well, we’re already seeing all kinds of risks of AI happening right now. We have deep fakes in both visual and auditory modalities. We have voice cloning, AI voices that can convince you that they are actually a real person or even a real person that you personally know. And this has led to scams and spread of disinformation and all kinds of terrible consequences. And I think it’s just gonna get worse.

We’ve also seen that AI can flood the internet with what people are calling slop, which is just AI generated content that then things like Google search engine picks up on and returns as the answer to somebody’s search, even though it was generated by AI and it’s totally untrue. We see AI being used, for instance, to undress women in photographs.

You can take a photograph of a woman, run it through a particular AI system, and she comes out looking naked. And people are using this online. And it’s just lots and lots of current risks. You know, Daniel Dennett, the late philosopher, wrote an article very shortly before he died about the risks of artificial people.

The idea that AI impersonating humans and convincing other humans that it is human, and then people kind of believing it and trusting it and giving it the kind of agency it doesn’t have and shouldn’t have.These are the real risks of AI.

Abha: Is there any way to keep the quality of information at a certain standard, even with AI in the loop?

Melanie: I fear not. I really worry about this. The quality of information, for instance, online never has been great. It’s always been hard to know who to trust. One of the whole purposes of Google in the first place was to have a search algorithm that used methods that allowed us to trust the results.

This was the whole idea of what they called PageRank, trying to rank web pages in terms of how much we should trust their results, how good they were and how trustworthy they were. But that’s really fallen apart through the commercialization of the internet, I think, and also the motivation for spreading disinformation. But I think that it’s getting even worse with AI and I’m not sure how we can fix that, to be honest.

Abha: Let’s go back to the idea of intelligence. A lot of people talk about the importance of embodiment. Also, you know, our guests mentioned this to be able to function as intelligent beings in the world because of the input we receive and experiences we have. Why is it important to think of this as a factor?

Melanie: Well, the history of AI has been a history of disembodied intelligence. Even at the very beginning, the idea was that we could somehow sift off intelligence or rationality or any of these things and implement it in a computer. You could upload your intelligence into a computer without having any body or any direct interaction with the world.

So that has gone very far with today’s large language models, which don’t have direct interaction with the world except through conversing with people, and are clearly disembodied. But some people, Iguess, including myself, think that there’s only so far that that can go, that there is something unique about being able to actually do things in the world and interact with the real world in a way that we humans do that machines don’t, that forms our intelligence in a very deep way.

Now it’s possible with vast, almost infinite amounts of data, training data and compute power that machines could come close to getting the knowledge that would approximate that, what humans do. And we’re seeing that kind of happening now with these systems that are trained on everything online, everything digitized, and that companies like Microsoft and Google are now building nuclear power plants to power their systems because there’s not enough energy currently to power these systems.

But that’s a crazy, inefficient, and non-sustainable way to get to intelligence, in my opinion. And so I think that if you have to train your system on everything that’s ever been written and get all the power in the world and even, like Sam Altman says, have to get to nuclear fusion energy in order to get to sort of human level intelligence that you’re just doing it wrong. You’re not achieving intelligence in any way that’s sustainable and we humans are able to do so much with so little energy compared to these machines that we really should be thinking about different ways to approach intelligence and AI.

And I think that’s what some of our guests have said that there’s other ways to do it. And for instance, Alison Gopnik is looking at how to train machines in the way that children learn. And this iswhat Linda Smith and Mike Frank and others are looking at too is like, aren’t there better ways to get systems to be able to exhibit intelligent behavior.

Abha: Right. So let’s move on to AGI. There are a lot of mixed opinions out there about what it is and how it could come into being. What in your view is artificial general intelligence?

Melanie: I think the term has always been a bit vague. It was first coined to mean something like human-like intelligence. The idea is that in the very early days of AI, the pioneers of AI like Minsky and McCarthy, their goal was to have something like the AI we see in the movies, robots that can do everything that people do. But then AI became much more focused on particular specific tasks, like driving a car or translating between languages or diagnosing diseases.

These systems could do a particular task, but they weren’t the sort of general purpose robots that we saw in the movies that we really wanted. And that’s what AGI was meant to capture was that vision. So AGI was a movement in AI back in the early 2000s. It had conferences, they had papers and discussions and stuff, but it was kind of a fringe movement. But it’s now come back in a big way because now AGI is at the center of the goals of all of the big AI companies.

But they define it in different ways. For instance, I think DeepMind defines it as a system that could do all what they call cognitive tasks as well as or better than humans. So that notion of a robot that can do everything has now been narrowed into, oh well, we don’t mean all that physical stuff, but only the cognitive stuff, as if those things could be separated. Again, the notion of disembodiment of intelligence.

OpenAI defined it as a system that can do all economically valuable tasks. That’s how they have it on their website, which is kind of a strange notion, because it’s sort of unclear what is and what isn’t an economically valuable task. You might not be getting paid to raise your child, but raising a child seems to be something of economic value eventually. So I don’t know, I think that it’s ill defined, that people have an idea of what they want, but it’s not clear what exactly the target is or how we’ll know when we get there.

Abha: So do you think we will ever get to the point of AGI in that definition of the ability to do general things?

Melanie: In some sense, we already have machines that can do some degree of general things. You know, ChatGPT can write poetry, it can write essays, it can solve math problems, it can do lots of different things. It can’t do them all perfectly for sure.

And it’s not necessarily trustworthy or robust, but it certainly is in some sense more general than anything we’ve seen before. But I wouldn’t call it AGI. I think the problem is, you know, AGI is one of those things that might get defined into existence, if you will. That is, the definition of it will keep changing until, okay, we have AGI. Sort of like now we have self-driving cars.

Of course, they can’t drive everywhere and in every condition. And if they do run into problems, we have people who acan operate them remotely to get them out of trouble. Do we want to call that autonomous driving? To some extent, yeah. To some extent, no. But I think the same thing is happening with AI, that we’re going to keep redefining what we mean by this. And finally, it’ll be there just because we defined it into existence.

Abha: Going back to the Nobel Prize in physics, physics has a theoretical component that proposes different theories and hypotheses that groups of experimentalists then go and try to see if it’s true or, if they can try it out and see what happens. In AI so far, the tech industry seems to be hurtling ahead without any theoretical component to it necessarily. How do you think academia and industry could work together?

Melanie: There’s a lot of people trying to do what you say, trying to kind of come up with a more theoretical understanding of AI and of intelligence more generally. It’s difficult because the term intelligence, as I said, isn’t that rigorously defined. I think academia and industry are working together especially in the field of applying AI systems to scientific problems.

But one problem is that it’s going much more in the big data direction than in the theoretical direction. So we talked about Alpha Fold, which basically won the chemistry prize. Alpha Fold is a big data system. It learns from huge amounts of data about proteins and the evolutionary histories of different proteins and similarity between proteins. And nobody can look at Alpha Fold’s results and explain exactly how it got there or reduce it to some kind of theory about protein folding and why certain proteins fold the way they do.

So it’s kind of a black box, big data method to do science. And I fear in a way that that’s the way a lot of science is going to go. That some of the problems that we have in science are going to be solved, not because we have a deep theoretical understanding, but more because we throw lots and lots of data at these systems and they are able to do prediction, but aren’t able to do explanation in any way that would be sort of theoretically useful for human understanding.

So maybe we’ll lose that quality of science that is human understanding in favor of just big data prediction.

Abha: That sounds incredibly tragic.

Melanie: Well, maybe the next generation won’t care so much. If you could cure cancer, let’s say, as we’ve been promised by people like Sam Altman that AI is going to do. Do we need to understand why these things work? You know, some kind of magic medicine for curing cancer? Do we need to understand why it works? Well, I don’t know. Lots of medications, we don’t totally understand how they work. So that may be something lost to AI is the human understanding of nature.

Abha: Right. Ted Chiang wrote an article, I think you must have read in the New Yorker, about the pursuit of art and what art is and how AI approaches it versus how we approach it. And even though art does not have the same kind of impact as curing cancer would, it does have a purpose in our human existence.

And to have AI take that away, you must have seen the memes coming out about these things, that one had expected artificial intelligence to take care of the housework, but it’s gone and taken away our creative work instead.

How do you look at that? Does that mean that as humans, we continue trying to pursue these artistic endeavors of understanding or, understanding more deeply things that we feel have meaning for our lives or do we just give that over to AI?

Melanie: That sounds even more tragic to me than giving science over to AI. Ted Chiang wrote that he didn’t think AI generated art was really art because to make art, he said you need to be able to make choices and AI systems don’t really make choices in the human-like sense.

Well, that’s gotten a lot of pushback, as you would imagine. People don’t buy it. I don’t think that art will be taken over by AI, at least not any time soon, because a big part of art is the artist being able to judge what it is that they created and decide whether it’s good or not, decide whether it conveys the meaning that they want it to convey. And I don’t think AI can do that.

And I don’t think it will be able to do that anytime soon, maybe in the very far future. It may be that AI will be something that artists use as a tool. I think that’s very likely already true. Now, one big issue about AI art is that it works by having been trained on huge amounts of human-generated art. And unfortunately, the training data mostly came without permission from the artists. And the artists didn’t get paid for having their artwork being used as training data. They’re still not getting paid.

And I think that’s a moral issue that we really have to consider when thinking about using AI as a tool. To what extent are we willing to have it be trained on human generated content without the permission of the humans who generated the content and without them getting any benefit.

Abha: Right, I think your own book, something was done by AI, right?

Melanie: Yeah, my book, which is called Artificial Intelligence: A Guide for Thinking Humans. Well, like many books, someone used an AI system to generate a book with the same title, that really was pretty terrible, but was for sale on Amazon.

Abha: So if you’re looking to buy that book, make sure you get the correct one.

Melanie: I put in a message to Amazon saying, please take this off. It’s, you know, played, it’s plagiarized. And nothing happened until I got interviewed by a reporter from Wired Magazine about it. And then Amazon deleted that other book. But this is a broad problem.

We’re getting more and more AI generated books that are for sale that either have related content to an actual human-generated book or whatever content. When you buy a book, you don’t know it’s generated by AI. And often these books are quite bad. And so this is part of the so-called slop from AI that’s just sort of littering all of our digital spaces.

Abha: Littering is a good word for this phenomenon, I think. I want to go into the idea of complexity science and AI research. You’ve written a book also on complexity science and AI research. You’ve had a long history with the Santa Fe Institute. You’ve been with us for many years now in different capacities. Why do you think AI is a complex system? And what keeps you in the complexity realm with this research?

Melanie: Well, I think AI at many different levels and dimensions of it are complex systems. One is just the systems themselves. Things like ChatGPT is a big neural network that is very complex, and we don’t understand how it works. People claim that it has so-called emergent behavior, which is a buzzword in complex systems.

And it’s something that complex systems people who think about large networks and large systems with emergent behavior might be able to put some insight in. The first notion of emergence came from physics, and now AI is part of physics, it’s won a Nobel Prize.

So I think these things are all tied up together. But also another dimension is sort of the interaction of AI and society. And clearly that’s a socio-technological complex system of the kind that many people here at the SFI are interested in studying.

So I think there’s many ways in which AI relates to complex systems research. I think SFI in particular is a great place for people to take this slower approach to thinking about these complex problems rather than the more quick incremental improvements that we see in the machine learning literature without very much deep thinking about how it all works and what it all means. So that’s what I’m hoping that SFI will be able to contribute to this whole discussion.

And I think, my colleague David Krakauer here at the SFI and I wrote a paper about the notion of understanding in AI that I think is influential because it really laid out the complexities of the topic. I do think that we people in complex systems do have a lot to contribute to this field.

Abha: So Melanie, we’ve talked about, you know, AI as a complex adaptive system. We’ve talked about AGI, the possibility and where we stand. Where do you think the research will lead us, eventually, say in another 10 years, having seen the progress we’ve made in the last 10 years?

Melanie: I think that one of the big things I mentioned is that the current approach to AI is just not sustainable in terms of the amount of data it requires, the amount of energy it requires. And what we’ll see in the next 10 years is ways to try and reduce the amount of data needed and reduce the amount of energy needed.

And that I think will take some ideas from the way people learn or the way animals learn. And it may even require AI systems to get more embodied. So that might be an important direction that AI takes, I think, in the next decade so that we can reduce this ridiculous dependence on so much data, so much energy, and make it a lot more sustainable and ecologically friendly.

Abha: Great. Thank you so much, Melanie. This has been wonderful as a season and to have you as a co-host was such a privilege. I’ve really enjoyed working with you and I hope we continue to discuss this over time. Maybe we’ll have another season back when you and John have finished your workshop that’s going to happen for the next three years.

Melanie: Yeah, that would be great. It’s been an incredible experience doing a podcast. I never thought I would do this, but it’s been fantastic and I’ve loved working with you. So thanks, Abha.

Abha: Likewise. Thank you, Melanie.

Complexity is the official podcast of the Santa Fe Institute. This episode was produced by Katherine Moncure. Our theme song is by Mitch Mignano, and additional music from Blue Dot Sessions. I’m Abha, thanks for listening.

◆

If you enjoyed this article…

◆

Learn more about the coaching process or
contact me to discuss your storytelling goals!

◆

Subscribe to the newsletter for the latest updates!

Nature of Intelligence – Episode Five – How do we assess intelligence?

February 28, 2025/in AI, Artificial Intelligence, Intelligence, Language, Neuroscience/by Mark Lovett

I don’t know about you, but my brain is starting to hurt, but in a good way. What seems clear to me was summed up when Abha Eli Phoboo informed us that, “we don’t fully understand human intelligence or animal intelligence” in this episode.

And there’s much discussion regarding how we’re trying to evaluate machines — and associated LLMs — based on measurements that we use on humans. It may feel ridiculous on one level, but at the moment humans can only understand the world through the lens of being human.

We use medicines all the time that we don’t understand the mechanisms that they work on. And that’s true. And I don’t think we cannot deploy LLMs until we understand how they work under the hood. ~ Ellie Pavlick

But is understanding what LLMs are, or how they operate all that important? As Ellie Pavlick reminds us, there’s much about the world we don’t fully understand. We just know whether something works or not.

But I found the discussion of comparing humans to animals to be as fascinating. Even if you don’t own a pet, I’m sure you’ve been around a number of animals at various times in your life. Did they seem “intelligent”, in one way or another? Did you feel they possessed a personality? I have a friend who’s owned horses most of her life, and when I hear her talking to folks at the stables, they describe each horse as though they were human. Will we describe LLM personas in the same way some day?

Transcript

Abha Eli Phoboo: The voices you’ll hear were recorded remotely across different countries, cities and work spaces.

Erica Cartmill: I often think that humans are very egotistical as a species, right? So we’re very good at particular things and we tend to place more value on the things that we’re good at.

Abha: From the Santa Fe Institute, this is Complexity

Melanie Mitchell: I’m Melanie Mitchell

Abha: And I’m Abha Eli Phoboo

Melanie: As we enter our fifth episode of this season on intelligence, we’ve explored quite a few complicated and controversial ideas. But one thing has become really clear: intelligence is a murky concept. And that’s the point of this series — it’s something that we think we know when we see it, but when we break it down, it’s difficult to define rigorously.

Abha: Today’s episode is about how we assess intelligence. When it comes to testing humans, we have all kinds of standardized measures: IQ tests, the SAT, and so on. But these tests are far from perfect, and they’ve even been criticized as limited and discriminatory.

Melanie: To understand where our desire to test intelligence comes from — and also the way we talk about it as an inherent personality trait — it’s useful to look at the history of intelligence in Western society. In ancient Greece, the concept was described as “reason” or “rationality,” which then evolved into “intelligence” more broadly when the discipline of psychology arose. Philosophers like Socrates, Plato, and Aristotle highly valued one’s ability to think. And at first glance, that seems like a noble perspective.

Abha: But Aristotle took this a step further. He used the quote unquote “rational element,” as justification for a social hierarchy. He placed European, educated men at the top, and women, other races, and animals below them.

Melanie: Other Western philosophers like Descartes and Kant embraced this hierarchy too, and they even placed a moral value on intelligence. By claiming that a person or an animal wasn’t intelligent, it became morally acceptable to subjugate them. And we know how the rest of that European expansion story goes.

Abha: So today’s notions about intelligence can be traced in part to the ways men distinguished themselves from… non-men.

Melanie: Or, to give the philosophers a more generous interpretation, the history of thought around intelligence centers on the idea that it is a fundamentally human quality.

Abha: So if intelligence, in theory, stems from humanity, how do we decide the degree to which other entities, like animals and large language models, are intelligent? Can we rely on observations of their behavior? Or do we need to understand what’s going on under the hood — inside their brains or software circuits?

Melanie: One scientist trying to tackle such questions is Erica Cartmill.

Erica: So my name is Erica Cartmill. I’m a professor of cognitive science, animal behavior, anthropology, and psychology at Indiana University. You know, I really study cognition, particularly social cognition, and the kinds of cognition that allow communication to happen across a wide range of species.

Abha: Erica has extensive experience observing intelligent behavior in beings that are very different from humans.

Erica: So I got the animal bug when I was a kid. And we had a whole range of different kinds of animals. It’s sort of a menagerie. We had horses, we had dogs, we had a turtle, we had a parrot. And I was always out watching lizards and butterflies and birds, mice in our barn. And sometimes I would catch a lizard, put it in a terrarium for two days, observe it, let it go again.

And that kind of wanting to observe the natural world and then have an opportunity to more closely observe it, under you might say controlled circumstances, even as a child, and then release it back into its natural environment is really something that I’ve continued to do as an adult in my scientific career. And that’s what I do mostly with my lab now, kind of split between studying great apes and human children.

But I’ve done work on a range of other species as well, Darwin’s finches in the Galapagos. I’m doing a project now that also includes dolphins and dogs and kea, which is a New Zealand parrot. And I’m starting a dog lab at IU. So I’m excited about some of those other species, but I would say the core of my work really focuses on comparing the cognitive and communicative abilities of great apes and humans.

Melanie: Much of Erica’s research has been on the evolution of language and communication. As we’ve said before, complex language is unique to our species. But other animals communicate in many ways, so researchers have been trying to narrow down what exactly makes our language so distinct.

Erica: So I think humans have always been really focused on this question of what separates us from other species. And for a long time, answers to that question centered around language as the defining boundary. And a lot of those arguments about language really focused on the structural features of language.

And if you look at sort of the history of these arguments, you would see that every time a linguist proposed a feature of language that say, human language is different because X, then people would go out and study animals and they would say, “Well, starlings have that particular feature” or, “A particular species of monkey has that feature.” And then linguists would sort of regroup and say, “Okay, well, actually this other feature is the real dividing line.”

And I think probably the boring answer or interesting answer, depending on how you look at it, is that there probably isn’t one feature. It’s the unique constellation of features combined with a constellation of cognitive abilities that make language different and make it so powerful. But I will say in recent years, the focus of these arguments about “language is unique because” has shifted from language is unique because of some particular structural feature to language is unique because it is built on a very rich social understanding of other minds.

It’s built on inferences about others’ goals, about what others know and don’t know. It’s built on what we call pragmatics and linguistics. So actually it’s very unlike a structured program that you can sort of apply and run anywhere. It’s actually something that relies on rich inferences about others’ intentions.

Melanie: When we humans communicate, we’re often trying to convey our own internal thoughts and feelings, or we’re making inferences about someone else’s internal state. We naturally connect external behavior with internal processes. But when it comes to other beings, our ability to make judgments about intelligence isn’t as straightforward.

Abha: So today we’re going to first look at what we can learn from external behavior and applying human notions of intelligence to animals and machines, which can pass tests at levels that are deceptively similar to humans.

Abha: Part 1: Assessing Intelligence in Humans, Animals, and Machines

Abha: If you have a pet at home, you’ve probably had moments when you’ve wanted to know what it’s trying to say when it barks, meows, or squawks. We anthropomorphize pets all the time, and one of the ways we do that is by envisioning them saying things like, “I’m hungry!” or “I want to go outside!” Or we might wonder what they say to each other.

Melanie: Animals most definitely communicate with one another. But there’s been a lot of debate about how sophisticated their communications are. Does a chimp’s hoot or a bird’s squawk always mean the same thing? Or are these signals flexible, like human words, communicating different meanings depending on context, including the animal’s understanding of the state of its listeners’ minds? In her work, Erica has critiqued the assumptions people often make in experiments testing animal communication.

She’s noted that the methods used won’t necessarily reveal the possible meaning of both vocal and other kinds of signals, especially if those meanings depend on particular contexts.

Erica: Authors recently, ranging from cognitive scientists to philosophers to linguists have argued that human communication is unique because it relies on these very rich psychological properties that underlie it. But this in turn has now led to new arguments about the dividing line between humans and other animals.

Which is that animals use communication that is very code-like, that one animal will produce a signal and another animal will hear that signal or see that signal and decode its meaning. And that it doesn’t rely on inferences about another’s intentions or goals, that the signals can be read into and out of the system. If you record, say, an auditory signal, like a bird call, and then you hide a speaker in a tree, and you play that call back, and you see how other birds respond. So this is called the playback method, unsurprisingly.

And that’s been one of the strongest things in the toolkit that animal communication researchers have to demonstrate that those calls in fact have particular meanings. That they’re not just, I’m singing because it’s beautiful, but that this call means go away and this other call means come and mate with me, and this other call means there’s food around, et cetera, et cetera.

And so decontextualizing those signals and then presenting them back to members of the species to see how they respond is the dominant method by which scientists demonstrate that a call has a particular meaning. That’s been incredibly important in arguing that animals really are communicating things. But that method, and the underlying model that is used to design experiments to ask questions about animal communication, is also very limiting.

Abha: An auditory signal taken out of context, whether a word or an animal call — is a very narrow slice of all the different ways animals — and humans — communicate with each other.

Erica: So it’s very good at demonstrating one thing, but it also closes off doors about the kinds of inferences that animals might be making. If Larry makes this call and I’m friends with Larry, versus Bob makes that call and I’m enemies with Bob, how do I respond? Does Bob know that I’m there? Can he see me? Is he making that call because I am there and he sees me and he’s directing that call to me? Versus, is he making that call to someone else and I’m eavesdropping on it.

Those are kinds of inferences that animals can make. I’m not saying all animals in all cases, but the ways that we ask questions about animal communication afford certain kinds of answers.

And we need, I think, to be more, I don’t know, humble is the right word? But we need to recognize the ways in which they limit the conclusions that we can draw, because this is very different from the way that we ask questions about human language.

And so when we draw conclusions about the difference between human language and animal communication based on the results of studies that are set up to ask fundamentally different questions, I think that leaves a lot to be desired.

Abha: And focusing on abilities that are relevant to humans’ intelligence might mislead us in how we think about animal intelligence.

Erica: I often think that humans are very egotistical as a species, right? So we’re very good at particular things and we tend to place more value on the things that we’re good at. And I think that in many cases, that’s fine, that’s one of our unique quirks as a species. But it also often limits the way that we ask questions and attribute kinds of intelligence to other species.

So it can be quite difficult, I think, for humans to think outside of the things that we’re good at or indeed outside of our own senses. I mean, sort of five senses, biological senses. So elephants… we’ve known for a long time that elephants are able to converge at a particular location, show up, far away at this tree on this day at this time from different starting points. And people really didn’t know how they were doing it.

They were starting too far apart to be able to hear one another. People were, are they planning? Do they have the sense of two Tuesdays from now we’re going to meet at the watering hole? And it wasn’t until people said maybe they’re using senses that fall outside of our own perceptual abilities. In particular, they measured very, very low frequencies and basically asked, okay, maybe they’re vocalizing in a way that we can’t perceive, right?

And so once they did that and greatly lowered the frequency of their recording equipment, they found that elephants were in fact vocalizing at very, very long distances, but they were doing it through this rumble vocalization that actually propagates through the ground rather than through the air.

And so they produce these, I can’t imitate it because you wouldn’t hear it even if I could, but they produce these very low rumbles that other elephants, kilometers away, perceive not through their ears but they perceive through specialized cells in the pads of their feet, where they can feel the vibrations.

And so I think this is a nice example of the way that we have to, in effect, not even necessarily think like an elephant, but imagine hearing like an elephant, having a body like an elephant, thinking, I like to call it thinking outside the human.

Humans are good at particular things, we have particular kinds of bodies, we perceive things on particular time scales, we perceive things at particular light wavelengths and auditory frequencies. Let’s set those aside for a second and think about, okay, what did that species evolve to do? What do its perceptual systems allow it to perceive and try to ask questions that are better tailored to the species that we’re looking at.

Melanie: There’s been a lot of work throughout the many decades on trying to teach human language to other species like chimps or bonobos or African gray parrots. And there’s been so much controversy over what they have learned. What’s the current thinking on the language abilities of these other species and those experiments in general?

Erica: It’s almost hard to answer the question with the current thinking, because there’s very little current research. A lot of that research was done 20 or even 40 years ago. Compared to the work that was being done 30 years ago, there’s very little current work with apes and parrots and dolphins, all of which 30 years ago, everyone was trying to teach animals human language.

And I think it was a really interesting area of inquiry. I would say people differ a little bit, but I think that probably the sort of most dominant opinion or maybe the discussion is best characterized by saying that people today, I think, largely believe that those animals were able to learn, understand, and productively use words, but that they were limited in the scope of the words they could learn, and that they weren’t combining them into productive sentences.

And this was part of the argument that syntax, the combining of words according to particular rules, was something that human language did that was very different from what animals could produce. And so I think with the animal language studies that were showing largely that animals could learn words, they could produce words, sometimes produce words together, but they weren’t doing it in reliable sentence-like structures.

Melanie: But do you think that the fact that we were trying to teach them human language in order to assess their cognitive abilities was a good approach to understanding animal cognition or should we more do what you said before, sort of take their point of view, try to understand what it’s like to be them rather than train them to be more like us?

Erica: I think that’s a great question. My answer probably hinges around the limitations of human imagination. Where I think that teaching animals to communicate on our terms allows us to ask better questions and better interpret their answers than us trying to fully understand their communication systems. People certainly are using things like machine learning to try to quote unquote “decode” whale song or bird song. I think that those approaches, which is more sort of on the animals’ terms or using their natural communication.

And I think that those are very interesting approaches. I think they’ll be good at finding patterns in what animals are producing. The question I think still remains whether animals themselves are perceiving those patterns and are using them in ways that have meaning to them.

Abha: And the way we’ve tried to assess intelligence in today’s AI systems also hinges around the limitations of human imagination, perhaps even more so than animals, given that by default, LLMs speak our language. We’re still figuring out how to evaluate them.

Ellie Pavlick: Yeah, I mean, I would say they’re evaluated very… I would say badly.

Abha: This is Ellie Pavlick. Ellie’s an assistant professor of computer science and linguistics at Brown University. Ellie has done a lot of work on trying to understand the capabilities of large language models.

Ellie: They’re evaluated right now using the things that we can conveniently evaluate, right? It is very much a, what can we measure? And that’s what we will measure. There’s a lot of repurposing of existing kind of evaluations that we use for humans. So things like the SAT or the MCAT or something like that.

And so it’s not that those are completely uncorrelated with the things we care about, but they’re not very deep or thoughtful diagnostics. Things like an IQ test or an SAT have long histories of problems for evaluating intelligence in humans. But they also just weren’t designed with models of this type being the subjects.

I think what it means when a person passes the MCAT or scores well on the SAT is not the same thing as what it might mean when a neural network does that. We don’t really know what it means when a neural network does it, and that’s part of the problem.

Melanie: So why do you think it’s not the same thing? I mean, what’s the difference between humans passing a bar exam and a large language model?

Ellie: Yeah, I mean, that’s a pretty deep question, right? So I would say, compared to a lot of my peers, not as quick to say the language models are obviously not doing what humans do, right?

I tend to reserve some space for the fact that they might actually be more human-like than we want to admit. A lot of times processes that people might be using to pass these exams might not be as deep as we like to think. So when a person, say, scores well on the SAT, we might like to think that there’s some more general mathematical reasoning abilities and some general verbal reasoning abilities. And then that’s going to be predictive of their ability to do well in other types of tasks. That’s why it’s useful for college admission.

But we know in practice that humans often are just learning how to take an SAT, right? And I think we very much would think that these large language models are mostly learning how to take an SAT.

Melanie: So just to clarify, when you say, I mean, I know what it means when a human is learning how to pass a test, but how does a language model learn how to pass a test?

Ellie: Yeah, so we can imagine this simple setting. I think people are better at thinking about, let’s pretend we just trained the language model on lots of examples of SATs. They’re going to learn certain types of associations that are not perfect, but very reliable.

And I always have this joke with my husband when we were in college about how you could pass a multiple choice test without having ever taken the subject. And we would occasionally try to pass his qualifying exams in med school. I think he took an econ exam with me. So there’s certain things like, whenever there’s something like “all of the above” or “none of the above,” that’s more likely to be the right answer than not, because it’s not always there. So it’s only there when that’s the right thing.

Or it’s a good way for the professor to test that you know all three of these things efficiently. Similarly, when you see answers like “always” or “never” in them, those are almost always wrong because they’re trying to test whether you know some nuanced thing.

Then there’s some, and none of these is perfect, but you can get increasingly sophisticated kinds of heuristics and things, based on the words, this one seems more or less related, this seems kind of topically off base, whatever. So you can imagine there’s patterns that you can pick up on. And if you stitch many, many of them together, you can pretty quickly get to, possibly perfect performance, with enough of them.

So I think that’s a kind of common feeling about how language models could get away with looking like they know a lot more than they do by kind of stitching together a very large number of these kinds of heuristics.

Abha: Would it help if we knew what was going on under the hood with LLMs? We don’t really actually know a whole lot about our brains either, and we don’t know anything about LLMs, but would it help in any way if we sort of could look onto the hood?

Ellie: I mean, that’s where I’m placing my bets. Yeah.

Melanie: In Part 2, we’ll look at how researchers are actually looking under the hood. And many of them are trying to understand LLMs in a way that’s analogous to how neuroscientists understand the brain.

Melanie: Part 2: Going Under the Hood

Abha: Okay, so wait a minute. If we’re talking about mechanistic understanding in animals or humans — that is, understanding the brain circuits that give rise to behavior — it makes sense that it’s something we need to discover. It’s not obvious to us, in the same way that it’s not obvious how a car works if you just look at the outside of it.

But we do know how cars work under the hood because they’re human inventions. And we’ve spent a lot of this season talking about how to learn more about artificial intelligence systems and understand what they’re doing. It’s a given that they’re so-called “black boxes.”

But… we made AI. Human programmers created large language models. Why don’t we have a mechanistic understanding? Why is it a mystery. We asked Ellie what she thought.

Ellie: The program that people wrote was programmed to train the model, not the model itself, right? So the model itself is this series of linear algebraic equations. Nobody sat down and wrote, “Okay, in the 118th cell of the 5,000th matrix, there’ll be a point zero two,” right? Instead there’s a lot of mathematical theory that says, why is this the right function to optimize? And how do we write the code? And how do we parallelize it across machines?

There’s a ton of technical and mathematical knowledge that goes into this. There’s all of these other variables that factor in, they’re very much part of this process, but we don’t know how they map out in this particular thing. You kind of set up some rules and constraints to guide a system, but the system itself is on its own. So if you’re routing a crowd through a city or something for a parade, right?

And now you come afterward and you’re trying to figure out why there’s a particular cup on the ground in a particular orientation or something. But you set up, you knew where the people were going to go. But there’s all of this other stuff that, it’s constrained by what you set up, but that’s not all that there is. There’s many different ways to meet those constraints.

And some of them will have some behavioral effects and others will have others, right? There’s a world where everyone followed your rules there wasn’t a cup there. And there’s a rule where those cars crashed or didn’t crash, and all of those other things are subject to other processes. So it’s kind of an under specified problem, right, that was written down. And there are many ways to fill in the details, and we don’t know why we got this one that we got.

Melanie: So when we’re assessing LLMs, it’s not quite the same as humans because we don’t know what happens between the constraints we set up and, for example, ChatGPT’s SAT score at the end.

And we don’t always know how individual people are passing the SAT either — how much someone’s score reflects their underlying reasoning abilities versus how much it reflects their ability to sort of “game” the test. But at the very least, when we see an SAT score on a college application, we do know that behind that SAT score, there’s a human being.

Ellie: We can take for granted that we all have a human brain. It’s true. We have no idea how it works, but it is a known entity because we’ve evolved dealing with humans. You live a whole life dealing with humans. So when you pick somebody to come to your university, or you hire someone for a job, it’s not just a thing that passes the SAT, it’s a human that passes the SAT, right?

That is one relevant feature. Presumably the more relevant feature is that it’s a human. And so with that comes a lot of inferences you can make about what humans who pass the SAT or score a certain score probably also have the ability to do, right? It’s a completely different ball game when you’re talking about somebody who’s not a human, because that’s just not what we’re used to working with.

And so it’s true, we don’t know how the brain works, but now that you’re in the reality of having another thing that’s scoring well, and you have no idea how it works. To me, the only way to start to chip away at that is we need to ask if they’re similar at a mechanistic level. Like asking whether a score on the SAT means the same thing when an LLM achieves it as a human, it is 100% dependent on how it got there.

Abha: Now, when it comes to assessing artificial intelligence, there’s another question here: How much do we need to understand how it works, or how intelligent it is, before we use it? As we’ve established, we don’t fully understand human intelligence or animal intelligence — people debate on how effective the SAT is for us — but we still use it all the time, and the students who take it go on to attend universities and have careers.

Ellie: We use medicines all the time that we don’t understand the mechanisms that they work on. And that’s true. And I don’t think we cannot deploy LLMs until we understand how they work under the hood. But if we’re interested in these questions of, “Is it intelligent?” Just the fact that we care about that question. Answering that question probably isn’t relevant for whether or not you can deploy it in some particular use case.

If you have a startup for LLMs to handle customer service complaints, it’s not really important whether the LLM is intelligent. You just care whether it can do this thing, right? But if you want to ask that question, we’re opening up this very big can of worms and we can’t ask the big questions and then not be willing to do the big work, right.

Melanie: And answering the question of mechanistic understanding is really big work. As in other areas of science, you have to decide what level of understanding you’re actually aiming for.

Ellie: Right, I mean, this kind of idea of levels of description has existed in cognitive science. I think cognitive scientists talk about it a lot, which is kind of what is the right language for describing a phenomenon? And sometimes you can have simultaneous consistent accounts, and they really should be consistent with one another, but it doesn’t make sense to answer certain types of questions at certain levels.

And so I think a favorite example in cognitive sciences is quantum physics versus classical mechanics, right? It would be really cumbersome and bizarre and highly unintuitive and we can’t really do it to say if I roll this billiards ball into this billiards ball and try to describe it at the level of quantum mechanics, it would be an absurd thing to do and you would be missing a really important part of how physics works.

And there’s a lot of debate about whether you could explain the kind of billiards ball in quantum mechanics. But the point is there’s laws at the lower level that tell you that the ball will exist. And now once you know that the ball is there, it makes sense to explain things in terms of the ball because the ball has the causal force in this thing, not the individual things that make up the ball.

But you would want to have the rules that combine the small things together in order to get you to the ball. And then when you know that the ball is there, then you can just talk in terms of the ball and you don’t have to appeal to the lower level things. And sometimes it just makes more sense to talk about the ball and not talk about the lower level things.

And I think the feeling is we’re looking for those balls within the LLM so that you can say, the reason the language model answered this way on this prompt, but when you change the period to have a space before it, it suddenly got the answer wrong.

That’s because it’s thinking in terms of these balls, right? And if we’re trying to understand it at the level of these low level things, it just seems random. If you’re missing the key causal thing, it just seems random. It could be that there is no key causal thing, right? That’s kind of part of the problem. I’m thinking there is, and if we find it, this will be so cool, and the common, legitimate point of skepticism is there might just not be one, right?

Abha: So we’re trying to find the shape and size of these “billiard balls” in LLMs. But as Ellie said, whether or not the billiard balls even exist is not certain. We’re assuming and hoping that they’re there and then going in and looking for them.

Melanie: And if we were to think about how these levels apply to humans, one way we try to gain mechanistic understanding of human intelligence is by looking inside our brains.

If you think back to Ev Fedorenko’s work from our episode about language, Ev’s use of fMRI brain scanning is exactly this — she’s looked at the pathways in the brain that light up when we use language. But imagine if we were to try to go even further and describe human language in terms of the protons, electrons, and neutrons within our brain cells. If you go down to that level of detail, you lose the order that you can see in the larger brain structures. It’s not coherent.

Abha: LLMs work by performing vast numbers of matrix multiplications —- at the granular, detailed level, it’s all math. And we could look at those matrix operations, in the same way we can observe the quantum mechanics of billiard balls. And they’ll probably show us that something’s happening, but not necessarily what we’re looking for.

Ellie: And maybe part of when we’re very frustrated with large language models and they seem like quote “black boxes” is because that’s kind of what we’re trying to do, right? We’re trying to describe these higher level behaviors in terms of the matrix multiplications that implement them, which obviously they are implemented by matrix multiplications, but it doesn’t correspond to anything that looks like anything that we can grab onto.

So I think there’s this kind of higher level description that we all want. It’s useful for understanding the model for its own sake. It’s also really useful for these questions about similarity to humans, right? Because humans aren’t gonna have those exact same matrix multiplications. And so it’s kind of like, what are the higher level abstractions that are being represented? How are they being operated on?

And that’s where the similarity is likely to exist. It’s like we kind of need to invent fMRIs and EEGs and we got to figure out how to do that. And I think there’s, there are some things that exist. They’re good enough to start chipping away and we’re starting to get some interesting converging results, but they’re definitely not the last word on it.

So I would say one of the most popular tools that we use a lot that I think was really invented maybe back around 2019, 2020 or something is called path patching, but that paper I think called it causal mediation analysis. I think there are a lot of papers that kind of have simultaneously introduced and perfected this technique.

But it basically is saying try to find which components in the model are like, maximally contributing to the choice of predicting A over B. So that’s been a really popular technique. There have been a lot of papers that have used it and it has made very reproducible types of results.

And what you basically get is some kind of an fMRI, It lights up parts of the network as saying these ones are highly active in this decision. These ones are less active.

Abha: So then, how do we get from path patching — this fMRI for large language models — to higher-level concepts like understanding, intentions, and intelligence?

We often wonder if LLMs “understand,” but what it means to “understand” something can depend on how you define it.

Melanie: Let me jump up from the matrix multiplication discussion to the highest philosophical level. So there was a paper in 2022 that was a survey of the natural language processing community.

And it asked people to agree or disagree with the following statement: “Some generative models trained only on text, given enough data and computational resources, could understand natural language in some non-trivial sense.” So this is in principle, trained only on language. So would you agree or disagree with that?

Ellie: I would say maybe I would agree. To me, it feels almost trivial because I think what’s nice about this question is it doesn’t treat understanding as a binary. And I think that’s the first place where I usually start when people ask this question. To me, a lot of the debate we’re having right now is not about large language models, it’s about distributional semantics, and it’s whether we thought distributional semantics could go this far.

Melanie: Can you explain what distributional semantics is?

Ellie: Yeah. You know, natural language processing has just been using text. And so using this idea that the words that occur before and after a word are a really good signal of its meaning. And so if you get a lot of text, and you cluster things based on the words, they co-occur with, cat and dog and, or maybe dog and puppy and Dalmatian will all occur together. Cat and dog and bird and other pets will co-occur together. Zebra and elephant, those will co-occur together.

And as you get bigger models and more text, the structure becomes more sophisticated. So you can cut similarity along lots of different dimensions. It’s not just on a one dimension, are these things similar or different. I’ve differentiated pets from zoo animals, but in this other dimension, I’ve just differentiated carnivores from herbivores, right?

So it’s obviously missing some stuff. It might know a lot about “cat and” as it relates to other words, but it doesn’t know what a cat actually is, right? It wouldn’t be able to point out a cat. It can’t see. So it doesn’t know what cats look like and doesn’t know what they feel like.

Melanie: So I think the results of that survey were interesting. That was in 2022. So it might be different now, but half the people agreed and half the people disagreed. And so the disagreement, I think the question was, could something trained only on language in principle understand language in a non-trivial sense? And I guess it’s just a kind of a difference between how people interpret the word understand.

And the people who disagreed, I would say that what you said, these systems know how to use the word cat, but they don’t know what a cat is. Some people would say that’s not understanding.

Ellie: Right, I think this gets down to people’s definition of understand and people’s definition of trivial. And I think this is where I feel like it’s an interesting discussion to have over drinks or something like that, but is it a scientific discussion right now? And I often find it’s not a scientific discussion. Some people just feel like this is not understanding and other people feel sure it is.

And there’s no moving their opinions because I don’t know how you speak to that. So the way you have to speak to it is to try to figure out what’s really going on in humans. Assuming we all agree that humans really understand and that’s the only example we all agree on. We need to figure out whether it is.

And then we have to figure out what’s different in the LLMs and then we have to figure out whether those differences are important or not. And I don’t know. That’s just a really long game.

So as much as I kind of love this question, I’ve increasingly gotten annoyed at having to answer it, cause I just don’t feel like it’s a scientific question. But it could be. It’s not asking about the afterlife or something. It’s not outside of the realm of answerable questions.

Abha: In our previous episodes, we’ve talked about how one of the big questions around artificial intelligence is whether or not large language models have theory of mind, which researchers first started assessing with human psychology tests like the Sally-Anne scenario.

And a second question arose out of that process: if LLMs can pass our human theory of mind tests — if they pass Sally-Anne when the details and the names are changed — are they actually doing complicated reasoning, or are they just getting more sophisticated at matching patterns in their training data?

As Ellie said, she cares that we’re intentional and scientific when we say things like, an LLM “understands” or “doesn’t understand.” And yet —

Ellie: They’re learning much more interesting structure than I would have guessed. So I would say my general, coming into this work, I would have called myself a neural network skeptic, and I still kind of view myself as that, right? I very often get annoyed when I hear people say stuff like they understand or they think.

And yet I actually spend more of my time writing papers saying, there is an interesting structure here. They do have some notion of compositionality. Or they, and I actually do use those words a lot, I really try not to in papers, but when I’m talking, I just don’t have another word for it. And it is so inefficient for me to come up with some new jargon, so I anthropomorphize like crazy in my talks and it’s terrible, and I apologize, blanket at the beginning, and I keep doing it.

But one big takeaway is I’m not willing to say that they think or they understand or any of these other words, but I definitely have stopped making claims about what they obviously can’t do or even obviously aren’t doing, right? Because I had to eat my words a couple of times and I think it’s just we understand so little that we should all just stop trying to call it and just take a little bit of time to study it.

I think that’s okay, we don’t need an answer right now on whether they’re intelligent or not. What is the point of that? It’s just guaranteed to be wrong. And so, let’s just take some time and figure out what we’re trying to even do by asking that question and do it right.

I think right now seeing LLMs on the scene, it’s too similar to humans in all the wrong kinds of ways to make intelligence the right way to be thinking about this. And so I would be happy if we just could abandon the word. The problem, like I said, is then you get bogged down in a ton of jargon and I think we should all just be in agreement that we are in the process, and it might take a while of redefining that word.

I hope it’ll get fractured up into many different words, and that a decade from now, you just won’t even see that in the papers anywhere, but you will see other types of terms where people are talking about other kinds of much more specific abilities.

Melanie: Well also just sort of willing to put up with uncertainty, which very few people in this field seem to be able to do.

Ellie: It would be nice if we could all just wait a decade. I get the world wouldn’t allow that, but I wish we could just do that, right?

Abha: And Erica agrees. Her work with animals has made her pause before making assumptions about what other entities can and can’t do.

Erica: I keep going to new talks and I sort of have an opinion and I get a new talk and then I go, well, that’s really interesting. And I have to kind of revise my opinion. And I push back a lot on human scientists moving the bar on, what makes humans unique? What makes human language unique?

And then I sort of find myself doing that a little bit with LLMs. And so I need to have a little bit of humility in that. So I don’t think they have a theory of mind, but I think demonstrating one, that they don’t and two, why they don’t are not simple tasks. And it’s important to me that I don’t just sort of dogmatically say, “Well, I believe that they don’t,” right?

Because I think people believe a lot of stuff about animals and then go into it saying, “Well, I believe animals don’t have concepts.” And then you say, “Well, why not?” “Well, because they don’t have language.” And it’s okay. So I think that LLMs are fundamentally doing next token prediction.

And I know you can build them within systems that do more sophisticated things, but they’re fundamentally, to the extent that my layperson understands, I mean, I do not build these systems, and you know much more about this than I do.

But I think that they’re very good at predicting the ways that humans would answer those questions based on the corpora of how humans answer either exactly those questions or questions that are similar in form, that are sort of analogous, structurally and logically similar.

And I mean, I’ve been spending quite a bit of time trying to argue that chimpanzees have a theory of mind and people are historically, I mean, now I think they’re becoming a little more open to it, but historically have been quite opposed to that idea. But we’ll very readily attribute those ideas to an LLM simply because they can answer verbal questions about it.

Abha: We’ll readily attribute human characteristics to LLMs because, unlike the chimpanzees Erica studies, they speak like us. They’re built on our language. And that makes them both more familiar to us on a surface level, and more alien when we try to figure out how they’re actually doing things.

Melanie: Earlier, Erica described a tradeoff in studying intelligence in animals: how much do we gain by using the metrics we’re familiar with, like human language, versus trying to understand animals on their own terms, like elephants that rumble through the ground to communicate?

Abha: And we asked Ellie how this applies to large language models. Does that tradeoff exist with them too?

Ellie: Yeah, totally. From the point of view of LLMs, I actually think within our lab, we do a little bit of both of these. I often talk more about trying to understand LLMs in human terms. Definitely much more so than with animals. LLMs were invented to communicate with us and do things for us. So it is not unreasonable or it’s not unnatural to try to force that analogy, right?

Unlike elephants, which existed long before us and are doing their own thing, and they could care less and would probably prefer that we weren’t there at all, right?

Melanie: On the other hand, Erica finds them more difficult to interpret, because even though they can perform on our terms, the underlying “stuff” that they’re made of is less intuitive for her than animals.

Erica: Again, I’m not sure because, an LLM is not fundamentally a single agent, right? It’s a collective. It’s reflecting collective knowledge, collective information. I feel like I know much more how to interpret a single parrot or a single dolphin or a single orangutan performing on a task. How do they, sort of, how do they interpret it? How do they respond?

To me, that question is very intuitive. I know that mind might be very different from my own, but there is a mind there. There is a self. And whether that self is conscious, whether that self is aware of itself, those I think are big questions, but there is a self. There is something that was born into the world that has narrative continuity and one day will die, we all will, right? LLMs don’t have that.

They aren’t born into the world. They don’t have narrative continuity and they don’t die in the same way that we do. And so I think it’s a collective of a kind that humans have never interacted with before.

And I don’t think that our thinking has caught up with technology. So I just don’t think that we’re asking the right questions about them because I don’t, these are entities or collectives or programs unlike anything else that we have ever experienced in human history.

Abha: So Melanie, let’s recap what we’ve done in this episode. We’ve looked at the notion of assessing intelligence in humans, non-human animals, and machines. The history of thought concerning intelligence is very much human centered. And our ideas about how to assess intelligence, it’s always valued the things that are most human-like.

Melanie: Yeah, I really resonated with Erica’s comment about our lack of imagination doing research on animals. And she showed us how a human-centered view has really dominated research in animal cognition and that it might be blinding us to important aspects of how animals think, not giving them enough credit.

Abha: But sometimes we give animals too much credit by anthropomorphizing them. When you make assumptions about what your dog or cat is quote unquote thinking or feeling, we project our emotions and our notions of the world onto them, right?

Melanie: Yeah, our human-centered assumptions can definitely lead us astray in many ways. But Ellie pointed out similar issues for assessing LLMs. We give them tests that are designed for humans, like the SAT or the bar exam, and then if they pass the test, we make the mistake of assuming the same things that we would for humans passing that test. But it seems that they can pass these tests without actually having the general underlying skills that these tests were meant to assess.

Abha: But Ellie also points out that humans often game these tests. Maybe it’s not the tests themselves that are the problem. Maybe it’s the humans or the animals or the machines that take them.

Melanie: Sure, our methods of assessing human intelligence have always been a bit problematic. But on the other hand, there’s been decades of work on humans trying to understand what general abilities correlate with these test scores while we’re just beginning to figure out how to assess AI systems like LLMs. Ellie’s own work in trying to understand what’s going on under the hood in AI systems, as we described before, is called mechanistic understanding or mechanistic interpretability.

Abha: The way I understood this is that she’s looking at ways to understand LLMs at a higher level than just weights and activations in a neural network. It’s analogous to what neuroscientists are after, right? And understanding the brain without having to look at the activation of every neuron or the strength of every synapse.

Melanie: Yeah, as Ellie said, we need something like fMRIs for LLMs. Or maybe we actually need something entirely different, since as Erica pointed out, an LLM might be better thought of as a collective kind of intelligence rather than an individual. But in any case, this work is really at its inception.

Abha: Yeah, and also as both Ellie and Erica pointed out, we need to understand better what we mean by words like intelligence and understanding, which are not yet rigorously defined, right?

Melanie: Absolutely not. And maybe instead of making grand proclamations like, LLMs understand the world or LLMs can’t understand anything, we should do what Ellie urges us to do. That is to be willing to put up with uncertainty.

Abha: In our final episode of the season, I’ll ask Melanie more about what she thinks about all these topics. You’ll hear about her background in the field of intelligence, her views on AGI and if we can achieve it, how sustainable the industry is, and if she’s worried about AI in the future.

That’s next time, on Complexity. Complexity is the official podcast of the Santa Fe Institute. This episode was produced by Katherine Moncure. Our theme song is by Mitch Mignano, and additional music from Blue Dot Sessions. I’m Abha, thanks for listening.

◆

If you enjoyed this article…

◆

Learn more about the coaching process or
contact me to discuss your storytelling goals!

◆

Subscribe to the newsletter for the latest updates!