The Japan Times - Inner workings of AI an enigma

The Japan Times - Inner workings of AI an enigma - even to its creators

Tokyo 8°C

EUR -

AED 4.315152

AFN 77.708509

ALL 96.852138

AMD 448.491142

ANG 2.103707

AOA 1077.46608

ARS 1692.867744

AUD 1.766731

AWG 2.114983

AZN 1.996065

BAM 1.958827

BBD 2.365606

BDT 143.531799

BGN 1.957646

BHD 0.442923

BIF 3471.553207

BMD 1.174991

BND 1.516883

BOB 8.115541

BRL 6.345419

BSD 1.17454

BTN 106.215586

BWP 15.56238

BYN 3.462451

BYR 23029.817846

BZD 2.36217

CAD 1.617428

CDF 2631.978985

CHF 0.93526

CLF 0.027299

CLP 1070.885484

CNY 8.288974

CNH 8.27372

COP 4466.84467

CRC 587.522896

CUC 1.174991

CUP 31.137254

CVE 110.435656

CZK 24.285177

DJF 209.15766

DKK 7.470444

DOP 74.667289

DZD 152.34334

EGP 55.789738

ERN 17.624861

ETB 183.52108

FJD 2.648192

FKP 0.879185

GBP 0.877671

GEL 3.168367

GGP 0.879185

GHS 13.482835

GIP 0.879185

GMD 85.774311

GNF 10213.261358

GTQ 8.995863

GYD 245.719709

HKD 9.144171

HNL 30.922442

HRK 7.532747

HTG 153.951832

HUF 385.151393

IDR 19592.088787

ILS 3.766621

IMP 0.879185

INR 106.613135

IQD 1538.577555

IRR 49493.544354

ISK 148.41283

JEP 0.879185

JMD 188.054601

JOD 0.833059

JPY 182.086549

KES 151.515079

KGS 102.752804

KHR 4702.386633

KMF 492.911492

KPW 1057.491268

KRW 1720.480396

KWD 0.36051

KYD 0.978813

KZT 612.546565

LAK 25462.346819

LBP 105176.728999

LKR 362.920819

LRD 207.301224

LSL 19.815521

LTL 3.469442

LVL 0.710741

LYD 6.379995

MAD 10.805297

MDL 19.854766

MGA 5203.151106

MKD 61.58937

MMK 2466.617904

MNT 4166.358748

MOP 9.418054

MRU 47.004836

MUR 53.990968

MVR 18.088629

MWK 2036.690621

MXN 21.126092

MYR 4.808648

MZN 75.093803

NAD 19.815521

NGN 1705.53442

NIO 43.227904

NOK 11.911281

NPR 169.94896

NZD 2.027652

OMR 0.451782

PAB 1.174515

PEN 3.954311

PGK 5.062068

PHP 69.231624

PKR 329.162758

PLN 4.221642

PYG 7889.359242

QAR 4.280496

RON 5.094291

RSD 117.388641

RUB 92.967943

RWF 1709.478019

SAR 4.40866

SBD 9.607607

SCR 17.223335

SDG 706.756952

SEK 10.910905

SGD 1.51451

SHP 0.881547

SLE 28.346692

SLL 24638.971924

SOS 670.04968

SRD 45.293589

STD 24319.935326

STN 24.534259

SVC 10.276881

SYP 12991.498391

SZL 19.808863

THB 36.931722

TJS 10.793679

TMT 4.124217

TND 3.433491

TOP 2.829096

TRY 50.173396

TTD 7.970316

TWD 36.798371

TZS 2916.912694

UAH 49.627044

UGX 4174.450755

USD 1.174991

UYU 46.090635

UZS 14149.865707

VES 314.239221

VND 30925.755393

VUV 142.323844

WST 3.261166

XAF 656.986216

XAG 0.018396

XAU 0.000271

XCD 3.175471

XCG 2.116771

XDR 0.81708

XOF 656.986216

XPF 119.331742

YER 280.241445

ZAR 19.712468

ZMK 10576.317779

ZMW 27.102111

ZWL 378.346528

SCS

0.0200

16.14

+0.12%
CMSC

-0.1300

23.3

-0.56%
BCE

0.3100

23.71

+1.31%
BTI

-1.2700

57.1

-2.22%
NGG

0.2400

74.93

+0.32%
JRI

-0.0200

13.7

-0.15%
RIO

-1.0800

75.66

-1.43%
GSK

-0.0700

48.81

-0.14%
BCC

0.2500

76.51

+0.33%
RBGPF

0.0000

81.17

0%
BP

-0.2700

35.26

-0.77%
AZN

-0.4600

89.83

-0.51%
RYCEF

-0.2500

14.6

-1.71%
CMSD

-0.1500

23.25

-0.65%
RELX

0.1000

40.38

+0.25%
VOD

0.0500

12.59

+0.4%

Inner workings of AI an enigma - even to its creators / Photo: Kirill KUDRYAVTSEV - AFP

Inner workings of AI an enigma - even to its creators

ECONOMY 13.05.2025

Even the greatest human minds building generative artificial intelligence that is poised to change the world admit they do not comprehend how digital minds think.

Text size:

"People outside the field are often surprised and alarmed to learn that we do not understand how our own AI creations work," Anthropic co-founder Dario Amodei wrote in an essay posted online in April.

"This lack of understanding is essentially unprecedented in the history of technology."

Unlike traditional software programs that follow pre-ordained paths of logic dictated by programmers, generative AI (gen AI) models are trained to find their own way to success once prompted.

In a recent podcast Chris Olah, who was part of ChatGPT-maker OpenAI before joining Anthropic, described gen AI as "scaffolding" on which circuits grow.

Olah is considered an authority in so-called mechanistic interpretability, a method of reverse engineering AI models to figure out how they work.

This science, born about a decade ago, seeks to determine exactly how AI gets from a query to an answer.

"Grasping the entirety of a large language model is an incredibly ambitious task," said Neel Nanda, a senior research scientist at the Google DeepMind AI lab.

It was "somewhat analogous to trying to fully understand the human brain," Nanda added to AFP, noting neuroscientists have yet to succeed on that front.

Delving into digital minds to understand their inner workings has gone from a little-known field just a few years ago to being a hot area of academic study.

"Students are very much attracted to it because they perceive the impact that it can have," said Boston University computer science professor Mark Crovella.

The area of study is also gaining traction due to its potential to make gen AI even more powerful, and because peering into digital brains can be intellectually exciting, the professor added.

- Keeping AI honest -

Mechanistic interpretability involves studying not just results served up by gen AI but scrutinizing calculations performed while the technology mulls queries, according to Crovella.

"You could look into the model...observe the computations that are being performed and try to understand those," the professor explained.

Startup Goodfire uses AI software capable of representing data in the form of reasoning steps to better understand gen AI processing and correct errors.

The tool is also intended to prevent gen AI models from being used maliciously or from deciding on their own to deceive humans about what they are up to.

"It does feel like a race against time to get there before we implement extremely intelligent AI models into the world with no understanding of how they work," said Goodfire chief executive Eric Ho.

In his essay, Amodei said recent progress has made him optimistic that the key to fully deciphering AI will be found within two years.

"I agree that by 2027, we could have interpretability that reliably detects model biases and harmful intentions," said Auburn University associate professor Anh Nguyen.

According to Boston University's Crovella, researchers can already access representations of every digital neuron in AI brains.

"Unlike the human brain, we actually have the equivalent of every neuron instrumented inside these models", the academic said. "Everything that happens inside the model is fully known to us. It's a question of discovering the right way to interrogate that."

Harnessing the inner workings of gen AI minds could clear the way for its adoption in areas where tiny errors can have dramatic consequences, like national security, Amodei said.

For Nanda, better understanding what gen AI is doing could also catapult human discoveries, much like DeepMind's chess-playing AI, AlphaZero, revealed entirely new chess moves that none of the grand masters had ever thought about.

Properly understood, a gen AI model with a stamp of reliability would grab competitive advantage in the market.

Such a breakthrough by a US company would also be a win for the nation in its technology rivalry with China.

"Powerful AI will shape humanity's destiny," Amodei wrote.

"We deserve to understand our own creations before they radically transform our economy, our lives, and our future."

M.Saito--JT

The Japan Times - Inner workings of AI an enigma - even to its creators

Inner workings of AI an enigma - even to its creators

Featured

Eurovision 2026 will feature 35 countries: organisers

German shipyard, rescued by the state, gets mega deal

'We are angry': Louvre Museum closed as workers strike

Stocks diverge ahead of central bank calls, US data