The Japan Times - Inbred, gibberish or just MAD? Warnings rise about AI models

Tokyo 18°C

EUR -

AED 4.302854

AFN 74.39904

ALL 95.619662

AMD 433.096644

ANG 2.097102

AOA 1075.566716

ARS 1631.816974

AUD 1.625293

AWG 2.108954

AZN 1.995753

BAM 1.956194

BBD 2.354894

BDT 143.458887

BGN 1.954417

BHD 0.442091

BIF 3479.30059

BMD 1.171641

BND 1.493001

BOB 8.078627

BRL 5.774663

BSD 1.169245

BTN 111.345371

BWP 15.889199

BYN 3.309995

BYR 22964.162049

BZD 2.351494

CAD 1.593824

CDF 2712.34812

CHF 0.915807

CLF 0.027076

CLP 1065.65458

CNY 8.002717

CNH 7.99335

COP 4356.66624

CRC 531.909375

CUC 1.171641

CUP 31.048484

CVE 110.287207

CZK 24.385828

DJF 208.203701

DKK 7.473517

DOP 69.664325

DZD 155.202576

EGP 62.816941

ERN 17.574614

ETB 183.843603

FJD 2.568881

FKP 0.865677

GBP 0.863441

GEL 3.145891

GGP 0.865677

GHS 13.106639

GIP 0.865677

GMD 85.530247

GNF 10261.066162

GTQ 8.922931

GYD 244.609254

HKD 9.181037

HNL 31.079391

HRK 7.534943

HTG 153.020812

HUF 361.335815

IDR 20386.024784

ILS 3.444159

IMP 0.865677

INR 111.529086

IQD 1534.849606

IRR 1541879.451952

ISK 143.22135

JEP 0.865677

JMD 183.987048

JOD 0.830677

JPY 184.692202

KES 151.001407

KGS 102.425437

KHR 4689.944364

KMF 492.677052

KPW 1054.48057

KRW 1712.986437

KWD 0.36083

KYD 0.974305

KZT 543.294034

LAK 25675.38912

LBP 104701.476252

LKR 374.148532

LRD 214.545032

LSL 19.566907

LTL 3.459551

LVL 0.708714

LYD 7.417557

MAD 10.806076

MDL 20.180236

MGA 4869.980616

MKD 61.652941

MMK 2460.102223

MNT 4192.842457

MOP 9.437581

MRU 46.685799

MUR 55.008529

MVR 18.107702

MWK 2027.408238

MXN 20.30653

MYR 4.638298

MZN 74.858342

NAD 19.566907

NGN 1600.402999

NIO 43.028664

NOK 10.830268

NPR 178.151633

NZD 1.984039

OMR 0.450615

PAB 1.169235

PEN 4.099025

PGK 5.084024

PHP 72.114016

PKR 325.824098

PLN 4.245517

PYG 7084.486994

QAR 4.272567

RON 5.238762

RSD 117.400755

RUB 88.460002

RWF 1709.544233

SAR 4.395789

SBD 9.403436

SCR 16.361155

SDG 703.569739

SEK 10.832909

SGD 1.492536

SHP 0.874748

SLE 28.851629

SLL 24568.719798

SOS 668.234555

SRD 43.909597

STD 24250.601528

STN 24.504934

SVC 10.230147

SYP 129.502321

SZL 19.562605

THB 37.996671

TJS 10.931995

TMT 4.106601

TND 3.385462

TOP 2.82103

TRY 52.990864

TTD 7.925664

TWD 36.977176

TZS 3042.965869

UAH 51.381846

UGX 4413.888778

USD 1.171641

UYU 47.069635

UZS 14070.953414

VES 578.197718

VND 30843.447241

VUV 138.868188

WST 3.182096

XAF 656.08911

XAG 0.015866

XAU 0.000255

XCD 3.166418

XCG 2.107142

XDR 0.815964

XOF 656.094711

XPF 119.331742

YER 279.553326

ZAR 19.424055

ZMK 10546.163634

ZMW 22.068632

ZWL 377.267898

CMSD

0.0400

23.29

+0.17%
NGG

0.1400

87.64

+0.16%
CMSC

0.0099

22.88

+0.04%
GSK

-0.5200

50.38

-1.03%
RIO

1.8700

100.5

+1.86%
BCE

0.1700

24.1

+0.71%
BP

-0.4400

46.5

-0.95%
BTI

1.0500

59.4

+1.77%
AZN

-2.2200

181.24

-1.22%
BCC

-2.2000

72.13

-3.05%
JRI

0.1100

13.04

+0.84%
RBGPF

1.6000

64.7

+2.47%
RYCEF

-0.0200

16.33

-0.12%
VOD

-0.3100

15.74

-1.97%
RELX

-0.2000

36.16

-0.55%

Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

TECHNOLOGY 05.08.2024

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

M.Saito--JT

The Japan Times - Inbred, gibberish or just MAD? Warnings rise about AI models

Inbred, gibberish or just MAD? Warnings rise about AI models

Featured

Digi Power X Signs AI Colocation Agreement with Leading AI Compute Company for 40 MW Data Center in Columbiana, Alabama

Apple earnings beat forecasts on iPhone 17 demand

Musk grilled on AI profits at OpenAI trial

Drivers help study road-trip mystery: what became of bug splats?