The Japan Times - Inbred, gibberish or just MAD? Warnings rise about AI models

EUR -
AED 4.231245
AFN 73.725097
ALL 95.962768
AMD 434.735824
ANG 2.062095
AOA 1056.342299
ARS 1606.393999
AUD 1.626239
AWG 2.073519
AZN 1.957604
BAM 1.95412
BBD 2.323522
BDT 141.558314
BGN 1.969047
BHD 0.434928
BIF 3421.305633
BMD 1.151955
BND 1.473031
BOB 7.97187
BRL 5.995001
BSD 1.153668
BTN 106.985319
BWP 15.644465
BYN 3.516233
BYR 22578.31327
BZD 2.320215
CAD 1.578374
CDF 2614.937616
CHF 0.909578
CLF 0.026702
CLP 1054.361214
CNY 7.917443
CNH 7.932522
COP 4269.950704
CRC 538.818112
CUC 1.151955
CUP 30.526801
CVE 111.797223
CZK 24.444653
DJF 204.725614
DKK 7.472483
DOP 69.175247
DZD 152.537418
EGP 60.177999
ERN 17.279321
ETB 180.856753
FJD 2.548643
FKP 0.863331
GBP 0.863321
GEL 3.127603
GGP 0.863331
GHS 12.562006
GIP 0.863331
GMD 85.244374
GNF 10114.162901
GTQ 8.837288
GYD 241.357858
HKD 9.029004
HNL 30.607446
HRK 7.53747
HTG 151.189535
HUF 391.62372
IDR 19539.456616
ILS 3.571117
IMP 0.863331
INR 106.993323
IQD 1509.060734
IRR 1514820.507162
ISK 143.2575
JEP 0.863331
JMD 181.144285
JOD 0.81669
JPY 183.535768
KES 149.235866
KGS 100.738475
KHR 4619.338365
KMF 493.036529
KPW 1036.734401
KRW 1729.129827
KWD 0.353005
KYD 0.961307
KZT 556.522279
LAK 24709.429743
LBP 103157.548449
LKR 359.231198
LRD 211.211295
LSL 19.376215
LTL 3.401423
LVL 0.696806
LYD 7.349679
MAD 10.798136
MDL 20.113313
MGA 4803.651589
MKD 61.677112
MMK 2419.224151
MNT 4113.747641
MOP 9.313507
MRU 46.21601
MUR 53.577753
MVR 17.809319
MWK 1999.793406
MXN 20.387203
MYR 4.51048
MZN 73.611468
NAD 19.375558
NGN 1563.13347
NIO 42.300018
NOK 11.020803
NPR 171.170971
NZD 1.970788
OMR 0.442921
PAB 1.153663
PEN 3.948325
PGK 4.956574
PHP 68.866739
PKR 321.735508
PLN 4.267705
PYG 7456.072821
QAR 4.197681
RON 5.092557
RSD 117.454429
RUB 96.613944
RWF 1680.701993
SAR 4.325527
SBD 9.267752
SCR 16.230038
SDG 692.324942
SEK 10.747156
SGD 1.473891
SHP 0.864264
SLE 28.395712
SLL 24155.927782
SOS 658.342883
SRD 43.054339
STD 23843.137717
STN 24.767027
SVC 10.094191
SYP 127.389792
SZL 19.375564
THB 37.565572
TJS 11.034248
TMT 4.031842
TND 3.360832
TOP 2.77363
TRY 50.935521
TTD 7.820006
TWD 36.757731
TZS 2999.3791
UAH 50.735507
UGX 4340.193737
USD 1.151955
UYU 46.719839
UZS 14025.049287
VES 519.46575
VND 30307.9297
VUV 137.765566
WST 3.149103
XAF 655.348139
XAG 0.015
XAU 0.000236
XCD 3.113216
XCG 2.079141
XDR 0.814294
XOF 652.58393
XPF 119.331742
YER 274.827596
ZAR 19.358311
ZMK 10368.954649
ZMW 22.559726
ZWL 370.928962
  • CMSC

    -0.1200

    22.83

    -0.53%

  • RYCEF

    -0.1800

    16.6

    -1.08%

  • NGG

    -3.0200

    87.4

    -3.46%

  • RBGPF

    0.1000

    82.5

    +0.12%

  • GSK

    -1.3500

    52.06

    -2.59%

  • VOD

    -0.3800

    14.37

    -2.64%

  • BCE

    -0.2600

    25.75

    -1.01%

  • AZN

    -2.8700

    188.42

    -1.52%

  • RIO

    -2.0800

    87.72

    -2.37%

  • BTI

    -2.4600

    58.09

    -4.23%

  • CMSD

    0.0100

    22.89

    +0.04%

  • JRI

    -0.1370

    12.323

    -1.11%

  • BCC

    -1.0800

    71.84

    -1.5%

  • RELX

    -0.4300

    33.86

    -1.27%

  • BP

    0.7600

    44.61

    +1.7%

Inbred, gibberish or just MAD? Warnings rise about AI models
Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

M.Saito--JT