Jimmy Wales is considering using GPT to write Wikipedia

Jimmy Wales is weighing up whether to begin having Large LAnguage Model AIs like GPT write Wikipedia. Wikipedia
–

By Loz Blain
2023 Apr 03

–

Despite its frequent and serious inaccuracies, GPT has got Wikipedia founder Jimmy Wales thinking seriously about how AI might become part of the workflow at the largest and most-read reference repository in the history of mankind.

In an interview with Daniel Hambury at London’s Evening Standard newspaper, Wales chews over some of the issues inherent with the technology – particularly its tendency to “hallucinate,” or flat-out make things up – but points out that “using AI to triple the number of Wikipedia entries wouldn’t increase our running costs by more than £1,000 a year.”

One early use case, says Wales, might be to use a large language model (LLM) like GPT to compare multiple articles, looking for points that contradict one another, and use its findings to identify pieces that Wikipedia’s army of human volunteers could need to put some work into.

But he’s definitely considering just having these LLMs write pages.

“I think we’re still a way away from: ‘ChatGPT, please write a Wikipedia entry about the Empire State Building,'” he tells Hambury, “but I don’t know how far away we are from that, certainly closer than I would have thought two years ago.”

One possible scenario is to have the AI go through looking for all the many gaps on Wikipedia – potentially useful pages that have never been written – and attempting to create summary entries for them using information from the Web.

But Wales is aware that Wikipedia’s entire reputation is founded on the perception of accuracy, and that this is currently a huge problem with LLMs like GPT.

“It has a tendency to just make stuff up out of thin air which is just really bad for Wikipedia,” he says. “That’s just not OK. We’ve got to be really careful about that.”

If LLMs begin writing a central knowledge repository like Wikipedia, hallucinations or lies that aren’t immediately caught will begin to snowball. People will use those non-facts in their own writing, and subsequent AIs will be trained with these non-facts baked in, making it difficult to correct them in the longer term and driving us deeper into this “post-truth” era.

Wales is also concerned about whether using LLMs to expand the resource could help with, or exacerbate, Wikipedia’s issues of systemic and unconscious bias; the resource is currently written and maintained by volunteers, an overwhelming majority of whom are white males, so the site has a tendency to ignore topics that aren’t of interest to this group, and cover other topics from a certain perspective.

ChatGPT has been explicitly designed to attempt a balanced perspective on topics where it can, in an attempt to bring some nuance back to discussion areas where people from different sides are increasingly finding it harder to start from any common ground. But GPT has its own bias problems inherent in its training data.

It’s a thorny topic, and it’s certainly got me considering whether I keep donating to the site if it goes down that road. But realistically, any organization that isn’t re-orienting around the phenomenal capabilities of next-gen LLMs is putting itself at a huge disadvantage, and it’s crazy to expect this stuff won’t start getting rolled in everywhere.

Source: Evening Standard
–

Tags

–
Technology Creative AI Artificial Intelligence GPT Wikipedia
–

Loz Blain

Loz has been one of our most versatile contributors since 2007, and has since proven himself as a photographer, videographer, presenter, producer and podcast engineer, as well as a senior features writer. Joining the team as a motorcycle specialist, he’s covered just about everything for New Atlas, concentrating lately on eVTOLs, hydrogen, energy, aviation, audiovisual, weird stuff and things that go fast.
–

4 comments

NL_01 – 2023 Apr 04 – 08:50 AM

Thanks for your ongoing excellent coverage of all the developments around LLMs/generative models Loz.
(I’m a long-time reader of New Atlas but recently subscribed now that it’s possible in the UK, happy to finally be able to support this site)

paul314 – 2023 Apr 04 – 10:11 AM

It seems that the current (and short-term foreseeable) generative AIs can be really useful in producing first drafts for skilled, knowledgeable humans to edit/polish. But everyone wants to skip that step. With something like wikipedia, you might be able to at least use some kind of basic old-school language processing to determine whether, say, the claimed citations existed and/or supported the statements made in a generated article.

Cryptonoetic – 2023 Apr 04 – 11:50 AM

This would be a disaster. Already there are several cases where GPT produced answers that were not only demonstrably (and sometimes risibly) false, but GPT then proceeded to cite non-existent sources as authoritative evidence for its false claims. GPT not only lies, it lies about its lies. The problem with LLM (large language models) is that they are, by definition, based on human communication which is not exactly known for absence of malice and mendacity. In reflecting the human communication model which it mimics, GPT will dauntlessly obfuscate, inveigle, deceive, misinform, and outright lie without the possibility of ever recognizing that it is doing so. Almost by definition there can never be an ethical or reliably truthful LLM-based AI.

Fairly Reasoner – 2023 Apr 04 – 12:45 PM

Not enough inaccuracies in Wikipedia already?
–