Why Aren't We Using AI to Save Dying Knowledge?

When I was young, our neighbor in Vietnam was a quiet medicine man. Long white hair, long white beard, someone you’d expect to step out of some old East Asian stories. He kept to himself mostly, except when someone in the village needed healing. He’d often venture deep into the mountains to collect plants, roots and herbs, returning with remedies that had been passed down through generations.
“My children don’t really read much. Why don’t you keep this and pass it to someone who might find it useful?”
Before my family moved to the United States, he handed my father a handwritten book. Pages of careful notes and illustrations of plants documenting his knowledge and experience.
That book is almost certainly lost now. Somewhere in the chaos of zealous Communist party members purging my father’s library of what they considered tied to the former South Vietnamese government, of my family’s immigration to the US, of starting over, of survival, it disappeared.
In Boston today, a family friend carries a similar burden, though his knowledge lives in muscle memory and historical legends rather than books or medicine. He inherited Wing Chun from a master who fled China for Vietnam, carrying a martial arts lineage that traced back centuries. Our friend refined that inheritance through decades of street fighting and underground rings. Deep, embodied knowledge that takes a lifetime to acquire.
He practices alone in his garage after work now, striking a wooden dummy made from a type of wood that is no longer available on the market. There’s no one to pass it to. Eventually, that art will be lost too.
We built seed vaults for biodiversity. We haven’t built knowledge vaults for human memory, and AI is the first tool that makes that possible at scale.
What We’re Losing
My neighbor and our family friend aren’t anomalies. This is happening across every continent.
A language dies every 40 days. According to Ethnologue, 3,193 languages are endangered today, representing 44% of the world’s 7,168 living languages. Over 88 million people speak languages at risk of extinction. At current rates, 90% of the world’s languages could disappear by the end of this century. When a language dies, entire frameworks for understanding the world vanish with it: medicinal knowledge encoded in plant names, spiritual concepts that don’t translate, ways of seeing that took millennia to develop.
The digital world mirrors this decay. Pew Research found that 38% of webpages from 2013 are no longer accessible. One quarter of all webpages that existed between 2013 and 2023 have disappeared entirely. Link rot affects 66.5% of URLs in academic papers. Even Supreme Court opinions contain broken links 49% of the time.
The internet feels permanent. But we’re living through the largest information loss event in human history, so gradual it barely registers as noise.
Where Archives Fall Short
Even after reaching one trillion archived webpages in October 2025, the Internet Archive captures less than 0.1% of all content published online. They process about 500 million pages per day against 1.7 billion websites generating new content continuously. The gap keeps widening.
For physical knowledge, archives aren’t even an option. My neighbor’s medicine book contained decades of observations about which plants worked, in what combinations, at what times of year. The Wing Chun master’s lineage includes internal mechanics, timing instincts, and combat principles passed down orally, knowledge that only reveals itself through practice. This kind of knowledge resists being written down. Recording it traditionally takes years, if it’s possible at all.
Much of this knowledge lives with people already past the age of training successors.
What AI Changes
Large Language Models (LLMs) offer a different approach to preservation, one based on scale and interaction rather than exhaustive documentation.
Traditional documentation creates static records. A dictionary captures vocabulary. A video captures a performance. But an LLM trained on a language can interact, generate new sentences, help learners practice conversation. The difference between a photograph of a person and a simulation that can respond.
Governments are starting to invest accordingly. India launched BharatGen in June 2025, the country’s first multimodal LLM covering 22 Indian languages. More striking is Adi Vaani, the world’s first AI-powered tribal language bridge, currently supporting Santali, Mundari, Gondi, and Bhili for 22 million speakers, with plans to expand to 50 endangered languages by 2030. Twelve Latin American countries are collaborating on Latam-GPT, which prioritizes Indigenous language preservation, including an initial translation tool for Rapa Nui. Nigeria unveiled its first multilingual LLM at GITEX 2025 with $3.5 million in funding. In New Zealand, Te Hiku Media’s Papa Reo continues advancing Māori language preservation with 92% transcription accuracy.
These projects show that even with limited training data, AI can create interactive preservation of languages that would otherwise exist only in aging memories. The models aren’t perfect, but they’re functional. And they improve as more data becomes available.
Beyond Language
Language gets the attention, but other knowledge faces the same cliff.
Traditional medicine systems that took centuries to develop through careful observation. The specific knowledge of which plants treat which conditions, gathered through generations of trial and error, exists primarily in the minds of aging practitioners in rural villages worldwide.
Martial arts lineages that preserve not just techniques but principles of movement, timing, and strategy refined through actual combat over generations. When the last master of a style dies without students, that entire branch of human physical knowledge disappears.
Craft traditions that encode deep understanding of materials, from blacksmithing to textile weaving to boat building. The tacit knowledge of how wood behaves, how metal moves, how fibers interact, knowledge that can’t be fully captured in written instructions.
Each of these represents accumulated human learning that took generations to develop and can vanish in a single generation of neglect.
Seed Vaults Exist. Knowledge Vaults Don’t.
The Svalbard Global Seed Vault in Norway stores over 1.3 million seed samples from around the world. It exists because we recognized that agricultural biodiversity was disappearing faster than natural systems could regenerate, and that losing crop varieties meant losing genetic information we might need to adapt to future conditions.
Some knowledge is irreplaceable. Loss is accelerating. Preservation requires deliberate infrastructure investment.
Why don’t we have knowledge vaults?
Not archives of recordings and documents, but trained AI models that can interact with endangered knowledge. Models that could help a young person in the Amazon learn their grandmother’s language. Models that could help a martial artist in Vietnam access techniques from a lineage otherwise lost. Models that could help a healer in Nigeria understand plant medicine traditions that were never written down.
We have the technical capability. We lack the infrastructure and intention.
Funding incentives favor commercial models, and preservation work sits awkwardly between academia, governments, and communities. No one has owned the problem.
Ethics and Ownership
Language communities have legitimate concerns about who controls their linguistic data. Indigenous groups have experienced centuries of extraction, where outsiders documented their knowledge and profited from it while the communities themselves gained nothing.
Te Hiku Media in New Zealand created the Kaitiakitanga License specifically to ensure that Māori data can only be used in ways that benefit Māori people. Any knowledge preservation effort must center the communities whose knowledge is being preserved, giving them control over access and use.
There’s also the question of accuracy. AI models can hallucinate. In December 2024, AI-generated language learning books containing incorrect translations of Indigenous languages were discovered being sold on Amazon. Bad preservation might be worse than no preservation if it replaces authentic knowledge with confident errors.
Community oversight, rigorous validation, transparent limitations can address these risks. Letting irreplaceable knowledge disappear because preservation is complicated doesn’t avoid the problem. It guarantees the worst outcome.
What Would It Take?
Building knowledge vaults would require coordination between AI researchers, linguists, anthropologists, and most importantly, the communities holding endangered knowledge. Funding at a scale that treats cultural preservation as infrastructure, not charity. New methodologies for capturing tacit knowledge, the kind that lives in bodies and habits rather than words.
India’s Adi Vaani demonstrates how government-backed AI can serve tribal communities rather than extract from them. Te Hiku Media shows how Indigenous-led initiatives can maintain data sovereignty while leveraging cutting-edge technology. Quebec’s FLAIR initiative proves that academic institutions can partner with communities effectively. The Internet Archive’s designation as a Federal Depository Library in July 2025 signals growing recognition that digital preservation is critical infrastructure.
These remain scattered efforts. We need a coordinated global initiative, the knowledge equivalent of the seed vault network.
Every year we wait means more practitioners gone, more languages silent, more lineages broken.
My neighbor’s medicine book is probably in a landfill somewhere. That knowledge is gone. But there are still medicine practitioners in villages throughout Vietnam, throughout the world, whose knowledge could be preserved if we built the systems to do it.
The Wing Chun master is still practicing in his garage. His knowledge could still be captured, modeled, made available to future generations who might want to learn.
AI can help preserve endangered knowledge. Whether we’ll invest in building the infrastructure before it’s too late is a different question.
What irreplaceable knowledge exists only in someone’s memory near you, and what would it take to preserve it before it’s gone?