“We’ve achieved peak data and there’ll be no more,” OpenAI’s former chief scientist told a crowd of AI researchers.
Ilya Sutskever, co-founder and former chief scientist of OpenAI, recently shared his thoughts on how the development of AI is about to undergo a significant shift. Speaking to a group of AI researchers, he said, “We have already accessed the best data, and there won’t be any more of it.”
Earlier this year, Sutskever made headlines when he left OpenAI to start his own AI lab, Safe Superintelligence Inc. Since then, he had mostly stayed out of the public eye until his rare appearance on Friday at the Neural Information Processing Systems (NeurIPS) conference in Vancouver.
On stage, Sutskever remarked, “Pre-training, as we know it, will undoubtedly come to an end.” This refers to the first phase of AI model development, where a large language model learns patterns from massive amounts of unlabelled data—typically sourced from the internet, books, and other texts. He reiterated, “We have already accessed the best data, and there won’t be any more of it.”
During his talk, he acknowledged that while current data can still drive AI progress, the industry is now utilizing new types of data for training. He compared this situation to fossil fuels: just as oil is a finite resource, the internet contains a limited amount of human-created content. “We have to work with the data we have. There is only one internet,” he emphasized.
Sutskever predicted that the next generation of models will become agentic in “real ways.” While he didn’t define the term in his talk, “agents” in AI usually refer to autonomous systems capable of performing tasks, making decisions, and interacting with software independently.
In addition to being agentic, Sutskever said future systems would have reasoning capabilities. Unlike today’s AI, which primarily relies on pattern-matching based on previously seen data, future AI systems will be able to solve problems step by step in a way that more closely resembles human thought. He explained, “The more reasoning a system does, the more unpredictable it becomes.”
To illustrate, he compared the unpredictability of truly reasoning systems to advanced chess-playing AI, which often surprises even the best human players. He added, “They will understand things from limited data. They won’t get confused.”
Sutskever also drew parallels between AI system scaling and evolutionary biology. He referred to research that shows a relationship between brain and body mass in different species. While most mammals follow a consistent scaling pattern, hominids (human ancestors) deviate with a logarithmic scaling of brain-to-body mass ratios. Just as evolution discovered a new scaling pattern for the hominid brain, Sutskever suggested that AI might similarly evolve beyond today’s pre-training methods.
After his talk, an audience member asked how researchers could create incentive systems that align AI development with humanity’s values, ensuring it respects the freedoms we enjoy as humans. Sutskever paused thoughtfully before responding, “I think these are questions people should think about more deeply.” He admitted he didn’t feel confident answering such questions, as they would likely require “a top-down governmental structure.”
When the audience member mentioned cryptocurrency as a possible solution, the room erupted in laughter. Sutskever replied, “I don’t think I’m the right person to comment on cryptocurrency, but there’s a chance that what you’re describing could happen.” He continued, “In some ways, it wouldn’t be a bad outcome if AI coexisted with us peacefully and wanted rights. Maybe that would be okay… I think things are incredibly unpredictable. I hesitate to comment, but I encourage speculation.”