The K Prize: A New Bar for AI-Powered Software Engineers
A recent AI coding challenge has crowned its first champion, setting a new standard for AI-powered software engineers. Laude Institute, a nonprofit organization, revealed the winner of the K Prize, a multi-round AI coding challenge initiated by Databricks and Perplexity co-founder Andy Konwinski. The victory went to Eduardo Rocha de Andrade, a prompt engineer from Brazil, who will be awarded $50,000 for his achievement. What made his win even more remarkable was his final score – he answered just 7.5% of the test questions correctly.
Konwinski expressed his satisfaction with the difficulty level of the benchmark, stating, “Benchmarks should be challenging to be meaningful.” He further explained that the K Prize, which operates offline with limited computing resources, favors smaller and open models, thus leveling the playing field. In a bold move, Konwinski has pledged $1 million to the first open-source model that can achieve a score higher than 90% on the test.
The K Prize differs from the well-known SWE-Bench system by testing models against flagged issues from GitHub, simulating real-world programming challenges. While SWE-Bench relies on a fixed set of problems for training models, the K Prize ensures a “contamination-free” environment by implementing a timed entry system to prevent benchmark-specific training. The test for round one was constructed using only GitHub issues flagged after March 12.
The top score of 7.5% on the K Prize test starkly contrasts with SWE-Bench’s current top scores of 75% and 34% on its “Verified” and “Full” tests, respectively. Konwinski is uncertain whether this difference is due to contamination in SWE-Bench or the difficulty of sourcing new GitHub issues. However, he anticipates that the K Prize project will provide clarity on this matter in the near future.
As more iterations of the K Prize occur, Konwinski expects a better understanding of the dynamics of competition. He believes that regular participation in the challenge will enable participants to adapt and improve their performance over time.
Techcrunch event
San Francisco
|
October 27-29, 2025
While there is a plethora of AI coding tools available, the difficulty level of benchmarks has been a growing concern. Projects like the K Prize aim to address this issue and enhance the evaluation of AI technologies.
Princeton researcher Sayash Kapoor believes that developing new tests for existing benchmarks is crucial for identifying and resolving evaluation challenges. He emphasizes the importance of experiments to determine the root cause of issues such as contamination or strategic targeting of leaderboard rankings.
For Konwinski, the K Prize represents not only a superior benchmark but also a challenge to the entire industry. He emphasizes the need for a reality check in the face of inflated expectations surrounding AI technologies. Achieving a score of more than 10% on a contamination-free SWE-Bench serves as a stark reminder of the current limitations in AI development.