A new AI coding challenge just published its first results — and they aren’t pretty

The K Prize: A New Bar for AI-Powered Software Engineers

A recent AI coding challenge has crowned its first champion, setting a new standard for AI-powered software engineers. Laude Institute, a nonprofit organization, revealed the winner of the K Prize, a multi-round AI coding challenge initiated by Databricks and Perplexity co-founder Andy Konwinski. The victory went to Eduardo Rocha de Andrade, a prompt engineer from Brazil, who will be awarded $50,000 for his achievement. What made his win even more remarkable was his final score – he answered just 7.5% of the test questions correctly.

Konwinski expressed his satisfaction with the difficulty level of the benchmark, stating, “Benchmarks should be challenging to be meaningful.” He further explained that the K Prize, which operates offline with limited computing resources, favors smaller and open models, thus leveling the playing field. In a bold move, Konwinski has pledged $1 million to the first open-source model that can achieve a score higher than 90% on the test.

The K Prize differs from the well-known SWE-Bench system by testing models against flagged issues from GitHub, simulating real-world programming challenges. While SWE-Bench relies on a fixed set of problems for training models, the K Prize ensures a “contamination-free” environment by implementing a timed entry system to prevent benchmark-specific training. The test for round one was constructed using only GitHub issues flagged after March 12.

The top score of 7.5% on the K Prize test starkly contrasts with SWE-Bench’s current top scores of 75% and 34% on its “Verified” and “Full” tests, respectively. Konwinski is uncertain whether this difference is due to contamination in SWE-Bench or the difficulty of sourcing new GitHub issues. However, he anticipates that the K Prize project will provide clarity on this matter in the near future.

As more iterations of the K Prize occur, Konwinski expects a better understanding of the dynamics of competition. He believes that regular participation in the challenge will enable participants to adapt and improve their performance over time.

Techcrunch event

San Francisco
|
October 27-29, 2025

While there is a plethora of AI coding tools available, the difficulty level of benchmarks has been a growing concern. Projects like the K Prize aim to address this issue and enhance the evaluation of AI technologies.

Princeton researcher Sayash Kapoor believes that developing new tests for existing benchmarks is crucial for identifying and resolving evaluation challenges. He emphasizes the importance of experiments to determine the root cause of issues such as contamination or strategic targeting of leaderboard rankings.

For Konwinski, the K Prize represents not only a superior benchmark but also a challenge to the entire industry. He emphasizes the need for a reality check in the face of inflated expectations surrounding AI technologies. Achieving a score of more than 10% on a contamination-free SWE-Bench serves as a stark reminder of the current limitations in AI development.

What's Hot

Lenovo’s new ThinkPads are built for high performance professionals

Why Iceland’s best hikes deserve a local guide

Fight Summer Slide With Free Math Games (Printable Flyer)

A new AI coding challenge just published its first results — and they aren’t pretty

Lenovo’s new ThinkPads are built for high performance professionals

Applications, Models & Real-Life Examples

Kevin Hartz’s A* just closed its third fund with $450 million

AI agents are running hospital records and factory inspections. Enterprise IAM was never built for them.

AI Learning Assistant | Teacher Picks

NBCU Academy’s The Edit | Teacher Picks

What SEL Skills Do High School Graduates Need Most? Report Lists Top Picks

Lenovo’s new ThinkPads are built for high performance professionals

Why Iceland’s best hikes deserve a local guide

Fight Summer Slide With Free Math Games (Printable Flyer)

Applications, Models & Real-Life Examples

Our Picks

Lenovo’s new ThinkPads are built for high performance professionals

Why Iceland’s best hikes deserve a local guide

Fight Summer Slide With Free Math Games (Printable Flyer)

What's Hot

A new AI coding challenge just published its first results — and they aren’t pretty

The K Prize: A New Bar for AI-Powered Software Engineers

Related Posts

Subscribe to Updates