Looking for expert insights delivered directly to your inbox? Subscribe to our weekly newsletters for the latest updates on enterprise AI, data, and cybersecurity. Join Now
Nous Research, an innovative artificial intelligence startup known for its contributions to the open-source AI movement, recently unveiled Hermes 4. This new family of large language models is touted to rival leading proprietary systems in performance while offering unparalleled user control and minimal content restrictions.
The launch of Hermes 4 signifies a significant development in the ongoing debate between open-source AI proponents and major tech companies over the ownership of advanced AI capabilities. In contrast to models from industry giants like OpenAI, Google, or Anthropic, Hermes 4 is designed to respond to a wide range of requests without the typical safety constraints found in commercial AI systems.
Nous Research introduces Hermes 4, the latest in hybrid reasoning models. https://t.co/E5EW9hBurb
Hermes 4 enhances user-aligned models with expanded test-time compute capabilities.
Special focus was placed on making the models engaging and creative to interact with, free from censorship, and neutrally aligned while maintaining cutting-edge math, coding, and reasoning performance for open weight models.
“Hermes 4 builds on our legacy of user-aligned models with expanded test-time compute capabilities,” announced Nous Research on X (formerly Twitter). “Special attention was given to making the models creative and interesting to interact with, unencumbered by censorship, and neutrally aligned while maintaining state of the art level math, coding, and reasoning performance for open weight models.”
Enhanced Performance with Hybrid Reasoning
Hermes 4 introduces what Nous Research refers to as “hybrid reasoning,” allowing users to switch between quick responses and more in-depth, step-by-step thinking processes. By activating this feature, the models generate internal reasoning within special <think> tags before delivering a final answer, providing transparency into the AI’s thinking process.
AI Scaling Challenges
Energy constraints, escalating token costs, and inference delays are reshaping enterprise AI. Join our exclusive discussion to explore how top teams are:
- Transforming energy into a strategic asset
- Designing efficient inference for tangible throughput gains
- Achieving competitive ROI with sustainable AI systems
Secure your spot to stay ahead: https://bit.ly/4mwGngO
The technical achievement of Hermes 4 is remarkable. During testing, the largest 405-billion parameter model scored 96.3% on the MATH-500 benchmark in reasoning mode and 81.9% on the challenging AIME’24 mathematics competition, showcasing performance that rivals or surpasses many costly proprietary systems.
According to AI researcher Rohan Paul on X, one of the technical breakthroughs in the release was the challenge of making thinking traces useful and verifiable without leading to endless reasoning loops.
Notably, Hermes 4 achieved the highest score among all models tested on “RefusalBench,” a new benchmark introduced by Nous Research to measure AI systems’ reluctance to answer questions. The model scored 57.1% in reasoning mode, significantly outperforming GPT-4o (17.67%) and Claude Sonnet 4 (17%).
Unveiling DataForge and Atropos: Revolutionary Training Systems Powering Hermes 4
Behind the exceptional capabilities of Hermes 4 lies a sophisticated training infrastructure developed by Nous Research over several years. The models underwent training using two innovative systems: DataForge, a graph-based synthetic data generator, and Atropos, an open-source reinforcement learning framework.
DataForge generates training data through “random walks” on directed graphs, converting simple pre-training data into intricate instruction-following examples. For instance, it can transform a Wikipedia article into a rap song and then generate questions and answers based on this transformation.
Atropos functions as a series of specialized training environments where AI models practice various skills, such as mathematics, coding, tool usage, and creative writing. Feedback is provided only when correct solutions are produced, ensuring that high-quality responses are included in the training data through a “rejection sampling” approach.
Atropos is Nous’ Reinforcement Learning framework
Atropos is an open-source reinforcement learning environment by Nous that features numerous “gyms” for training and evaluating LLM trajectories across scalable, asynchronous RL loops.
In simpler terms… pic.twitter.com/fjxaQKClEZ
“Nous utilized these environments to generate the dataset for Hermes 4!” explained Tommy Shaughnessy, a venture capitalist at Delphi Ventures and an investor in Nous Research. “The dataset contains 3.5 million reasoning samples and 1.6 million non-reasoning samples! Hermes was trained on RL data, not just static question and answer datasets!”
The training process involved 192 Nvidia B200 GPUs and 71,616 GPU hours for the largest model, showcasing a significant computational investment that highlights how specialized techniques can compete with the immense scale of tech industry giants.
Challenging AI Safety Norms: The Philosophy of Nous Research
Nous Research has established itself on a philosophy that prioritizes user control over corporate content regulations. The company’s models are designed to be “steerable,” allowing for fine-tuning and specific behavioral prompts without the rigid safety constraints prevalent in commercial AI systems.
“Hermes 4 is not constrained by disclaimers, rules, and excessive caution, which can impede innovation and usability,” wrote Shaughnessy in an in-depth analysis. “Being open-source but refusing all requests renders it pointless. This is not an issue with Hermes 4.”
Hermes 4 is not shackled by disclaimers, rules, and being overly cautious which is annoying as hell and hurts innovation and usability.
Hermes 4 70B is at the complete opposite of the spectrum vs OpenAI’s open source model. It’s also ~4x more open vs ChatGPT 4o!
If its open… pic.twitter.com/q5RpX1oOzo
This approach has garnered Nous Research significant support among AI researchers and developers seeking maximum flexibility. However, it has also placed the company in the midst of ongoing discussions about AI safety and content moderation. While the models could potentially be misused, Nous Research argues that transparency and user control are preferable to corporate restrictions.
The company’s technical report, released alongside the models, offers unprecedented insight into the training process, evaluation outcomes, and actual text outputs from benchmark assessments. “We believe this report sets a new benchmark for transparency in evaluation,” the company stated.
Competing Against Tech Giants with Innovative Approaches
The introduction of Hermes 4 arrives at a crucial juncture in the AI industry. While major tech companies have invested heavily in developing advanced AI systems, the rise of the open-source movement argues that these capabilities should not be monopolized by a few corporations.
Recent advancements in open-source AI, including models like Meta’s Llama 3.1, DeepSeek’s R1, and Alibaba’s Qwen series, have demonstrated performance on par with proprietary systems. Hermes 4 represents another leap in this progression, particularly in the realm of reasoning, a domain traditionally dominated by closed systems like OpenAI’s o1.
“Firstly, Nous is a startup with numerous highly skilled individuals,” noted Shaughnessy. “They do not possess the $100b+ annual capex spending of large tech companies or thousands of employees, yet they continue to deliver innovative models and research at a remarkable pace.”
The startup, which secured $65 million in funding earlier this year led by Paradigm, is also working on Psyche Network, a decentralized training system aiming to coordinate AI training across internet-connected devices using blockchain technology.
Addressing Endless Thinking Loops: Hermes 4’s Technical Breakthrough
One of the notable technical achievements of Hermes 4 tackles a common issue in reasoning models: prolonged thinking processes leading to endless loops. The researchers discovered that their smaller 14-billion parameter model would reach maximum context length 60% of the time during reasoning, essentially getting stuck in perpetual thought loops.
Their solution involved a secondary training phase that instructs models to cease reasoning at precisely 30,000 tokens, reducing excessively long generation by 65-79% while retaining most of the reasoning performance. This “length control” technique holds promise for the broader AI research community.
“Smaller models (<14B) tend to overthink when distilled, but larger models don’t,” observed AI researcher Muyu He on X, citing insights from the technical report.
However, despite impressive benchmark results, Hermes 4 still encounters limitations common to open-source models. While demanding significant computational resources to operate, the models may not match the user-friendliness or reliability of commercial AI services for many applications.
Exploring Hermes 4 and Comparing Costs
Nous Research offers access to Hermes 4 through various channels, reflecting its open-source ethos. Model weights are available for free download on Hugging Face, and the company provides API access through its revamped chat interface and collaborations with inference providers like Chutes, Nebius, and Luminal.
“You can experience Hermes 4 in the new, redesigned Nous Chat UI,” the company announced, highlighting features such as parallel interactions and a memory system.
For enterprise users and researchers, these models present a compelling alternative to paying for API access to proprietary systems, especially for applications requiring extensive customization or handling of sensitive content.
Looking Ahead: Implications of Hermes 4 for AI Development
The launch of Hermes 4 signifies more than just the introduction of a new AI model—it reflects a stance on the future of artificial intelligence. In an industry dominated by a few tech behemoths with vast resources, Nous Research has demonstrated that innovation can emerge from unexpected sources.
The company’s approach raises fundamental questions about the balance between safety and capability, corporate control and user empowerment. While major tech firms argue for cautious content moderation and safety measures in AI deployment, Nous Research contends that transparency and user autonomy outweigh corporate constraints.
Whether this philosophy proves beneficial or problematic remains to be seen. Nevertheless, one thing is certain: Hermes 4 underscores that the future of AI will not be dictated solely by entities with deep pockets.
In an industry where what was once deemed impossible becomes commonplace, Nous Research has shown that perhaps the only thing more consequential than an AI that declines is one that readily accepts.
