News

Opus 4 is Anthropic’s new crown jewel, hailed by the company as its most powerful effort yet and the “world’s best coding ...
Anthropic’s latest AI model, Claude Opus 4, has surpassed OpenAI’s GPT-4.1 in coding abilities, marking a significant shift ...
As AI capabilities continue advancing, researchers are developing evaluation methods that test for genuine understanding.
Imagine asking a conversational bot like Claude or ChatGPT a legal question in Greek about local traffic regulations. Within ...
Dieselgate' scandal, new research suggests that AI language models such as GPT-4, Claude, and Gemini may change their ...
When tested, Anthropic’s Claude Opus 4 displayed troubling behavior when placed in a fictional work scenario. The model was ...
The new Gemini 2.5 Pro shows a 24-point Elo score increase on LMArena, holding a top score of 1470 and maintaining its ...
The Allen Institute of AI updated its reward model evaluation RewardBench to better reflect real-life scenarios for enterprises.
As large language models (LLMs) rapidly evolve, so does their promise as powerful research assistants. Increasingly, they’re ...
Alibaba introduces a new benchmark aimed at evaluating how well AI translation systems perform in real-world industry ...
Apple WWDC keynote is next week. You may have heard the rumours about significant redesign and naming scheme changes for iOS, ...