VideoGameBench Finds AI Models’ Struggles With Retro Games

🎁 Ace quick missions & earn crypto rewards while gaining real-world Web3 skills. JOIN NOW! 🔥

VideoGameBench Exposes Top AI Models’ Struggles With Retro Games

Written by Aaron S.,

Editor-In-Chief

Last Updated: April 23, 2025

Key Takeaways

VideoGameBench shows that top AI models still struggle with older games like Doom due to slow response times and poor in-game decisions;
The benchmark uses simple Game Boy and MS-DOS titles to test how well AI models can process visuals and react in real time;
AI models often miss key tasks or repeat actions, which proves they are not yet ready for fast-paced gaming challenges.

Ace quick missions & earn crypto rewards while gaining real-world Web3 skills. Participate Now! 🔥

BITDEGREE IN YOUR SOCIAL FEED

VideoGameBench, a new tool developed to test how well artificial intelligence (AI) models can play video games, has revealed that even advanced models still struggle with older, simpler ones.

The benchmark was designed to evaluate vision-language models like GPT-4o, Claude Sonnet 3.7, and Gemini 2.5 Pro using a set of 20 popular games, including Doom, Prince of Persia, and Warcraft II.

Instead of relying on code or special inputs, these models were only given the visual game screen to decide their next move. The AI takes a screenshot, analyzes it, suggests an action, and then tries to carry it out.

How to Use Crypto? 5 Rewarding Strategies Explained (Animated)

Did you know?

Want to get smarter & wealthier with crypto?

Subscribe - We publish new crypto explainer videos every week!

This delay is especially noticeable in fast-paced games like Doom, where quick reactions are key. If the AI takes too long to respond, the situation on the screen has already changed, which makes its decision outdated. For example, an enemy might have moved, or the player may already be in danger before the model responds.

According to the research team, current models are not only slow to react but also struggle with basic tasks. They often miss items, fail to interact with the environment properly, or keep repeating the same actions without making progress.

The team used older Game Boy and MS-DOS games because their simple graphics and variety of control types provide a good way to test how well models understand space and timing.

The benchmark was developed by computer scientist Alex Zhang, who explained that these games help reveal how much work is still needed before AI can play games reliably in real-time.

Meanwhile, on April 14, Meta received approval from the EU's data regulator to use public posts from its platforms to train its AI systems. What does this mean? Read the full story.