Home » Security Bloggers Network » AI Security Incident Case: Both Grok and Gemini Hallucinated When Verifying Minab Cemetery Photo

AI Security Incident Case: Both Grok and Gemini Hallucinated When Verifying Minab Cemetery Photo

by NSFOCUS on June 8, 2026

Overview

A core risk within AI security threats lies in the reliability of AI models, manifested as distorted outputs, hallucinations, and the generation of misleading content. While these issues may seem like mere technical flaws, they have evolved into tangible harms in real-world information ecosystems. AI-generated misinformation can be presented as authoritative analysis, contaminate fact-checking processes, and even be exploited to deny real-world events.

Incident Recap

In March 2026, an airstrike hit a school in Minab, southern Iran. Photographs of a cemetery for the victims began circulating widely on social media shortly afterwards. The images showed dozens of freshly dug graves arranged in orderly rows, with additional burial plots marked out in chalk on the ground. The photograph soon became one of the most direct visual records documenting civilian casualties in the conflict.

The controversy emerged when users submitted the cemetery photograph to Gemini and Grok for authenticity verification. Both models produced confident but entirely incorrect assessments.

Gemini claimed that the image had been taken two years earlier in Kahramanmaraş, Turkey, depicting mass burials following the devastating magnitude-7.8 earthquake in 2023 and bearing no connection to Iran.

Grok, meanwhile, asserted that the image originated in Jakarta, Indonesia, and was an archival photograph from large-scale COVID-19 funerals conducted during the pandemic in 2021.

Both were delivered with high confidence, both cited alleged “original sources,” and both turned out to be fabricated. Attempts to trace the referenced materials led either to nonexistent images or to linked reports nowhere to be found.

Subsequent cross-verification by researchers against satellite imagery confirmed the photos were indeed captured in Minab. Multiple images and video clips from different angles showed no signs of digital tampering.

Journalists from The Guardian later challenged Gemini regarding its assessment. After being informed that its answer was incorrect, Gemini revised its conclusion, claiming the image was taken in Gaza in November 2023. When corrected again, it suggested the photograph originated from Tehran during the COVID-19 pandemic. After yet another correction, it changed its explanation to an image taken following an earthquake in southern Iran.

Throughout the exchange, the model consistently supplied specific dates, locations, and contextual narratives. It remained highly confident at every step—yet every statement was incorrect

Threat Analysis

This case exposes more than just AI hallucinations. The real danger stems from the combination of hallucinations and the authoritative tone adopted by AI outputs.

Large language models (LLMs) generate responses based on statistical probability rather than factual verification. For tasks such as image provenance analysis, which require actual retrieval and validation capabilities, models often prioritize generating a plausible-sounding explanation over acknowledging uncertainty.

This creates a particularly hazardous failure mode: misinformation is packaged in the format of an investigative report, complete with dates, locations, contextual details, and even fabricated source references.

For many users, such formal presentation inherently lends false credibility. According to Shayan Sardarizadeh, Senior Journalist on the BBC Verify team, over half of the viral misinformation cases currently investigated by fact-checkers involve AI-generated content. Fact-checking teams increasingly face a dual burden: verifying both the original false claims and the misleading AI-generated verification results produced in response to them. Each requires independent validation from scratch.

Even more concerning, AI-generated “fact checks” can become tools for atrocity denial. When authentic photographs of casualties are labeled as fake, or when verified events are confidently dismissed as misinformation, accountability becomes obscured. Families of victims may find themselves confronted with AI-generated claims suggesting that their loved ones never died in the first place.

Findings from an international study published in 2025 highlight the severity of the issue. Approximately half of AI-generated summaries contained at least one significant sourcing or factual error, with Gemini exhibiting an error rate as high as 76%. At the same time, the number of people relying on generative AI for information retrieval doubled over the preceding year. Taken together, these trends suggest a measurable deterioration of the current information ecosystem.

Discrepancy Between Confidence and Accuracy

A structural weakness common to current AI systems is the absence of a reliable relationship between output confidence and factual accuracy. Regardless of whether an answer is correct, models frequently present conclusions with identical certainty and tone.

This issue stems from the core design of LLMs. Fundamentally, these systems are optimized to predict the most probable next token in a sequence, enabling them to generate coherent, natural, and contextually appropriate text. They are not inherently designed to verify facts or trace sources. As a result, when faced with questions beyond their knowledge scope or tasks requiring real-time retrieval and external evidence, models may still generate logically consistent and seemingly reasonable responses, rather than openly acknowledging uncertainty.

In public information scenarios, the coexistence of high confidence and low accuracy easily misleads people. Users often interpret well-structured, professionally worded, and assertive responses as verified facts, thereby amplifying both the dissemination risk and decision-making risk associated with AI hallucinations.

Recommendations

Do not treat AI fact-checking as final judgments, especially for tasks like image provenance tracing and real-time event verification that require genuine retrieval capabilities.

Specific details provided by AI systems—including dates, locations, and cited sources—must be independently verified. A well-formatted output does not guarantee factual accuracy.

Adopt a default skeptical attitude towards AI outputs.

Any conclusions generated by AI should be treated as unverified claims rather than established facts until information has been independently cross-validated through reliable sources.AI systems should also be encouraged to communicate uncertainty explicitly when confidence is low or evidence is insufficient.

For internal AI-assisted information processing within organizations, clearly labelall AI-involved segments and mandate manual review procedures.

AI-generated content must not bypass human verification, particularly when the output may be publicly disseminated or used to support fact-checking conclusions.

The Minab cemetery photograph incident is not merely an isolated case of model hallucination. It serves as a broader illustration of the growing credibility challenges facing AI-generated information. As generative AI becomes an increasingly common tool for information retrieval and fact verification, the risks associated with model outputs continue to expand. At its core, the problem lies in disconnection between AI’s output format and its factual reliability. To address this challenge, technical improvements are needed to help models better express uncertainty. On the user side, standardized workflows for manual review and independent verification must be established. AI should be treated as an auxiliary tool, not an ultimate arbiter of truth.