Artificial Image Detection using Vanishing Point Geometry 🔍 Artificial images often break the geometric rules that real cameras follow. Even if they look similar at the pixel level, these geometric differences make them easy to separate from natural photos. One example is vanishing points. Real scenes usually have one main vanishing point because the whole image comes from a single camera perspective. But synthetic images often show many vanishing points spread out. This happens because generative models build scenes piece by piece and do not enforce one consistent perspective across the whole image.
This is addresses one of my complaints with ai generated content. The other one being the lack of scale consistency in spaces, floor plans, and buildings ai visualizes. It is getting better (especially compared to early examples I experimented with a couple years ago) but this analaysis is helpful in quantifying exactly what is amiss.
this geometric vulnerability you've pointed out really shows how generative models still struggle with fundamental spatial logic. it's fascinating that while these systems can create incredibly detailed textures and lighting, they're still tripping up on basic perspective rules that our eyes naturally expect to see unified across an image.
This is a good point to consider when identifying fake content. Among the convincingly generated AI content we see today, you can often spot cases where the single vanishing point perspective is broken. However, just as AI has resolved the awkward finger problem over time, it seems likely that AI will soon begin generating content that properly reflects single vanishing point perspective as well. 😬
Some mentioned you could just use this as a constraint in your model, but I'm skeptical that would work. Vanishing point consistency is a global constraint, but diffusion models build iteratively from noise. You can't enforce it meaningfully until you have enough image coherence, but by then you're already committed to local decisions that might violate it. Computing it at every denoising step gets expensive, and the model wasn't designed to encode global geometric consistency anyway.
Really interesting! Synthetic images often have trouble keeping a consistent perspective, which can show up as multiple vanishing points. It would be cool to try a machine learning approach to catch this. You could extract the lines, find where they intersect to get vanishing points, and use that information to help a classifier spot generated images.
Interesting idea, but it feels a bit like assuming “current AI models behave this way, so they always will.” Vanishing-point inconsistency isn’t some inherent flaw of generative models; it’s just a side effect of not forcing them to follow real camera geometry. If you condition the model on a single viewpoint or enforce projective constraints, producing perfectly consistent one-point-perspective synthetic images is trivial. And detecting VPs reliably isn’t guaranteed either — real photos with wide-angle lenses, distortion, or weak line structure will fail this test just as easily. So yes, this may catch today’s models, but tomorrow they’ll pass it effortlessly. Doesn’t look like a very future-proof detector. ... says chatgpt :)
I’m skeptical about this as both evaluation criteria and training constraint, because real images and videos often have occlusions which prevent one from accurately estimating those geometric invariants
In AI generated content, I have observed inaccuracies in every aspect of their geometric construction. While they do not represent single or multiple vanishing point perspective matrices correctly, additionally AI generally does not represent the geometric accuracy of other matrices within an image, such as in lighting structure, scale consistency, lens distortion, or object consistency, and often color consistency across the value spectrum. For example, we often see objects and their cast shadows being different, reflections being unrelated to the reflector, floors of buildings being random different sizes, opposing sides of automobiles not being perfectly symmetrical, or most often of all, color saturation being inacurate across shading values. Some of this is of course harder to detect in purely organic subject matter.