Deepfake detection methods vulnerable to attack

Deepfake images can be easily manipulated to remove ‘AI fingerprints’ making it difficult to conclude if an image has been AI generated, a study has found.

Artist's impression of deepfake detection

It is also possible for hackers to change an image’s AI fingerprint so it falsely appears to come from a different model. This could be used to wrongly blame legitimate tech companies for harmful images their systems never actually created.

Edinburgh experts say improvements in AI fingerprinting techniques combined with watermarking of AI-generated images would strengthen the detection of deepfakes.

Tackling misinformation

Generative AI is now capable of creating images nearly indistinguishable from real photos, raising concerns about the use of these technologies for scams and misinformation campaigns.

One promising approach to mitigate these risks is AI fingerprinting - a group of techniques that detect unique, invisible traces that AI models leave in their images, which helps identify the specific generator that produced them. Removing fingerprints would hinder forensic investigations into deepfakes.

Scientists from the University have found that these fingerprints can be removed or manipulated using various modes of attack.

Evaluating security

In the first part of the study, a security evaluation of fingerprinting techniques for generative AI was performed. Adversarial attacks were then developed that aimed to remove or forge fingerprints across a range of threat scenarios.

These scenarios ranged from powerful attackers with full access to the inner workings of the image generator to low-resource attackers with no special access.

Fingerprint removal

The scientists simulated these attacks on 12 image generators and 14 fingerprinting methods in the largest evaluation of such techniques to date.

Many fingerprinting methods were found to achieve high accuracy in detecting unaltered deepfake images, but performance drops dramatically once the image is attacked.

Fingerprint removal was found to be highly effective, often achieving more than 80 per cent success for attackers with full knowledge of an image generator and just over 50 per cent for simple attacks with no knowledge of the generator’s inner workings.

In several cases, simple changes to an image such as JPEG compression, resizing or blurring are enough to 'smudge' the fingerprints.

Vulnerable systems

Fingerprint forgery – misrepresentation of the AI model used to generate the image – was less effective than removal overall, but half of the image generators evaluated were vulnerable to this kind of attack.

All attacks were imperceptible to the human eye, leaving no visible evidence on the images. None of the evaluated fingerprinting techniques delivered both high accuracy and resistance to attack across all threat scenarios.

Hidden signatures

By pinpointing where and why current approaches fail, it should be possible to build stronger methods of deepfake detection, especially when paired with watermarking – the process of embedding a hidden digital signature into AI-generated content.

The findings from this work were peer reviewed and will be presented at the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) in Munich. A copy of the final version of the paper is available here: https://arxiv.org/abs/2512.11771

This work was supported by the Edinburgh International Data Facility, the Data-Driven Innovation Programme and the Generative AI Laboratory, all at the University of Edinburgh.

We were surprised to find just how fragile these AI fingerprints truly are. We expected that sophisticated attacks would be effective, but seeing that simple, everyday image edits could effectively ‘smudge’ the forensic evidence was a real wake-up call. It suggests that many of the deepfake detection methods based on image fingerprinting might fail the moment an image is shared or edited in the real world.

Deploying these techniques without considering the threats they face could give a false sense of security. If fingerprinting is to be used to hold bad actors accountable, it must ensure that fingerprints cannot be easily removed or forged, as any accountability tool will itself become a target for attack. The community must therefore move beyond optimising for performance alone and incorporate adversarial robustness into their evaluation methodology.

Tackling misinformation

Evaluating security

Fingerprint removal

Vulnerable systems

Hidden signatures

Related links

Tags