Based on work presented in Squint Hard Enough:
Attacking Perceptual Hashing with Adversarial Machine Learning

Authored by Jonathan Prokos*, Neil Fendley, Matthew Green,
Roei Schuster§, Eran Tromer, Tushar Jois, Yinzhi Cao

*Two Six Technologies, Johns Hopkins University Applied Physics Laboratory,
Johns Hopkins University, §Vector Institute, Tel Aviv University, Columbia University, City College of New York

USENIX Security 2023 Presentation

tldr; End-to-end (e2e) encrypted messaging systems have become widely adopted. In a hurry to provide content moderation, providers have begun using perceptual hash functions to detect illicit content. However, we show perceptual hashing is vulnerable to adversarial attacks, introducing a side-channel in these e2e systems.


What is Perceptual Hashing?

To start, a hash function is a function that takes arbitrary input and produces an output of a fixed length. This is useful for compressing data or verifying its integrity. Most people are familiar with cryptographic hash functions such as SHA-256 or MD5. Cryptographic hash functions are one-way functions meaning that it is computationally infeasible to find an input that produces a given output. Additionally, these algorithms are designed to be collision resistant meaning that it is computationally infeasible to find two inputs that produce the same output. Perceptual hash functions (PHF) do not share these cryptographic guarantees.

Although we show that perceptual hash functions are vulnerable to adversarial attacks, they are still useful for their intended purpose. The perceptual part of the name alludes to its design to embed image semantics into the resultant digest. This allows the hash to be locality sensitive. To understand what this means, observe the example below:

PHF: Perceptual Hash Function (PhotoDNA) | SHA: SHA-256, a Cryptographic Hash Function

The table above shows the input (top) and output (table content) for two types of hash functions (left). The input is a set of cat images where cat1' is an occluded version of cat1 with a red square placed ontop using ms paint.

As perceptual hash functions are designed to be locality sensitive, the red square does not affect its output much as it is not semantically important. However, the cryptographic hash function is not locality sensitive and thus the red square affects its output significantly. This is the key property of perceptual hash functions that makes them useful for lots of applications.


How are PHFs used?

Cryptographic hash functions - such as SHA-256 shown above - have lots of applications in privacy and security. However,

Imagine the scenario above; client A wants to send an image to client B using some provider (Facebook in this example). Before sending this content across its network, the provider will first check the image for illicit content.

The provider receives the image from client A and computes its perceptual hash. This hash is compared to a database of known illicit hashes; if a match is found, the provider will not send the image to client B. Otherwise, if no match is found the provider will send the image to client B.

This setup works well for detecting known illicit content, but what if we want to capture unknown illicit content? To achieve this, providers utilize neural networks to classify any arbitrary image as illicit before sending it along to client B.

More updates coming soon...


Below you can find two gifs showing the progress of our attack over time when utilizing PhotoDNA or PDQ as the target perceptual hash function. Attack progression is shown on the left and the target image & its hash is shown on the right.

PhotoDNA
PDQ

USENIX Slides (there are speaker notes!)


The following figures are pulled from our USENIX submission. These will be included in the soon to come ePrint update.

Figure 1 shows the results of our targeted second-preimage attack at both low and high threshold levels. Figure 2 shows our detection avoidance attack at the baseline collision threshold as well as at a significantly higher threshold. Figures 3 & 4 show the progress of our targeted second-preimage attack as the collision threshold is lowered. Figures 5 & 6 show a comparison of the performance of this attack using various gradient optimization methods.