Breaking Perceptual Hashing
Based on work presented in
Squint Hard Enough:
Attacking
Perceptual Hashing with Adversarial Machine Learning
Authored by
Jonathan
Prokos ,
Neil Fendley ,
Matthew Green ,
Roei Schuster ,
Eran Tromer ,
Tushar Jois ,
Yinzhi Cao
*Two Six Technologies,
†Johns Hopkins University Applied Physics Laboratory,
‡Johns Hopkins University,
§Vector Institute,
‖Tel Aviv University,
◊Columbia University,
¶City College of New York
tldr; End-to-end (e2e) encrypted messaging systems are becoming increasingly popular. Perceptual hashing is being used to detect illicit content in these systems. We show perceptual hashing is vulnerable to adversarial attacks, introducing a side-channel in these e2e systems.
Perceptual hashing?
To start, a hash function is a function that takes an arbitrary input of and produces an output of a fixed length. This is useful for compressing data or verifying its integrity. Most people are familiar with cryptographic hash functions such as SHA-256 or MD5. Cryptographic hash functions are one-way functions meaning that it is computationally infeasible to find an input that produces a given output. Additionally, these algorithms are designed to be collision resistant meaning that it is computationally infeasible to find two inputs that produce the same output. Perceptual hash functions share neither of these properties.