GPS for GPUs

Accurately tracking supercomputer clusters around the world

Mar 08, 2023

NPTs -- or Nuclear Proliferation Treaties -- are agreements between countries that aim to prevent the spread of nuclear weapons and promote peaceful uses of nuclear energy. One critical component of a NPT is the verification of nuclear activities, which involves monitoring and inspecting nuclear facilities.

If AI is as important as nuclear, we might want a way to verify where and how where is being developed and used through identifying the locations of the various supercomputers. How many are owned by Amazon? How many have been exported to China? And so forth.

GPS has a fundamental issue where it's based on trust. A device looks up into the sky, finds 3-4 satellites, and then calculates its position based on the delay of the received signals, and then sends its location back. This is a one way conversation that can easily be spoofed. There are ways of developing a "secure location" chip which we could embed in each GPU cluster.

One example is how Apple Watch unlocks a Macbook using Bluetooth. As an added layer of security, the Mac uses speed-of-light to verify the watch is actually near the laptop. It sends an additional request to the watch via wireless. This request is then timed for arrival. Since the Mac knows the watch has to be within 3 meters, the timestamp on the packet has a very tight tolerance for delay. Theoretically, a malicious person could intercept the Bluetooth signal, and try to pretend to be the watch. However, the packet delay would be too great, so the Mac would not unlock.

This is a basic "distance bounding protocol".

Say an NVIDIA H100 cluster has been purchased by a Turkish company. We want to ensure the cluster remains in Ankara and not exported elsewhere. Since we now where the cluster is supposed to be, we also know which satellite is the nearest to it at any given time of day.

The protocol could work like this:

A specific satellite sends a cryptographically signed challenge request to the H100 cluster.
The response is then timed for arrival.
The cluster sends a response back to the satellite as fast as it can.
The satellite records the delay in the response and compares it to the expected delay based on the distance to the H100 cluster.
If the delay is too large, then the satellite sends a warning to a central server.

This solution is not foolproof. An attacker could refuse to respond or remove the chip. It is hard to guarantee an H100 cluster can't be "jailbroken", but it will let governments at least know how many "missing" clusters have been exported and where they they were last located.

Governments could ask GPU manufacturers to embed this chip in the clusters, so they can be tracked in case of misuse. This could help reduce the threat of stolen GPUs being used in illegal activities or malicious deep learning research.

[1] To mitigate an attack where the location chip is removed from the cluster, the challenge request could include a workload that has to be solved by the cluster in a predictable amount of time, which would make spoofing a little bit more tricky.

[2] There is still the gaping open issue of supervising what the cluster does (versus where it is). Fixing that will be a little harder, but perhaps the chip could also have a limited view into the cluster's memory and report any suspicious patterns or anomalies. This is obviously a much harder problem.

[3] Thank you to Yonadav Shavit for hive-minding this idea with me a few weeks ago.

Daniel's Corner

GPS for GPUs

Accurately tracking supercomputer clusters around the world