BIP37 privacy problems
BIP_0037 is a protocol for bitcoin lightweight wallets. It was designed with privacy in mind, but the protocol has serious design flaws leading to essentially no privacy for anyone who uses it. It is equivalent to sending all the wallets addresses to a random server which can easily spy on the wallet.
There are at least three major flaws of BIP37:
- Due to a design flaw, the stated false positive rate (fp) is reduced to fp^2. For example a false positive rate of 20% will in reality produce a false positive rate of 4%
- The same wallet may produce different bloom filters. If two or more filters are collected by an adversary then the intersection can be computed which removes false positives.
- If a bloom filter is combined with information about the transaction graph on the blockchain, then false positive addresses found by the adversary can be eliminated by seeing that they dont make any transactions to any other addresses from the filter.
This last privacy break is most comprehensive. Mike Hearn[1] describes this in more eloquence:
The reason bitcoinj doesn't use the obfuscation capabilities of the Bloom filtering protocol is that lying consistently is hard. Let's elaborate on what this means. The Bloom filtering protocol lie about what it's interested in from a remote node. But anyone who ever watched a cop show knows that lying is one thing, but lying without getting caught is something else entirely. Usually in these shows, the detective cleverly puzzles out whodunnit from inconsistencies and mistakes in the suspect's story. Common problems that let the detective catch the bad guy include: constantly changing their story, telling different lies to different observers, telling lies that contain elements of the truth and so on.
Subgraph traversal
As we all know, Bloom filters allow us to match more transactions and script elements than we actually need, and in a way that the server doesn't know which we genuinely care about and which we don't. The problem starts when we realise that what we actually care about is not transactions but rather transaction chains. If we observe a payment to our address A whilst scanning the chain, we must also observe any transaction that spends that payment. Otherwise if someone cloned their wallet to another machine and we were in a watching wallet configuration, we would have no idea the money was actually spent. And it's recursive - having observed the transaction that spends the payment, if it had a change address we must also observe any transaction that spends that transaction, and so on, following the "peel chain" of transactions all the way to the bottom.
Lessons from the failure of BIP37 can be useful when designing and understanding other privacy solutions, especially with the point about data fusion of combining BIP37 bloom filter leaks with blockchain transaction information leaks.