A Bloom filter is a probabilistic data structure that
approximately and efficiently answers the *set membership*
question. Given a Bloom filter B constructed out of the elements
of a set , returns or ,
indicating the membership of in .
Bloom filters are *approximate* since they can return
false positives. The false positive rate for a Bloom filter
can be expressed by:

An interesting question is:
*Can Bloom filters be used to compute whether two sets are disjoint?*

That is, given a set represented by a Bloom filter , and another set , can we use the Bloom filter to accurately determine whether ? An obvious procedure to compute disjointness is:

Procedure :Iterate through each element of the set . If for any element , then return , else return .

Before we can reason about the accuracy of this procedure, let’s define the following events:

- : the event that and are disjoint (and to be its complement), and
- : the event that is accurate (and to be its complement).

Now, we make the following claims about our procedure :

Claim 1: .

In other words, our procedure is guaranteed to be correct when
and are *not* disjoint.
This claim follows easily from the fact that a Bloom filter cannot have
false negatives, *i.e.*, if does exist in , then will
return .

Claim 2: .

In other words, *can* be inaccurate when
and are disjoint.
Specifically, this procedure can be inaccurate if
the Bloom filter returns for at least one element in ,
given that no element in belongs to .
But *how* inaccurate?
It is often easier to compute the probability of the complementary event
,
which is the event that the Bloom filter returns for every element of .
The probability that the Bloom filter returns for a given element
can be expressed by:

Assuming independence of Bloom filter output for the different elements of , and assuming that has elements, we can write:

Further, notice that the likelihood of the sets being disjoint has nothing whatsoever to do with the Bloom filter accuracy (I like to picture it as: ). In other words, and are independent! If we wanted to be explicit, we could apply Bayes’ Theorem:

from which it follows that:

and:

Since , the inaccuracy grows with .
This makes intuitive sense since the larger the set ,
the more the chances that we will run into a Bloom filter false positive.
In plain English: Bloom filters are probably **not** a good idea for computing
whether two sets are disjoint.