Anonymous Communications – CMSC 414
Anonymous Communications
CMSC 414
Why Anonymity?
Does anonymity mean you have something to hide?
What kinds of behaviors does anonymity enable?
I Illegal activities
I Harassment
I Expression of unpopular ideas
I Association with members of marginalized/dissident groups
I Looking up potentially embarassing or stigmatizing
information
I Privacy
What is Anonymity
Anonymity refers to the inability of an adversary to determine
who is communicating
We can divide this into two main types:
Sender Anonymity Cannot determine the true sender from a set of
potential senders
Receiver Anonymity Cannot determine the true receiver from a set
of potential receivers
k-anonymity means that the adversary cannot distinguish
communicant from a pool of k potential parties
Sender Anonymity
Ransom note
Flier on a utility pole
Message in a bottle
Pirate radio station
Receiver Anonymity
Coded classified ads
Number stations
TV/radio receivers (over-the-airwaves, at least…)
Public library
Dining Cryptographers
David Chaum
Given k parties, how can one transmit a message without any of
them knowing who it was?
We’ll start with 3, and generalize
Alice, Bob, and Carol want to determine who paid for dinner
anonymously (one of them or a third party) without violating any
of their anonymity
Dining Cryptographers
Each flips a coin visible to only two of
them
Each then announces XOR of coins they
see (heads=1, tails=0)
If Alice paid, she flips her bit
XOR of all announced bits ⇒ 1=some-
one paid
We call this protocol a dc-net
Alice
Bob
Carol
kA
kC
kB
Dining Cryptographers
Number the participants
A participant P transmits TP , where
TP = kP ⊕ kP+1 ⊕mP
XOR all these together
T =TA⊕TB⊕TC
= kA⊕ kB ⊕mA
⊕ kB⊕ kC⊕mB
⊕ kA ⊕ kC⊕mC
=mA⊕mB⊕mC
Alice
Bob
Carol
kA
kB
kC
Generalizing dc-nets
More than 1 bit
{k} and {m} all must be the same size
The rest of the math holds
This requires a way to generate sufficiently long shared keys
⇒ We’ve seen ways to do this
These keys need to be regenerated for every message
⇒ Significant set-up costs
Otherwise, we can eventually distinguish between mP = 0 and
mP 6= 0
Generalizing dc-nets
More than 3 participants
Basic extension is simple ⇒ pairwise keys between participants
I Pi and Pj share kij = kji
I Ti = Mi ⊕
[⊕
j kij
]
∀j 6= i
I T =
⊕
i Ti =
⊕
i Mi
For n participants, this means n(n − 1)/2 shared keys to establish
for every message, and n broadcasts (one from each participant) to
send a (possibly null) message
Can make this more efficient, but not without loss of anonymity
guarantees
Generalizing dc-nets
More than 1 message
Issues:
1. Continuing communications
2. Collisions
We solve both simultaneously
Scheduled message time slices ⇒ keys exchanges, bcasts
Any participant Pi sending a message can verify T
?= Mi
Any participant Pj not sending a message has no knowledge of
collisions
Pi detects collision ⇒ wait a random number of time slices before
trying again
If messages are infrequent, eventually will be sent without collision
Mixnets
Also David Chaum
Given NS senders, NR receivers, and a mailserver M, how can we
prevent an observer from determining what sender is
communicating with what receiver?
By now, should be evident that cryptographers are extremely
verbose very precise
We’ll use our notation, which assumes encryption and signing are
done with proper random padding, etc.
Mixnet Messages
We have a Mix, which we will call X
B → X : EX (EA(M), A)
X → A: EA(M)
X cannot actually read messages
On its own, not terribly helpful
X
B
A
E
X
(E
A
(M
),A
)
E
A
(M
)
Mixnet Messages
Combine for a lot of senders and
receivers
Can see that messages are being
sent and received
But every potential sender is
sending, and every potential re-
ceiver is receiving
Everything is encrypted
⇒ can’t connect inputs/outputs
X
B
A
E
X
(E
A
(M
),A
)
E
A
(M
)
M
essages
In
M
essages
O
ut
Mixnet Drawbacks
Mix has to wait for every sender before it can deliver
What if they don’t have messages to send?
What if two senders have a message for the same receiver?
Receiver can’t send a reply to the sender
Mix knows who pairs of communicants are, even if not what they
are communicating
Mixnet Drawbacks
No message from Si
Similar to what we saw with DC-nets
Senders can encrypt and send garbage
X can throw this out after decrypting
Do this in well-synchronized time slices
Similarly, send encrypted garbage to message-less receivers
Mixnet Drawbacks
Multiple messages for Ri
Much harder to deal with
Cannot send more than one message without revealing something
⇒ Two senders sent to the same receiver
⇒ Neither is a dummy message
Mix can hold all but one message for Ri until a later time slice
If messages are infrequent, eventually all messages will be sent
Message exchange protocol is synchronous, but actual message
delivery is asynchronous
Mixnet Drawbacks
Replies
{S} = {R}
We define a return address EX (K ′′A, A), K
′
A
K ′A and K
′′
A are ephemeral public keys, with E
′
A(x) and E
′′
A(x)
B → X : EX (EA(M, EX (K ′′B, B), K
′
B), A)
X → A: EA(M, EX (K ′′B, B), K
′
B)
A→ X : EX (K ′′B, B), E
′
B(M
′)
X → B: E ′′B(E
′
B(M
′))
Note that we need lots of padding, but we can make each message
repliable and the same length
Mixnet Drawbacks
Matching senders/receivers
We still have a single TTP
Can improve this with a cascade of
mixes
Have to wrap the encryption in
multiple layers
Outermost is first hop from sender
Innermost is last hop to receiver
X1
X2
XN
B
A
EXN (…EX2(EX1(EA(M), A), X1), X2…)
EX2(EX1(EA(M), A), X1)
EX1(EA(M), A)
EA(M)
Tor: The Onion Router
Inspired by Mixnets, with slightly different goals:
Deployability Minimal burden (technical/legal) on users, no burden
on normal servers
Usability Want as many users as possible
Flexibility Composable with new research and techniques
Simple Design Design/implementation can be validated relatively
easily
Provides low-latency anonymity, with different security model
How it Works
Alice wants to create a stream
to Bob
She sets up a circuit through Tor
Chooses the relays
⇒ How many
⇒ Can exit to Bob at any time
(leaky pipe)
Diffie-Hellman with each relay
Wraps encryptions hop-by-hop
Similar to mixnet cascade
Alice
OR1
OR2
OR3
Bob
{{{Bob, M}A,OR3}A,OR2}A,OR1
{{Bob, M}A,OR3}A,OR2
{Bob, M}A,OR3
M
{{Bob, M′}A,OR2}A,OR1
{Bob, M′}A,OR2
M′
Security Features
All relays maintain TLS connections to one another
Protects against external adversary for confidentiality/integrity
Not against timing attacks (correlate messages from Alice/to Bob)
What about malicious insider?
Add integrity checks between Alice and last relay before Bob
Generally sufficient ⇒ exit early/re-route to isolate malicious relay
Performance Features
Rate limiting
I enforces long-term average
I allows for short-term bursts
Congestion control
I prevent bottleneck overload
I distinguishes between relay/egress congestion
I both for circuits and streams
Hidden Services
Sergio wants to run a hidden
server
Selects an Introduction Point
(IP)
Advertises IP w/ KS
Alice creates a Rendezvous
Point (RP)
Hands start of D-H handshake to
RP, with IP as target
IP relays handshake message to
Sergio
Sergio
IP
RP
Alice
se
le
ct
se
tu
p
1
2
3
45
6
7
Who Knows What?
Alice
Connecting to a hidden service X, but not who runs it
Sergio
Running a hidden service X, but not who connects to
it
Introduction Point
Someone is connecting to X, but not who
Rendezvous Point
Someone is connecting to a hidden service, but not
who or what
Comparison with Mixnets
Threat Model
Mixnet assumes a Global Observer
I Sees everything
I Passive wiretapper
Tor assumes a Limited-visibility Observer
I Sees only some fraction of the network
I Active wiretapper
I Can run their own onion routers (relays)
I Can compromise existing routers
Comparison with Mixnets
Performance/Anonymity Trade-offs
Mixnet batches transmissions
I Cannot correlate senders and receivers based on message times
I Must synchronize based on slowest nodes ⇒ high latency
I Makes interactive use (web browsing) difficult; OK for email
Tor transmits immediately
I Can potentially correlate senders and receivers
I Suitable for web browsing & other latency-sensitive apps
I Still adds considerable latency, though
I Relies on enough users to provide cover even without batching
Breaking Tor’s Anonymity
Given one communicant Alice
⇒ Infeasible to determine who she is talking to
Given suspected communicants Alice and Bob
⇒ Can watch both ends and confirm based on
I Transmission/delivery times
I Message volume
Attacking Tor
Adversary can control Tor relay nodes
More nodes ⇒ more likely to be included in a circuit
DoS other relays ⇒ more likely to be included in a circuit
DoS also forces circuits to be rebuilt
⇒ Relays spend more resources on crypto handshakes
Hold exit nodes legally accountable for traffic ⇒ discourages use
Exploit misconfiguration and poor choice of options
Deanonymizing
Even with anonymizing techniques, often possible to fingerprint
users
Lots of browser/machine-specific information
I Fonts
I Screen dimensions
I Clock skew
I Browser User-Agent
I Operating system
Individually, not much information, but
I Combining these narrows potential users considerably
I Requires no client-side tricks (such as cookies)
I Private-mode browsing provides no protection
When Deanonymization is Good
I Detect sources of DoS attacks
I Identify sources of fraud
I Identify account hijackers
I Identify content scrapers
Great, but bad guys can also use this
I No user consent needed (ie, no opt-in)
I No opt-out
Fingerprinting with Fonts
Javascript lets you load fonts
Presence/absence can be sent back to website
Used frequently by analytics companies
Fifield, David, and Serge Egelman. “Fingerprinting web users through
font metrics.” International Conference on Financial Cryptography and
Data Security. Springer Berlin Heidelberg, 2015.
Boda, Károly, et al. “User tracking on the web via cross-browser
fingerprinting.” Nordic Conference on Secure IT Systems. Springer,
Berlin, Heidelberg, 2011.
Fingerprinting with Display Characteristics
Javascript can draw in
Can then retrieve pixel-based details, or at least a hash
Will depend on OS, graphics drivers, screen resolution, browser
window size, etc
Provides remarkably unique identifier
Mowery, Keaton, and Hovav Shacham. “Pixel perfect: Fingerprinting
canvas in HTML5.” Proceedings of W2SP (2012): 1-12.
Cookies
Third-party cookies can be used to track across websites
Used by advertisers
Advertising services sometimes share cookies ⇒ Cookie Synching
Zombie Cookies come back, even when deleted
Zombie Cookies
Lots of tricks for storing cookies, such as:
I Request to a bogus URL with cookie hash ⇒ read web history
I Browser stores HTTP ETags for sites that use them
(ostensibly for page freshness)
I HTML5 supports local data storage, independent of cookies
I TCP Fast Open uses a connection-based cookie for
reestablishing connections with a server
If a 3rd-party cookie is deleted, any of these can be used by server
to put them back
Evercookie is an open-source Javascript package that creates
zombie cookies
Zombie cookies can be used on their own, even when 3rd-party
cookies blocked
Cross-Device Targeting
Can we identify the same user on different devices?
Account logins
Similar search or access patterns from the same location
If you have it
I Visual characteristics
I Facial recognition
I Lighting/color patterns in environment
I Audio characteristics
I Voice recognition
I Environmental sounds
I One device emits a sound that the other picks up
Especially problematic for mobile devices
Countermeaures
To hide in a crowd, have to have a crowd:
I Common OS/browser/settings
I Using Tor puts you in a smaller crowd of a different kind
Can’t really disable canvas or fonts
Can block Javascript, but a lot of things will break
Zombie cookies are in a lot of places — Can you find/remove them
all? Does your browser let you?
Block 3rd-party cookies, but if zombified before you block them,
they might come back!