Encryption in practice
======================

Recap of where we are: big building blocks for cryptography.
  Hashes: Authenciating data.
    CRHF.
  MAC: Authentication with shared keys.
    EUF-CMA.
  Signatures: Authentication with public keys.
    EUF-CMA.
    Constructions: hash-based, RSA-based.
  Secret-key encryption: Encryption with shared keys.
    IND-CPA security.
    Use MACs to achieve IND-CCA security.
  Key exchange (last lecture): Generate shared keys from public keys.
    Diffie-Hellman key exchange.
    Constructions: Z_p^*, elliptic curves.

Last lecture also touched on public-key encryption.
  Combine key exchange and secret-key encryption.
  Sometimes used directly by applications, esp. historically.
  Sometimes applications build their own abstraction using the above blocks.

Public-key encryption: definition.
  Gen() -> (sk, pk)
  Enc(pk, msg) -> ct
  Dec(sk, ct) -> msg (or error)

Correctness: Dec is the inverse of Enc.
  \forall (sk, pk) <- Gen(),
    \forall msg,
    Dec(sk, Enc(pk, msg)) = msg

Security: IND-CCA.
  Challenger picks a random b and generates (sk, pk).
  Adversary gets pk.
  Adversary can ask for encryptions of pairs of same-length messages;
    the challenger picks one of two messages to encrypt based on b.
  Adversary can ask for decryptions of ciphertexts of its choice, as
    long as it's not a ciphertext that it got above.
  Adversary cannot guess b.

Construction: El-Gamal public-key encryption.
  Builds on a key-exchange scheme KE,
    a shared-key encryption scheme SE,
    and a hash function Hash.

  Gen(): generate (x, g^x) using KE.Gen().
  Enc(g^x, msg):
    Choose random r (mod q, the key-exchange group order).
    Let k := Hash((g^x)^r).
    Return (g^r, SE.Enc(k, msg)).
  Dec(x, ct = (g^r, e)):
    Let k := Hash((g^r)^x).
    Return SE.Dec(k, e).

However, public-key encryption is often not used directly in applications.
  Applications often use key-exchange and other building blocks above,
    to achieve their own security goals.
  Will look at several example applications in this lecture.

Closest application to public-key encryption: messaging.
  Signal, WhatsApp, etc.
  Users exchange messages via central server.
  Messages encrypted so that server cannot decrypt contents.
  Central server typically keeps track of user public keys.

What security properties might we want from text messaging?
  Confidentiality: adversary cannot obtain message.
  Authenticity: recipient knows who the message came from.
  Forward secrecy.
    Adversary that compromises device cannot decrypt past messages.
  Post-compromise security.
    If attacker gets a snapshot of a user's device, eventually
    adversary won't be able to decrypt messages to/from that user.
  Deniability: recipient can't prove to others what he received.
  In-order delivery.
  ...
  Group messaging has lots of complex goals.

Strawman plan: public-key encryption and signatures.
  Every user has a secret/public key pair for encryption and for signatures.
    Users can query server to get another user's public keys.
  To send a message m from Alice to Bob, Alice's device sends:
    ct = Enc(PK_bob, m)
  Need to also authenticate the message as coming from Alice:
    sig = Sign(SK_alice, ct)
      OR
    sig = Sign(SK_alice, m)
  Potential problem with either plan?
    Signing ct: Eve could sign same ct, as if message came from Eve.
      E.g., "please pay my bill from cc# xxx ..."
      Eve can pretend this message came from her, without knowing m!
    Signing m: Eve could send this message to many recipients.
      Alice sent m, but signature doesn't specify who it was sent to.
  No forward-secrecy: compromising SK_bob allows decrypting old messages.
  No post-compromise security: compromising SK_bob decrypts future messages.
  No deniability: signature proves Alice sent this message.

Ratchet construction (simplified):
  Users have a long-term signing key (SK_alice, SK_bob, ...).
  Users generate key-exchange handshakes registered at the server.
    E.g., Alice generates random value a1.
    Publishes g^a1 on the server, along with Sign(SK_alice, g^a1).
  When one user wants to chat with another, requests a handshake from server.
    E.g., Bob wants to chat with Alice, gets g^a1, Sign(SK_alice, g^a1)
    Server records that handshake as now being used up.
    In practice, Alice would register multiple handshakes at the server.
  Bob generates their own key-exchange handshake random value b1.
  Bob computes shared key k11 = (g^a1)^b1 = g^(a1*b1).
  To send message m, Bob sends g^b1, Sign(SK_bob, g^b1), AEnc(k11, m).
  Alice can decrypt:
    Check signature, that this message is from Bob.
    Compute k11 = (g^b1)^a1 = g^(a1*b1).
    Decrypt and authenticate message (authenticated encryption).
  Alice can then respond with some other message m, in the same way:
    Generates a fresh a2, compute k21 = g^(a2*b1).
    Send g^a2, Sign(SK_alice, g^a2), AEnc(k21, m).

What properties does this ratchet have?
  Confidentiality.
  Authenticity.
  Deniability: cannot cryptographically prove Alice or Bob sent message m.
  Forward secrecy: if Alice and Bob forget their secrets (a1, b1, ...),
    cannot decrypt old messages anymore, because no way to compute shared key.
  Post-compromise security: Alice and Bob generate fresh randomness for each
    message, so old keys not sufficient to decrypt future messages.

Another big use of encryption: secure channels.
  Client <--[TLS]--> Server
  Uses certificates to authenticate server, and optionally client.
  Unlike secure messaging, this is relatively easy to add to applications.
    Replaces existing connection abstraction with a TLS secure channel.
    HTTPS, SMTPS, IMAPS, SIP-over-TLS, some VPN protocols, ...
  Surprisingly tricky to get right: many iterations of protocols!
    SSL 2, SSL 3, TLS 1.1, TLS 1.2, TLS 1.3.
  Many features:
    Flexible cipher selection.
    Backwards compatibility: SSL 3 client should talk to TLS 1.1 server, etc.
    Client certificates (like at MIT), but optional.
    Low-latency operation (don't want many round-trips).
    Session resumption (avoid public-key operations).
    ...

What properties might we want?
  Confidentiality.
  Authenticity: each party knows who the other one is.
  Forward secrecy.
  Freshness: connection should not be replayed.
  ...
  Turns out to be even more complex.
  Does not provide some properties that messaging tries to provide
    (e.g., post-compromise security for a given connection).

TLS 1.3 structure.
  Handshake: establish a connection, which means a shared session key.
  Record layer: send application data encrypted with session key.
  Handshake could resume at some point, e.g., to generate fresh keys.
    Perhaps after large amount of data has been sent.

TLS 1.3 RFC defines 8 security properties for the handshake.
  [ Ref: https://datatracker.ietf.org/doc/html/rfc8446#appendix-E.1 ]
  Correctness: same session keys.
  Secrecy of session keys.
  Peer authentication: certificates authenticate names, when supplied.
  Uniqueness: fresh session keys.
  Downgrade protection: same parameters as if no attacker.
  Forward secrecy.
  Key-compromise impersonation (KCI) resistance.
    If adversary has my MIT certificate, they can't trick my browser into
      incorrectly authenticating MIT servers.
  Protection of endpoint identities: only against passive adversary.

Mix of different threat models and security goals under those attacks.

Simplified TLS handshake protocol:
  Client: choose "client random"
  Client: choose r_c at random
  Client: guess what key-exchange scheme server likely supports

  C -> S: ClientHello: client random, supported ciphers, g^(r_c)

  Server: choose "server random"
  Server: choose r_s at random
  Server: pick a cipher suite to use (AES/Chacha, SHA256/SHA384, ..)

  S -> C: ServerHello: server random, chosen cipher, g^(r_s)

  Server: compute session key k = H(g^(r_c r_s))

  S -> C: AEnc(k, server certificate ||
                  Sign(SK_server, transcript so far))

  Client: compute session key k = H(g^(r_c r_s))
  Client: decrypt certificate, check that it is signed by CA.
  Client: verify server signature on protocol transcript

  S -> C, C -> S: send application data using k.
    Actually, multiple keys derived from k.
    Separate keys for authentication vs encryption.
    Separate keys for each direction of communication.

Why does this protocol achieve its goals?
  Correctness: key exchange.
  Secrecy of session keys: key exchange.
  Peer authentication: certificates, CAs, signed transcript.
  Uniqueness/freshness: client/server random.
  Downgrade protection: signed transcript.
  Forward secrecy: fresh key exchange, long-term secret used only to sign.
  KCI resistance: independently sign with client vs server keys.
  Protection of endpoint identities: cert sent under encryption.

How could we have screwed up KCI?
  Suppose each party sends their public key + certificate.
  Establish session key k = H(g^(sk_client sk_server)).
  If adversary knows sk_client: can compute k without knowing server secret key!
  Can impersonate server just by providing its certificate (public key),
    without actually knowing server's secret key.

Despite this extensive list of properties, there's things TLS doesn't provide..

Authenticated EOF.
  Streaming protocol: data sent to application as it arrives.
  What happens if a connection breaks (e.g., TCP reset or timeout)?
    TLS reports end-of-file to the application.
  Can have unexpected consequences for the application.
    E.g., curl https://sh.rustup.rs | sh
    What if the shell script has "rm -rf /tmp/installer"?
    Adversary might cut off the connection right at "tmp".
    Application will get "rm -rf /", then EOF.

Hiding data length.
  TLS does not hide length of data sent; that in itself could be problematic.
  Even worse when used in combination with compression.
  Used to be common to gzip-compress HTTP request data sent via TLS (i.e., HTTPS).
  Subtle problem: adversary-supplied data compressed together w/ app secrets.

CRIME attack.
  Victim browser visits attacker.com.
  attacker.com page tries to load an image from google.com.
    <img src="https://google.com/filename">
  Victim browser sends request to google.com:
    GET /filename HTTP/1.1
    Cookie: sessionid=secret
  attacker.com tries all possible single characters in image pathname:
    <img src="https://google.com/sessionid=a">
    <img src="https://google.com/sessionid=b">
    ...
    <img src="https://google.com/sessionid=z">
  One of them compresses by 1 more byte than others!
  Then proceed to guess the second byte:
    /sessionid=ca
    /sessionid=cb
    ..
    /sessionid=cz

Summary.
  We've covered most of the building blocks used by applications.
    Hashes, MACs, signatures, authenticated encryption, key exchange.
  Many network protocols now use cryptography over the Internet.
  Application security quite a bit more complex than these building blocks.
    Richer properties, richer threat models, ...
  Will look at open problems in next lecture.