Introduction
============

Welcome to 6.1600 -- Foundations of Computer Security.
  Broadly speaking, an introduction to computer security.
  Big ideas on how to build secure computer systems.
    Cryptography, system design, software, hardware, ...

Course staff.
  Lecturers: Henry, Nickolai.
  TA: Anna.

Interactive lectures.
  Interrupt, ask questions, point out mistakes.

Course logistics.
  Five big topics, each spanning several lectures.
    Authentication.
    Transport security.
    Platform security.
    Software security.
    Human security considerations.
  Roughly two parts: applied crypto, and systems security.
  Lectures will be MW11-12:30, in 4-237.
    No recitation.
  One midterm exam, one final exam.
  6 lab assignments.
    Apply ideas from class to practical problems.
    Combination of attacks and defenses.
    Office hours for lab/lecture help.
    Lab 0 due next week: start early!
  Sign up for Piazza (link on course web site).
    Mostly questions/answers about labs.
    We will post any important announcements there.

Warning about security work/research on MITnet (and in general).
  You will learn how to attack systems, so that you know how to defend them.
  Know the rules: https://ist.mit.edu/network/rules
  Don't mess with other peoples' data/computers/networks w/o permission.
  Ask course staff for advice if in doubt.

What does it mean for a computer system to be secure?
  Secure = achieves some property despite attacks by adversaries.
  Systematic thought is required for successful defense.
    Details matter!
  High-level plan for thinking about security:
    Goal: what your system is trying to achieve.
      e.g. only Alice should read file F.
    Categories of goals: confidentiality, integrity, availability.
      Integrity: no way for the adversary to corrupt the state of the system.
      Availability: system keeps working despite the adversary.
      Confidentiality: no way for adversary to learn secret information.
    Threat model: assumptions about what the attacker can do.
      e.g. can guess passwords, cannot physically steal our server.
      Part of the plan for how to achieve goal; "ignoring" some attacks.
    Implementation: how to achieve your goal under your threat model.
      e.g. set permissions on F so it's readable only by Alice's processes.
      e.g. require a password and two-factor authentication.
      e.g. implementing user accounts, passwords, file permissions, encryption.
    An important part of implementation is deployment/operations.
      e.g. how to decomission servers, how to update software.
      e.g. how to handle situation of admin forgetting their password.
  Goal defines the security property you want to achieve.
  Threat model specifies which attacks are out of scope.
    Threat model and goal are part of the "definition" of security.
    In practice, wrong threat model or goal could lead to security problems.
    But hard to formally say that the threat model or goal is right (or wrong).
  Implementation is how your system tries to achieve security.
    Can talk about it achieving (or not achieving) some goal / threat model.

Building secure systems is hard -- why?
  Example: grades file, stored on an Athena AFS server.
    Goal: only TAs should be able to read and write the grades file.
  Easy to implement the *positive* aspect of the goal:
    There just has to be one code path that allows a TA to get at the file.
  But security is a *negative* goal:
    We want no tricky way for a non-TA to get at the file.
  There are a huge number of potential attacks to consider!
    Exploit a bug in the server's code.
    Guess a TA's password.
    Steal a TA's laptop, maybe it has a local copy of the grades file.
    Intercept grades when they are sent over the network to the registrar.
    Break the cryptographic scheme used to encrypt grades over the network.
    Trick the TA's computer into encrypting grades with the attacker's key.
    Get a job in the registrar's office, or as a TA in this class.

Achieving security requires thinking about adversary's capabilities.
  Need a mindset of considering all possible attacks an adversary might try.
  More formally speaking, security often defined as a game with an adversary.
    You get to set up the system in some way.
    Adversary gets to interact with the system, based on your threat model.
    Users concurrently keep using the system too.
    Should be no way for adversary to violate your security goal.
  Both for theory (for precise reasoning about crypto) and for system design.
  Powerful but hard-to-achieve statement: any adversary!
  Most of this lecture is about failures to help you start building this mindset.

Security is rarely perfect.
  Cannot get goal / threat model / implementation right on the first try.
  Typically must iterate:
    Design, watch attacks, update understanding of threats and goals.
    Use well-understood components, designs, etc.
    Post-mortems important to understand failures.
      Public databases of vulnerabilities (e.g., https://cve.mitre.org/)
      Encourage people to report vulnerabilities (e.g., bounty programs)
    Threat models change over time.
  Defender is often at a disadvantage in this game.
    Defender usually has limited resources, other priorities.
    Defender must balance security against convenience.
  A determined attacker can usually win!
    Defense in depth
    Recovery plan (e.g., secure backups)

What's the point if we can't achieve perfect security?
  Perfect security is rarely required.
  Make cost of attack greater than the value of the information.
    So that perfect defenses aren't needed.
  Make our systems less attractive than other peoples'.
    Works well if attacker e.g. just wants to generate spam.
  Find techniques that have big security payoff (i.e. not merely patching holes).
    We'll look at techniques that cut off whole classes of attacks.
    Successful: popular attacks from 10 years ago are no longer very fruitful.
  Sometimes security *increases* value for defender:
    Cryptography allows secure communication over the Internet (e.g., public wifi).
    VPNs might give employees more flexibility to work at home.
    Sandboxing (JavaScript) might give confidence to run software I don't fully understand.
  No perfect physical security either.
    But that's OK: cost, deterrence.
    One big difference in computer security: attacks are cheap.

What goes wrong #1: problems with the goal.
  I.e., system designer missed a whole class of problems.

Example: Business-class airfare.
  Airlines allow business-class tickets to be changed at any time, no fees.
  Is this a sufficient goal?
  Turns out, in some systems ticket could have been changed even AFTER boarding.
  Adversary can keep boarding plane, changing ticket to next flight, ad infinitum.
  Revised goal: ticket cannot be changed once passenger has boarded the flight.
    Sometimes requires changes to the system architecture.
    Need computer at the aircraft gate to send updates to the reservation system.

Example: Hardware side channels.
  CPU manufacturers (Intel, AMD, ARM, etc) provide an ISA specification.
    Goal: physical CPU implementation meets the ISA spec.
  ISA specs typically didn't talk in detail about timing, cache effects, etc.
    Didn't seem relevant: instructions produce correct result even if timing varies.
  However, timing information can leak confidential information!
    [ Ref: https://css.csail.mit.edu/6.858/2022/readings/spectre-meltdown.pdf ]
  Traditional goal of a timing-agnostic ISA might not be a good plan.
  Still an open question of how to deal with side channels, speculative execution, etc.

Example: Fairfax County, VA school system.
  [ Ref: https://catless.ncl.ac.uk/Risks/26.02.html#subj7.1 ]
  Student can access only his/her own files in the school system.
  Superintendent has access to everyone's files.
  Teachers can add new students to their class.
  Teachers can change password of students in their class.
  What's the worst that could happen if student gets teacher's password?
    Student adds the superintendent to the compromised teacher's class.
    Changes the superintendent's password, since they're a student in class.
    Logs in as superintendent and gets access to all files.
  Policy amounts to: teachers can do anything.

Example: Sarah Palin's email account.
  [ Ref: https://en.wikipedia.org/wiki/Sarah_Palin_email_hack ]
  Yahoo email accounts have a username, password, and security questions.
  User can log in by supplying username and password.
  If user forgets password, can reset by answering security Qs.
  Some adversary guessed Sarah Palin's high school, birthday, etc.
  Policy amounts to: can log in with either password *or* security Qs.
    No way to enforce "Only if user forgets password, then ..."
  Thus user should ensure that password *and* security Qs are
    both hard to guess.

Example: Mat Honan's accounts at Amazon, Apple, Google, etc.
  [ Ref: https://www.wired.com/gadgetlab/2012/08/apple-amazon-mat-honan-hacking/all/ ]
  Honan an editor at wired.com; someone wanted to break into his gmail account.
  Gmail password reset: send a verification link to a backup email address.
    Google helpfully prints part of the backup email address.
    Mat Honan's backup address was his Apple @me.com account.
  Apple password reset: need billing address, last 4 digits of credit card.
    Address is easy, but how to get the 4 digits?
  How to get hold of that e-mail?
  Call Amazon and ask to add a credit card to an account.
    No authentication required,
    presumably because this didn't seem like a sensitive operation.
  Call Amazon tech support again, and ask to change the email address on an account.
    Authentication required!
    Tech support accepts the full number of any credit card registered with the account.
    Can use the credit card just added to the account.
  Now go to Amazon's web site and request a password reset.
    Reset link sent to the new e-mail address.
  Now log in to Amazon account, view saved credit cards.
    Amazon doesn't show full number, but DOES show last 4 digits of all cards.
    Including the account owner's original cards!
  Now attacker can reset Apple password, read gmail reset e-mail,
    reset gmail password.
  Lesson: attacks often assemble apparently unrelated trivia.
  Lesson: individual policies OK, but combination is not.
    Apple views last 4 as a secret, but many other sites do not.
  Lesson: big sites cannot hope to identify which human they are talking to;
    at best "same person who originally created this account".
    security questions and e-mailed reset link are examples of this.

Example: Verifying domain ownership for TLS certificates.
  Browser verifies server's certificate to ensure talking to the right server.
  Certificate contains server's host name and cryptographic key,
    signed by some trusted certificate authority (CA).
  Browser has CA's public key built in to verify certificates.
  CA is in charge of ensuring that certificate is issued only to
    legitimate domain (hostname) owner.
  Typical approach: send email to the contact address for a domain.
  Some TLDs (like .eu) do not reveal the contact address in ASCII text.
    Most likely to prevent spam to domain owners.
  Instead, they reveal an ASCII image of the email address.
  One CA (Comodo) decided to automate this by OCR'ing the ASCII image.
  Turns out, some ASCII images are ambiguous!
    E.g., foo@a1telekom.at was mis-OCRed as foo@altelekom.at
    Adversary can register mis-parsed domain name, get certificate for
      someone else's domain.
  [ Ref: https://www.mail-archive.com/dev-security-policy@lists.mozilla.org/msg04654.html ]

Goals often go wrong in "management" or "maintenance" cases.
  Who can change permissions or passwords?
  Who can access audit logs?
  Who can access the backups?
  Who can upgrade the software or change the configuration?
  Who can manage the servers?
  Who revokes privileges of former admins / users / ...?

What goes wrong #2: problems with threat model / assumptions.
  I.e. designer assumed an attack wasn't feasible (or didn't think of the attack).

Example: users will not give their two-factor authentication codes to adversary.
  Two-factor authentication defends against password compromises.
    E.g., authenticator app (TOTP), code sent via SMS or email, hardware token, ..
  Assumes user will keep their codes secret.
    Only enter the code into the legitimate application or web site.
  Adversary can try to confuse / trick the user into giving out their code.
    User doesn't have a good way to identify legitimate web site from adversary.
    Especially if adversary asks over the phone rather than via web site.
  [ Ref: https://www.vice.com/en/article/y3vz5k/booming-underground-market-bots-2fa-otp-paypal-amazon-bank-apple-venmo ]

Example: information availability changes over time.
  Used to be difficult to learn personal information about an individual.
    So, "security questions" for password reset were a reasonable thing.
  Nowadays easy to find information about someone online (e.g., Facebook).

Example: computational assumptions change over time.
  MIT's Kerberos system used 56-bit DES keys, since mid-1980's.
  At the time, seemed fine to assume adversary can't check all 2^56 keys.
  No longer reasonable: now costs about $100.
    [ Ref: https://www.cloudcracker.com/dictionaries.html ]
    Several years ago, 6.858 final project showed can get any key in a day.

Example: distributed DoS attacks.
  Availability is challenging in an open setting.
  Rate-limiting by IP address seemed like a plausible plan at some point.
  However, adversaries often control many compromised computers or devices.
  Easy for adversary to perform a DoS attack from a million sources.
  Makes it much less effective to do IP-level rate-limiting.

Example: assuming a particular kind of a solution to the problem.
  Many services use CAPTCHAs to check if a human is registering for an account.
    Requires decoding an image of some garbled text, for instance.
  Goal is to prevent mass registration of accounts to limit spam, prevent
    high rate of password guessing, etc.
  Assumed adversary would try to build OCR to solve the puzzles.
    Good plan because it's easy to change image to break the OCR algorithm.
    Costly for adversary to develop new OCR!
  Turns out adversaries found another way to solve the same problem.
    Human CAPTCHA solvers in third-world countries.
    Human solvers are far better at solving CAPTCHAs than OCRs or even regular users.
    Cost is very low (fraction of a cent per CAPTCHA solved).
  [ Ref: https://www.cs.uic.edu/pub/Kanich/Publications/re.captchas.pdf ]

Example: assume the design/implementation is secret
  "Security through obscurity."
  Clipper chip
    [ Ref: https://en.wikipedia.org/wiki/Clipper_chip ]
  Broken secret crypto functions

Example: most users are not thinking about security.
  User gets e-mail saying "click here to renew your account",
    then plausible-looking page asks for their password.
  Or dialog box pops up with "Do you really want to install this program?"
  Or tech support gets call from convincing-sounding user to reset password.

Example: all TLS CAs are fully trusted.
  If attacker compromises CA, can generate fake certificate
    for any server name.
  Originally there were only a few CAs; seemed unlikely that
    attacker could compromise a CA.
  But now browsers fully trust 100s of CAs!
  In 2011, two CAs were compromised, issued fake certs for many domains
    (google, yahoo, tor, ...), apparently used in Iran (?).
    [ Ref: https://en.wikipedia.org/wiki/DigiNotar ]
    [ Ref: https://en.wikipedia.org/wiki/Comodo_Group ]
  In 2012, a CA inadvertently issued a root certificate valid for any domain.
    [ Ref: http://www.h-online.com/security/news/item/Trustwave-issued-a-man-in-the-middle-certificate-1429982.html ]
  Several other high-profile incidents since then too.
  Mistake: maybe reasonable to trust one CA, but not 100s.

Example: assuming you are running the expected software.
  1. In the 80's, military encouraged research into secure OS'es.
    Surprise: successful attacks by gaining access to development systems
    Mistake: implicit trust in compiler, developers, distribution, &c
  2. Apple's development tools for iPhone applications (Xcode) are large.
    Downloading them from China required going to Apple's servers outside of China.
    Takes a long time.
    Unofficial mirrors of Xcode tools inside China.
    Some of these mirrors contained a modified version of Xcode that injected malware
      into the resulting iOS applications.
    Found in a number of high-profile, popular iOS apps!
      [ Ref: https://en.wikipedia.org/wiki/XcodeGhost ]
  Classic paper: Reflections on Trusting Trust.

Example: assuming users can unambiguously understand the UI.
  [ Ref: https://en.wikipedia.org/wiki/IDN_homograph_attack ]
  [ Ref: https://www.trojansource.codes/trojan-source.pdf ]

Example: decomissioned disks.
  Many laptops, desktops, servers are thrown out without deleting sensitive data.
  One study reports large amounts of confidential data on disks bought via ebay, etc.
  [ Ref: https://simson.net/page/Real_Data_Corpus ]

Example: software updates.
  Apple iPhone software updates vs FBI.
    [ Ref: https://www.apple.com/customer-letter/ ]
  Chrome extensions bought by malware/adware vendors.
    [ Ref: https://arstechnica.com/security/2014/01/malware-vendors-buy-chrome-extensions-to-send-adware-filled-updates/ ]
  Node.js library updated to include code that steals Bitcoin keys.
    [ Ref: https://www.theregister.co.uk/2018/11/26/npm_repo_bitcoin_stealer/ ]

Example: machines disconnected from the Internet are secure?
  Stuxnet worm spread via specially-constructed files on USB drives.

Example: assuming your hardware is trustworthy.
  If NSA is your adversary, turns out to not be a good assumption.
  [ Ref: https://www.schneier.com/blog/archives/2013/12/more_about_the.html ]

What to do about threat model problems?
  More explicit threat models, to understand possible weaknesses.
  Simpler, more general threat models.
    E.g., should a threat model assume that system design is secret?
    May be incrementally useful but then hard to recover.
    Probably not a good foundation for security.
  Better designs may eliminate / lessen reliance on certain assumptions.
    E.g., alternative trust models that don't have fully-trusted CAs.
    E.g., authentication mechanisms that aren't susceptible to phishing.
  Defense in depth (good idea for problems w/ goals and implementations too).
    Compensate for possibly having the wrong threat model.
    Provide different levels of security under different levels of assumptions.
    E.g., audit everything in case your enforcement threat model was wrong.
      Ideally the audit system has a simpler, more general threat model.
    E.g., enforce coarse-grained isolation between departments in company,
      even if fine-grained permissions get misconfigured by admins.

What goes wrong #3: problems with the implementation -- bugs or misconfigurations.
  Bugs routinely undermine security.
    Rule of thumb: one bug per 1000 lines of code.
    Bugs in implementation of security policy.
    But also bugs in code that may seem unrelated to security, but they are not.
      Good mindset: Any bug is a potential security exploit.
      Especially if there is no isolation around the bug.

Example: buffer overflows.
  You have already seen this in 6.033.
  Mistakes in handling of buffers in C code can allow arbitrary code execution.
  Buffer overflow lessons:
    Bugs are a problem in all parts of code, not just in security mechanism.
    Everything else may be irrelevant if the implementation has a bug.
    But stay tuned; there is hope for the defense.
    Used to be the main attack vector, but things have gotten better.

Example: Apple's iCloud password-guessing rate limits.
  [ Ref: https://github.com/hackappcom/ibrute ]
  People often pick weak passwords; can often guess w/ few attempts (1K-1M).
  Most services, including Apple's iCloud, rate-limit login attempts.
  Apple's iCloud service has many APIs.
  One API (the "Find my iPhone" service) forgot to implement rate-limiting.
  Attacker could use that API for millions of guesses/day.
  Lesson: if many checks are required, one will be missing.

Meta-example: Insecure defaults.
  Well-known default passwords in routers.
  Public default permissions in cloud services (e.g., objects in AWS S3 bucket).
  Secure defaults are crucial because of the "negative goal" aspect.
    Large systems are complicated, lots of components.
    Operator might forget to configure some component in their overall system.
    Important for components to be secure if operator forgets to configure them.

Example: MS office document signing.
  [ Ref: https://www.usenix.org/system/files/sec23summer_235-rohlmann-prepub.pdf ]
  Microsoft Office provides a way to cryptographically sign documents (OOXML).
    Stronger guarantee than just including a scanned pen-and-paper signature.
    ISO standard for how to sign documents (OOXML Signatures).
  Complex document format, complex signature verification process.
  One bug: MS Office doesn't check that signed document is the top-level document.
    Documents can embed other content, including from other documents.
    Adversary constructs a document of their choice, embeds victim's signed doc inside.
      Victim document might not even be visible when viewing adversary's document.
    Adversary's document uses victim's signature on the embedded doc.
    Buggy MS Office implementation shows the document as correctly signed!
  Another bug: MS Office for Mac always displays "Document protected by digital signature".
    As long as there's a signature, doesn't matter what the signature signs, etc.
    Hard to track down such implementation bugs: tricky to write tests for different attacks.

Example: Missing access control checks in Citigroup's credit card web site.
  [ Ref: https://www.nytimes.com/2011/06/14/technology/14security.html ]
  Citigroup allowed credit card users to access their accounts online.
  Login page asks for username and password.
  If username and password OK, redirected to account info page.
  The URL of the account info page included some numbers.
    e.g. x.citi.com/id=1234
  The numbers were (related to) the user's account number.
  Adversary tried different numbers, got different people's account info.
  The server didn't check that you were logged into that account!
  Lesson: programmers tend to think only of intended operation.

Example: poor randomness for cryptography.
  Need high-quality randomness to generate the keys that can't be guessed.
  Incorrect implementation in Sony PlayStation 3.
    [ Ref: https://arstechnica.com/gaming/2010/12/ps3-hacked-through-poor-implementation-of-cryptography/ ]
    Did not use fresh randomness for signatures.
  Debian accidentally "disabled" randomness in the OpenSSL library.
    [ Ref: https://www.debian.org/security/2008/dsa-1571 ]
    The randomness was initialized using C code that wasn't strictly correct.
    Program analysis tool flagged this as a problem.
    Debian developers fixed the warning by removing the offending lines.
    Everything worked, but turned out that also prevented seeding the PRNG.
      A Pseudo-Random Number Generator is deterministic after you set the seed.
      So the seed had better be random!
    API still returned "random" numbers but they were guessable.
    Adversary can guess keys, impersonate servers, users, etc.
  Android's Java SecureRandom weakness leads to Bitcoin theft.
    [ Ref: https://bitcoin.org/en/alert/2013-08-11-android ]
    [ Ref: https://www.nilsschneider.net/2013/01/28/recovering-bitcoin-private-keys.html ]
    Bitcoins can be spent by anyone that knows the owner's private key.
    Many Bitcoin wallet apps on Android used Java's SecureRandom API.
    Turns out the system sometimes forgot to seed the PRNG!
    As a result, some Bitcoin keys turned out to be easy to guess.
    Adversaries searched for guessable keys, spent any corresponding bitcoins.
      Really it was the nonce in the ECDSA signature that wasn't random,
      and repeated nonce allows private key to be deduced.
    Lesson: be careful.
    Lesson: prefer crypto that has fewer randomness requirements.
      E.g., EdDSA does not require randomness for signing.
  Embedded devices generate predictable keys.
    Problem: embedded devices, virtual machines may not have much randomness.
    As a result, many keys are similar or susceptible to guessing attacks.
    [ Ref: https://factorable.net/weakkeys12.extended.pdf ]
  Casino slot machines.
    [ Ref: https://www.wired.com/2017/02/russians-engineer-brilliant-slot-machine-cheat-casinos-no-fix/ ]

Example: Moxie's SSL certificate name checking bug
  [ Ref: https://www.wired.com/2009/07/kaminsky/ ]
  Certificates use length-encoded strings, but C code often is null-terminated.
  CAs would grant certificate for amazon.com\0.nickolai.org
  Browsers saw the \0 and interpreted as a cert for amazon.com
  Lesson: parsing code is a huge source of security bugs.

Example: loss of important cryptographic keys.
  [ Ref: https://arstechnica.com/gadgets/2022/12/samsungs-android-app-signing-key-has-leaked-is-being-used-to-sign-malware/ ]

What to do about implementation bugs?
  Keep the implementation simple.
  Factor out the security-critical parts from the rest of the code.
    But make sure the factoring is sensible.
    See buffer overflows for how "non-security-critical" bugs still matter.
  Reuse well-designed implementations, tools, libraries, etc.
  Understand corner cases of systems you are using.
    Take adversarial mindset when evaluating dependencies you want to use.