Software trust ============== General problem: how do we know that our computers are trustworthy? For this lecture: how do we know that our computers are running expected code? Partially next lecture: is our hardware trustworthy? Next module: how do we know our code isn't buggy? Software security; how to gain confidence in code. Not in this class: is our (non-buggy) code trustworthy? Specifications, formal verification, cryptographic protocols, ... Scenario: user is about to type in their password to log into their computer. What attacks might cause user's computer to be running the wrong software? Adversary tricks user into installing bad software (malware). Adversary tricks developer into including malware in library. Adversary compromises distribution process. Adversary tricks software update process. Adversary tricks user into running wrong application. ... Broadly, path of software from developer to user's computer. Developers write code. Applications incorporate many libraries. Compile software into binaries. Distribute binaries over the network. Users download software to install new application / OS. Users download updates over time. Users interact with device to launch application. Applications interact with remote servers running some code. Overall, this problem is about scoping trust. Inevitable that we will need to trust some parties that supply our software. Goal: limit who needs to be trusted. Goal: start enumerating / making explicit what the trusted components are. Goal: allow auditing if we are running the right software. Starting point: software development. For the purposes of this discussion, developers are necessarily trusted. But developers also include code from others (libraries, modules, ..) How do the developers know this library code comes from the right source? Example in Golang: import "github.com/grpc/grpc-go". URL explicitly specifies HTTPS server to contact. Server authenticated using TLS certificate. Server is trusted to provide the correct version of that library. Benefit: explicit. Clear what source is being trusted. Benefit: decentralized. Do not need to trust central package repository. Downside: server trusted to distribute software on behalf of library dev. Example in Rust: "cargo add rand" Implicit global server in charge of packages. Uses HTTPS to connect to server, checks TLS certificate. Server tracks information about packages, lets clients download them. Unspecified plan for authenticating software going into cargo repository. (Perhaps fetch via HTTPS URL, and first-come-first-serve name registration.) Benefit: centralized repository simplifies discovery. Downside: centralized repository is fully trusted. Downside: repo is a fuzzy link in trusting the source of the software package. Similar dependencies in Python, npm, etc: "pip install foo", "npm install foo". What happens for private packages, libraries? E.g., Yelp might have their own logging library, "yelp-logger". With flat namespace like Cargo, Python, npm: typically added "on the side". Don't upload these private packages to global repositories. Separate repository of private packages added by developers. Subtle issue: what happens if the same name appears in both public, private repo? Some package managers might install the wrong package. [[ Ref: https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610 ]] Precedence between repos is probably not quite right. Might want to add someone's custom repo, but not trust it for all packages. Could we do better in terms of trust assumptions? Have library developers sign their library source code. E.g., git supports signing commits in repository. Sign the hash of all files, perhaps as a hash tree following directory structure. Applications need to know developer's public key: TOFU or explicitly configured. Avoids trusting server that distributes library code. How to update libraries? Typically the application's source code tracks imported version of library. With server, can ask server if a newer version exists, download it if so. With explicit developer signatures, developer should sign version#. Not sufficient to sign software itself: do not know version. Adversary might be able to trick clients into installing older software. Common challenge in authenticating software distribution: rollback. Will see it more prominently in other places in the trust chain. Next step: building binaries from software source code. Who compiles the software to produce a binary that can be installed? In general, lose track of how source code was authenticated. Need to fully trust the machine that compiles the software. One idea that may help: reproducible builds. Make the build process deterministic: compilation, linking, dependencies, ... If builds are deterministic, could at least audit to check for consistency. Quite a bit of interest in this recently, starting to get some deployment. [[ Ref: https://go.dev/blog/rebuild ]] Tricky issue: compilers are an important dependency for this process. Compiler itself might have back doors. [[ Ref: https://dl.acm.org/doi/pdf/10.1145/358198.358210 ]] [[ Ref: https://research.swtch.com/nih ]] Some potential ideas for how to deal with this problem, but mostly not deployed. Installing software. What software should be installed on a computer? Two questions: what software to install in the first place, and how to update? Several different plans (and combinations thereof). Signatures by software developer. Software developer signs their released application (compiled binaries, etc). Doesn't help so much with the question of what to install. Unclear if developer is trustworthy or not. Unclear if we're seeing the latest version. Helps with the update question. Install updates signed by the same developer as original installed app. Track last installed version to avoid rollback (adversary giving old buggy app). For example, this is how Android applications work. Signatures by repository. Centralized organization packages up software, builds it, etc. Signature by centralized organization's key on the software packages. Somewhat helps with the question of what to install. Helps discovery (all packages in one place). Slightly helps with malicious software (but at large scale, hard to filter). Helps with identifying the latest version. Central organization signs both individual packages and overall list of versions. Can identify what's the latest version of a package (if we see latest list). Helps with updates. Check latest version in central repo, install update if it's newer. Downside: centralized trust. Compromised central repository can force devices to run arbitrary software. For example, this is how many Linux package managers work (apt, pacman, ..) Signatures on repository data; can be served by any mirror. A variant of this is how Python's pip software manager works. Instead of signing the repository data, there's a central server that speaks HTTPS. Server is trusted to respond to HTTPS queries, authenticated with TLS. Downside: More trust because main key has to be online on the server. Downside: Harder to scale (cannot add untrusted mirrors). Upside: Stronger freshness guarantees (if we get a server response). Lots of interesting considerations not mentioned here. See "The Update Framework" for an extreme case example. [[ Ref: https://theupdateframework.io/ ]] Signatures by validator. Some entity is in charge of sanity-checking software packages. Auditing for malware. Auditing for correct builds. This entity signs approved packages. Your computer requires packages to be signed by such a validator. E.g., this is how Google's Play Store works for Android apps. Many phones by default require apps signed by developer and Play Store servers. E.g., Windows driver signing. Signatures by an audit log. Instead of validator, require that installed binary be publicly visible. Goal: if you get tricked into installing malicious software, at least it should be publicly logged. Global log containing published software. Installing software requires a certificate that this software appears in global log. Can be combined with other schemes. Helps audit for potential compromises. E.g., suppose developer's signing key is compromised. Adversary can sign a malicious software update. But can't quietly install it onto victim device. Must log a copy of the malicious update on a public server. "Binary transparency." Some interest but not widely deployed. What software to run: install-time vs boot-time verification. So far we've talked about checks that happen on install or update. Downside: what if we make a mistake at install/update time? E.g., software bug or attack that installs software incorrectly? Alternative plan: secure boot. Secure boot. Verify what software is loaded on computer starting from boot. Advantage: reboot will clear any unwanted software. Advantage: user gets a stronger assurance computer is in a "good" state. Initial boot code comes from ROM, hard-wired at manufacture time. ROM code loads boot loader, checks signature on boot loader. ROM comes with public key that is used for verification. Boot loader loads OS kernel, similarly checks signature. OS kernel checks integrity of all other data it loads. One technical complication: past OS kernel, too costly to check all data. E.g., entire OS image including all libraries. Don't want to load it from disk just to check signature. Instead, signature is on the Merkle root. Check Merkle proofs when loading some data from disk later on. Effectively deferring checks. Many systems look like this secure boot story. Apple iOS devices. Game consoles (Playstation, Xbox, etc). Chrome OS Verified Boot. [[ Ref: https://www.chromium.org/chromium-os/chromiumos-design-docs/verified-boot ]] UEFI Secure Boot works. [[ Ref: https://docs.microsoft.com/en-us/windows/security/information-protection/secure-the-windows-10-boot-process ]] Complication: rollback. Boot ROM is stateless, has no idea what version it might have seen before. Naive design could be tricked into booting old OS that has security bugs. One fix: monotonic counter in hardware tracks last seen OS version. More clever trade-offs possible: see Apple's design in case study next week. Alternative: measured boot. Secure boot supposes you know what key must sign the software. What if the hardware doesn't know what software is good vs bad? Idea: measure what software gets booted. Hash the boot loader, then hash the OS kernel, then hash the OS image, etc. Cannot prevent bad software from loading, but can generate different secrets! System has some durably-stored secret in hardware (same across reboots). When system boots up, it derives a secret key based on its hardware secret. Derivation based on hash of boot loader, OS kernel, etc. OS gets secret key to decrypt its data, to authenticate to remote servers, etc. Booting different OS (e.g., due to malware corruption?) generates different key. Last consideration: what software is the user interacting with? Secure / measured boot ensures we booted correctly. But once OS boots, lots of applications might be running. How do we know what the user is interacting with? Bad application could be running and impersonating login window, etc. Could steal user's password. Idea: secure attention key (SAK). Windows: Ctrl-Alt-Del to log in. iPhone: push front button to access launcher / home screen. Applications not allowed to hijack secure attention key. OS will bring up a trustworthy home screen / launcher on SAK. Safe for users to start interacting with computer, enter secrets / passwords, .. Summary. Many trust assumptions to believe that computer is running correct software. Several techniques for carefully reasoning about this trust. Signing / authenticating software components, libraries. Reproducible builds. Signed software distribution / updates. Binary transparency. Secure boot, measured boot. Secure attention key.