Software security ================= Software bugs lead to security problems. Rule of thumb: one bug per 1000 lines of code. Surprisingly often, bugs lead to security compromises. Good mindset to have: any bug can lead to a potential security exploit. Even bugs in code that might not seem to be security-critical. Another view: security requires much of the program to work correctly. This module in the class: security in the presence of software bugs. Today: overview, motivation, broad approaches. Next 3 lectures: specific techniques. What kinds of bugs might lead to security problems? Bugs can be arbitrary, so how do we make some progress here? Turns out, we (broadly speaking) have lots of experience with bugs. Common classes of bugs that programmers make, leading to security issues. Memory corruption. Used to be extremely prevalent, still reasonably common in some software. Sloppy memory operations translate into arbitrary code execution. Simple example: buffer overflow. void f() { char buf[128]; gets(buf); } What does gets() do? Keep writing input bytes to the buffer. Buffer passed as a pointer. When the end of input is reached, write a zero byte indicating end-of-string. What happens if the input is longer than 128 bytes? gets() keeps writing data, incrementing the pointer. What does that do? Depends on what's around buf in memory. Typically there is stack frame data, including return address for f. Return address determines what code gets executed when f returns. Adversary can completely control execution on f's return. What happens if the stack grows up, instead of grows down? Another return address (for gets). Adversary gets to control what code executes on return from gets. Checked example: void g() { char buf[N]; uint32_t n = get_input(); // will get n 16-byte chunks for (uint32_t i = 0; i < n; i++) { // read into buf[i*16] .. buf[i*16+15] } } What check should we add? Candidate: if (n * 16 >= N) { return; } Potential problem: what if n = 2^30? 2^30 * 16 = 0 Check passes just fine. But overflow still happens.. Many variants of memory corruption. C requires programmers to follow many rules to ensure memory safety. Easy to make a mistake in C code. Dramatic consequences. Use-after-free example: void h() { char *buf = malloc(N); int err = 0; read(0, buf, N); if (strncmp(buf, "GET", 3)) { err = 1; free(buf); } ... if (err) { printf("Error processing request: %s\n", buf); } } What might go wrong? Will print contents of buf. But buf might have been reused for something else. E.g., another code path might allocate memory for a cryptographic key. Adversary could send other concurrent requests to trigger other code paths. Could reveal sensitive data (e.g., crypto key) to adversary! Use-after-free are the most prevalent memory errors today. Either leakage of sensitive data or corruption (e.g., function pointers). Tricky to prevent with just range checks. Need lifetime checks. Either in type system at compile-time (e.g., Rust). Or in "band-aid" runtime checks (but tricky with memory re-allocation, etc). What if you write all of your code in Python? Python runtime written in C. Python modules use libraries written in C. Underlying OS kernel, etc, written in C. Some hope: newer languages like Rust provide more memory safety guarantees. Harder to make mistakes that lead to similar kinds of memory corruption. Another common category of problems: encoding / decoding. Challenging to correctly encode or decode untrusted data. Encoding example: SQL injection. Applications often store data in a SQL database. Database is accessed over a text-oriented query interface. Application formulates query, sends to database. E.g., SELECT name FROM users WHERE phone="617-253-6005" Might be used by application to look up name for a phone number. Common pattern (perhaps becoming less so): just use string concatenation. What if adversary supplies the phone number? Suppose adversary supplies a phone number of: 617" OR email="nz@mit.edu Can find name for a given email address. Or even so: 617"; DELETE FROM users Might cause database to select some name, then delete all users data. Encoding example: cross-site scripting. Web pages can contain Javascript code. Javascript code can access sensitive state in user's web browser. E.g., HTTP cookie often contains secret token to access user's login session. Web applications might embed user data when constructing web pages. Setup: Adversary --[adversary's data]--> Web application --[web page]--> Victim Suppose web application wants to include a list of user names (incl. adversary). Build up a list like this: