CS 161 Textbook Computer Security
An online version is available at https://textbook.cs161.org. Textbook by , , ,
Additional contributions by , , and Shomil Jain Last update: August 23, 2021
Contact for corrections:
It would not be too much of a stretch to say that much of today’s world is built upon the Internet. Many of the services that run on top of the Internet come with their own class of vulnerabilities and defenses to match. In particular, we will be focusing on web security, which covers a class of attacks that target web pages and web services.
17 SQL Injection
17.1 Code Injection
SQL injection is a special case of a more broad category of attacks called code injections.
As an example, consider a calculator website that accepts user input and calls eval in Python in the server backend to perform the calculation. For example, if a user types 2+3 into the website, the server will run eval(“2+3″) and return the result to the user.
If the web server is not careful about checking user input, an attacker could provide a malicious input like
2+3”); os.system(“rm -rf /
When the web server plugs this into the eval Python function, the result looks like eval(“2+3”); os.system(“rm -rf /”)
If interpreted as code, this statement causes the web server to delete all its files!
The general idea behind these attacks is that a web server uses user input as part of the code it runs. If the input is not properly checked, an attacker could create a special input that causes unintended code to run on the server.
CS 161 Notes 1 of 23
17.2 SQL Injection Example
Many modern web servers use SQL databases to store information such as user logins or uploaded files. These servers often allow users to interact with the database through HTTP requests.
For example, consider a website that stores a SQL table of course evaluations named evals: id course rating
1 cs61a 4.5
2 cs61b 4.4
3 cs161 5.0
A user can make an HTTP GET request for a course rating through a URL:
http://www.berkeley.edu/evals?course=cs61a
To process this request, the server performs a SQL query to look up the rating corresponding
to the course the user requested:
SELECT rating FROM evals WHERE course = ’cs61a’
Just like the code injection example, if the server does not properly check user input, an attacker could create a special input that allows arbitrary SQL code to be run. Consider the following malicious input:
garbage’; SELECT password FROM passwords WHERE username = ’admin
When the web server plugs this into the SQL query, the resulting query looks like
SELECT rating FROM evals WHERE course = ’garbage’;
SELECT password FROM passwords WHERE username = ’admin’
If interpreted as code, this causes the query to return the password for the admin user! 17.3 SQL Injection Strategies
Writing a malicious input that creates a syntactically valid SQL query can be tricky. Let’s break down each part of the malicious input from the previous example:
• garbage is a garbage input to the intended query so that it doesn’t return anything. • ’ closes the opening quote from the intended query. Without this closing quote, the
rest of our query would be treated as a string, not SQL code.
• ; ends the intended SQL query and lets us start a new SQL query.
• SELECT password FROM passwords WHERE username = ’admin is the malicious SQL query we want to execute. Note that we didn’t add a closing quote to ’admin, because the intended SQL query will automatically add a closing quote at the end of our input.
CS 161 Notes 2 of 23
Consider another vulnerable SQL query. This time, we have a users table that contains the username and password of every user.
When the web server receives a login request, it creates a SQL query by plugging in the username and password from the request. For example, if you make a login request with username alice and password password123, the resulting SQL query would be
SELECT username FROM users WHERE username = ’alice’
AND password = ’password123’
If the query returns more than 0 rows, the server registers a successful login.
Suppose we want to login to the server, but we don’t have an account, and we don’t know anyone’s username. How might we achieve this using SQL injection?
First, in the username field, we should add a dummy username and a quote to end the opening quote from the original query:
SELECT username FROM users WHERE username = ’alice’ ’ AND password = ’ ’
Next, we need to add some SQL syntax so that this query returns more than 0 rows (since we don’t know if alice is a valid username). One trick for forcing a SQL query to always return something is to add some logic that always evaluates to true, such as OR 1=1:
SELECT username FROM users WHERE username = ’alice’ OR 1=1 ’ AND password = ’ ’
Next, we have to add some SQL so that the rest of the query doesn’t throw a syntax error. One way of doing this is to add a semicolon (ending the previous query) and write a dummy query that matches the remaining SQL:
SELECT username FROM users WHERE username = ’alice’ OR 1=1;
SELECT username FROM users WHERE username = ’alice’ AND password = ’ ’
The second query might not return anything, but the first query will return a nonzero number of entries, which lets us perform a login. The last step is to add some garbage as the password:
SELECT username FROM users WHERE username = ’alice’ OR 1=1;
SELECT username FROM users WHERE username = ’alice’ AND password = ’garbage’
Thus, our malicious username and password should be
username = alice’ OR 1=1; SELECT username FROM users WHERE username = ’alice
password = garbage
Another trick to make SQL injection easier is the — syntax, which starts a comment in SQL.
This tells SQL to ignore the rest of the query as a comment.
In our previous example, we can instead start a comment to ignore parts of the query we don’t want to execute:
CS 161 Notes 3 of 23
SELECT username FROM users WHERE username = ’alice’ OR 1=1
–’ AND password = ’garbage’
Thus, another malicious username and password is
username = alice’ OR 1=1–
password = garbage Further reading: SQL Injection Attacks by Example
17.4 Defense: Escape Inputs
One way of defending against SQL injection is to escape any potential input that could be used in an attack. Escaping a character means that you tell SQL to treat this character as part of the string, not actual SQL syntax.
For example, the quote ’ is used to denote the end of a string in SQL. However, the escaped quote \’ is treated as a literal quote character in SQL, and it does not cause the current string to end.
By properly replacing characters with their escaped version, malicious inputs such as the ones we’ve been creating will be treated as strings, and the SQL parser won’t try to run them as actual SQL commands.
For example, in the previous exploit, if the server replaces all instances of the quote ’ and the dash – with escaped versions, the SQL parser will see
SELECT username FROM users WHERE username = ’alice\’ OR 1=1\-\-’ AND password = ’garbage’
The escaped quote won’t cause the username string to end, and the escaped dashes won’t cause a comment to be created. The parser will try to look up someone with a username alice’ OR 1=1– and find nothing.
However, we have to be careful with escaping. If an attacker inputs a backslash followed by a quote \’, the escaper might “escape the escape” and give the input \\’ to the SQL parser. The parser will treat the two backslashes \\ as an escaped backslash, and the quote won’t be escaped!
The key takeaway here is that building a good escaper can be tricky, and there are many edge cases to consider. There is almost no circumstance in which you should try to build an escaper yourself; secure SQL escapers exist in SQL libraries for almost every programming language. However, if you are running SQL statements with raw user input, escapers are often an ineffective solution, because you need to ensure that every call is properly escaped. A far more robust solution is to use parameterized SQL.
17.5 Defense: Parameterized SQL/Prepared Statements
A better defense against SQL injection is to use parameterized SQL or prepared statements. This type of SQL compiles the query first, and then plugs in user input after the query has
CS 161 Notes 4 of 23
already been interpreted by the SQL parser. Because the user input is added after the query is compiled and interpreted, there is no way for any attacker input to be treated as SQL code. Parameterized SQL prevents all SQL injection attacks, so it is the best defense against SQL injection!
In most SQL libraries, parameterized SQL and unsafe, non-paramaterized SQL are provided as two different API functions. You can ensure that you’ve eliminated all potential SQL vulnerabilities in your code by searching for every database query and replacing each API call with a call to the parameterized SQL API function.
The biggest problem with parameterized SQL is compatibility. SQL is a (mostly) generic language, so SQL written for MySQL can run on Postgres or commercial databases. Pa- rameterized SQL requires support from the underlying database (since the processing itself happens on the database side), and there is no common standard for expressing parameter- ized SQL. Most SQL libraries will handle the translation for you, but switching to prepared statements may make it harder to switch between databases.
In practice, most modern SQL libraries support parameterized SQL and prepared statements. If the library you are using does not support parameterized SQL, it is probably best to switch to a different SQL library.
Further reading: OWASP Cheat Sheet on SQL Injection
CS 161 Notes 5 of 23
18 Introduction to the Web 18.1 URLs
Every resource (webpage, image, PDF, etc.) on the web is identified by a URL (Uniform Re- source Locator). URLs are designed to describe exactly where to find a piece of information on the Internet. A basic URL consists of three mandatory parts:
http://www.example.com/index.html
The first mandatory part is the protocol, located before :// in the URL. In the example above, the protocol is http. The protocol tells your browser how to retrieve the resource. In this class, the only two protocols you need to know are HTTP, which we will cover in the next section, and HTTPS, which is a secure version of HTTP using TLS (refer to the networking unit for more details). Other protocols include git+ssh://, which fetches a git archive over an encrypted tunnel using ssh, or ftp://, which uses the old FTP (File Transfer Protocol) to fetch data.
The second mandatory part is the location, located after :// but before the next forward slash in the URL. In the example above, the location is www.example.com. This tells your browser which web server to contact to retrieve the resource.
Optionally, the location may contain an optional username, which is followed by an @ char- acter if present. For example, is a location with a username evanbot. All locations must include a computer identifier. This is usually a domain name such as www.example.com. Sometimes the location will also include a port number, such as www.example.com:81, to distinguish between different applications running on the same web server. We will discuss ports a bit more when we talk about TCP during the networking section.
The third mandatory part is the path, located after the first single forward slash in the URL. In the example above, the path is /index.html. The path tells your browser which resource on the web server to request. The web server uses the path to determine which page or resource should be returned to you.
One way to think about paths is to imagine a filesystem on the web server you’re contacting. The web server can use the path as a filepath to locate a specific page or resource. The path must at least consist of /, which is known as the “root”1 of the filesystem for the remote web site.
Optionally, there can be a ? character after the path. This indicates that you are supplying additional arguments in the URL for the web server to process. After the ? character, you can supply an optional set of parameters separated by & characters. Each parameter is usually encoded as a key-value pair in the format key=value. Your browser sends all this information to the web server when fetching a URL. See the next section for more details on URL parameters.
1It is called the root because the filesystem can be treated as a tree and this is where the tree starts.
CS 161 Notes 6 of 23
Finally, there can be an optional anchor after the arguments, which starts with a # character. The anchor text is not sent to the server, but is available to the web page as it runs in the browser.
The anchor is often used to tell your browser to scroll to a certain part of the webpage when loading it. For example, try loading https://en.wikipedia.org/wiki/Dwinelle Hall#Floor plan and https://en.wikipedia.org/wiki/Dwinelle Hall#Construction and note that your browser skips to the section of the article specified in the anchor.
In summary, a URL with all elements present may look like this:
where http is the protocol, evanbot is the username, www.cs161.org is the computer loca- tion (domain), 161 is the port, /whoami is the path, k1=v1&k2=v2 are the URL arguments, and anchor is the anchor.
Further reading: What is a URL? 18.2 HTTP
The protocol that powers the World Wide Web is the Hypertext Transfer Protocol, abbre- viated as HTTP. It is the language that clients use to communicate with servers in order to fetch resources and issue other requests. While we will not be able to provide you with a full overview of HTTP, this section is meant to get you familiar with several aspects of the protocol that are important to understanding web security.
18.3 HTTP: The Request-Response Model
Fundamentally, HTTP follows a request-response model, where clients (such as browsers) must actively start a connection to the server and issue a request, which the server then responds to. This request can be something like “Send me a webpage” or “Change the password for my user account to foobar.” In the first example, the server might respond with the contents of the web page, and in the second example, the response might be something as simple as “Okay, I’ve changed your password.” The exact structure of these requests will be covered in further detail in the next couple sections.
The original version of HTTP, HTTP 1.1, is a text-based protocol, where each HTTP request and response contains a header with some metadata about the request or response and a payload with the actual contents of the request or response. HTTP2, a more recent version of HTTP, is a binary-encoded protocol for efficiency, but the same concepts apply.
For all requests, the server generates and sends a response. The response includes a series of headers and, in the payload, the body of the data requested.
18.4 HTTP: Structure of a Request
Below is a very simple HTTP request.
CS 161 Notes 7 of 23
GET / HTTP/1.1
Host: squigler.com
The first line of the request contains the method of the request (GET), the path of the request (/), and the protocol version (HTTP/1.1). This is an example of a GET request. Each line after the first line is a request header. In this example, there are two headers, the DNT header and the Host header. There are many HTTP headers defined in the HTTP spec which are used to convey various pieces of information, but we will only be covering a couple of them through this lab.
Here is another HTTP request:
POST /login HTTP/1.1
Host: squigler.com
Content-Length: 40
Content-Type: application/x-url-formencoded
Here, we have a couple more headers and a different request type: the POST request.
18.5 HTTP: GET vs. POST
While there are quite a few methods for requests, the two types that we will focus on for this course are GET requests and POST requests. GET requests are are generally intended for “getting” information from the server. POST requests are intended for sending information to the server that somehow modifies its internal state, such as adding a comment in a forum or changing your password.
In the original HTTP model, GET requests are not supposed to change any server state. However, modern web applications often change server state in response to GET requests in query parameters.
Of note, only POST requests can contain a body in addition to request headers. Notice that the body of the second example request contains the username and password that the user alice is using to log in. While GET requests cannot have a body, it can still pass query parameters via the URL itself. Such a request might look something like this:
GET /posts?search=security&sortby=popularity
Host: squigler.com
In this case, there are two query parameters, search and sortby, which have values of security and popularity, respectively.
CS 161 Notes 8 of 23
18.6 Elements of a Webpage
The HTTP protocol is designed to return arbitrary files. The response header usually spec- ifies a media type that tells the browser how to interpret the data in the response body.
Although the web can be used to return files of any type, much of the web is built in three languages that provide functionality useful in web applications.
A modern web page can be thought of as a distributed application: there is a component running on the web server and a component running in the web browser. First, the browser makes an HTTP request to a web server. The web server performs some server-side com- putation and generates and sends an HTTP response. Then, the browser performs some browser-side computation on the HTTP response and displays the result to the user.
18.7 Elements of a Webpage: HTML
HTML (Hypertext Markup Language) lets us create structured documents with paragraphs, links, fillable forms, and embedded images, among other features. You are not expected to know HTML syntax for this course, but some basics are useful for some of the attacks we will cover.
Here are some examples of what HTML can do:
• Create a link to Google: Click me
• Embed a picture in the webpage:
• Include JavaScript in the webpage:
• Embed the CS161 webpage in the webpage:
Frames pose a security risk, since the outer page is now including an inner page that may be from a different, possibly malicious source. To protect against this, modern browsers enforce frame isolation, which means the outer page cannot change the contents of the inner page, and the inner page cannot change the contents of the outer page.
18.8 Elements of a Webpage: CSS
CSS (Cascading Style Sheets) lets us modify the appearance of an HTML page by using different fonts, colors, and spacing, among other features. You are not expected to know CSS syntax for this course, but you should know that CSS is as powerful as JavaScript when used maliciously. If an attacker can force a victim to load some malicious CSS, this is functionally equivalent to the attacker forcing the victim to load malicious JavaScript.
18.9 Elements of a Webpage: JavaScript
JavaScript is a programming language that runs in your browser. It is a very powerful language–in general, you can assume JavaScript can arbitrarily modify any HTML or CSS
CS 161 Notes 9 of 23
on a webpage. Webpages can include JavaScript in their HTML to allow for dynamic features such as interactive buttons. Almost all modern webpages use JavaScript.
When a browser rec