10.2. Authenticating on the Web

On the web, a web server is usually authenticated to users by using SSL or TLS and a server certificate - but it's not as easy to authenticate who the users are. SSL and TLS do support client-side certificates, but there are many practical problems with actually using them (e.g., web browsers don't support a single user certificate format and users find it difficult to install them). Using Java or Javascript has its own problems, since many users disable them, some firewalls filter them out, and they tend to be slow. In most cases, requiring every user to install a plug-in is impractical too, though if the system is only for local use this may be appropriate.

Many techniques don't work or don't work very well. One approach that works in some cases is to use ``basic authentication'', which is built into essentially all browsers and servers. Unfortunately, basic authentication sends passwords unencrypted, so it makes passwords easy to steal; basic authentication by itself is really useful only for worthless information. You could wrap all basic authentication passwords in an SSL/TLS connection (which would encrypt it), but this hurts performance. You could also use ``digest authentication'', which is a good technique but not universally supported by browsers. You could store authentication information in the URLs selected by the users, but for most circumstances you shouldn't do this - there are too many ways that this information can leak to others (e.g., through the browser history logs stored by many browsers, logs of proxies, and to other web sites through the Referer: field).

Thus, the most common technique for authenticating on the web today is through cookies. Cookies weren't really designed for this purpose, but they can be used for authentication - but there are many wrong ways to use them that create security vulnerabilities, so be careful. For more information about cookies, see IETF RFC 2965, along with the older specifications about them. Note that to use cookies, some browsers (e.g., Microsoft Internet Explorer 6) may insist that you have a privacy profile (named p3p.xml on the root directory of the server).

Note that some users don't accept cookies, so this solution still has some problems. Ideally, you should use send this authentication information back and forth via HTML form hidden fields (since nearly all browsers support them without concern). You'd use the same approach as with cookies - you'd just use a different technology to have the data sent from the user to the server. Naturally, if you implement this approach, you need to include settings to ensure that these pages aren't cached for use by others. However, while I think avoiding cookies is preferable, in practice these other approaches often require much more development effort. Since it's so hard to implement this on a large scale for many application developers, I'm not currently stressing these approaches. I would rather describe an approach that is reasonably secure and reasonably easy to implement, than emphasize approaches that are too hard to implement correctly (by either developers or users). However, if you can do so without much effort, by all means support sending the authentication information using form hidden fields and an encrypted link (e.g., SSL/TLS).

Fu [2001] discusses client authentication on the web, along with a suggested approach, and this is the approach I suggest for most sites. The basic idea is that client authentication is split into two parts, a ``login procedure'' and ``subsequent requests.'' In the login procedure, the server asks for the user's username and password, the user provides them, and the server replies with an ``authentication token''. In the subsequent requests, the client (web browser) sends the authentication token to the server (along with its request); the server verifies that the token is valid, and if it is, services the request.

10.2.1. Authenticating on the Web: Logging In

The login procedure is typically implemented as an HTML form; I suggest using the field names ``username'' and ``password'' so that web browsers can automatically perform some useful actions. Make sure that the password is sent over an encrypted connection (using SSL or TLS, through an https: connection) - otherwise, eavesdroppers could collect the password. Make sure all password text fields are marked as passwords, so that the password text is not visible to anyone who can see the user's screen.

When the username and password is sent, it must be checked against the user account database. This database shouldn't store the passwords ``in the clear'', since if someone got a copy of the this database they'd suddenly get everyone's password (and users often reuse passwords). Some use crypt() to handle this, but crypt can only handle a small input, so I recommend using a different approach (this is my approach - Fu [2001] doesn't discuss this). Instead, the user database should store a username, salt, and the password hash for that user. The ``salt'' is just a random sequence of characters, used to make it harder for attackers to determine a password even if they get the password database - I suggest an 8-character random sequence. It doesn't need to be cryptographically random, just different from other users. The password hash should be computed by concatenating ``server key1'', the user's password, and the salt, and then running a cryptographically secure hash algorithm. Server key1 is a secret key unique to this server - keep it separate from the password database. Someone who has server key1 could then run programs to crack user passwords if they also had the password database; since it doesn't need to be memorized, it can be a long and complex password. Most secure would be HMAC-SHA-1 or HMAC-MD5; you could use SHA-1 (most web sites aren't really worried about the attacks it allows) or MD5 (but see the discussion about MD5).

Thus, when users create their accounts, the password is hashed and placed in the password database. When users try to log in, the purported password is hashed and compared against the hash in the database (they must be equal). When users change their password, they should type in both the old and new password, and the new password twice (to make sure they didn't mistype it); and again, make sure none of these password's characters are visible on the screen.

By default, don't save the passwords themselves on the client's web browser using cookies - users may sometimes use shared clients (say at some coffee shop). If you want, you can give users the option of ``saving the password'' on their browser, but if you do, make sure that the password is set to only be transmitted on ``secure'' connections, and make sure the user has to specifically request it (don't do this by default).

Make sure that the page is marked to not be cached, or a proxy server might re-serve that page to other users.

Once a user successfully logs in, the server needs to send the client an ``authentication token'' in a cookie, which is described next.

10.2.2. Authenticating on the Web: Subsequent Actions

Once a user logs in, the server sends back to the client a cookie with an authentication token. A suggested token would look like this:

Where t is the expiration time of the token (say, in several hours), data (say, the user name or session id), and the digest is a keyed digest. Feel free to change the field name of ``data'' to be more descriptive (e.g., username or sessionid). The keyed digest should be a cryptographic hash of the expiration time and the data concatenated. If you have more than one field of data (e.g., both a username and a sessionid), make sure the digest uses both the field names and data values of all fields you're authenticating; concatenate them with a pattern (say ``%%'', ``+'', or ``&'') that can't occur in any of the field data values. The keyed digest should use HMAC-MD5 or HMAC-SHA1, using a different server key (key2). If this key2 is compromised, anyone can authenticate to the server, but it's easy to change key2 - when you do, it'll simply force currently ``logged in'' users to re-authenticate. See Fu [2001] for more details.

From then on, the server should check the expiration time and the digest of this authentication token, and only server the data if it matches. If there's no token, it could reply with the user login page (with a hidden form field to show where the successful login should go).

If you include a sessionid in the authentication token, you can limit access further. Your server could ``track'' what pages a user has seen in a given session, and only permit access to other appropriate pages (e.g., only those directly linked from those page(s)). For example, if a user is granted access to page foo.html, and page foo.html has pointers to resources bar1.jpg and bar2.png, then accesses to bar4.cgi can be rejected. You could even kill the session, though only do this if the authentication information is valid (otherwise, this would make it possible for attackers to cause denial-of-service attacks on other users). This would somewhat limit the access an attacker has, even if they successfully hijack a session, though clearly an attacker with time and an authentication token could ``walk'' the links just as a normal user would.

One decision is whether or not to require the authentication token and/or data to be sent over a secure connection (e.g., SSL). If you send an authentication token in the clear (non-secure), someone who intercepts the token could do whatever the user could do until the expiration time. Also, when you send data over an unencrypted link, there's the risk of unnoticed change by an attacker; if you're worried that someone might change the data on the way, then you need to authenticate the data being transmitted. Encryption by itself doesn't guarantee authentication, but it does make corruption more likely to be detected, and typical libraries can support both encryption and authentication in a TLS/SSL connection. In general, if you're encrypting a message, you should also authenticate it. If your needs vary, one alternative is to create two authentication tokens - one is used only in a ``secure'' connection for important operations, while the other used for less-critical operations. Make sure the token used for ``secure'' connections is marked so that only secure connections (typically encrypted SSL/TLS connections) are used. If users aren't really different, the authentication token could omit the ``data'' entirely.

Again, make sure that the pages with this authentication token aren't cached. There are other reasonable schemes also; the goal of this text is to provide at least one secure solution. Many variations are possible.

10.2.3. Authenticating on the Web: Logging Out

You should always provide users with a mechanism to ``log out'' - this is especially helpful for customers using shared browsers (say at a library). Your ``logout'' routine's task is simple - just unset the client's authentication token.