In ``ordinary'' proxying, the client specifies the hostname and port number of a proxy in his web browsing software. The browser then makes requests to the proxy, and the proxy forwards them to the origin servers. This is all fine and good, but sometimes one of several situations arise. Either
This is where transparent proxying comes in. A web request can be intercepted by the proxy, transparently. That is, as far as the client software knows, it is talking to the origin server itself, when it is really talking to the proxy server. (Note that the transparency only applies to the client; the server knows that a proxy is involved, and will see the IP address of the proxy, not the IP address of the user. Although, squid may pass an X-Forwarded-For header, so that the server can determine the original user's IP address if it groks that header).
Cisco routers support transparent proxying. So do many switches. But, (surprisingly enough) Linux can act as a router, and can perform transparent proxying by redirecting TCP connections to local ports. However, we also need to make our web proxy aware of the affect of the redirection, so that it can make connections to the proper origin servers. There are two general ways this works:
The first is when your web proxy is not transparent proxy aware. You can use a nifty little daemon called transproxy that sits in front of your web proxy and takes care of all the messy details for you. transproxy was written by John Saunders, and is available from
ftp://ftp.nlc.net.au/pub/linux/www/ or your local metalab mirror. transproxy will not be discussed further in this document.
A cleaner solution is to get a web proxy that is aware of transparent proxying itself. The one we are going to focus on here is squid. Squid is an Open Source caching proxy server for Unix systems. It is available from www.squid-cache.org
Alternatively, instead of redirecting the connections to local ports, we could redirect the connections to remote ports. This is discussed in the Transparent Proxy to a Remote Box section. Readers interested in this approach should skip down to that section. Readers interested on doing everything on one box can safely ignore that section.
This document will focus on squid version 2.4 and Linux kernel version 2.4, the most current stable releases as of this writing (December 2001). It should also work with most of the later 2.3 kernels. If you need information about earlier releases of squid or Linux, you can find some earlier documents at www.unxsoft.com/transproxy.html.
If you are using a development kernel or a development version of squid, you are on your own. This document may help you, but YMMV.
Note that this document focuses only on HTTP proxing. I get many emails asking about transparent FTP proxying. Squid can't do it. Now, allegedly a program called Frox can. I have not tried this myself, so I cannot say how well it works. You can find it at http://www.hollo32.fsnet.co.uk/frox/.
I only focus on squid here, but Apache can also function as a caching proxy server. (If you are not sure which to use, I recommend squid, since it was built from the ground up to be a caching proxy server, Apache's caching proxy features are more of afterthought additions to an already existing system.) If you want use Apache instead of squid: follow all the instructions in this document that pertain to the kernel and iptables rules. Ignore the squid specific sections, and instead look at http://lupo.campus.uniroma2.it/progetti/mod_tproxy/ for source code and instructions for a transparent proxy module for Apache (thanks to Cristiano Paris (firstname.lastname@example.org) for contributing this).
Finally, as far as transparently proxing HTTPS (e.g. secure web pages using SSL, etc.), you can't do it. Don't even ask. For the explanation, do a search for 'man-in-the-middle attack'. Note that you probably don't really need to transparently proxy HTTPS anyway, since squid does not cache secure pages.