Linux WWW HOWTO : Apache

7. Apache

The current version of Apache is 1.3.9. The main Apache site is at http://www.apache.org/. Another good source of information is Apacheweek at http://www.apacheweek.com/. The Apache documentation is ok, so I'm not going to go into detail in setting up apache. The documentation is on the website and is included with the source (in HTML format). There are also text files included with the source, but the HTML version is better. The documentation should get a whole lot better once the Apache Documentation Project gets under way. Right now most of the documents are written by the developers. Not to discredit the developers, but they are a little hard to understand if you don't know the terminology.

7.1 Where to get

Apache is included in the Red Hat, Slackware, and OpenLinux distributions. Although they may not be the latest version, they are very reliable binaries. The bad news is you will have to live with their directory choices (which are totally different from each other and the Apache defaults).

The source is available from the Apache web site at http://www.apache.org/dist/ Binaries are are also available at apache at the same place. You can also get binaries from sunsite at ftp://sunsite.unc.edu/pub/Linux/apps/www/servers/. And for those of us running Red Hat the latest binary RPM file can usually be found in the contrib directory at ftp://ftp.redhat.com/pub/contrib/i386/

If your server is going to be used for commercial purposes, it is highly recommended that you get the source from the Apache website and compile it yourself. The other option is to use a binary that comes with a major distribution. For example Slackware, Red Hat, or OpenLinux distributions. The main reason for this is security. An unknown binary could have a back door for hackers, or an unstable patch that could crash your system. This also gives you more control over what modules are compiled in, and allows you to set the default directories. It's not that difficult to compile Apache, and besides you not a real Linux user until you compile your own programs ;)

7.2 Compiling and Installing

First untar the archive to a temporary directory. Next change to the src directory. Then edit the Configuration file if you want to include any special modules. The most commonly used modules are already included. There is no need to change the rules or makefile stuff for Linux. Next run the Configure shell script (./Configure). Make sure it says Linux platform and gcc as the compiler. Next you may want to edit the httpd.h file to change the default directories. The server home (where the config files are kept) default is /usr/local/etc/httpd/, but you may want to change it to just /etc/httpd/. And the server root (where the HTML pages are served from) default is /usr/local/etc/httpd/htdocs/, but I like the directory /home/httpd/html (the Red Hat default for Apache). If you are going to be using su-exec (see special features below) you may want to change that directory too. The server root can also be changed from the config files too. But it is also good to compile it in, just encase Apache can't find or read the config file. Everything else should be changed from the config files. Finally run make to compile Apache.

If you run in to problems with include files missing, check the following things. Make sure you have the kernel headers (include files) installed for your kernel version. Also make sure you have these symbolic links in place:


/usr/include/linux should be a link to /usr/src/linux/include/linux
/usr/include/asm should be a link to /usr/src/linux/include/asm
/usr/src/linux should be a link to the Linux source directory (ex.linux-2.0.30)

Links can be made with ln -s, it works just like the cp command except it makes a link (ln -s source-dir destination-link)

When make is finished there should be an executable named httpd in the directory. This needs to be moved in to a bin directory. /usr/sbin or /usr/local/sbin would be good choices.

Copy the conf, logs, and icons sub-directories from the source to the server home directory. Next rename 3 of the files files in the conf sub-directory to get rid of the -dist extension (ex. httpd.conf-dist becomes httpd.conf)

There are also several support programs that are included with Apache. They are in the support directory and must be compiled and installed separately. Most of them can be make by using the makefile in that directory (which is made when you run the main Configure script). You don't need any of them to run Apache, but some of them make the administrators job easier.

7.3 Configuring

Now you should have four files in your conf sub-directory (under your server home directory). The httpd.conf sets up the server daemon (port number, user, etc). The srm.conf sets the root document tree, special handlers, etc. The access.conf sets the base case for access. Finally mime.types tells the server what mime type to send to the browser for each extension.

The configuration files are pretty much self-documented (plenty of comments), as long as you understand the lingo. You should read through them thoroughly before putting your server to work. Each configuration item is covered in the Apache documentation.

The mime.types file is not really a configuration file. It is used by the server to translate file extensions into mime-types to send to the browser. Most of the common mime-types are already in the file. Most people should not need to edit this file. As time goes on, more mime types will be added to support new programs. The best thing to do is get a new mime-types file (and maybe a new version of the server) at that time.

Always remember when you change the configuration files you need to restart Apache or send it the SIGHUP signal with kill for the changes to take effect. Make sure you send the signal to the parent process and not any of the child processes. The parent usually has the lowest process id number. The process id of the parent is also in the httpd.pid file in the log directory. If you accidently send it to one of the child processes the child will die and the parent will restart it.

I will not be walking you through the steps of configuring Apache. Instead I will deal with specific issues, choices to be made, and special features.

I highly recommend that all users read through the security tips in the Apache documentation. It is also available from the Apache website at http://www.apache.org/docs/mics/security_tips.html.

7.4 Hosting virtual websites

Virtual Hosting is when one computer has more than one domain name. The old way was to have each virtual host have its own IP address. The new way uses only one IP address, but it doesn't work correctly with browsers that don't support HTTP 1.1.

My recommendation for businesses is to go with the IP based virtual hosting until most people have browsers that support HTTP 1.1 (give it a year or two). This also gives you a more complete illusion of virtual hosting. While both methods can give you virtual mail capabilities (can someone confirm this?), only IP based virtual hosting can also give you virtual FTP as well.

If it is for a club or personal page, you may want to consider shared IP virtual hosting. It should be cheaper than IP based hosting and you will be saving precious IP addresses.

You can also mix and match IP and shared IP virtual hosts on the same server. For more information on virtual hosting visit Apacheweek at http://www.apacheweek.com/features/vhost.

IP based virtual hosting

In this method each virtual host has its own IP address. By determining the IP address that the request was sent to, Apache and other programs can tell what domain to serve. This is an incredible waste of IP space. Take for example the servers where my virtual domain is kept. They have over 35,000 virtual accounts, that means 35,000 IP addresses. Yet I believe at last count they had less than 50 servers running.

Setting this up is a two part process. The first is getting Linux setup to accept more than one IP address. The second is setting up apache to serve the virtual hosts.

The first step in setting up Linux to accept multiple IP addresses is to make a new kernel. This works best with a 2.0 series kernel (or higher). You need to include IP networking and IP aliasing support. If you need help with compiling the kernel see the kernel howto.

Next you need to setup each interface at boot. If you are using the Red Hat Distribution then this can be done from the control panel. Start X-windows as root, you should see a control panel. Then double click on network configuration. Next goto the interfaces panel and select your network card. Then click alias at the bottom of the screen. Fill in the information and click done. This will need to be done for each virtual host/IP address.

If you are using other distributions you may have to do it manually. You can just put the commands in the rc.local file in /etc/rc.d (really they should go in with the networking stuff). You need to have a ifconfig and route command for each device. The aliased addresses are given a sub device of the main one. For example eth0 would have aliases eth0:0, eth0:1, eth0:2, etc. Here is an example of configuring a aliased device:


ifconfig eth0:0 192.168.1.57
route add -host 192.168.1.57 dev eth0:0

You can also add a broadcast address and a netmask to the ifconfig command. If you have alot of aliases you may want to make a for loop to make it easier. For more information see the IP alias mini howto.

Then you need to setup your domain name server (DNS) to serve these new domains. And if you don't already own the domain names, you need to contact the Internic to register the domain names. See the DNS-howto for information on setting up your DNS.

Finally you need to setup Apache to server the virtual domain correctly. This is in the httpd.conf configuration file near the end. They give you an example to go by. All commands specific to that virtual host are put in between the virtualhost directive tags. You can put almost any command in there. Usually you set up a different document root, script directory, and log files. You can have almost unlimited number of virtual hosts by adding more virtualhost directive tags.

In rare cases you may need to run separate servers if a directive is needed for a virtual host, but is not allowed in the virtual host tags. This is done using the bindaddress directive. Each server will have a different name and setup files. Each server only responds to one IP address, specified by the bindaddress directive. This is an incredible waste of system resources.

Shared IP virtual hosting

This is a new way to do virtual hosting. It uses a single IP address, thus conserving IP addresses for real machines (not virtual ones). In the same example used above those 30,000 virtual hosts would only take 50 IP addresses (one for each machine). This is done by using the new HTTP 1.1 protocol. The browser tells the server which site it wants when it sends the request. The problem is browsers that don't support HTTP 1.1 will get the servers main page, which could be setup to provide a menu of virtual hosts available. That ruins the whole illusion of virtual hosting. The illusion that you have your own server.

The setup is much simpler than the IP based virtual hosting. You still need to get your domain from the Internic and setup your DNS. This time the DNS points to the same IP address as the original domain. Then Apache is setup the same as before. Since you are using the same IP address in the virtualhost tags, it knows you want Shared IP virtual hosting.

There are several work arounds for older browsers. I'll explain the best one. First you need to make your main pages a virtual host (either IP based or shared IP). This frees up the main page for a link list to all your virtual hosts. Next you need to make a back door for the old browsers to get in. This is done using the ServerPath directive for each virtual host inside the virtualhost directive. For example by adding ServerPath /mysite/ to www.mysite.com old browsers would be able to access the site by www.mysite.com/mysite/. Then you put the default page on the main server that politely tells them to get a new browser, and lists links to all the back doors of all the sites you host on that machine. When an old browser accesses the site they will be sent to the main page, and get a link to the correct page. New browsers will never see the main page and will go directly to the virtual hosts. You must remember to keep all of your links relative within the web sites, because the pages will be accessed from two different URL's (www.mysite.com and www.mysite.com/mysite/).

I hope I didn't lose you there, but its not an easy workaround. Maybe you should consider IP based hosting after all. A very similar workaround is also explained on the apache website at http://www.apache.org/manual/host.html.

If anyone has a great resource for Shared IP hosting, I would like to know about it. It would be nice to know what percent of browsers out there support HTTP 1.1, and to have a list of which browsers and versions support HTTP 1.1.

7.5 CGI scripts

There are two different ways to give your users CGI script capability. The first is make everything ending in .cgi a CGI script. The second is to make script directories (usually named cgi-bin). You could also use both methods. For either method to work the scripts must be world executable (chmod 711). By giving your users script access you are creating a big security risk. Be sure to do your homework to minimize the security risk.

I prefer the first method, especially for complex scripting. It allows you to put scripts in any directory. I like to put my scripts with the web pages they work with. For sites with allot of scripts it looks much better than having a directory full of scripts. This is simple to setup. First uncomment the .cgi handler at the end of the srm.conf file. Then make sure all your directories have the option ExecCGI or All in the access.conf file.

Making script directories is considered more secure. To make a script directory you use the ScriptAlias directive in the srm.conf file. The first argument is the Alias the second is the actual directory. For example ScriptAlias /cgi-bin/ /usr/httpd/cgi-bin/ would make /usr/httpd/cgi-bin able to execute scripts. That directory would be used whenever someone asked for the directory /cgi-bin/. For security reasons you should also change the properties of the directory to Options none, AllowOveride none in the access.conf (just uncomment the example that is there). Also do not make your script directories subdirectories of your web page directories. For example if you are serving pages from /home/httpd/html/, don't make the script directory /home/httpd/html/cgi-bin; Instead make it /home/httpd/cgi-bin.

If you want your users to have there own script directories you can use multiple ScriptAlias commands. Virtual hosts should have there ScriptAlias command inside the virtualhost directive tags. Does anyone know a simple way to allow all users to have a cgi-bin directory without individual ScriptAlias commands?

7.6 Users Web Directories

There are two different ways to handle user web directories. The first is to have a subdirectory under the users home directory (usually public_html). The second is to have an entirely different directory tree for web directories. With both methods make sure set the access options for these directories in the access.conf file.

The first method is already setup in apache by default. Whenever a request for /~bob/ comes in it looks for the public_html directory in bob's home directory. You can change the directory with the UserDir directive in the srm.conf file. This directory must be world readable and executable. This method creates a security risk because for Apache to access the directory the users home directory must be world executable.

The second method is easy to setup. You just need to change the UserDir directive in the srm.conf file. It has many different formats; you may want to consult the Apache documentation for clarification. If you want each user to have their own directory under /home/httpd/, you would use UserDir /home/httpd. Then when the request /~bob/ comes in it would translate to /home/httpd/bob/. Or if you want to have a subdirectory under bob's directory you would use UserDir /home/httpd/*/html. This would translate to /home/httpd/bob/html/ and would allow you to have a script directory too (for example /home/httpd/bob/cgi-bin/).

7.7 Daemon mode vs. Inetd mode

There are two ways that apache can be run. One is as a daemon that is always running (Apache calls this standalone). The second is from the inetd super-server.

Daemon mode is far superior to inetd mode. Apache is setup for daemon mode by default. The only reason to use the inetd mode is for very low use applications. Such as internal testing of scripts, small company Intranet, etc. Inetd mode will save memory because apache will be loaded as needed. Only the inetd daemon will remain in memory.

If you don't use apache that often you may just want to keep it in daemon mode and just start it when you need it. Then you can kill it when you are done (be sure to kill the parent and not one of the child processes).

To setup inetd mode you need to edit a few files. First in /etc/services see if http is already in there. If its not then add it:


http    80/tcp

Right after 79 (finger) would be a good place. Then you need to edit the /etc/inetd.conf file and add the line for Apache:


http    stream  tcp     nowait  root    /usr/sbin/httpd httpd

Be sure to change the path if you have Apache in a different location. And the second httpd is not a typo; the inet daemon requires that. If you are not currently using the inet daemon, you may want to comment out the rest of the lines in the file so you don't activate other services as well (FTP, finger, telnet, and many other things are usually run from this daemon).

If you are already running the inet deamon (inetd), then you only need to send it the SIGHUP signal (via kill; see kill's man page for more info) or reboot the computer for changes to take effect. If you are not running inetd then you can start it manually. You should also add it to your init files so it is loaded at boot (the rc.local file may be a good choice).

7.8 Allowing put and delete commands

The newer web publishing tools support this new method of uploading web pages by http (instead of FTP). Some of these products don't even support FTP anymore! Apache does support this, but it is lacking a script to handle the requests. This script could be a big security hole, be sure you know what you are doing before attempting to write or install one.

If anyone knows of a script that works let me know and I'll include the address to it here.

For more information goto Apacheweek's article at http://www.apacheweek.com/features/put.

7.9 User Authentication/Access Control

This is one of my favorite features. It allows you to password protect a directory or a file without using CGI scripts. It also allows you to deny or grant access based on the IP address or domain name of the client. That is a great feature for keeping jerks out of your message boards and guest books (you get the IP or domain name from the log files).

To allow user authentication the directory must have AllowOverrides AuthConfig set in the access.conf file. To allow access control (by domain or IP address) AllowOverrides Limit must be set for that directory.

Setting up the directory involves putting an .htaccess file in the directory. For user authentication it is usually used with an .htpasswd and optionally a .htgroup file. Those files can be shared among multiple .htaccess files if you wish.

For security reasons I recommend that everyone use these directives in there access.conf file:


<files ~ "/\.ht">
order deny,allow
deny from all
</files>

If you are not the administrator of the system you can also put it in your .htaccess file if AllowOverride Limit is set for your directory. This directive will prevent people from looking into your access control files (.htaccess, .htpasswd, etc).

There are many different options and file types that can be used with access control. Therefore it is beyond the scope of this document to describe the files. For information on how to setup User Authentication see the Apacheweek feature at http://www.apacheweek.com/features/userauth or the NCSA pages at http://hoohoo.ncsa.uiuc.edu/docs-1.5/tutorials/user.html.

7.10 su-exec

The su-exec feature runs CGI scripts as the user of the owner. Normally it is run as the user of the web server (usually nobody). This allows users to access there own files in CGI scripts without making them world writable (a security hole). But if you are not careful you can create a bigger security hole by using the su-exec code. The su-exec code does security checks before executing the scripts, but if you set it up wrong you will have a security hole.

The su-exec code is not for amateurs. Don't use it if you don't know what you are doing. You could end up with a gaping security hole where your users can gain root access to your system. Do not modify the code for any reason. Be sure to read all the documentation carefully. The su-exec code is hard to setup on purpose, to keep the amateurs out (everything must be done manually, no make file no install scripts).

The su-exec code resides in the support directory of the source. First you need to edit the suexec.h file for your system. Then you need to compile the su-exec code with this command:


gcc suexec.c -o suexec

Then copy the suexec executable to the proper directory. The Apache default is /usr/local/etc/httpd/sbin/. This can be changed by editing httpd.h in the Apache source and recompiling Apache. Apache will only look in this directory, it will not search the path. Next the file needs to be changed to user root (chown root suexec) and the suid bit needs to be set (chmod 4711 suexec). Finally restart Apache, it should display a message on the console that su-exec is being used.

CGI scripts should be set world executable like normal. They will automaticaly be run as the owner of the CGI script. If you set the SUID (set user id) bit on the CGI scripts they will not run. If the directory or file is world or group writable the script will not run. Scripts owned by system users will not be run (root, bin, etc.). For other security conditions that must be met see the su-exec documentation. If you are having problems see the su-exec log file named cgi.log.

Su-exec does not work if you are running Apache from inetd, it only works in daemon mode. It will be fixed in the next version because there will be no inetd mode. If you like playing around in source code, you can edit the http_main.c. You want to get rid of the line where Apache announces that it is using the su-exec wrapper (It wrongly prints this in front of the output of everything).

Be sure and read the Apache documentation on su-exec. It is included with the source and is available on the Apache web site at http://www.apache.org/docs/suexec.html

7.11 Imagemaps

Apache has the ability to handle server side imagemaps. Imagemaps are images on webpages that take users to different locations depending on where they click. To enable imagemaps first make sure the imagemap module is installed (its one of the default modules). Next you need to uncomment the .map handler at the end of the srm.conf file. Now all files ending in .map will be imagemap files. Imagemap files map different areas on the image to separate links. Apache uses map files in the standard NCSA format. Here is an example of using a map file in a web page:


<a href="/map/mapfile.map">
<img src="picture.gif" ISMAP>
</a>

In this example mapfile.map is the mapfile, and picture.gif is the image to click on.

There are many programs that can generate NCSA compatible map files or you can create them yourself. For a more detailed discussion of imagemaps and map files see the Apacheweek feature at http://www.apacheweek.com/features/imagemaps.

7.12 SSI/XSSI

Server Side Includes (SSI) adds dynamic content to otherwise static web pages. The includes are embedded in the web page as comments. The web server then parses these includes and passes the results to the web server. SSI can add headers and footers to documents, add date the document was last updated, execute a system command or a CGI script. With the new eXtended Server Side Includes (XSSI) you can do a whole lot more. XSSI adds variables and flow control statements (if, else, etc). Its almost like having an programming language to work with.

Parsing all HTML files for SSI commands would waste allot of system resources. Therefore you need to distinguish normal HTML files from those that contain SSI commands. This is usually done by changing the extension of the SSI enhanced HTML files. Usually the .shtml extension is used.

To enable SSI/XSSI first make sure that the includes module is installed. Then edit srm.conf and uncomment the AddType and AddHandler directives for .shtml files. Finally you must set Options Includes for all directories where you want to run SSI/XSSI files. This is done in the access.conf file. Now all files with the extension .shtml will be parsed for SSI/XSSI commands.

Another way of enabling includes is to use the XBitHack directive. If you turn this on it looks to see if the file is executable by user. If it is and Options Includes is on for that directory, then it is treated as an SSI file. This only works for files with the mime type text/html (.html .htm files). This is not the preferred method.

There is a security risk in allowing SSI to execute system commands and CGI scripts. Therefore it is possible to lock that feature out with the Option IncludesNOEXEC instead of Option Includes in the access.conf file. All the other SSI commands will still work.

For more information see the Apache mod_includes documentation that comes with the source. It is also available on the website at http://www.apache.org/docs/mod/mod_include.html.

For a more detailed discussion of SSI/XSSI implementation see the Apacheweek feature at http://www.apacheweek.com/features/ssi.

For more information on SSI commands see the NCSA documentation at http://hoohoo.ncsa.uiuc.edu/docs/tutorials/includes.html.

For more information on XSSI commands goto ftp://pageplus.com/pub/hsf/xssi/xssi-1.1.html.

7.13 Module system

Apache can be extended to support almost anything with modules. There are allot of modules already in existence. Only the general interest modules are included with Apache. For links to existing modules goto the

Apache Module Registry at http://www.zyzzyva.com/module_registry/.

For module programming information goto http://www.zyzzyva.com/module_registry/reference/

Next Previous Contents