The starting point in this will be to consider where you are and what you want to do. The typical home system starts out with existing hardware and the newly converted Linux user will want to get the most out of existing hardware. Someone setting up a new system for a specific purpose (such as an Internet provider) will instead have to consider what the goal is and buy accordingly. Being ambitious I will try to cover the entire range.
Various purposes will also have different requirements regarding file system placement on the drives, a large multiuser machine would probably be best off with the
/home directory on a separate disk, just to give an example.
In general, for performance it is advantageous to split most things over as many disks as possible but there is a limited number of devices that can live on a SCSI bus and cost is naturally also a factor. Equally important, file system maintenance becomes more complicated as the number of partitions and physical drives increases.
With the cheap hardware available today it is possible to have quite a big system at home that is still cheap, systems that rival major servers of yesteryear. While many started out with old, discarded disks to build a Linux server (which is how this HOWTO came into existence), many can now afford to buy 20 GB disks up front.
Size remains important for some, and here are a few guidelines:
Linux is simple and you don't even need a hard disk to try it out, if you can get the boot floppies to work you are likely to get it to work on your hardware. If the standard kernel does not work for you, do not forget that often there can be special boot disk versions available for unusual hardware combinations that can solve your initial problems until you can compile your own kernel.
about operating system is something Linux excels in, there is plenty of documentation and the source is available. A single drive with 50 MB is enough to get you started with a shell, a few of the most frequently used commands and utilities.
use or more serious learning requires more commands and utilities but a single drive is still all it takes, 500 MB should give you plenty of room, also for sources and documentation.
software development or just serious hobby work requires even more space. At this stage you have probably a mail and news feed that requires spool files and plenty of space. Separate drives for various tasks will begin to show a benefit. At this stage you have probably already gotten hold of a few drives too. Drive requirements gets harder to estimate but I would expect 2-4 GB to be plenty, even for a small server.
come in many flavours, ranging from mail servers to full sized ISP servers. A base of 2 GB for the main system should be sufficient, then add space and perhaps also drives for separate features you will offer. Cost is the main limiting factor here but be prepared to spend a bit if you wish to justify the "S" in ISP. Admittedly, not all do it.
Basically a server is dimensioned like any machine for serious use with added space for the services offered, and tends to be IO bound rather than CPU bound.
With cheap networking technology both for land lines as well as through radio nets, it is quite likely that very soon home users will have their own servers more or less permanently hooked onto the net.
Big tasks require big drives and a separate section here. If possible keep as much as possible on separate drives. Some of the appendices detail the setup of a small departmental server for 10-100 users. Here I will present a few consideration for the higher end servers. In general you should not be afraid of using RAID, not only because it is fast and safe but also because it can make growth a little less painful. All the notes below come as additions to the points mentioned earlier.
Popular servers rarely just happens, rather they grow over time and this demands both generous amounts of disk space as well as a good net connection. In many of these cases it might be a good idea to reserve entire SCSI drives, in singles or as arrays, for each task. This way you can move the data should the computer fail. Note that transferring drives across computers is not simple and might not always work, especially in the case of IDE drives. Drive arrays require careful setup in order to reconstruct the data correctly, so you might want to keep a paper copy of your
fstab file as well as a note of SCSI IDs.
Estimate how many drives you will need, if this is more than 2 I would recommend RAID, strongly. If not you should separate users across your drives dedicated to users based on some kind of simple hashing algorithm. For instance you could use the first 2 letters in the user name, so
jbloggs is put on
/u/j is a symbolic link to a physical drive so you can get a balanced load on your drives.
This is an essential service if you are serious about service. Good servers are well maintained, documented, kept up to date, and immensely popular no matter where in the world they are located. The big server ftp.funet.fi is an excellent example of this.
In general this is not a question of CPU but of network bandwidth. Size is hard to estimate, mainly it is a question of ambition and service attitudes. I believe the big archive at ftp.cdrom.com is a *BSD machine with 50 GB disk. Also memory is important for a dedicated FTP server, about 256 MB RAM would be sufficient for a very big server, whereas smaller servers can get the job done well with 64 MB RAM. Network connections would still be the most important factor.
For many this is the main reason to get onto the Internet, in fact many now seem to equate the two. In addition to being network intensive there is also a fair bit of drive activity related to this, mainly regarding the caches. Keeping the cache on a separate, fast drive would be beneficial. Even better would be installing a caching proxy server. This way you can reduce the cache size for each user and speed up the service while at the same time cut down on the bandwidth requirements.
With a caching proxy server you need a fast set of drives, RAID0 would be ideal as reliability is not important here. Higher capacity is better but about 2 GB should be sufficient for most. Remember to match the cache period to the capacity and demand. Too long periods would on the other hand be a disadvantage, if possible try to adjust based on the URL. For more information check up on the most used servers such as
Harvest, Squid and the one from Netscape.
Handling mail is something most machines do to some extent. The big mail servers, however, come into a class of their own. This is a demanding task and a big server can be slow even when connected to fast drives and a good net feed. In the Linux world the big server at
vger.rutgers.edu is a well known example. Unlike a news service which is distributed and which can partially reconstruct the spool using other machines as a feed, the mail servers are centralised. This makes safety much more important, so for a major server you should consider a RAID solution with emphasize on reliability. Size is hard to estimate, it all depends on how many lists you run as well as how many subscribers you have.
Big mail servers can be IO limited in performance and for this reason some use huge silicon disks connected to the SCSI bus to hold all mail related files including temporary files. For extra safety these are battery backed and filesystems like
udf are preferred since they always flush metadata to disk. This added cost to performance is offset by the very fast disk.
Note that these days more and more switch over from using
POP to pull mail to local machine from mail server and instead use
IMAP to serve mail while keeping the mail archive centralised. This means that mail is no longer spooled in its original sense but often builds up, requiring huge disk space. Also more and more (ab)use mail attachments to send all sorts of things across, even a small word processor document can easily end up over 1 MB. Size your disks generously and keep an eye on how much space is left.
This is definitely a high volume task, and very dependent on what news groups you subscribe to. On Nyx there is a fairly complete feed and the spool files consume about 17 GB. The biggest groups are no doubt in the
alt.binary.* hierarchy, so if you for some reason decide not to get these you can get a good service with perhaps 12 GB. Still others, that shall remain nameless, feel 2 GB is sufficient to claim ISP status. In this case news expires so fast I feel the spelling IsP is barely justified. A full newsfeed means a traffic of a few GB every day and this is an ever growing number.
There are many services available on the net and even though many have been put somewhat in the shadows by the web. Nevertheless, services like archie, gopher and wais just to name a few, still exist and remain valuable tools on the net. If you are serious about starting a major server you should also consider these services. Determining the required volumes is hard, it all depends on popularity and demand. Providing good service inevitably has its costs, disk space is just one of them.
Servers today require large numbers of large disks to function satisfactorily in commercial settings. As mean time between failure (MTBF) decreases rapidly as the number of components increase it is advisable to look into using RAID for protection and use a number of medium sized drives rather than one single huge disk. Also look into the High Availability (HA) project for more information. More information is available at
High Availability HOWTO and also at related web pages.
There is also an article in Byte called How Big Does Your Unix Server Have To Be? with many points that are relevant to Linux.
The dangers of splitting up everything into separate partitions are briefly mentioned in the section about volume management. Still, several people have asked me to emphasize this point more strongly: when one partition fills up it cannot grow any further, no matter if there is plenty of space in other partitions.
In particular look out for explosive growth in the news spool (
/var/spool/news). For multi user machines with quotas keep an eye on
/var/tmp as some people try to hide their files there, just look out for filenames ending in gif or jpeg...
In fact, for single physical drives this scheme offers very little gains at all, other than making file growth monitoring easier (using '
df') and physical track positioning. Most importantly there is no scope for parallel disk access. A freely available volume management system would solve this but this is still some time in the future. However, when more specialised file systems become available even a single disk could benefit from being divided into several partitions.
For more information see section Troubleshooting.