Linuxdoc Linux Questions
Click here to ask our community of linux experts!
Custom Search
Next Previous Contents

3. Fundamentals

3.1 What is the kernel?

The kernel is the "core" of any computer system: it is the "software" which allows users to share computer resources.

The kernel can be thought as the main software of the OS (Operating System), which may also include graphics management.

For example, under Linux (like other Unix-like OSs), the XWindow environment doesn't belong to the Linux Kernel, because it manages only graphical operations (it uses user mode I/O to access video card devices).

By contrast, Windows environments (Win9x, WinME, WinNT, Win2K, WinXP, and so on) are a mix between a graphical environment and kernel.

3.2 What is the difference between User Mode and Kernel Mode?

Overview

Many years ago, when computers were as big as a room, users ran their applications with much difficulty and, sometimes, their applications crashed the computer.

Operative modes

To avoid having applications that constantly crashed, newer OSs were designed with 2 different operative modes:

  1. Kernel Mode: the machine operates with critical data structure, direct hardware (IN/OUT or memory mapped), direct memory, IRQ, DMA, and so on.
  2. User Mode: users can run applications.
                      
               |          Applications           /|\
               |         ______________           |
               |         | User Mode  |           |  
               |         ______________           | 
               |               |                  |  
Implementation |        _______ _______           |   Abstraction
    Detail     |        | Kernel Mode |           |
               |        _______________           |
               |               |                  |
               |               |                  | 
               |               |                  |
              \|/          Hardware               |

Kernel Mode "prevents" User Mode applications from damaging the system or its features.

Modern microprocessors implement in hardware at least 2 different states. For example under Intel, 4 states determine the PL (Privilege Level). It is possible to use 0,1,2,3 states, with 0 used in Kernel Mode.

Unix OS requires only 2 privilege levels, and we will use such a paradigm as point of reference.

3.3 Switching from User Mode to Kernel Mode

When do we switch?

Once we understand that there are 2 different modes, we have to know when we switch from one to the other.

Typically, there are 2 points of switching:

  1. When calling a System Call: after calling a System Call, the task voluntary calls pieces of code living in Kernel Mode
  2. When an IRQ (or exception) comes: after the IRQ an IRQ handler (or exception handler) is called, then control returns back to the task that was interrupted like nothing was happened.

System Calls

System calls are like special functions that manage OS routines which live in Kernel Mode.

A system call can be called when we:

                                 |                |
                         ------->| System Call i  | (Accessing Devices)
|                |       |       |  [sys_read()]  |
| ...            |       |       |                |
| system_call(i) |--------       |                |
|   [read()]     |               |                |
| ...            |               |                |
| system_call(j) |--------       |                |  
|   [get_pid()]  |       |       |                |
| ...            |       ------->| System Call j  | (Accessing kernel data structures)
|                |               |  [sys_getpid()]|
                                 |                | 
 
    USER MODE                        KERNEL MODE
 
  
                        Unix System Calls Working 

System calls are almost the only interface used by User Mode to talk with low level resources (hardware). The only exception to this statement is when a process uses ''ioperm'' system call. In this case a device can be accessed directly by User Mode process (IRQs cannot be used).

NOTE: Not every ''C'' function is a system call, only some of them.

Below is a list of System Calls under Linux Kernel 2.4.17, from [ arch/i386/kernel/entry.S ]

        .long SYMBOL_NAME(sys_ni_syscall)       /* 0  -  old "setup()" system call*/
        .long SYMBOL_NAME(sys_exit)
        .long SYMBOL_NAME(sys_fork)
        .long SYMBOL_NAME(sys_read)
        .long SYMBOL_NAME(sys_write)
        .long SYMBOL_NAME(sys_open)             /* 5 */
        .long SYMBOL_NAME(sys_close)
        .long SYMBOL_NAME(sys_waitpid)
        .long SYMBOL_NAME(sys_creat)
        .long SYMBOL_NAME(sys_link)
        .long SYMBOL_NAME(sys_unlink)           /* 10 */
        .long SYMBOL_NAME(sys_execve)
        .long SYMBOL_NAME(sys_chdir)
        .long SYMBOL_NAME(sys_time)
        .long SYMBOL_NAME(sys_mknod)
        .long SYMBOL_NAME(sys_chmod)            /* 15 */
        .long SYMBOL_NAME(sys_lchown16)
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old break syscall holder */
        .long SYMBOL_NAME(sys_stat)
        .long SYMBOL_NAME(sys_lseek)
        .long SYMBOL_NAME(sys_getpid)           /* 20 */
        .long SYMBOL_NAME(sys_mount)
        .long SYMBOL_NAME(sys_oldumount)
        .long SYMBOL_NAME(sys_setuid16)
        .long SYMBOL_NAME(sys_getuid16)
        .long SYMBOL_NAME(sys_stime)            /* 25 */
        .long SYMBOL_NAME(sys_ptrace)
        .long SYMBOL_NAME(sys_alarm)
        .long SYMBOL_NAME(sys_fstat)
        .long SYMBOL_NAME(sys_pause)
        .long SYMBOL_NAME(sys_utime)            /* 30 */
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old stty syscall holder */
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old gtty syscall holder */
        .long SYMBOL_NAME(sys_access)
        .long SYMBOL_NAME(sys_nice)
        .long SYMBOL_NAME(sys_ni_syscall)       /* 35 */                /* old ftime syscall holder */
        .long SYMBOL_NAME(sys_sync)
        .long SYMBOL_NAME(sys_kill)
        .long SYMBOL_NAME(sys_rename)
        .long SYMBOL_NAME(sys_mkdir)
        .long SYMBOL_NAME(sys_rmdir)            /* 40 */
        .long SYMBOL_NAME(sys_dup)
        .long SYMBOL_NAME(sys_pipe)
        .long SYMBOL_NAME(sys_times)
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old prof syscall holder */
        .long SYMBOL_NAME(sys_brk)              /* 45 */
        .long SYMBOL_NAME(sys_setgid16)
        .long SYMBOL_NAME(sys_getgid16)
        .long SYMBOL_NAME(sys_signal)
        .long SYMBOL_NAME(sys_geteuid16)
        .long SYMBOL_NAME(sys_getegid16)        /* 50 */
        .long SYMBOL_NAME(sys_acct)
        .long SYMBOL_NAME(sys_umount)                                   /* recycled never used phys() */
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old lock syscall holder */
        .long SYMBOL_NAME(sys_ioctl)
        .long SYMBOL_NAME(sys_fcntl)            /* 55 */
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old mpx syscall holder */
        .long SYMBOL_NAME(sys_setpgid)
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old ulimit syscall holder */
        .long SYMBOL_NAME(sys_olduname)
        .long SYMBOL_NAME(sys_umask)            /* 60 */
        .long SYMBOL_NAME(sys_chroot)
        .long SYMBOL_NAME(sys_ustat)
        .long SYMBOL_NAME(sys_dup2)
        .long SYMBOL_NAME(sys_getppid)
        .long SYMBOL_NAME(sys_getpgrp)          /* 65 */
        .long SYMBOL_NAME(sys_setsid)
        .long SYMBOL_NAME(sys_sigaction)
        .long SYMBOL_NAME(sys_sgetmask)
        .long SYMBOL_NAME(sys_ssetmask)
        .long SYMBOL_NAME(sys_setreuid16)       /* 70 */
        .long SYMBOL_NAME(sys_setregid16)
        .long SYMBOL_NAME(sys_sigsuspend)
        .long SYMBOL_NAME(sys_sigpending)
        .long SYMBOL_NAME(sys_sethostname)
        .long SYMBOL_NAME(sys_setrlimit)        /* 75 */
        .long SYMBOL_NAME(sys_old_getrlimit)
        .long SYMBOL_NAME(sys_getrusage)
        .long SYMBOL_NAME(sys_gettimeofday)
        .long SYMBOL_NAME(sys_settimeofday)
        .long SYMBOL_NAME(sys_getgroups16)      /* 80 */
        .long SYMBOL_NAME(sys_setgroups16)
        .long SYMBOL_NAME(old_select)
        .long SYMBOL_NAME(sys_symlink)
        .long SYMBOL_NAME(sys_lstat)
        .long SYMBOL_NAME(sys_readlink)         /* 85 */
        .long SYMBOL_NAME(sys_uselib)
        .long SYMBOL_NAME(sys_swapon)
        .long SYMBOL_NAME(sys_reboot)
        .long SYMBOL_NAME(old_readdir)
        .long SYMBOL_NAME(old_mmap)             /* 90 */
        .long SYMBOL_NAME(sys_munmap)
        .long SYMBOL_NAME(sys_truncate)
        .long SYMBOL_NAME(sys_ftruncate)
        .long SYMBOL_NAME(sys_fchmod)
        .long SYMBOL_NAME(sys_fchown16)         /* 95 */
        .long SYMBOL_NAME(sys_getpriority)
        .long SYMBOL_NAME(sys_setpriority)
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old profil syscall holder */
        .long SYMBOL_NAME(sys_statfs)
        .long SYMBOL_NAME(sys_fstatfs)          /* 100 */
        .long SYMBOL_NAME(sys_ioperm)
        .long SYMBOL_NAME(sys_socketcall)
        .long SYMBOL_NAME(sys_syslog)
        .long SYMBOL_NAME(sys_setitimer)
        .long SYMBOL_NAME(sys_getitimer)        /* 105 */
        .long SYMBOL_NAME(sys_newstat)
        .long SYMBOL_NAME(sys_newlstat)
        .long SYMBOL_NAME(sys_newfstat)
        .long SYMBOL_NAME(sys_uname)
        .long SYMBOL_NAME(sys_iopl)             /* 110 */
        .long SYMBOL_NAME(sys_vhangup)
        .long SYMBOL_NAME(sys_ni_syscall)       /* old "idle" system call */
        .long SYMBOL_NAME(sys_vm86old)
        .long SYMBOL_NAME(sys_wait4)
        .long SYMBOL_NAME(sys_swapoff)          /* 115 */
        .long SYMBOL_NAME(sys_sysinfo)
        .long SYMBOL_NAME(sys_ipc)
        .long SYMBOL_NAME(sys_fsync)
        .long SYMBOL_NAME(sys_sigreturn)
        .long SYMBOL_NAME(sys_clone)            /* 120 */
        .long SYMBOL_NAME(sys_setdomainname)
        .long SYMBOL_NAME(sys_newuname)
        .long SYMBOL_NAME(sys_modify_ldt)
        .long SYMBOL_NAME(sys_adjtimex)
        .long SYMBOL_NAME(sys_mprotect)         /* 125 */
        .long SYMBOL_NAME(sys_sigprocmask)
        .long SYMBOL_NAME(sys_create_module)
        .long SYMBOL_NAME(sys_init_module)
        .long SYMBOL_NAME(sys_delete_module)
        .long SYMBOL_NAME(sys_get_kernel_syms)  /* 130 */
        .long SYMBOL_NAME(sys_quotactl)
        .long SYMBOL_NAME(sys_getpgid)
        .long SYMBOL_NAME(sys_fchdir)
        .long SYMBOL_NAME(sys_bdflush)
        .long SYMBOL_NAME(sys_sysfs)            /* 135 */
        .long SYMBOL_NAME(sys_personality)
        .long SYMBOL_NAME(sys_ni_syscall)       /* for afs_syscall */
        .long SYMBOL_NAME(sys_setfsuid16)
        .long SYMBOL_NAME(sys_setfsgid16)
        .long SYMBOL_NAME(sys_llseek)           /* 140 */
        .long SYMBOL_NAME(sys_getdents)
        .long SYMBOL_NAME(sys_select)
        .long SYMBOL_NAME(sys_flock)
        .long SYMBOL_NAME(sys_msync)
        .long SYMBOL_NAME(sys_readv)            /* 145 */
        .long SYMBOL_NAME(sys_writev)
        .long SYMBOL_NAME(sys_getsid)
        .long SYMBOL_NAME(sys_fdatasync)
        .long SYMBOL_NAME(sys_sysctl)
        .long SYMBOL_NAME(sys_mlock)            /* 150 */
        .long SYMBOL_NAME(sys_munlock)
        .long SYMBOL_NAME(sys_mlockall)
        .long SYMBOL_NAME(sys_munlockall)
        .long SYMBOL_NAME(sys_sched_setparam)
        .long SYMBOL_NAME(sys_sched_getparam)   /* 155 */
        .long SYMBOL_NAME(sys_sched_setscheduler)
        .long SYMBOL_NAME(sys_sched_getscheduler)
        .long SYMBOL_NAME(sys_sched_yield)
        .long SYMBOL_NAME(sys_sched_get_priority_max)
        .long SYMBOL_NAME(sys_sched_get_priority_min)  /* 160 */
        .long SYMBOL_NAME(sys_sched_rr_get_interval)
        .long SYMBOL_NAME(sys_nanosleep)
        .long SYMBOL_NAME(sys_mremap)
        .long SYMBOL_NAME(sys_setresuid16)
        .long SYMBOL_NAME(sys_getresuid16)      /* 165 */
        .long SYMBOL_NAME(sys_vm86)
        .long SYMBOL_NAME(sys_query_module)
        .long SYMBOL_NAME(sys_poll)
        .long SYMBOL_NAME(sys_nfsservctl)
        .long SYMBOL_NAME(sys_setresgid16)      /* 170 */
        .long SYMBOL_NAME(sys_getresgid16)
        .long SYMBOL_NAME(sys_prctl)
        .long SYMBOL_NAME(sys_rt_sigreturn)
        .long SYMBOL_NAME(sys_rt_sigaction)
        .long SYMBOL_NAME(sys_rt_sigprocmask)   /* 175 */
        .long SYMBOL_NAME(sys_rt_sigpending)
        .long SYMBOL_NAME(sys_rt_sigtimedwait)
        .long SYMBOL_NAME(sys_rt_sigqueueinfo)
        .long SYMBOL_NAME(sys_rt_sigsuspend)
        .long SYMBOL_NAME(sys_pread)            /* 180 */
        .long SYMBOL_NAME(sys_pwrite)
        .long SYMBOL_NAME(sys_chown16)
        .long SYMBOL_NAME(sys_getcwd)
        .long SYMBOL_NAME(sys_capget)
        .long SYMBOL_NAME(sys_capset)           /* 185 */
        .long SYMBOL_NAME(sys_sigaltstack)
        .long SYMBOL_NAME(sys_sendfile)
        .long SYMBOL_NAME(sys_ni_syscall)               /* streams1 */
        .long SYMBOL_NAME(sys_ni_syscall)               /* streams2 */
        .long SYMBOL_NAME(sys_vfork)            /* 190 */
        .long SYMBOL_NAME(sys_getrlimit)
        .long SYMBOL_NAME(sys_mmap2)
        .long SYMBOL_NAME(sys_truncate64)
        .long SYMBOL_NAME(sys_ftruncate64)
        .long SYMBOL_NAME(sys_stat64)           /* 195 */
        .long SYMBOL_NAME(sys_lstat64)
        .long SYMBOL_NAME(sys_fstat64)
        .long SYMBOL_NAME(sys_lchown)
        .long SYMBOL_NAME(sys_getuid)
        .long SYMBOL_NAME(sys_getgid)           /* 200 */
        .long SYMBOL_NAME(sys_geteuid)
        .long SYMBOL_NAME(sys_getegid)
        .long SYMBOL_NAME(sys_setreuid)
        .long SYMBOL_NAME(sys_setregid)
        .long SYMBOL_NAME(sys_getgroups)        /* 205 */
        .long SYMBOL_NAME(sys_setgroups)
        .long SYMBOL_NAME(sys_fchown)
        .long SYMBOL_NAME(sys_setresuid)
        .long SYMBOL_NAME(sys_getresuid)
        .long SYMBOL_NAME(sys_setresgid)        /* 210 */
        .long SYMBOL_NAME(sys_getresgid)
        .long SYMBOL_NAME(sys_chown)
        .long SYMBOL_NAME(sys_setuid)
        .long SYMBOL_NAME(sys_setgid)
        .long SYMBOL_NAME(sys_setfsuid)         /* 215 */
        .long SYMBOL_NAME(sys_setfsgid)
        .long SYMBOL_NAME(sys_pivot_root)
        .long SYMBOL_NAME(sys_mincore)
        .long SYMBOL_NAME(sys_madvise)
        .long SYMBOL_NAME(sys_getdents64)       /* 220 */
        .long SYMBOL_NAME(sys_fcntl64)
        .long SYMBOL_NAME(sys_ni_syscall)       /* reserved for TUX */
        .long SYMBOL_NAME(sys_ni_syscall)       /* Reserved for Security */
        .long SYMBOL_NAME(sys_gettid)
        .long SYMBOL_NAME(sys_readahead)        /* 225 */


IRQ Event

When an IRQ comes, the task that is running is interrupted in order to service the IRQ Handler.

After the IRQ is handled, control returns backs exactly to point of interrupt, like nothing happened.


           
              Running Task 
             |-----------|          (3)
NORMAL       |   |       | [break execution] IRQ Handler
EXECUTION (1)|   |       |     ------------->|---------| 
             |  \|/      |     |             |  does   |         
 IRQ (2)---->| ..        |----->             |  some   |      
             |   |       |<-----             |  work   |       
BACK TO      |   |       |     |             |  ..(4). |
NORMAL    (6)|  \|/      |     <-------------|_________|
EXECUTION    |___________|  [return to code]
                                    (5)
               USER MODE                     KERNEL MODE

         User->Kernel Mode Transition caused by IRQ event
     

The numbered steps below refer to the sequence of events in the diagram above:

  1. Process is executing
  2. IRQ comes while the task is running.
  3. Task is interrupted to call an "Interrupt handler".
  4. The "Interrupt handler" code is executed.
  5. Control returns back to task user mode (as if nothing happened)
  6. Process returns back to normal execution

Special interest has the Timer IRQ, coming every TIMER ms to manage:

  1. Alarms
  2. System and task counters (used by schedule to decide when stop a process or for accounting)
  3. Multitasking based on wake up mechanism after TIMESLICE time.

3.4 Multitasking

Mechanism

The key point of modern OSs is the "Task". The Task is an application running in memory sharing all resources (included CPU and Memory) with other Tasks.

This "resource sharing" is managed by the "Multitasking Mechanism". The Multitasking Mechanism switches from one task to another after a "timeslice" time. Users have the "illusion" that they own all resources. We can also imagine a single user scenario, where a user can have the "illusion" of running many tasks at the same time.

To implement this multitasking, the task uses "the state" variable, which can be:

  1. READY, ready for execution
  2. BLOCKED, waiting for a resource

The task state is managed by its presence in a relative list: READY list and BLOCKED list.

Task Switching

The movement from one task to another is called ''Task Switching''. many computers have a hardware instruction which automatically performs this operation. Task Switching occurs in the following cases:

  1. After Timeslice ends: we need to schedule a "Ready for execution" task and give it access.
  2. When a Task has to wait for a device: we need to schedule a new task and switch to it *

* We schedule another task to prevent "Busy Form Waiting", which occurs when we are waiting for a device instead performing other work.

Task Switching is managed by the "Schedule" entity.

 
Timer    |           |
 IRQ     |           |                            Schedule
  |      |           |                     ________________________
  |----->|   Task 1  |<------------------>|(1)Chooses a Ready Task |
  |      |           |                    |(2)Task Switching       |
  |      |___________|                    |________________________|   
  |      |           |                               /|\
  |      |           |                                | 
  |      |           |                                |
  |      |           |                                |
  |      |           |                                |      
  |----->|   Task 2  |<-------------------------------|
  |      |           |                                |
  |      |___________|                                |
  .      .     .     .                                .
  .      .     .     .                                .
  .      .     .     .                                .
  |      |           |                                |
  |      |           |                                |
  ------>|   Task N  |<--------------------------------
         |           |
         |___________| 
    
            Task Switching based on TimeSlice
 

A typical Timeslice for Linux is about 10 ms.


 

 |           |            
 |           | Resource    _____________________________
 |   Task 1  |----------->|(1) Enqueue Resource request |
 |           |  Access    |(2)  Mark Task as blocked    |
 |           |            |(3)  Choose a Ready Task     |
 |___________|            |(4)    Task Switching        |
                          |_____________________________|
                                       |
                                       |
 |           |                         |
 |           |                         |
 |   Task 2  |<-------------------------
 |           |  
 |           |
 |___________|
 
     Task Switching based on Waiting for a Resource
 

3.5 Microkernel vs Monolithic OS

Overview

Until now we viewed so called Monolithic OS, but there is also another kind of OS: ''Microkernel''.

A Microkernel OS uses Tasks, not only for user mode processes, but also as a real kernel manager, like Floppy-Task, HDD-Task, Net-Task and so on. Some examples are Amoeba, and Mach.

PROs and CONTROs of Microkernel OS

PROS:

CONS:

My personal opinion is that, Microkernels are a good didactic example (like Minix) but they are not ''optimal'', so not really suitable. Linux uses a few Tasks, called "Kernel Threads" to implement a little microkernel structure (like kswapd, which is used to retrieve memory pages from mass storage). In this case there are no problems with perfomance because swapping is a very slow job.

3.6 Networking

ISO OSI levels

Standard ISO-OSI describes a network architecture with the following levels:

  1. Physical level (examples: PPP and Ethernet)
  2. Data-link level (examples: PPP and Ethernet)
  3. Network level (examples: IP, and X.25)
  4. Transport level (examples: TCP, UDP)
  5. Session level (SSL)
  6. Presentation level (FTP binary-ascii coding)
  7. Application level (applications like Netscape)

The first 2 levels listed above are often implemented in hardware. Next levels are in software (or firmware for routers).

Many protocols are used by an OS: one of these is TCP/IP (the most important living on 3-4 levels).

What does the kernel?

The kernel doesn't know anything (only addresses) about first 2 levels of ISO-OSI.

In RX it:

  1. Manages handshake with low levels devices (like ethernet card or modem) receiving "frames" from them.
  2. Builds TCP/IP "packets" from "frames" (like Ethernet or PPP ones),
  3. Convers ''packets'' in ''sockets'' passing them to the right application (using port number) or
  4. Forwards packets to the right queue
frames         packets              sockets
NIC ---------> Kernel ----------> Application
                  |    packets
                  --------------> Forward
                        - RX - 

In TX stage it:

  1. Converts sockets or
  2. Queues datas into TCP/IP ''packets''
  3. Splits ''packets" into "frames" (like Ethernet or PPP ones)
  4. Sends ''frames'' using HW drivers
sockets       packets                     frames
Application ---------> Kernel ----------> NIC
              packets     /|\    
Forward  -------------------
                        - TX -  


3.7 Virtual Memory

Segmentation

Segmentation is the first method to solve memory allocation problems: it allows you to compile source code without caring where the application will be placed in memory. As a matter of fact, this feature helps applications developers to develop in a independent fashion from the OS e also from the hardware.

     
            |       Stack        |
            |          |         |
            |         \|/        |
            |        Free        | 
            |         /|\        |     Segment <---> Process    
            |          |         |
            |        Heap        |
            | Data uninitialized |
            |  Data initialized  |
            |       Code         |
            |____________________|  
 
                   Segment  

We can say that a segment is the logical entity of an application, or the image of the application in memory.

When programming, we don't care where our data is put in memory, we only care about the offset inside our segment (our application).

We use to assign a Segment to each Process and vice versa. In Linux this is not true. Linux uses only 4 segments for either Kernel and all Processes.

Problems of Segmentation

 
                                 ____________________
                          ----->|                    |----->
                          | IN  |     Segment A      | OUT
 ____________________     |     |____________________|   
|                    |____|     |                    |   
|     Segment B      |          |     Segment B      |
|                    |____      |                    |   
|____________________|    |     |____________________|   
                          |     |     Segment C      |   
                          |     |____________________|
                          ----->|     Segment D      |-----> 
                            IN  |____________________| OUT 
 
                     Segmentation problem


In the diagram above, we want to get exit processes A, and D and enter process B. As we can see there is enough space for B, but we cannot split it in 2 pieces, so we CANNOT load it (memory out).

The reason this problem occurs is because pure segments are continuous areas (because they are logical areas) and cannot be split.

Pagination

 
             ____________________
            |     Page 1         |
            |____________________|
            |     Page 2         |
            |____________________| 
            |      ..            |     Segment <---> Process    
            |____________________|
            |     Page n         |
            |____________________|
            |                    |
            |____________________|
            |                    |
            |____________________|  
 
                   Segment  
 

Pagination splits memory in "n" pieces, each one with a fixed length.

A process may be loaded in one or more Pages. When memory is freed, all pages are freed (see Segmentation Problem, before).

Pagination is also used for another important purpose, "Swapping". If a page is not present in physical memory then it generates an EXCEPTION, that will make the Kernel search for a new page in storage memory. This mechanism allow OS to load more applications than the ones allowed by physical memory only.

Pagination Problem

             ____________________
   Page   X |     Process Y      |
            |____________________|
            |                    |
            |       WASTE        |
            |       SPACE        |
            |____________________|  
   
              Pagination Problem
 

In the diagram above, we can see what is wrong with the pagination policy: when a Process Y loads into Page X, ALL memory space of the Page is allocated, so the remaining space at the end of Page is wasted.

Segmentation and Pagination

How can we solve segmentation and pagination problems? Using either 2 policies.

 
                                  |      ..            |
                                  |____________________|
                            ----->|      Page 1        |
                            |     |____________________|
                            |     |      ..            |
 ____________________       |     |____________________|
|                    |      |---->|      Page 2        |
|      Segment X     |  ----|     |____________________|
|                    |      |     |       ..           |
|____________________|      |     |____________________|
                            |     |       ..           |
                            |     |____________________|
                            |---->|      Page 3        |
                                  |____________________|
                                  |       ..           |
 

Process X, identified by Segment X, is split in 3 pieces and each of one is loaded in a page.

We do not have:

  1. Segmentation problem: we allocate per Pages, so we also free Pages and we manage free space in an optimized way.
  2. Pagination problem: only last page wastes space, but we can decide to use very small pages, for example 4096 bytes length (losing at maximum 4096*N_Tasks bytes) and manage hierarchical paging (using 2 or 3 levels of paging)
 
 

                          |         |           |         |
                          |         |   Offset2 |  Value  |
                          |         |        /|\|         |
                  Offset1 |         |-----    | |         |
                      /|\ |         |    |    | |         |
                       |  |         |    |   \|/|         | 
                       |  |         |    ------>|         |
                      \|/ |         |           |         |
 Base Paging Address ---->|         |           |         |
                          | ....... |           | ....... |
                          |         |           |         |    
 
                     Hierarchical Paging

Next Previous Contents