Friday, November 12, 2010

kernel/proc.c - sys_call() internals

I gave some idea about sys_call() in the preceding diagrams. sys_call() does a number of checks before it allows a message or notification to be passed from one process to another. These are all part of the security mechanisms in the kernel that keep processes from doing things they are not allowed to do and from corrupting each other's memory maps. Assuming that a process has just made a system call, here is the list of checks that happen:

  1. Does the process have the privileges to make this particular system call? (the priv structure for this process in the process table is looked up to determine this)
  2. In the case of a 'send' or 'notify', is the destination a valid process? In the case of a 'receive' is the source a valid process? Note that a caller can specify 'ANY' as its source in which case it is looking for a message or notification from any process, so this check is unnecessary for that case. 'ECHO' is also a special case where this check doesn't make sense.
  3. Does the message pointer point to a valid region in memory i.e. is it really within the memory map limits of the calling process? Again, the memory map limits are available in the memmap struct for this process in the process table.
  4. If the request is to 'send' to some process, is the caller allowed to send to the destination process?
  5. Is the destination process running?
  6. Has system shutdown started?
If all the checks above have passed then call one of the following functions: mini_notify(), mini_receive() or mini_send() depending upon the nature of the request.

Refer to the 'Message Primitives and Process States' diagrams (preceding posts) for details on what happens within the above functions.

Comments welcome.

Monday, November 1, 2010

Process scheduling in Minix



The diagram above shows the state transitions for any process in Minix. A running process can become blocked when it makes a system call (which translates into the INT assembly instruction, also known as a software interrupt). Once blocked, it has to become ready before it can be run again. The scheduling algorithm only picks ready processes for execution.

The kernel maintains 16 queues of runnable processes. These queues are numbered 0 (highest priority queue) to 15 (lowest priority queue). Different processes have different priorities and the priority of a process may change during its execution. The changed priority is reflected in the fact that the process is moved to a different scheduling queue.

The clock and system tasks in Layer 1 of Minix have the highest priority. The device drivers in Layer 2 get lower priority but they are all not equal. Server processes (process manager, file system etc.) get lower priority than device drivers. User processes get lower priority than the server processes. Refer to this post for more on these layers.

Scheduling is round robin in each queue. If a running process uses up its quantum, it is moved to the tail of its queue and given a new quantum (Transition 4 in the diagram above). When a blocked process is awakened it is put at the head of its queue if it had any part of its quantum left when it blocked (Transition 2 in the diagram above). But it does not get a new quantum. It gets only as much quantum as it was left with when it blocked.

Whenever a running process becomes blocked or a runnable or running process is killed by a signal, it is removed from the scheduling queues. What this means is that the scheduling queues contain only ready or runnable processes.

Given this structure, the rule is very simple: Find the highest priority queue that is not empty and pick and continue running the process that is at the head of that queue.

So here is the chain of events at a high level: Process A is running -> Hardware and software interrupts occur -> interrupt handling code runs -> messages or notifications are sent or received -> the message processing results in process state transitions which causes changes to the scheduling queues in the kernel -> so Process B gets selected for execution.

We have discussed in detail how context switching happens and how the scheduling logic works. It is clear that the state transitions in the above diagram really drive the context switching. But what really causes these state transitions? The answer lies in the messaging layer of Minix. That is the final missing piece of the puzzle and I will discuss it with another diagram in an upcoming post.

Comments welcome.

Saturday, October 23, 2010

Minix Context Switching Detail

Download and view this PDF file.

I have shown both hardware and software interrupt workflows here. Since the diagram is large and complex, it may have some errors due to oversight. Comments/corrections welcome.

Wednesday, October 20, 2010

Context Switching

I think I now understand exactly how context switching happens in Minix. I am working on a detailed diagram which I will upload once I am done.

Sunday, October 17, 2010

Minix Interrupt Handling





Context Switching - The illusion of multiple processes executing simultaneously

In a single processor system, at any instant, only one process is executing - be it a system process (part of the OS) or a user process. The illusion of multiple processes running simultaneously is achieved by constantly switching from process to process. Now, when does a context switch happen in minix? What causes the CPU to suspend execution of one process and resume another? There are three ways in which this can happen:

1. Hardware interrupt (one of the devices in the computer sends a signal to the interrupt controller which in turn sends an interrupt signal to the INT pin of the CPU)
2. Software interrupt (i. e. system call made by the executing process)
3. The 'quantum' of time assigned to the executing process has been used up, so this process is pre-emptively suspended and another one is picked for execution. (This is really a special case of hardware interrupt where the interrupting device is the clock.)

In upcoming posts, I will first explain interrupt processing and context switching and then give more details on scheduling of processes.

Friday, October 15, 2010

The Global Descriptor Table

The GDT (Global Descriptor Table), LDTs (Local Descriptor Tables) and the IDT (Interrupt Descriptor Table) , all located in memory provide protected access to system resources. The GDT and IDT are pointed to by special registers within the CPU (so the processor can access them), and the GDT entries point both to the operating system segments and to the LDTs. The IDT contains pointers to code that needs to be executed when interrupts occur.

Each running program has its own LDT but there is a single GDT for the entire system.

The LDT describes segments local to each program including its text (code), data, stack and so on. The GDT describes system segments including the operating system itself. (Segments are independent address spaces. Segmentation is one of the techniques use to manage memory on computers. More on segments when we discuss memory management but for now just understand that with segmented memory, we need a two part address to reach any memory location - the first part is the address of the segment and the second part is the location (offset) within the segment.)

So the GDT basically points to (i.e. contains the physical addresses of) segments that contain the operating system code and data but also provides a way to get to all the LDTs in the system.

Wednesday, September 29, 2010

Privilege Levels

The architecture of 32-bit Intel processors provides four privilege levels of which Minix takes advantage of three. These are defined as C macros in kernel/protect.h

#define INTR_PRIVILEGE 0 /* kernel and interrupt handlers */
#define TASK_PRIVILEGE 1 /* kernel tasks */
#define USER_PRIVILEGE 3 /* servers and user processes */

The kernel contains code that runs during interrupts and during context switches, so it always runs with INTR_PRIVILEGE. Every address in memory and every register in the CPU can be accessed by a process with this privilege level.

The System and Clock tasks run with the TASK_PRIVILEGE level which allows them to access I/O but not to use instructions that modify special registers (like those that point to descriptor tables).

Servers and user processes run at USER_PRIVILEGE level. Such processes are unable to execute certain instructions such as those that access I/O ports, change memory assignments or change privilege levels themselves.

Note that of the four layers of processes in the Minix OS, all the layers except Layer 1 run with USER_PRIVILEGE level (i.e. user mode).

Tuesday, September 28, 2010

Fork

Just for the heck of it, I added some print statements to the code in forkexit.c in the Minix source. After recompiling, linking, creating a boot image and installing in the boot directory when I restarted the machine, all I could see was thousands and thousands of those messages I had introduced. After startup, no matter what I did the screen was littered with those messages. Which is natural since fork happens (one or more times) whenever you execute a command.

Tuesday, July 6, 2010

The Banker's Algorithm for avoiding deadlocks

The Banker's Algorithm for avoiding deadlocks is about as effective as banker's hours are in the software development industry. Read it for academic interest.

Thursday, July 1, 2010

Deadlocks

I am back after quite a while. Time-slicing works well for processors but not for the human brain. It is better to do a few things well by spending enough time on each of them than to do a hundred things without making much progress in any.

Anyway, I am studying deadlocks right now. The formal definition (unlike most formal definitions!) is quite easy to comprehend: A set of processes is deadlocked if each process in the set is waiting for an event that only another process in the set can cause.

But all of the conditions below should hold for a deadlock to occur:

  1. Mutual exclusion: Each resource is either currently assigned to exactly one process or is available.
  2. Hold and wait: Processes currently holding resources that were granted earlier can request new resources
  3. No preemption: Resources already granted cannot be forcibly taken away from a process. (The process has to explicitly release them.)
  4. Circular wait: There must be a circular chain of two or more processes, each of which is waiting for a resource held by the next member of the chain.

The 'directed graph' technique is used to model these conditions. All approaches that attempt to prevent deadlocks try and eliminate one or more of the above conditions.

Friday, January 8, 2010

The microkernel design of Minix

In a monolithic OS, the entire OS is in a single process. But Minix has a microkernel design in which the operating system is composed of a set of processes (the kernel being just one of them) that work together by passing messages to each other. In this case it becomes necessary to distinguish between 'system calls' and 'kernel calls'. Unlike in a monolithic OS, user processes in Minix do not request services directly from the kernel. System calls from user processes result in 'messages' to the server processes (File System, Process Manager etc.). In turn, the server processes communicate with each other, the device drivers and with the kernel again using 'messages'. These messages to the kernel are really the 'kernel calls'. User processes do not have the privileges to make kernel calls.

In summary, the conventional 'system call' in Minix results in one or more 'kernel calls' which will be initiated by either the server processes or device drivers. The ipc primitives: send, receive and notify are the 'infrastructure' on top of which kernel and system calls are built. The technical term for these ipc primitives is 'traps'.

Tanenbaum gives a nice analogy to drive home the meaning of traps in Minix. The ipc primitive is like a carrier wave in a radio communications system. A send message contains some data which conveys useful information to the receiving process. It is like a modulated radio wave. The notify message conveys no information other than its origin. This is like an unmodulated radio wave (like a radio beacon that guides airplanes to an airport).