Embedded Systems.. EASY questions!

profileWolverineX
linux_rt_issues.pptx

Week 11

Linux Internals

Processes, scheduling

Lecture organization

Kernel Structure

Process structure

Process creation

Signals

Process management, scheduling

Linux for embedded systems

References

Linux Internals (to the power of -1)

Simone Demblon, Sebastian Spitzner

http:// www.tutorialized.com/view/tutorial/Linux-kernel-internals-from-Process-Birth-to-Death/40955

Linux Knowledge Base and Tutorial

http :// linux-tutorial.info/modules.php?name=MContent&pageid=224

Inside the Linux scheduler IBM Developer Works

http://www.ibm.com/developerworks/linux/library/l-scheduler /

The Linux Kernel

A Unix kernel fulfills 4 main management tasks:

• Memory management

• Process management

• File system management

• IO management

For our study of real time implications, we will examine Process management

A Rather brief look at the kernel

Part 1

Process structure

Process data structure

A process is represented by a rather large structure called task_struct.

It contains all of the necessary data to represent the process, along with data for accounting and to maintain relationships with other processes (parents and children).

The actual structure of the task_struct is many pages long

A short sample of task_struct is shown in the next slide

Task_struct

Pointers to open files

Memory map

Signals: received, masked

Register contents

Everything defining the state of the computation

Task_struct detail

The sample contains the state of execution, a stack, a set of flags, the parent process, the thread of execution (of which there can be many), and open files.

The state variable: the state of the task.

Typical states:

the process is running

In a run queue about to be running (TASK_RUNNING),

sleeping (TASK_INTERRUPTIBLE),

sleeping but unable to be woken up (TASK_UNINTERRUPTIBLE),

stopped (TASK_STOPPED), or a few others.

Flags: the process is being created (PF_STARTING) or exiting (PF_EXITING), or currently allocating memory (PF_MEMALLOC).

The comm (command) field: the name of the executable

Priority: (called static_prio). The actual priority is determined dynamically based on loading and other factors.

More Task_struct

The tasks field is a linked-list

mm and active_mm fields The process's address space. mm represents the process's memory descriptors, while the active_mm is the previous process's memory descriptors

The thread_struct identifies the stored state of the process: The CPU state (hardware registers, program counter, etc.).

Part 2

Process creation

System call functions

user-space tasks and kernel tasks, rely on a function called do_fork to create the new process.

In the case of creating a kernel thread, the kernel calls a function called kernel_thread

In user-space, a program calls fork, which results in a system call to the kernel function called sys_fork

The function relationships are shown graphically in the next slide.

Function hierarchy for process creation

do_fork

The do_fork function begins with a call to alloc_pidmap, which allocates a new PID.

The do_fork function then calls copy_process, passing the flags, stack, registers, parent process, and newly allocated PID.

The copy_process function is where the new process is created as a copy of the parent. This function performs all actions except for starting the process

Copying the process

Next, dup_task_struct allocates a new task_struct and copies the current process's descriptors into it.

After a new thread stack is set up, control returns to copy_process. A sequence of copy functions is then invoked to copy, open file descriptors, signal information, process memory and finally the thread.

The new task is then assigned to a processor. The priority of the new process inherits the priority of the parent, and control returns to do_fork. At this point, your new process exists but is not yet running.

The do_fork function fixes this with a call to wake_up_new_task. This function places the new process in a run queue, then wakes it up for execution.

Finally, upon returning to do_fork, the PID value is returned to the caller and the process is complete.

Part 3

Process management

Process Management

A program is an executable file, whereas a process is a running program.

A process is an instance of a program in memory, executing instructions on the processor.

The only way to run a program on a Unix/Linux system is to request the kernel to execute it via an exec() system call.

Remember that the only things that can make system calls are processes (binary programs that are executing.)

Scheduler

The scheduler manages processes and the fair distribution of CPU time between them.

The kernel classifies processes as being in one of two possible queues at any given time: the sleep queue and the run queue.

Run Queue

Processes in the run queue compete for access to the CPU.

The processes in the run queue compete for the processor(s). It is the schedulers' job to allocate a time slice to each process, and to let each process run on the processor for a certain amount of time in turn.

Each time slice is so short (fractions of a second) that each process in the run queue gets to run often every second. It appears as though all of these processes are "running at the same time". This is called round robin scheduling.

On a uniprocessor system, only one process can execute instructions at any one time.

Only on a multiprocessor system can true multiprocessing occur, with more than one process (as many as there are CPUs) executing instructions simultaneously.

Classes of scheduling

There are different classes of scheduling besides round-robin.

An example would be real-time scheduling.

Different Unix systems have different scheduling classes and features, and Linux is no exception.

Sleep queue

Processes waiting for a resource wait on the sleep queue. These processes do not take up a slot on the run queue

Once the resource becomes available, it is reserved by the process, which is then moved back onto the run queue to wait for a turn on the processor.

Every process waiting for a resource is placed on the sleep queue, including new processes, whose resources still have to be allocated, even if they are readily available.

Queue management

Time-slicing

Each 10 milli-seconds (This may change with the HZ value) an Interrupt comes on IRQ0, which helps us in a multitasking environment.

The interrupt signal to the CPU comes from PIC (Programmable Interrupt Controller).

With each Time-slice the current process execution is interrupted (without task switching), and the processor does housekeeping then the current process is re-activated unless a higher priority process is waiting.

When does switching occur?

Some examples are:

• When a Time Slice ends the scheduler gives access to another process

• If needing a resource, the process will have to go back into the sleep queue to wait for or to be given access to that resource, and only then would it be ready to be scheduled access to the processor again.

• If we have a process waiting for information from another process in the form of piped information. That process would have to run before this process can continue, so the other process would be given priority for the processor.

Process destruction

Process destruction can be driven by several events:

normal process termination,

through a signal

through a call to the exit function.

However process exit is driven, the process ends through a call to the kernel function do_exit

This process is shown graphically in the next slide

Function hierarchy for process destruction

do_exit

do_exit removes all references to the current process from the operating system

The cycle of detaching the process from the various resources that it attained during its life is performed through a series of calls, including exit_mm (to remove memory pages) to exit_keys (which disposes of per-thread session and process security keys).

The do_exit function performs various accountings for the disposal of the process, then a series of notifications

Finally, the process state is changed to PF_DEAD, and the schedule function is called to select a new process to execute.

Time slice

When a process's time slice has run out or for some other reason another process gets to run, it is suspended (placed on the sleep or run queue).

It’s state is stored in task_struct, so that when it gets a turn again. This enables the process to return to the exact place where it left off.

System Processes

In addition to user processes, there are system processes running.

Some deal with managing memory and scheduling turns on the CPU. Others deal with delivering mail, printing.

In principle, both of these kinds of processes are identical. However, system processes can run at much higher priorities and therefore run more often than user processes.

Typically a system process of this kind is referred to as a daemon process because they run without user intervention.

Filesystems

Under Linux, files and directories are grouped into units called filesystems. Filesystems exist within a section of the hard disk called a partition.

Each hard disk can be broken down into multiple partitions and a filesystem is created within the partition. (Some dialects of UNIX allow multiple filesystems within a partition.)

The Life Cycle of Processes http:// linux-tutorial.info/modules.php?name=MContent&pageid=84

From the time a process is created with a fork() until it has completed its job and disappears from the process table, it goes through many different states. The state a process is in changes many times during its "life." These changes can occur, for example, when the process makes a system call, it is someone else's turn to run, an interrupt occurs, or the process asks for a resource that is currently not available.

A commonly used model shows processes operating in one of six separate states:

executing in user mode

executing in kernel mode

ready to run

sleeping

newly created, not ready to run, and not sleeping

issued exit system call (zombie)

Device Files

In Linux, almost everything is either treated as a file or is only accessed through files.

For example, to read the contents of a data file, the operating system must access the hard disk. Linux treats the hard disk as if it were a file. It opens it like a file, reads it like a file, and closes it like a file. The same applies to other hardware such as tape drives and printers. Even memory is treated as a file. The files used to access the physical hardware are device files.

When the Linux wants to access any hardware device, it first opens a file that "points" toward that device (the device node). Based on information it finds in the inode, the operating system determines what kind of device it is and can therefore access it in the proper manner. This includes opening, reading, and closing, just like any other file.

Process states

State Diagram

State transitions

A newly created process enters the system in state 5.

If an exec() is made, then this process will end up in kernel mode (2).

When a process is running, an interrupt may be generated (more often than not, this is the system clock) and the currently running process is pre-empted (3).

This is the same state as state 3 because it is still ready to run and in main memory.

More Transitions

When the process makes a system call while in user mode (1), it moves into state 2 where it begins to run in kernel mode.

Assume at this point that the system call made was to read a file on the hard disk. Because the read is not carried out immediately, the process goes to sleep, waiting on the event that the system has read the disk and the data is ready. It is now in state 4.

When the data is ready, the process is awakened. This does not mean it runs immediately, but rather it is once again ready to run in main memory (3).

If a process that was asleep is awakened (perhaps when the data is ready), it moves from state 4 (sleeping) to state 3 (ready to run). This can be in either user mode (1) or kernel mode (2).

End

A process can end its life by either explicitly calling the exit() system call or having it called for them.

The exit() system call releases all the data structures that the process was using.

One exception is the slot in the process table, which is the responsibility of the init process. The slot in the process table is used for the exit code of the exiting process. This can be used by the parent process to determine whether the process did what it was supposed to do or whether it ran into problems.

The process shows that it has terminated by putting itself into state 8, and it becomes a "zombie." Once here, it can never run again because nothing exists other than the entry in the process table.

This is why you cannot "kill" a zombie process. There is nothing there to kill. To kill a process, you need to send it a signal (more on signals later). Because there is nothing there to receive or process that signal, trying to kill it makes little sense. The only thing to do is to let the system clean it up.

Access to a resource

Processes waiting for a common resource all wait on the same wait channel. All are woken up when the resource is ready. Only one process gets the resource. The others put themselves back to sleep on the same wait channel.

When a process puts itself to sleep, it voluntarily gives up the CPU. It may be that this process had just started its turn when it couldn’t access the resource it needed.

Signals

Signals are a way of sending simple messages to processes.

Most of these messages are already defined and can be found in <linux/signal.h>. Most are generated in the kernel

Signals can only be processed when the process is in user mode. If a signal has been sent to a process that is in kernel mode, it is dealt with immediately on returning to user mode.

Signals -2

Signals are used to signal asynchronous events to one or more processes. A signal could be generated by a keyboard interrupt or an error condition such as the process attempting to access a non-existent location in its virtual memory.

Signals are also used by the shells to signal job control commands to their child processes.

There are a set of defined signals that the kernel can generate or that can be generated by other processes in the system, provided that they have the correct privileges.

Signals - 3

Linux implements signals using information stored in the task_struct for the process. The number of supported signals is limited to the word size of the processor.

The currently pending signals are kept in the signal field with a mask of blocked signals held in blocked. With the exception of SIGSTOP and SIGKILL, all signals can be blocked.

If a blocked signal is generated, it remains pending until it is unblocked.

Linux also holds information about how each process handles every possible signal and this is held in an array of sigaction data structures pointed at by the task_struct for each process. Amongst other things it contains either the address of a routine that will handle the signal or a flag which tells Linux that the process either wishes to ignore this signal or let the kernel handle the signal for it.

The process modifies the default signal handling by making system calls and these calls alter the sigaction for the appropriate signal as well as the blocked mask.

Problems with earlier Linux schedulers

Before the 2.6 kernel, the scheduler had a significant limitation when many tasks were active. This was due to the scheduler being implemented using an algorithm with O(n) complexity. In other words, the more tasks (n) are active, the longer it takes to schedule a task. At very high loads, the processor can be consumed with scheduling and devote little time to the tasks themselves.

The pre-2.6 scheduler also used a single runqueue for all processors in a symmetric multiprocessing system (SMP). This meant a task could be scheduled on any processor -- which can be good for load balancing but bad for memory caches

Finally, preemption wasn't possible in the earlier scheduler; this meant that a lower priority task could execute while a higher priority task waited for it to complete.

Introducing the Linux 2.6 scheduler

The 2.6 scheduler resolves the primary three issues found in the earlier scheduler (O(n) and SMP scalability issues), as well as other problems. Now we'll explore the basic design of the 2.6 scheduler.

Major scheduling structures

Each CPU has a runqueue made up of 140 priority lists that are serviced in FIFO order.

Tasks that are scheduled to execute are added to the end of their respective runqueue's priority list.

Each task has a time slice that determines how much time it's permitted to execute.

The first 100 priority lists of the runqueue are reserved for real-time tasks, and the last 40 are used for user tasks (see next slide).

Priority lists expired active

Runqueues

In addition to the CPU's runqueue, which is called the active runqueue, there's also an expired runqueue.

When a task on the active runqueue uses all of its time slice, it's moved to the expired runqueue. During the move, its time slice is recalculated.

If no tasks exist on the active runqueue for a given priority, the pointers for the active and expired runqueues are swapped, thus making the expired priority list the active one.

The job of the scheduler is simple: choose the task on the highest priority list to execute.

http:// www.linuxjournal.com/article/7477 By Aseem R. Deshpande

With the release of kernel 2.6, Linux now poses serious competition to major RTOS vendors in the embedded market space.

Linux 2.6 introduces many new features that make it an excellent operating system for embedded computing.

Among these new features are enhanced real-time performance, easier porting to new computers, support for large memory models, support for microcontrollers and an improved I/O system.

Characteristics of Embedded Systems

Embedded systems often need to meet timing constraints reliably.

,embedded systems have access to far fewer resources than does a normal PC. They have to squeeze maximum value out of whatever is available.

. The OS should perform reliably and efficiently, if possible, under the cases of extreme load.

If a crash occurs in one part of the module, it should not effect other parts of the system. Furthermore, recovery from crashes should be graceful.

How Linux 2.6 Satisfies the Requirements

Although Linux 2.6 is not yet a true real-time operating system, it does contain improvements that make it a worthier platform than previous kernels when responsiveness is desirable.

Three significant improvements are preemption points in the kernel, an efficient scheduler and improved synchronization.

Kernel Preemption

As with most general-purpose operating systems, Linux always has forbidden the process scheduler from running when a process is executing in a system call. Therefore, once a task is in a system call, that task controls the processor until the system call returns, no matter how long that might take.

As of kernel 2.6, the kernel is preemptible. A kernel task now can be preempted, so that some important user application can continue to run.

In Linux 2.6, kernel code has been laced with preemption points, instructions that allow the scheduler to run and possibly block the current process so as to schedule a higher priority process.

Kernel Preemption

Linux 2.6 avoids unreasonable delays in system calls by periodically testing a preemption point. During these tests, the calling process may block and let another process run.

Thus, under Linux 2.6, the kernel now can be interrupted mid-task, so other applications can continue to run even when something low-level and complicated is going on in the background.

Embedded software often has to meet deadlines that renders it incompatible with virtual memory demand paging, in which slow handling of page faults would ruin responsiveness.

To eliminate this problem, the 2.6 kernel can be built with no virtual memory system. Of course, it then becomes the software designer's responsibility to ensure enough real memory always is available to get the job done.

An Efficient Scheduler

The process scheduler has been rewritten in the 2.6 kernel to eliminate the slow algorithms of previous versions.

Formerly, in order to decide which task should run next, the scheduler had to look at each ready task and make a computation to determine that task's relative importance.

After all computations were made, the task with the highest score would be chosen.

Because the time required for this algorithm varied with the number of tasks, complex multitasking applications suffered from slow scheduling.

2.6 improvements

The scheduler in Linux 2.6 no longer scans all tasks every time. Instead, when a task becomes ready to run, it is sorted into position on a queue, called the current queue.

Then, when the scheduler runs, it chooses the task at the most favorable position in the queue. As a result, scheduling is done in a constant amount of time.

When the task is running, it is given a time slice, or a period of time in which it may use the processor, before it has to give way to another thread.

When its time slice has expired, the task is moved to another queue, called the expired queue.

The task is sorted into this expired queue according to its priority. This new procedure is substantially faster than the old one, and it works equally as well whether there are many tasks or only a few in queue. This new scheduler is called the O(1) scheduler.

New Synchronization Primitives

Applications involving the use of shared resources, such as shared memory or shared devices, have to be developed carefully to avoid race conditions.

The solution implemented in Linux, called Mutex, ensured that only one task is using the resource at a time. Mutex involved a system call to the kernel to decide whether to block the thread or allow it to continue executing.

But when the decision is to continue, the time-consuming system call was unnecessary. The new implementation in Linux 2.6 supports Fast User-Space Mutexes (Futex). These functions can check from user space whether blocking is necessary and perform the system call to block the thread only when it is required.

When blocking is not required, avoiding the unneeded system call saves time. It also supports setting priorities to allow applications or threads of higher priority to have first access to the contested resource.

Improved Threading Model and Support for NPTL

threading in 2.6 is based on a 1:1 threading model, one kernel thread for one user thread. It also includes in-kernel support for the new Native Posix Threading Library (NPTL).

NPTL brings an eight-fold improvement over its predecessor. Tests conducted by its authors have shown that Linux, with this new threading, can start and stop 100,000 threads simultaneously in about two seconds. This task took 15 minutes on the old threading model.

Along with POSIX threads, 2.6 provides POSIX signals and POSIX high-resolution timers as part of the mainstream kernel. POSIX signals are an improvement over UNIX-style signals, which were the default in previous Linux releases. Unlike UNIX signals, POSIX signals cannot be lost and can carry information as an argument. Also, POSIX signals can be sent from one POSIX thread to another, rather than only from process to process, like UNIX signals.

Embedded systems often need to poll hardware or do other tasks on a fixed schedule. POSIX timers make it easy to arrange any task to be scheduled periodically. The clock that the timer uses can be set to tick at a rate as fine as one kilohertz, so software engineers can control the scheduling of tasks with precision.

Subarchitecture Support

Hardware designs in the embedded world often are customized for special applications

For example, a purpose-built board may use different IRQ management than what a similar reference design uses.

In order to run on the new board, Linux has to be ported or altered to support the new hardware. This porting is easier if the operating system is made of components that are well separated, making it necessary to change only the code that has to change.

The components of Linux 2.6 that are likely to be altered for a custom design have been refactored with a concept called Subarchitecture. Components are separated clearly and can be modified or replaced individually with minimal impact on other components of the board support package.

By formalizing Linux's support for the slightly different hardware types, the kernel can be ported more easily to other systems, such as dedicated storage hardware and other components that use industry-dominant processor types.

A small portion of task_struct

struct task_struct {

volatile long state;

void *stack;

unsigned int flags;

int prio, static_prio;

struct list_head tasks;

struct mm_struct *mm, *active_mm;

pid_t pid;

pid_t tgid;

struct task_struct *real_parent;

char comm[TASK_COMM_LEN];

struct thread_struct t hread;

struct files_struct *files;

...

};

The states listed here describe what is happening conceptually and do not indicate what "official"

state a process is in. The official states are listed below:

TASK_RUNNING task (process) currently running

TASK_INTERRUPTABLE process is sleeping but can b e woken up (interrupted)

TASK_UNINTERRUPTABLE process is sleeping but can not be woken up (interrupted)

TASK_ZOMBIE process terminated but its status was not collected (it was not waited for)

TASK_STOPPED process stopped by a debugger or job control

TASK_SWAPPING (removed in 2.3.x kernel)

Table - Process States in sched.h