Embedded RTOSes

Not all RTOSes are meant for embedded applications however embedded programming is where RTOSes are mostly used. RTOSes designed for embedded application are a specialized kind of RTOSes. Most embedded RTOSes, including ChibiOS/RT share a set of common features:

  • Multi Threaded Single Application model.
  • Fixed Priorities Scheduling.
  • ISR Handling as part of the API.

We will focus on the embedded kind from now on.

Priorities and Scheduling

In the fixed priority scheduling model tasks are allocated to fixed priority levels. Task priorities are usually a continuous numeric interval ranging from minimum to maximum. The exact numeric range is unimportant. Usually ISRs are considered exactly as tasks placed on a numeric interval above the tasks priorities range.

The lowest priority level is usually reserved for a task commonly called “idle task”. It is the task executed when all other tasks or ISRs are not ready for execution. The scheduling rule is very simple: in any instant, the task being executed is the ready task with the highest priority level. This is true for both tasks and ISRs in the proposed model.

Static and Modifiable Priorities

In simple RTOSes priorities are usually assigned statically and cannot be changed at runtime, in this case we talk about Static Priorities.

More complex RTOSes often allow for priority to change at runtime in order to implement particular scheduling strategies, in this case we talk about Modifiable Priorities.

Note that, unlike general purpose OSes, priorities never change automatically as function of the system load of other parameters (Dynamic Priorities), priority changes are either procedurally commanded or a result of deterministic scheduling strategies, see the Priority Inheritance section below for an example of this.

Interrupts and Tasks

The two kind of processes we can find in embedded RTOSes (remember the previous chapter?) are called ISRs and tasks. Fundamentally ISRs and tasks can be considered similar with the difference that an ISR cannot wait internally, it can only be executed and then terminate. Usually there are differences in the API, some functions can be called from task level only and/or vice versa.

Interrupts

In an embedded application interrupts are the primary source of events. Interrupts trigger directly ISRs which in turn can wakeup tasks.

Many RTOSes split interrupts in two categories (names may change):

Interrupt Classes
Fast Interrupts This class of interrupts are processed very efficiently but cannot call directly any OS function.
OS Interrupts Can use OS services but there can be some OS-related overhead in the ISR code.

Depending on the HW architecture and OS capabilities interrupts can be processed differently.

Interrupt Handling
No Priorities Simplest handling, all interrupt sources have the same priority, no preemption.
Multiple Priorities Multiple priorities but ISRs cannot preempt each other.
Nested Interrupts Multiple priorities and ISRs can preempt lower priority ISRs.

Another important difference is the possible existence of a dedicated stack for interrupts processing, a dedicated stack means that it is not necessary allocate space for interrupts in each task's stack area. Dedicated stacks can be implemented in SW or handled directly by the CPU, of course the latter is the most efficient solution in terms of cycles and latency.

If interrupts processing is an important requirement for your system then you should look for an RTOS/core combination able to efficiently handle nested interrupts on a dedicated interrupts stack.

Tasks and Threads

Tasks are the fundamental entities in an RTOS environment. A task can be seen as a virtual CPU inside the system with its own registers bank and stack area. Tasks are scheduled by the RTOS based on their priority as described before. Some RTOSes, like ChibiOS for example, use the term threads for their tasks. The distinction is subtle, the term thread is used more often in more complex systems, for example Win32 an Posix have threading functionalities. Personally I prefer to use the term threads when the system is able to support the following minimal features:

  • Create. Threads can be started at runtime not just declared or configured statically. Note, this does not imply dynamic allocation, a thread can be statically allocated and started/terminated at runtime.
  • Exit. Threads are able to terminate returning an exit value, much like the value returned by a C function.
  • Join. Threads are able to spawn other threads and then wait for the result of their execution.

Threads can be seen as functions than can be executed in parallel to the spawning function, the spawning function is then able to re-synchronize with the spawned thread and retrieve the result value. If a system does not support all the above features then it is more appropriate the use of the term tasks.

Task States

It is pretty standard to represent the possible states that a task can take as a state machine. For example:

The meaning of the above states is:

  • INIT. The task has not yet been created.
  • READY. The task is eligible for execution but not running.
  • RUNNING. The task is currently running.
  • WAITING. The task is waiting for events in order to resume execution.

Different RTOS implementations can have different states and state machines but the above is a pretty common one.

Task Types

Some RTOSes make a clear distinction between types of tasks based on how tasks are triggered (the event waking them up).

Periodic Tasks

A periodic task is a task triggered periodically with a fixed time interval. The task is mostly waiting and becomes ready for execution when its internal timer triggers it. The task then performs a brief action and returns to the waiting state.

Non-periodic Tasks

This kind of tasks are triggered by an external event, for example an ISR, and are thus not periodic. After performing its programmed action the task returns to the waiting state.

Continuous Tasks

Tasks should never take the CPU indefinitely, a task running an empty loop would not allow the execution of tasks at lower priority level. The rule is that tasks should wait for events, do their programmed action and then go back to waiting for events. If there are tasks that never release the CPU resource then those must be placed at lowest priority level in the system.

Most RTOSes do not make a strong distinction between task types, all tasks are equal and it is their internal code defining the behavior. We will see examples of tasks in various roles in the chapter describing the ChibiOS/RT kernel.

Synchronization

Usually tasks can use areas of memory as shared data, usually also called Shared Resources, the concurrent access to resources can lead to corruption of said data because the operations performed by tasks may be not atomic. For example, lets assume there are two tasks and a shared variable initialized to 10. One task increases the shared variable by executing the following pseudo-instructions:

  1. WAIT event
  2. LOAD r0,var
  3. INC r0
  4. STORE r0,var
  5. JUMP 1

Another tasks instead performs a decrement of the same variable:

  1. WAIT event
  2. LOAD R1,var
  3. DEC r1
  4. STORE R1,var
  5. JUMP 1

If the events are triggered at almost the same time the two sequences could get mixed getting to an incorrect result, fore example:

  1. LOAD r0,var
  2. INC r0
  3. <preemption by the second task>
  4. LOAD R1,var
  5. DEC r1
  6. STORE R1,var
  7. <returns to the first task>
  8. STORE r0,var

The normal result should still be 10 but, because the sequence of instructions got mixed by preemption, the end result is 11. Note that you would have the same problem if instead of tasks the involved entities were ISRs in systems where ISRs can preempt each other. Handling shared data requires a synchronized access to the resource, the usual solution is called Mutual Exclusion and there are multiple ways to achieve this, we will examine several possible mechanisms when the ChibiOS kernel primitives will be explained.

Atomic Operations

The above problem exists because we assumed that increment and decrement operations on a memory variable to be non-atomic operations, this is generally true on RISC machines but could be false on some CISC architectures, any operation can be atomic or not atomic depending on the CPU architecture, at high level there is no way to known if an operation is atomic especially if the code is generated by a compiler. All operations should be considered not atomic but in general it is safe to assume that load and store operations are atomic if the data size is less or equal to the machine preferred size. For example, 8 and 16 bits load and store operations can be considered atomic on 16 bits CPUs, 32 and 64 bits ones should not be considered atomic because the compiler could generate multiple load and store operations. Anyway, the safest approach is to consider everything not atomic. Failure to understand atomicity and implement proper mutual exclusion is the recipe for disaster, errors are usually random in nature and very hard to detect and debug. Ideally the problem must be resolved in the analysis and design phases by carefully defining the shared resources and defining correct protocols for concurrent access.

Critical Sections

Critical Sections (or critical zones) are probably the most simple and common way to make a non-atomic sequence of code behave atomically. In simple RTOSes on simple architectures critical sections are often implemented by disabling/enabling interrupts, this makes sure that the code in the critical section cannot be preempted by an IRQ or other tasks. Of course things are not always so simple, critical sections should not be simply considered an abstraction around the act of disabling/enabling interrupts. There are some things to be considered:

  • Globally disabling interrupts is generally bad, it increases the jitter in the system and potentially can make the worst case in response time even worse. On some architecture it could be preferable to only mask interrupts up to a certain priority level instead of masking them globally.
  • Disabling interrupts does not work if tasks are being run by multiple physical cores or hardware threads. In those cases the critical section implementation also need to use some kind of HW semaphore in order to coordinate the multiple cores/threads.
  • The action of entering a critical section could trigger checks at OS level or be used for statistics purposes.
  • Compiler optimizations can move generated instructions around functions or inlined code. This means that some code could be potentially moved out from or moved into a critical section. The result would be that code looking into a critical section at high level would not be really atomic. A well designed RTOS implements barriers for the compiler inside the critical section API. A simplistic approach can lead to disastrous results.

Most RTOSes have a specific API for handling of critical sections, the right approach is to use the RTOS-provided API and not make assumptions about how critical sections are or should be implemented. A good RTOS should take care about the best implementation on any given architecture.

Proper Use of Critical Sections

Critical sections are the most lightweight and efficient way to implement mutual exclusion, lets see the advantages and the restrictions.

Advantages

  • Very lightweight, minimum runtime overhead.
  • Can implement mutual exclusion between tasks and ISRs.

Potential restrictions

  • Can increase the jitter figure in the system.
  • Can make the worst-case response time even worse if the protected zones are too long.
  • Some RTOSes do not allow the use of any API from within critical sections.
  • Some RTOSes do not have a clear abstraction of critical sections.
  • Mutual exclusion on very short code paths with finite execution time and, preferably, with no inner loops.

Mutual Exclusion

One of the most common problems in systems with multiple tasks is to synchronize the access to shared resources. Critical sections are an easy solution to this problem that is known with the more general definition of Mutual Exclusion. Mutual exclusion can be implemented using a variety of mechanisms commonly present in RTOSes for example and not limited to: Counting Semaphores, Binary Semaphores, Mutexes, Priority Escalation, Messages, Critical Sections. Each mechanism has its advantages and disadvantages, of the mentioned mechanisms Mutexes are the one specifically meant to solve the mutual exclusion problem in the most general way. We will see ways of implementing mutual exclusions when the various mechanisms will be discussed.

Priority Inversion

A well know side effect when using Mutual Exclusion is the possibility to trigger the Priority Inversion problem. Priority inversion can cause a task with a certain priority to delay tasks with higher priority when a resource is protected with mutual exclusion and accessed by tasks at various priority levels. This is an example:

Time |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
   0 S+++++AL++++------------------------------------++++++++++++++++++AU----------++++++G
   1 ..................S+++++------++++++++++++++++++G
   2 ........................S+++++AL..................................++++++AU++++G
   3 ............S+++++G
   A       ******************************************************************

Legend:
  0..3 - Priority levels
  ***  - Mutex taken
  +++  - Running
  ---  - Ready
  ...  - Waiting
  xL   - Lock operation on mutex 'x'
  xU   - Unlock operation on mutex 'x'
  S,G  - Start, Goal

In the above example this is what happens:

  • The task at low priority (0) starts executing and takes the resource A using a mutex.
  • The very high priority task (3) starts executing and terminates, it is unaffected by (0).
  • The medium priority task (1) starts executing.
  • The high priority task (2) starts executing and tries to take the shared resource A which is already taken by (0), the task (2) starts waiting for the resource A to be released.
  • Task (1) resumes execution delaying both tasks (0) and (2). Task (1) holds task (0) because its higher priority, task (0) holds task (2) because it keeps the resource locked and is unable to release it.

The end result is that task (1) holds indefinitely the higher priority and unrelated task (2) because a conflict on access of resource A. The order of priorities is not respected, the goal of the four tasks is not reached in priority order.

Possible Solutions

There are several approaches that can be used in order to solve the Priority Inversion problem:

  • Avoid Mutual Exclusion zones or sharing of resources in general, not always possible.
  • Reorder priorities in order to have all tasks accessing a shared resource at adjacent priority levels, not always possible.
  • Implement the Mutual Exclusion as a Critical Section where preemption is forbidden, not always possible.
  • Use mitigation algorithms like Priority Ceiling or Priority Inheritance.

Priority Ceiling

Imagine attributing to a Mutex a priority level, this priority would be higher than the priority of all tasks trying to acquire it. Tasks locking the Mutex would temporarily acquire its priority. This is how the timing diagram changes under this scenario:

Time |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
   0 S+++++AL4+++++++++++++++++++++AU0---------------------------------------------++++++G
   1 ..................S-----------------------------------++++++++++++++++++++++++G
   2 ........................S-----------++++++AL4+++AU3+++G
   3 ............S-----------------++++++G
   A       ^***********************^...........^*****^

Legend:
  0..3 - Priority levels
  4    - Priority associated to Mutex A
  ***  - Mutex taken
  +++  - Running
  ---  - Ready
  ...  - Waiting or Terminated
  xLn  - Lock operation on mutex 'x' with priority boost to level 'n'
  xUn  - Unlock operation on mutex 'x' with priority returning to level 'n'
  S,G  - Start, Goal
  ^    - Priority transition (boost or return).

You can see that priority inversion does no more occur, all tasks reach their goal in their priority order. The Priority Ceiling algorithm resolves the priority inversion but in a non-optimal way because it affects all tasks at priority below the ceiling, see how the task at priority 3 is delayed. Priority Ceiling is best used in simple systems, it is the solution adopted in well known RTOSes like those implementing the OSEK specification.

Priority Inheritance

The Priority Inheritance is an algorithm where tasks holding a mutex are temporarily given the same priority of the highest priority task among all tasks requesting the same mutex. This is how the above timing diagram changes under this scenario:

Time |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
   0 S+++++AL++++------------------2+++++++++++++++++AU0---------------------------++++++G
   1 ..................S+++++------------------------------------++++++++++++++++++G
   2 ........................S+++++AL----------------++++++AU++++G
   3 ............S+++++G
   A       ************************^*****************^*****

Legend:
  0..3 - Priority levels
  ***  - Mutex locked
  +++  - Running
  ---  - Ready
  ...  - Waiting or Terminated
  xLn  - Lock operation on mutex 'x' with priority boost to level 'n'
  xUn  - Unlock operation on mutex 'x' with priority returning to level 'n'
  S,G  - Start, Goal
  ^    - Priority transition (boost or return).

Again the priority inversion does no more occur, all tasks reach their goal in their priority order. The Priority Inheritance algorithm resolves the priority in a more general way, it only affects the tasks trying to access the same mutex. This is the solution implemented in more complex RTOSes, including ChibiOS/RT.