RT Debug

One of the most important features that an RTOS must provide is support for development. The development phase is where coding errors are expected to be found, note that we are not talking about testing here, we are talking about errors that must be handled during the design and implementation phase.

ChibiOS/RT provides a comprehensive set of debug options meant to assist the developer during the system implementation and debug phase. All the debug options are reachable into the kernel configuration file chconf.h, each project has its own copy of this file.

Compile Time Checks

Configuration errors are, by design, detected at compile time, the system headers include logic checks that result in compilation errors in case of a wrong configuration. If you see an error during compilation please read carefully the message, it could be a failed configuration check rather than a normal syntax error.

Runtime Checks

Most debug options operate at runtime in order to catch design or programming errors. If a problem is detected then the system is stopped into the function chSysHalt() and the global variable ch.dbg_panic_msg points to an error message string.

If the system is halted the correct thing to do is:

  1. Halt the application using the debugger.
  2. Verify that indeed the application stopped into chSysHalt().
  3. Retrieve the error message using the debugger memory view or the ChibiOS/RT Eclipse debug plugin. Messages can be either descriptive strings like “stack overflow”, “NULL parameter” or encoded error codes like “SV#4” depending on the debug option that triggered it.
  4. Inspect the stack trace in order to understand in which point of the code chSysHalt() has been called. The condition that triggered the halt could give an hint about the nature of the problem.

Kernel Statistics

The debug option CH_DBG_STATISTICS enables support for kernel statistics. Statistics include:

  • Number of served IRQs.
  • Number of context switches.
  • Time measurement of thread-level critical sections: best, worst, last cases are stored.
  • Time measurement of ISR-level critical sections: best, worst, last cases are stored.
  • For each thread the following counters are kept:
    • Longest execution time.
    • Shortest execution time.
    • Last execution time.
    • Cumulative execution time.

Times are measured using the realtime counter and are clock cycle accurate. The ChibiOS/RT Eclipse plugin is able to show the runtime statistics of the application under debug.

Statistics View in ChibiStudio

Statistics are available in specific views in ChibiStudio debug perspective:

From the threads tab:

 Threads View

  • Switches is the number times that the thread has been switched in.
  • Worst Path is the longest time, in CPU cycles, that the thread has executed without going to sleep.
  • Cumulative Time is the CPU time, in CPU cycles, that the thread consumed since its start.

From the statistics tab:

 Statistics View

  • Number of IRQs is the number of interrupts served since the system start, note that in tick-less mode the number can be quite low in a well designed system.
  • Number of Context Switches is the total number of thread context switches since the system start.
  • Threads Critical Zones is the time, in CPU cycles, spent by the system inside critical sections at thread level.
  • ISRs Critical Zones is the time, in CPU cycles, spent by the system inside critical sections at ISR level.

System State Checks

The debug option CH_DBG_SYSTEM_STATE_CHECK enables an unique ChibiOS/RT, the System State Checker. This option is able to detect any call protocol violation, calling OS APIs out of the proper context is one of the greatest sources of hard to detect problems and random crashes.

Purpose

Not all RTOS functions can be called from any context, for example, it makes no sense to invoke from an ISR a function that puts to sleep the current task/thread. One of the most common problems while using an RTOS is making sure that RTOS services are invoked from the proper context and/or system state.

Usually RTOS functions include in the documentation the proper way and place to invoke them, the problem is that the thing is documented and not actually checked by the system. The problem is exacerbated by the fact that this kind of errors often do not result in a clear malfunction but could cause rare and inexplicable crashes or malfunctions in the application.

Error Codes

When the State Checker is enabled the system state is checked each time an RTOS function is called at runtime, in case of violation detection the execution is stopped and a message is pointed by the variable ch.dbg_panic_msg:

  • SV#1. The function chSysDisable() has been called from ISR context or from within a critical zone.
  • SV#2. The function chSysSuspend() has been called from ISR context or from within a critical zone.
  • SV#3. The function chSysEnable() has been called from ISR context or from within a critical zone.
  • SV#4. The function chSysLock() has been called from ISR context or from within a critical zone. This function is meant to start a critical zone from thread context. This can also happen when a normal API is called from within a critical zone.
  • SV#5. The function chSysUnlock() has been called from ISR context or from without a critical zone. This function is meant to end a critical zone from thread context.
  • SV#6. The function chSysLockFromISR() has been called from thread context or from within a critical zone. This function is meant to start a critical zone from ISR context.
  • SV#7. The function chSysUnlockFromISR() has been called from thread context or from without a critical zone. This function is meant to end a critical zone from ISR context.
  • SV#8. Misplaced CH_IRQ_PROLOGUE() macro. Not placed at ISR begin or called from a critical zone.
  • SV#9. Misplaced CH_IRQ_EPILOGUE() macro. Not placed at ISR end or called from a critical zone.
  • SV#10. I-Class function called out of a critical zone.
  • SV#11. S-Class function called out of a critical zone or from an ISR.

All the above errors are result of programming errors outside the RT kernel, if an application is able to run with the state checker enabled in all of its part then there is a very high confidence level that the application is free from RTOS-related integration problems. It is advisable to keep this option enabled through the whole development process.

Functions Parameters Checks

The debug option CH_DBG_ENABLE_CHECKS enables the parameters checks at API level. This option is able to detect application errors causing the application to pass invalid parameters to the RTOS, a typical example are NULL pointers passed where a reference to a valid object is expected. It is advisable to keep this option enabled through the whole development process. Safety concerns may require to keep this kind of checks in place also in the final code as a defensive measure.

System Assertions

The debug option CH_DBG_ENABLE_ASSERTS enables system-wide integrity checks on the RTOS data structures. The system is also checked for unexpected runtime situations. It is advisable to keep this option enabled through the whole development process. Safety concerns may require to keep this kind of checks in place also in the final code as a defensive measure.

Trace Buffer

The option CH_DBG_TRACE_MASK is an “or” of various option flags, each option selects an event to be traced:

  • CH_DBG_TRACE_MASK_NONE. No events traced by default (but tracing can be activated at runtime for any event).
  • CH_DBG_TRACE_MASK_SWITCH. Context switches are traced by default.
  • CH_DBG_TRACE_MASK_ISR. ISR enter and leave events are traced by default.
  • CH_DBG_TRACE_MASK_HALT. The halt event is traced, of course it is the last event recorded.
  • CH_DBG_TRACE_MASK_USER. User events are recorded. Application code can trace events using the chDbgWriteTrace() API.
  • CH_DBG_TRACE_MASK_SLOW. All events are enabled except IRQ-related ones.
  • CH_DBG_TRACE_MASK_ALL. All events are enabled.
  • CH_DBG_TRACE_MASK_DISABLED. The trace subsystem is removed entirely from the OS image.

The trace buffer stores the last N context switch operations. It can be used to determine the sequence of operations that lead to an halt condition. The option CH_DBG_TRACE_BUFFER_SIZE allows to change the size of the trace buffer, the default is 128 entries.

Trace View In ChibiStudio

The trace buffer is a memory structure but the ChibiOS/RT Eclipse Debug plugin is able to show the content in a table. Note. This is how the trace buffer is shown in the debugger perspective:

 Trace Buffer View

Note that both IRQ-related and threads-related events are shown.

Stack Overflow Checks

The debug option CH_DBG_ENABLE_STACK_CHECK enables checks on stack overflow conditions. The implementation of the detection is port-dependent and can be implemented differently in each port or even be not supported at all. The most common implementation is to check the stack position when a context switch is about to be performed, if the calculated new stack position overflows the stack limit then the system is halted. Safety concerns may require to keep this kind of checks in place also in the final code as a defensive measure.

Working Area Fill

The debug option CH_DBG_FILL_THREADS fills the threads working area with a fixed 0x55 pattern before the thread is executed, this allows to calculate the effective stack usage by the various threads. The ChibiOS/RT Eclipse plugin is able to calculate the unused stack size for each thread if this option is enabled. Optimizations of the unused stacks should only be performed:

  • At development end.
  • With all other debug options disabled or in their final settings.
  • Using the final compiler version.
  • Using the final compiler options.

Use case in optimizing stacks because different compiler options or compiler version can change stack sizes dramatically and there is the risk of introducing errors not easily detected in the final code (with checks disabled).

Threads Profiling

The debug option CH_DBG_THREADS_PROFILING enables a system tick counter in each thread, after a long execution time relative values of the counters indicate the relative “weight” of all threads. This option has been superseded by CH_DBG_STATISTICS and is not compatible with the tick-less mode.