Followup projects for SIL2LinuxMP

The following list of project outlines are spin-off projects from the SIL2LinuxMP project. Each of the project outlines describes a technology or topic that needs to be developed or refined to achieve the overall goal of qualifying GNU/Linux for mid integrity levels according to IEC 61508. Based on the current results of SIL2LinuxMP we are in the process of defining further projects with the intent to give a clear picture of the overall efforts that will be necessary to achieve certification of GNU/Linux in a long-term maintainable manner.

False positive management infrastructure

For all generic bug detection tools a serious issue are false positives - it is common in the current mainline tools (e.g. sparse, coccicheck, checkpatch.pl, smatch etc.) results to have almost pure false positives or cases that formally are correct positives but do not merit to get ever fixed by some reason (so called “won ́t fix” cases).

False positives have serious side effects at the process level:

  1. After checking a few positives and finding that all of them are false, developers tend to consider all other findings as false positives and skip their inspection;
  2. As the effort of running some tools (e.g. all Coccinelle scripts) is quite high, some developers may simply abandon using these tools at all. The proposal of this project is to generate a public database of false-positive findings in order to make the current mainline tools easier to use and, thus, more attractive for developers.

More details

LOPA tooling and infrastructure development

Currently only manual methods of proof of concept are available. However, first results look principally usable and may provide significant architectural flexibility as well as some level of modularity (notably for hardening). The LOPA technologies currently studied also provide significant security related capabilities that will be relevant to all practical systems and are of relevance in the context of IEC 61508 Ed 2 (see part 1 7.4.2.3). Security is though explicitly not considered in this proposal for LOPA (see possible extensions). The formal basis for LOPA at the system level can be found in 61508-1 7.6.2.7 as well as 61508-2 Ed 2 7.4.3.1-4, respectively by reference in 61508-3 Ed 2 7.4.2.11 that addressed the coverage of systematic capabilities by elements that them selves provide a lower SC than the target safety function. Note that this differs significant from ISO 26262 Ed 1 ASIL decomposition. Software LOPA it self is not a concept established in the context of IEC 61508 Ed 2 directly but is utilized in the process industry sector standard IEC 61511 Ed 1 Clause 9 (main reference) which is intended as guidance for this LOPA interpretation though with the significant difference that LOPA in 61511 is referring to low-complexity elements while we are intending to apply the concept to high-complexity elements with the intent to cover residual analytical uncertainty as well as incompleteness issues.

More details

Tools qualification for kernel verification tool-set

Utilizing the Linux kernel for safety related systems hinges on the claims of adequate development rigor in general and the ability to detect and mitigate defects. Tools used for the kernel development thus play a key role. While the kernel development life-cycle process has largely been outline by key kernel developers (Documentation/process as of October 2016) this forms the basis. In the development work-flow tools are mentioned at a number of points which addresses many of the formal needs but an assessment of the actual usage and the effectiveness can not rest on process records as in a bespoke process but must be assessed under the assumption of incompleteness and possibly incorrect application of tools and methods. Provided the process is adequate and the use of tools is confirmed to be effective, that is classes of faults claimed to be addressed are actually addressed by developers, the question of tools completeness is still open. This is to be addressed by a adequate qualification of the individual tools - in context of the Linux kernel - as well as uncover possible gaps and inconsistencies. Based on this assessment possible mitigations (e.g. by extension) are to be developed and finally executed to provide a complete assessment of tools and the tools generated artifacts for the certification process of the Linux kernel.

More details

Route 3 H - preliminary evaluation and investigation of compliance route option for pre-existing hardware

IEC 61508 and derived standards assume that ASICs comply with one of two compliance routes:

  • Compliance route options 1 for HW: Route 1 Compliance route options 1 for HW: Route 1 H (FT + SFF) a) Calculated safe failure fraction (SFF) of each element (treated as HFT 0 element). b) Architectural protection is in place with each of the (semi-)independent channels having a comparable SFF. c) The SFF for each element in the SIL2 system is 60%<SFF<90%

This requires a clear understanding of the safe respectively dangerous failures of the elements and the achievable (estimated) diagnostic coverage (DC) as a basis.

  • Compliance route options 1 for HW: Route 1 Compliance route options 2 for HW: Route 2 H (reliability data) a) Field feedback for elements in similar applications and env – AND b) Based on data collected in accordance with IEC 60300-3-2 or ISO 14224 – AND c) Evaluated according to: i. The amount of field feedback – AND ii. The exercise of expert judgment - AND where needed: iii. The undertaking of specific tests;

Note that IEC 61508 assumes a HFT of 1 for type-B systems operating in continuous mode in compliance with SIL2 (see 61508-2 Ed 2 7.4.4.3.1). As all systems we are referring to here are conceptually type-B systems (see 61508-2 Ed 2 7.4.4.1.3). Essentially it is though the vagueness of 61508-2 Ed 2 7.4.6 that on the one hand is problematic but on the other hand allows the freedom to assess the effective measures taken for mass production microprocessors during design, take credit for documented testing and generate adequate quality field data as well as the post processing of the same. The proposed route 3 H explicitly assumes the re-use of a pre-existing processor and not the generation of a specific processor for a project or system. All requirements for control of systematic faults critically hinge on the ability to understand the system elements (see notably 7.4.7.3 note on following good human-factor practice).

More details

DLCDM - prototype rework of DLCDM tool and extend by community interface

The Linux kernel development life-cycle (DLC) is not a static process but is undergoing continuous modification - in general improvements. There is considerable information on developers and commits that can be extracted from commit meta-data and evolution of this meta-data attributes of time. To allow harvesting this information for initial selection, assessment of subsystems or developers as well as process stability process impact the DLC should be systematically and continuously monitored. The DLC evolution is captured in a database with the intent to allow for data mining on this meta-data - hence the name DLCDM. The information that can be extracted depends not only on the extraction mechanism but also on the verification of the same - from the prototype we know that there are defects (e.g. dates: Jan 1 1970, 14 Aug 2030, 25 Apr 2037...) or ambiguities (e.g. names: Peter Zijlstra, Peter Zijlstra (Intel)). While some of this can be filtered out automatically - some does need manual confirmation and some incorrect data elements can not be detected at all (e.g. if dates are within reasonable ranges but incorrect). The DLCDM rework is not just a database but also a community verification interface to allow the data contained to be verified. Based on this data then appropriate statistical modeling can allow to extract anomalies as well as detect process level deviations.

More details

Statistics - Model cleanup and analysis improvements

Uncovering systematic software development process faults by looking at metrics and attributes of the resulting code has been in use since the 1980s. The methods essentially consist of more or less well defined metrics, heuristics for attributes or indication of sound processes and trending for process stability. None of the metrics/attributes/trends are calibrated and it is questionable if they can be calibrated in any meaningful generic sense. However they possess the ability to indicate change. Furthermore trending and analysis of metrics development can pinpoint areas of intensified review and analysis. For the SIL2LinuxMP approach of greatest importance is the ability to detect process risks and/or code base of elevated risk that can be excluded from safety related systems and thus offers a high-level approach to risk elimination rather than focusing on technical measures for risk reduction only. Statistical methods as introduced here do not try to "calculate" residual bugs or directly estimate risk - with other words this is not a proven-in-use (route 2 S ) through the back-door - but rather we seek to minimize the risks by selection and - and this is crucial as it is a characteristics of complex systems - we need to extract an estimate for maintenance efforts so that monitoring and incident response planing is realistic. Complex safety related systems using pre-existing software elements must expect a significantly higher incident rate as well as change rate over time, compared to small traditional safety RTOS - statistical methods will be a key factor in planing realistically as well as evaluation of economic viability for a given system. The methods proposed here are potentially of interest beyond safety related systems - both HA and business-critical systems may as well profit from such methods as could continuous improvement efforts for the kernel development life-cycle.

More details

Root cause analysis (statistical severity estimation basis)

The term root cause analysis is quite contended in the safety community so a short (imprecise) definition: root cause analysis; identify the logical life-cycle phase in which the error that manifested it self was introduced and what the subjective causal factors of the committer were. The goal of the root cause analysis effort is to allow quantification of the defect severity (e.g. for contingency planing) as well as improved detection of those defects that are of elevated severity. This effort would result in an initial assessment against historic defects (classification and contingency estimation) as well as in the initiation of a continuous (low volume) project to conduct root-cause analysis on defects in -stable (or -cip).

More details

L2S (Line-To-SHA) is a tool to get the SHA of each parsed/compiled line of a binary mapped to the commit hash

The goal of L2S (Line-to-SHA) is to extract not only the minimal set of code lines but also the effective set of commits in a particular configuration of the kernel. This allows to inspect meta data of the commits as well as conduct analysis on the patches that comprise the actual binary (phase: initial selection) as well as support the impact analysis during system maintenance as L2S allows to identify the subset of stable bug-fixes commits that actually affect the binary of a particular system. This allows to minimize the monitoring and impact analysis efforts of a running system which we believe is key to making GNU/Linux manageable with reasonable economic constraints.