Multiprocessor computing system reliability analysis

profilefahad611
multiproc-reliability.pdf

Multiprocessor computing system reliability analysis

A multiprocessor system is composed of two computing modules: CM1 and CM2. Each of them contains one processor(P1 and P2, respectively), one memory module (M1 and M2) and two disks: a primary disk (D11 and D21 respectively) and a backup disk (D12 and D22 respectively).

Initially, the primary disk is accessed by the corresponding computing module, while the backup disk contains the copy of the primary disk’s data, and it is accessed only periodically for updating operations. If the primary disk fails, it is replaced in its function by the backup disk. In terms of reliability the disks are identical, they are characterized by the same failure rate or reliability cumulative distribution function (cdf). The computing modules are connected by means of the bus B; moreover, P1 and P2 are energized by the power supply PS: the failure of PS forces P1 and P2 to fail.

M3 is a spare memory replacing M1 or M2 in the case of failure. If both M1 and M2 are operational, M3 is just kept alive or in warm standby in order to maintain the data stored, but it is not accessed to read or write any data. When M1 or M2 or both fail, M3 substitutes the failed unit.

In order to properly work, the multiprocessor computing system requires that at least one computing module (CM1 or CM2), the PS and the bus B correctly operate. Moreover, a computing module (CM1 and CM2) is operational if the processor (P1 and P2, respectively), one between the local memory (M1 and M2) and the shared memory M3 and one disk (D11 or D21 for CM1 and D12 or D22 for CM2) are not failed.

Assuming that all the component have a failure time exponentially dis- tributed and the memory module M3 has different failure rates, when it is in warm standby and or active, as reported in Table 1, compute the system reliability function and the MTTF. Failure rates in Table 1 are expressed in failures in time (FIT), i.e. number of faults per billion device hours.

Moreover, compute the system availability assuming that the system is reparable and the component repair rates are as in Table 1 (Remark: a repair rate equal to 0 means the component is reliable).

1

Component Failure rate Repair rate (FIT) (repairs/h)

B 2 0 P1, P2 500 3.85 · 10−2 PS 6000 1

M1, M2 30 4.00 · 10−2 M3 (active) 30 4.00 · 10−2 M3 (standby) 25 4.00 · 10−2

D11 80000 3.45 · 10−2 D21 80000 3.45 · 10−2 D12 80000 3.45 · 10−2 D22 80000 3.45 · 10−2

Table 1: System parameter values

2