Multiprocessor Operating Systems 1 Multiprocessor Hardware. For the most part, multiprocessor operating systems are just regular operating systems. Priority Cpu Scheduling Program C' title='Priority Cpu Scheduling Program C' />They handle system calls, do memory management, provide a file system, and manage IO devices. Nevertheless, there are some areas in which they have unique features. Priority Cpu Scheduling Program C' title='Priority Cpu Scheduling Program C' />These include process synchronization, resource management, and scheduling. This chapter excerpt takes a brief look at multiprocessor hardware and then moves on to these operating systems issues. J51Pg6F.png' alt='Priority Cpu Scheduling Program C' title='Priority Cpu Scheduling Program C' />From the author of A shared memory multiprocessor or just multiprocessor henceforth is. CPUs share full access to a common RAM. A. program running on any of the CPUs sees a normal usually paged virtual address. The only unusual property this system has is that the CPU can write some. CPU has changed it. In computing, scheduling is the method by which work specified by some means is assigned to resources that complete the work. The work may be virtual computation. Writing an infinite loop is simple whiletrue add whatever break condition here But this will trash the CPU performance. This execution thread will take as. When organized correctly, this property. CPU writes some data into. Below we will first take a brief look at. Although all multiprocessors have the property that every CPU can address all. These machines are called. UMA Uniform Memory Access multiprocessors. In contrast. NUMA Nonuniform Memory Access multiprocessors do not have this. Why this difference exists will become clear later. We will first. examine UMA multiprocessors and then move on to NUMA multiprocessors. UMA Bus Based SMP Architectures The simplest multiprocessors are based on a single bus, as illustrated in. Fig. 8 1a. Two or more CPUs and one or more memory modules all use the same. When a CPU wants to read a memory word, it first checks. If the bus is idle, the CPU puts the address of the. If the bus is busy when a CPU wants to read or write memory, the CPU just. Herein lies the problem with this design. With. two or three CPUs, contention for the bus will be manageable with 3. The system will be totally limited by the bandwidth of the. CPUs will be idle most of the time. The solution to this problem is to add a cache to each CPU, as depicted in. Fig. 8 1b. The cache can be inside the CPU chip, next to the CPU chip, on the. Since many reads can now be. CPUs. In general, caching is not done on an individual. When a word is referenced. CPU touching it. Figure 8 1 Three bus based multiprocessors. Without caching. b With. With caching and private memories. Each cache block is marked as being either read only in which case it can be. If a CPU attempts to write a word that. If other caches have. If some other cache has a. Many cache transfer protocols exist. Yet another possibility is the design of Fig. CPU has. not only a cache, but also a local, private memory which it accesses over a. To use this configuration optimally, the compiler. The shared memory is then. In most cases, this careful placement. UMA Multiprocessors Using Crossbar Switches Even with the best caching, the use of a single bus limits the size of a UMA. CPUs. To go beyond that, a different kind of. The simplest circuit for connecting n. CPUs to k memories is the crossbar switch, shown in Fig. Crossbar switches have been used for decades within telephone switching. At each intersection of a horizontal incoming and vertical outgoing line. A crosspoint is a small switch that can be electrically. In Fig. 8 2a we see three crosspoints closed. CPU, memory pairs 0. Many other combinations are also. In fact, the number of combinations is equal to the number of. Figure 8 2 a An 8 X 8 crossbar switch. An open crosspoint. A. closed crosspoint. One of the nicest properties of the crossbar switch is that it is a. CPU is ever denied the connection it. Furthermore, no advance planning is needed. Even if. seven arbitrary connections are already set up, it is always possible to connect. CPU to the remaining memory. One of the worst properties of the crossbar switch is the fact that the. With 1. 00. 0 CPUs and. Such a large crossbar switch. Nevertheless, for medium sized systems, a crossbar design is. UMA Multiprocessors Using Multistage Switching Networks. A completely different multiprocessor design is based on the humble 2 X 2. Fig. 8 3a. This switch has two inputs and two outputs. Messages arriving on either input line can be switched to either output line. For our purposes, messages will contain up to four parts, as shown in Fig. The Module field tells which memory to use. The Address. specifies an address within a module. The Opcode gives the operation. READ or WRITE. Finally, the optional Value field may contain an. WRITE. The switch inspects the. Module field and uses it to determine if the message should be sent on. X or on Y. Figure 8 3 a A 2 X 2 switch. A message format. Our 2 X 2 switches can be arranged in many ways to build larger multistage. Adams et al., 1. 98. Bhuyan et al., 1. Kumar and. Reddy, 1. One possibility is the no frills, economy class omega. Fig. 8 4. Here we have connected eight CPUs to eight. More generally, for n CPUs and n. Figure 8 4 An omega switching network. The wiring pattern of the omega network is often called the perfect. To see how the omega. CPU 0. 11 wants to read a word from memory module 1. The CPU sends a READ message to switch 1. D containing 1. 10 in the Module. The switch takes the first i. A 0 routes to the upper output and a 1 routes to the lower one. Since this bit is a 1, the message is routed via the lower output to 2. D. All the second stage switches, including 2. D, use the second bit for routing. This, too, is a 1, so the message is now forwarded via the lower output to 3. D. Here the third bit is tested and found to be a 0. Consequently, the message goes. The path followed. Fig. 8 4 by the letter a. As the message moves through the switching network, the bits at the left hand. They can be put to good use by. For path a, the incoming lines are 0 upper input to 1. Command And Conquer General Full Game there. D, 1 lower input. D, and 1 lower input to 3. D, respectively. The reply is routed back using. At the same time all this is going on, CPU 0. An analogous process happens here, with the message routed. When it arrives, its Module field reads 0. Since these two requests do not use any of the same switches. Now consider what would happen if CPU 0. Its request would come into conflict with CPU 0. A. One of them would have to wait. Unlike the crossbar. Not every set of. Conflicts can occur over the use of a. It is clearly desirable to spread the memory references uniformly across the. One common technique is to use the low order bits as the module number. Consider, for example, a byte oriented address space for a computer that mostly. The 2 low order bits will usually be 0. By using these 3 bits as the module number. A memory system in. Interleaved memories maximize parallelism because most. It is also possible to design. CPU to each memory module, to spread the traffic better. Single bus UMA multiprocessors are generally limited to no more than a few. CPUs and crossbar or switched multiprocessors need a lot of expensive. To get to more than 1. CPUs, something. has to give. Usually, what gives is the idea that all memory modules have the. This concession leads to the idea of NUMA multiprocessors, as. Like their UMA cousins, they provide a single address space. CPUs, but unlike the UMA machines, access to local memory modules. Thus all UMA programs will run without. NUMA machines, but the performance will be worse than on a UMA machine. NUMA machines have three key characteristics that all of them possess and. There is a single address space visible to all CPUs. Access to remote memory is via LOAD and STORE instructions. Access to remote memory is slower than access to local memory.