Memory Organization

Section 7.5 Memory Organization

In this section we will discuss how registers, SRAM, and DRAM are organized and constructed. Keeping with the intent of this book, the discussion will be introductory only.

Subsection 7.5.1 Registers

Registers are used in places where small amounts of very fast memory is required. The CPU contains many registers, where they are used for numerical computations, temporary data storage, etc. Some registers are directly accessible by the programmer (Section 8.2), others are hidden. They are also used in the hardware that serves to interface between the CPU and other devices in the computer system.

We begin with a design for a simple 4-bit register, which allows us to store four bits. Figure 7.5.1 shows a design for implementing a 4-bit register using D flip-flops.

Figure 7.5.1. A 4-bit register. A D flip-flop is used to hold each bit. The state of the \(i^{th}\) bit is set by the value of \(d_{i}\) at each clock tick. The 4-bit value stored in the register is \(r = r_{3}r_{2}r_{1}r_{0}\text{.}\)

As described above, each time the clock cycles, the state of each of the D flip-flops is set according to the value of \(d = d_{3}d_{2}d_{1}d_{0}\text{.}\) The problem with this circuit is that any changes in any \(d_{i}\) will change the state of the corresponding bit in the next clock cycle, so the contents of the register are essentially valid for only one clock cycle.

One-cycle buffering of a bit pattern is sufficient for some applications, but there is also a need for registers that will store a value until it is explicitly changed, perhaps billions of clock cycles later. The circuit in Figure 7.5.2 adds a \(store\) signal and feedback from the output of each bit. When \(store = 1\) each bit is set according to its corresponding input, \(d_{i}\text{.}\) When \(store = 0\) the output of each bit, \(r_{i}\text{,}\) is used as the input, giving no change. So this register can be used to store a value for as many clock cycles as desired. The value will not be changed until \(store\) is set to \(1\text{.}\)

Figure 7.5.2. A 4-bit register with store. The storage portion is the same as in Figure 7.5.1. When \(store = 1\) each bit is set according to its corresponding input, \(d_{i}\text{.}\) When \(store = 0\) the output of each bit, \(r_{i}\text{,}\) is used as the input, giving no change.

Most computers need many general purpose registers, which are grouped together into a Register File.

Register File: A group of two or more registers.

A mechanism must be provided for addressing each of the registers in the register file. Consider a register file composed of eight 4-bit registers, r0–r7. We could build eight copies of the circuit shown in Figure 7.5.2. Let the 4-bit data input, \(d\text{,}\) be connected in parallel to all of the corresponding data pins, \(d_{3}d_{2}d_{1}d_{0}\text{,}\) of each of the eight registers. Three bits are required to address one of the registers (\(2^{3} = 8\)). If the 8-bit output from a \(3 \times 8\) decoder is connected to the eight \(store\) inputs of each of the registers, \(d\) will be stored into one, and only one, of the registers during the next clock cycle. All the other registers will have \(store = 0\text{,}\) and they will simply maintain their current state. Selecting the output from one of the eight registers can be done with four 8-input multiplexers. One such multiplexer is shown in Figure 7.5.3. The inputs \(r0_{i}\)–\(r7_{i}\) are the \(i^{th}\) bits from each of eight registers, r0–r7.

Figure 7.5.3. 8-way mux to select output of register file. This only shows the output of the \(i^{th}\) bit. \(n\) are required for \(n\)-bit registers. \(Reg\_Sel\) is a 3-bit signal that selects one of the eight inputs.

Keep in mind that four of these output circuits would be required for 4-bit registers. The same \(Reg\_Sel\) would be applied to all four multiplexers simultaneously in order to output all four bits of the same register. Larger registers would, of course, require correspondingly more multiplexers.

There is another important feature of this design that follows from the master/slave property of the D flip-flops. The state of the slave portion does not change until the second half of the clock cycle. So the circuit connected to the output of this register can read the current state during the first half of the clock cycle, while the master portion is preparing to change the state to the new contents.

Subsection 7.5.2 Shift Registers

There are many situations where it is desirable to shift a group of bits. A shift register is a common device for doing this. Common applications include:

Inserting a time delay in a bit stream.
Converting a serial bit stream to a parallel group of bits.
Converting a parallel group of bits into a serial bit stream.
Shifting a parallel group of bits left or right to perform multiplication or division by powers of \(2\text{.}\)

Serial-to-parallel and parallel-to-serial conversion is required in I/O controllers because most I/O communication is done via serial bit streams, while data processing in the CPU is performed on groups of bits in parallel. A simple 4-bit serial-to-parallel shift register is shown in Figure 7.5.4.

Figure 7.5.4. Four-bit serial-to-parallel shift register. A D flip-flop is used to hold each bit. Bits arrive at the input, \(s_{i}\text{,}\) one at a time. The last four input bits are available in parallel at \(r_{3}\)–\(r_{0}\text{.}\)

A serial stream of bits is input at \(s_{i}\text{.}\) At each clock tick, the output of \(Q_{0}\) is applied to the input of \(Q_{1}\text{,}\) thus copying the previous value of \(r_{0}\) to the new \(r_{1}\text{.}\) The state of \(Q_{0}\) changes to the value of the new \(s_{i}\text{,}\) thus copying this to be the new value of \(r_{0}\text{.}\) The serial stream of bits continues to ripple through the four bits of the shift register. At any time, the last four bits in the serial stream are available in parallel at the four outputs, \(r_{3}, r_{2}, r_{1}, r_{0}\text{,}\) with \(r_{3}\) being the oldest in time.

The same circuit could be used to provide a time delay of four clock ticks in a serial bit stream. Simply use \(r_{3}\) as the serial output.

Subsection 7.5.3 Static Random Access Memory (SRAM)

There are several problems with trying to extend the multiplexer design to large memory systems. Although a multiplexer works for selecting the output from several registers, one that selects from many millions of memory cells is simply too large. From Figure 7.1.10, we see that such a multiplexer would need an AND gate for each memory cell, plus an OR gate with an input for each of these millions of AND gate outputs.

We need another logic element called a tri-state buffer.

Tri-State Buffer: A device that has three possible outputs: \(0\text{,}\) \(1\text{,}\) and “no connection.”

The “no connection” output is actually a very high impedance connection (Section 6.1.2), also called “High Z” or “open.”

A tri-state buffer takes two inputs—data input and enable. The truth table describing a tri-state buffer is:

\(Enable\)	\(In\)	\(Out\)
\(\binary{0}\)	\(\binary{0}\)	\(highZ\)
\(\binary{0}\)	\(\binary{1}\)	\(highZ\)
\(\binary{1}\)	\(\binary{0}\)	\(\binary{0}\)
\(\binary{1}\)	\(\binary{1}\)	\(\binary{1}\)

and its circuit symbol is shown in Figure 7.5.5.

Figure 7.5.5. Tri-state buffer.

When \(Enable = 1\) the output, which is equal to the input, is connected to whatever circuit element follows the tri-state buffer. But when \(Enable = 0\text{,}\) the output is essentially disconnected. Be careful to realize that this is different from \(0\text{;}\) being disconnected means it has no effect on the circuit element to which it is connected.

A 4-way multiplexer using a \(2 \times 4\) decoder and four tri-state buffers is illustrated in Figure 7.5.6.

Figure 7.5.6. Four way multiplexer built from tri-state buffers. \(Output = w, x, y\text{,}\) or \(z\text{,}\) depending on which one is selected by \(s_{1}s_{0}\) fed into the decoder. Compare with Figure 7.1.10.

The tri-state buffer design may not be an advantage for small multiplexers. But an n-way multiplexer without tri-state buffers requires an n-input OR gate, which presents some technical electronic problems.

Figure 7.5.7 shows how tri-state buffers can be used to implement a single memory cell.

Figure 7.5.7. 4-bit memory cell. Each bit is output through a tri-state buffer. \(addr_{j}\) is one output from a decoder corresponding to an address.

This circuit shows only one 4-bit memory cell so you can compare it with the register design in Figure 7.5.1, but it scales to much larger memories. \(Write\) is asserted to store data in the D flip-flops. \(Read\) enables the output tri-state buffer in order to connect the single output line to \(Mem\_data\_out\text{.}\) The address decoder is also used to enable the tri-state buffers to connect a memory cell to the output, \(r_{3}r_{2}r_{1}r_{0}\text{.}\)

This type of memory is called Static Random Access Memory (SRAM).

Static Random Access Memory (SRAM): Memory that retains its stored values as long as power to the circuit is maintained.

A 1 MB memory requires a 20-bit address. This requires a \(20 \times 2^{20}\) address decoder as shown in Figure 7.5.8.

Figure 7.5.8. Addressing 1 MB of memory with one \(20 \times 2^{20}\) address decoder. The short line through the connector lines indicates the number of bits traveling in parallel in that connection.

Recall from Section 7.1.3 that an \(n \times 2^{n}\) decoder requires \(2^{n}\) AND gates. We can simplify the circuitry by organizing memory into a grid of rows and columns as shown in Figure 7.5.9.

Figure 7.5.9. Addressing 1 MB of memory with two \(10 \times 2^{10}\) address decoders.

Although two decoders are required, each requires \(2^{n/2}\) AND gates, for a total of \(2 \times 2^{n/2} = 2^{(n/2) + 1}\) AND gates for the decoders. Of course, memory cell access is slightly more complex, and some complexity is added in order to split the 20-bit address into two 10-bit portions.

Subsection 7.5.4 Dynamic Random Access Memory (DRAM)

Each bit in SRAM requires about six transistors for its implementation. A less expensive solution is found in Dynamic Random Access Memory (DRAM).

Dynamic Random Access Memory (DRAM): Memory in which each bit value decays in time, even while power remains on. The bit values must be refreshed periodically.

A common implementation of DRAM is to store each bit value by a charging a capacitor to one of two voltages. The circuit requires only one transistor to charge the capacitor, as shown in Figure 7.5.10, which shows only four bits in a single row.

Figure 7.5.10. Bit storage in DRAM.

When the “Row Address Select” line is asserted all the transistors in that row are turned on, thus connecting the respective capacitor to the Data Latch. The value stored in the capacitor, high voltage or low voltage, is stored in the Data Latch. There, it is available to be read from the memory. Since this action tends to discharge the capacitors, they must be refreshed from the values stored in the Data Latch.

When new data is to be stored in DRAM, the current values are first stored in the Data Latch, just as in a read operation. Then the appropriate changes are made in the Data Latch before the capacitors are refreshed.

These operations take more time than simply switching flip-flops, so DRAM is appreciably slower than SRAM. In addition, capacitors lose their charge over time. So each row of capacitors must be read and refreshed in the order of every 60 msec. This requires additional circuitry and further slows memory access. But the much lower cost of DRAM compared to SRAM warrants the slower access time.

This has been only an introduction to how switching transistors can be connected into circuits to create a CPU and memory. We leave the details to more advanced books, e.g., [8], [11], [12], [13], [14], and [16].