Low-level Character Handling

Section 2.15 Low-level Character Handling

In Section 2.9 we used the printf and scanf functions from the C Standard Library to convert between C data types and single characters written on the screen or read from the keyboard. In this section, we introduce the two system call functions, write and read. We will use the write function to send bytes to the screen and the read function to get bytes from the keyboard.

When these low-level functions are used, it is the programmer's responsibility to convert between the individual characters and the C/C++ data type storage formats. Although this clearly requires more programming effort, we will use them instead of printf and scanf for most of the programs in this book in order to better illustrate data storage formats.

We start with the C program in Listing 2.15.1, which shows how to display the character 'A' on the screen.

/* oneChar.c
 * Writes a single character on the screen.
 * 2017-09-29: Bob Plantz
 */

#include <unistd.h>

int main(void)
{
  char aLetter = 'A';
  write(STDOUT_FILENO, &aLetter, 1); // STDOUT_FILENO is
                                     // defined in unistd.h
  return 0;
}

Listing 2.15.1.

This program allocates one byte of memory as a char variable and names it “aLetter.” This byte is initialized to the bit pattern $\hex{41}$ ('A' from Table 2.13.1). The write function is invoked to display the character on the screen. The arguments to write are:

STDOUT_FILENO is defined in the system header file, unistd.h. It is the GNU/Linux file descriptor for standard out (usually the screen). GNU/Linux sees all devices as files. When a program is started the operating system opens a path to standard out and assigns it file descriptor number 1.
&aLetter is a memory address. The sequence of one-byte bit patterns starting at this address will be sent to standard out by the write function.
1 (one) is the number of bytes that will be sent (to standard out) as a result of this call to write.

The program returns a $0$ to the operating system.

File descriptor symbolic names for terminal I/O can be found in the man page for stdio. Here is a summary:

STDIN_FILENO$= 0\text{.}$ This is used to read from the keyboard.
STDOUT_FILENO$= 1\text{.}$ This is where normal screen text is written.
STDERR_FILENO$= 2\text{.}$ This is used for writing error message to the screen.

It is much better to use the symbolic names than the integers. This provides some implicit documentation for your code, and the numbers might be changed in a future version of GNU/Linux. You should read the man page for further information.

Now let's consider a program that echoes each character entered from the keyboard.

/* echoChar1.c
 * Echoes a character entered by the user.
 * 2017-09-29: Bob Plantz
 */

#include <unistd.h>

int main(void)
{
  char aLetter;

  write(STDOUT_FILENO, "Enter one character: ", 21); // prompt user
  read(STDIN_FILENO, &aLetter, 1);                   // one character
  write(STDOUT_FILENO, "You entered: ", 13);         // message
  write(STDOUT_FILENO, &aLetter, 1);                 // echo character
      
  return 0;
}

Listing 2.15.2.

As in the program in Listing 2.15.1 we allocate one char variable to hold the character. In the program in Listing 2.15.2 we need to read the character that results from the user typing on the keyboard. This is accomplished by calling the read function. Since we wish to read from the keyboard, we will read from STDIN_FILENO.

The read function will store the keyboard character in memory, so we need to pass it an address of the place to store it. This is done with the “address-of” operator, &aLetter.

A run of this program gave:

pi@rpi3:~/chp02 $ ./echoChar1
Enter one character: a
You entered: api@rpi3:~/chp02 $
pi@rpi3:~/chp02 $

which probably looks like the program is not working correctly.

Look more carefully at the program behavior. It illustrates some important issues when using the read function. First, how many keys did the user hit? There were actually two keystrokes, the ‘a’ key and the return key. In fact, the program waits until the user hits the return key. The user could have used the delete key to change the character before hitting the return key.

Next, the program correctly echoes the first key hit then terminates. Upon program termination the shell prompt, pi@rpi3:~/chp02 $, is displayed. But the return character is still in the input buffer, and the shell program reads it. The result is the same as if the user had simply pressed the return key in response to the shell prompt.

Here is another run where I entered three characters before hitting the return key:

pi@rpi3:~/chp02 $ ./echoChar1
Enter one character: abc
You entered: api@rpi3:~/chp02 $ bc
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
quit
pi@rpi3:~/chp02 $

Again, the program correctly echoes the first character, but the two characters bc remain in the input line buffer. When echoChar1 terminates, the shell program reads the remaining characters from the line buffer and interprets them as a command. In this case, bc is a program that I installed on my Raspberry Pi, so the shell executes that program. (If you have installed bc and wish to learn about the program, use “man bc”.)

An important point of the program in Listing 2.15.2 is to illustrate the simplistic behavior of the write and read functions. They work at a very low level. It is your responsibility to design your program to interpret each byte that is written to the screen or read from the keyboard.