Tangible Computing
6. Strings and Arrays


One common type of data we want to deal with is text. Text such as "Hello World!" is called a string. Each letter in that string, as well as the space and exclamation point, is called a character. If we want character and string variables in C, we use the type char.
char c = 'X';
char h[6] = "Hello";
char w[] = "World!";
The first line declares a variable named c of type character whose value is initialized to 'X'. In C, single quotes are used to surround character values, while double quotes are used to surround string values. The second line declares a variable named h as a string with room for six characters, initialized to the letters "Hello". How many characters are in the string "Hello"?

C strings are always declared to be one character longer than their length! This last character holds a special null character '\0'.

The last line declares a variable w as a string initialized to "World!". Since no length was given, it uses its initial value to compute its length. "World!" has how many characters? So how long is w?

6.1 Internal Representation

All data in a computer must be represented as bits. How do we represent characters as bits? We number them. ASCII is one system of numbering the keyboard characters. For example in ASCII 'A' is given the number 65, 'B' is given the number 66, etc. The end of string character '\0' is given the number 0. A char is just a number in C. You can add 1 to it. You can compare with '<='. You can bitwise '&' or negate '~' it. The only difference between a char and an int is that a char is only 8-bits.

What about strings? Strings are a sequence of characters stored consecutively. One can access members of the sequence using indexing.
if (h[0] == 'H') {
     /* Checks if the first character is h. */
}

if (h[1] == 'H') {
     /* Checks if the second character is h. */
}

for(int i=0; h[i]; i++) {
     /* Have i index every character in h. */
}
Notice that indexing starts at 0! This will take some getting used to and you will undoubtedly forget it. So let me say it again, indexing starts at 0!. Look at the final loop. This code starts i at zero and increments it until h[i] is false, which means the character is zero, which recall is the end of string character. So this loop goes over every character in h.

6.2 String Example

code/readSerial/readSerial.ino

    void setup() {
      Serial.begin(9600);
    }
     
    void loop() {
      
      // Type in a number
      uint32_t number = readLong();
      
      Serial.println(number);  
    }
     
     
    /* Read a long off of the serial port and return. */
    int32_t readLong()
    {
      char s[128]; /* 128 characters should be more than enough. */
      readLine(s, 127);
      s[127] = '\0';
      return atol(s);
    }
     
    /* Read a line (up to maxlen characters) off of the serial port. */
    void readLine(char *s, uint8_t maxlen)
    {
      uint8_t i = 0;
      
      while(1) {
        while (Serial.available() == 0) { } /* Do nothing */
        s[i] = Serial.read();
        if (s[i] == '\0' || s[i] == '\n' || i == maxlen-1) break;
        i += 1;
      }
    }


Look at the code for the functions readLine() and readLong(). Can you follow what they are doing? Note that atol is a builtin function for converting a string representation of a number into a long. Similarly atoi converts a string to an integer.

Call by Reference
Up until now all function arguments have been call by value, that is the arguments receive copies of the value passed by the caller. If those values were changed they did not affect the caller's variables. In C, all composite types (types that are not just a single value) such as strings, are passed call by reference. This means that changes made to the characters in a string directly change the string of the caller. This is what is meant by readline taking char * as the argument type of s. In the function, s is treated as a string having unknown length. Changes to the string affect the caller's string passed into the function. This is how readline "returns" a string read off of the serial port.

6.3 Arrays

Strings are one example of the more general concept of an array data structure. Arrays are sequences of any data type stored consecutively.
int numbers[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
int primes[100];
This declares an array of integers of length 10 named numbers and initialized with the listed values. It also declares an array of length 100 named primes, which is not initialized to anything (its values may be anything). Unlike strings, arrays are not terminated by a particular value. If you want 10 integers, you declare an array of integers of length 10, not 11. You will be able to safely access 10 indices from 0 to 9. But like strings, you can index an array to get a particular value or assign a particular value to that index in the array.

Length
Once you declare an array (or string) in C, it immediately forgets the length of the array. It never checks if you are accessing a valid index in the array (really it doesn't even know what are valid indices). Therefore it's a good idea to store the size of the array as another integer, say size, to keep track of it. Therefore you know the last index you can safely access is size-1, since 0 counts as the first index. For example, the following statements are all valid (although, some will likely have very unexpected consequences).
int nums[10];

nums[0] = 1; /* Changes the first value in the array to 1 */
nums[1] = 2; /* Changes the second value in the array to 2 */
nums[10] = 11; /* Changes the value AFTER the last value of the array to 11 */
nums[-1] = 0; /* Changes the value BEFORE the first value of the array to 0 */
Changing values outside of the declared length of your array is a very, very bad idea. Other values are being stored there and you are changing those values. You may even be changing what code the program thinks should be executed next. In many cases your program will crash, ceasing to run, with no error message explaining why. In other words, be very careful about indexing into an array, and always be certain your indices are actually part of the array's declared length. In fact one of the single most common security vulnerabilities in code is a buffer overrun, where a user is allowed to give input longer than the length of the declared string, allowing other values in memory to be changed, possibly changing what code gets executed and taking over the program and computer. Even good programmers make buffer overrun mistakes.
6. Strings and Arrays
Tangible Computing / Version 3.20 2013-03-25