Joey Thornberry
2024-09-28 20:43:18.512644831 UTC

Understanding Counting Sort

tl,dr: When you need to sort a list of integers that are all close together, use Counting Sort to get it done quickly!

So far, we've been using general-purpose comparative sorting algorithms. Algorithms like merge sort and quicksort are staples for software developers because they can sort just about any list that has a concept of "bigger" or "smaller," but, because of this reliance on comparison, they are mathematically limited to an O(n lg n) running time.

Sometimes, however, our input list is already organized in a convenient way, or it's made of convenient elements. In those cases, we can sometimes use non-comparative sorting algorithms to sort it more quickly! Counting Sort is one of those algorithms. Counting Sort is a specialized tool designed to sort a list of integers that are all less than some integer 'k'. For example, one could use it to sort the list 8 3 5 2 1, because all elements in that list are integers less than 9, and we would call that a Counting Sort algorithm with k = 9. Counting Sort becomes more memory-heavy the larger k is, so it's best to use it on lists that don't contain some values that are way bigger than others (For example, Counting Sort might not be the best choice to sort the list 4 7 2 90032145, because k would have to be 90032146, which would be pretty expensive for such a small list).

How it Works

The goal of Counting Sort for a given input list A is to create a new list C, such that for every element e in A, C[e] is the number of items in A that are less than or equal to e. Yes, that's right---we're using e as an index into C. That fact is actually the core of Counting Sort; by using elements as indexes, we can sort lists without comparing elements. Once we have created C, we can use it to create the sorted list: we know how many elements are equal to or less than a given element, so we know where in the output list it should go.

Creating a Counting Sort Algorithm

Our algorithm will work in three steps.

  1. We will loop throught A and count each element, placing that value in the corresponding position in C. For example, C[e] will contain the number of times that e appears in A.

  2. We will loop through C and add the value of each index to that of the index after it. After this, C[e] will contain the number of elements that are less than or equal to e.

  3. We will create the output list by looping through A and using C to decide where each element goes. For example, the element e would go to the position C[e] - 1 in the output array.

After this, the output list will be sorted!

Writing the Code

Starting off, for a given list inputv of length inputc and a given k, we're going to create a counting list C of length k and an output list outputv of length inputc.

long * counting_sort(int inputc, long * inputv, int k) {
    long * c = malloc(k * sizeof(long));
    long * outputv = malloc(inputc * sizeof(long));

Next, we're going to do Step 1: counting each element in inputv, taking the number of times it occurs in inputv, and recording that value in C.

    for (int i = 0; i < inputc; i++) {
        if (inputv[i] > k) continue;
        // ignore elements greater than k
        c[inputv[i]] += 1;
        // count the element
    }

If we did that right, then Step 2: for each element e in inputv, the position C[e] of C should contain the number of times e occurs in inputv. Now, we need to add to C[e] until it contains the number of elements in inputv that are equal or less to e.

    for (int i = 1; i < k; i++) c[i] += c[i - 1];

Now, we do Step 3: we take each element of inputv, we use C to find out where it should go in the sorted list, and we put it into outputv in that position!

    for (int j = inputc - 1; j >= 0; j--) {
        if (inputv[j] > k) continue; 
        // again, skip elements bigger than k
        outputv[c[inputv[j]] - 1] = inputv[j];
        // if c[e] is the number of elements equal
        // or less than e, the final _index_ that
        // e should end up at is c[e] - 1.
        c[inputv[j]]--;
        // decrement c[e], because there is now 
        // one fewer element less than or equal to `e`.
    }

After this, outputv is our answer: the sorted list.

    free(inputv);
    free(c);
    return outputv;
}

Putting it All Together

Here's an example of Counting Sort in use, using my command-line helper code to run Counting Sort on terminal input, with a k-value of 10.

#include "../clilonglist.h" // my command-line input utility functions
#include <stdlib.h>

long * counting_sort(int inputc, long * inputv, int k) {
    long * c = malloc(k * sizeof(long));
    long * outputv = malloc(inputc * sizeof(long));

    for (int i = 0; i < inputc; i++) {
        if (inputv[i] > k) continue;
        c[inputv[i]] += 1;
    }

    for (int i = 1; i < k; i++) c[i] += c[i - 1];

    for (int j = inputc - 1; j >= 0; j--) {
        if (inputv[j] > k) continue;
        outputv[c[inputv[j]] - 1] = inputv[j];
        c[inputv[j]]--;
    }

    free(inputv);
    free(c);
    return outputv;
}

int main(int argc, char * argv[]) {

    int inputc = argc - 1;
    long * inputv = extract_long_list(inputc, argv);
    // use my utility functions to grab the 
    // command line arguments as an array of longs
    
    inputv = counting_sort(inputc, inputv, 10);
    print_long_list(inputc, inputv);

    free(inputv);
}

Let's run it!

Sorted!

Now, What is k?

In Counting Sort, k is our constraint. Counting Sort tells us, "I can sort a list of integers very fast, but only if you tell me what the biggest value I'll need to sort is," and that value is k - 1. If you look at the code, you'll notice two things.

  1. The length of C is k, so we literally cannot sort any values greater than k.
  2. The bigger k is, the bigger C is, so we want k to be as small as possible.

As such, the programmer uses Counting Sort in situations where they can guarantee that not being able to sort items equal to or greater than k won't be an issue; for example, if they know that all of the items in the list are less than 300, in which case they would set k = 300, or if they are willing to throw away values that are equal to or greater than k (our algorithm, for instance, just ignores items that are not less than k). Usually, we can't just ignore values in a list, so we'll usually only use Counting Sort when we know that we'll only need to run it on a list with a defined maximum possible value.

Conclusion

Counting Sort is a fun, niche algorithm that works very well in its defined territory. If one were to ask one million programmers to rank a given programming language on a scale of 1 to 100, what algorithm do you think we would use to sort the programmers from "likes it the least" to "likes it the most?"