So far, we've been using general-purpose comparative sorting algorithms. Algorithms like merge sort and quicksort are staples for software developers because they can sort just about any list that has a concept of "bigger" or "smaller," but, because of this reliance on comparison, they are mathematically limited to an O(n lg n) running time.
Sometimes, however, our input list is already organized in a convenient way, or it's made of convenient elements.
In those cases, we can sometimes use non-comparative sorting algorithms to sort it more quickly!
Counting Sort is one of those algorithms. Counting Sort is a specialized tool designed to sort
a list of integers that are all less than some integer 'k'. For example, one could use it to sort the list 8 3 5 2 1
,
because all elements in that list are integers less than 9
, and we would call that a Counting Sort algorithm with k = 9
.
Counting Sort becomes more memory-heavy the larger k
is, so it's best to use it on lists that don't contain some values that are
way bigger than others (For example, Counting Sort might not be the best choice to sort the list 4 7 2 90032145
, because
k
would have to be 90032146
, which would be pretty expensive for such a small list).
How it Works
The goal of Counting Sort for a given input list A
is to create a new list C
, such that for every element e
in A
,
C[e]
is the number of items in A
that are less than or equal to e
. Yes, that's right---we're using e
as an index into
C
. That fact is actually the core of Counting Sort; by using elements as indexes, we can sort lists without comparing elements.
Once we have created C
, we can use it to create the sorted list: we know how many elements are equal to or less than a given element,
so we know where in the output list it should go.
Creating a Counting Sort Algorithm
Our algorithm will work in three steps.
We will loop throught
A
and count each element, placing that value in the corresponding position inC
. For example,C[e]
will contain the number of times thate
appears inA
.We will loop through
C
and add the value of each index to that of the index after it. After this,C[e]
will contain the number of elements that are less than or equal toe
.We will create the output list by looping through
A
and usingC
to decide where each element goes. For example, the elemente
would go to the positionC[e] - 1
in the output array.
After this, the output list will be sorted!
Writing the Code
Starting off, for a given list inputv
of length inputc
and a given k
, we're going to create a counting list C
of length
k
and an output list outputv
of length inputc
.
long * counting_sort(int inputc, long * inputv, int k) {
long * c = malloc(k * sizeof(long));
long * outputv = malloc(inputc * sizeof(long));
Next, we're going to do Step 1: counting each element in inputv
, taking the number of times it occurs in inputv
, and recording that value in C
.
for (int i = 0; i < inputc; i++) {
if (inputv[i] > k) continue;
// ignore elements greater than k
c[inputv[i]] += 1;
// count the element
}
If we did that right, then Step 2: for
each element e
in inputv
, the position C[e]
of C
should contain the number of times e
occurs in inputv
. Now,
we need to add to C[e]
until it contains the number of elements in inputv
that are equal or less to e
.
for (int i = 1; i < k; i++) c[i] += c[i - 1];
Now, we do Step 3: we take each element of inputv
, we use C
to find out where it should go in the sorted list, and
we put it into outputv
in that position!
for (int j = inputc - 1; j >= 0; j--) {
if (inputv[j] > k) continue;
// again, skip elements bigger than k
outputv[c[inputv[j]] - 1] = inputv[j];
// if c[e] is the number of elements equal
// or less than e, the final _index_ that
// e should end up at is c[e] - 1.
c[inputv[j]]--;
// decrement c[e], because there is now
// one fewer element less than or equal to `e`.
}
After this, outputv
is our answer: the sorted list.
free(inputv);
free(c);
return outputv;
}
Putting it All Together
Here's an example of Counting Sort in use, using my command-line helper code to run Counting Sort on terminal input, with a k-value of 10.
#include "../clilonglist.h" // my command-line input utility functions
#include <stdlib.h>
long * counting_sort(int inputc, long * inputv, int k) {
long * c = malloc(k * sizeof(long));
long * outputv = malloc(inputc * sizeof(long));
for (int i = 0; i < inputc; i++) {
if (inputv[i] > k) continue;
c[inputv[i]] += 1;
}
for (int i = 1; i < k; i++) c[i] += c[i - 1];
for (int j = inputc - 1; j >= 0; j--) {
if (inputv[j] > k) continue;
outputv[c[inputv[j]] - 1] = inputv[j];
c[inputv[j]]--;
}
free(inputv);
free(c);
return outputv;
}
int main(int argc, char * argv[]) {
int inputc = argc - 1;
long * inputv = extract_long_list(inputc, argv);
// use my utility functions to grab the
// command line arguments as an array of longs
inputv = counting_sort(inputc, inputv, 10);
print_long_list(inputc, inputv);
free(inputv);
}
Let's run it!
- ./counting_sort 4 2 7 4 9 1 3
1 2 3 4 4 7 9
Sorted!
Now, What is k?
In Counting Sort, k
is our constraint. Counting Sort tells us, "I can sort a list of integers very fast, but only if you
tell me what the biggest value I'll need to sort is," and that value is k - 1
. If you look at the code, you'll notice two things.
- The length of
C
isk
, so we literally cannot sort any values greater thank
. - The bigger
k
is, the biggerC
is, so we wantk
to be as small as possible.
As such, the programmer uses Counting Sort in situations where they can guarantee that not being able to sort items equal to or greater
than k
won't be an issue; for example, if they know that all of the items in the list are less than 300
, in which case they would
set k = 300
, or if they are willing to throw away values that are equal to or greater than k
(our algorithm, for instance, just ignores
items that are not less than k
). Usually, we can't just ignore values in a list, so we'll usually only use Counting Sort when we know that we'll
only need to run it on a list with a defined maximum possible value.
Conclusion
Counting Sort is a fun, niche algorithm that works very well in its defined territory. If one were to ask one million programmers to rank a given programming language on a scale of 1 to 100, what algorithm do you think we would use to sort the programmers from "likes it the least" to "likes it the most?"