This document is a copy of a set of postings Gareth McCaughan made on comp.sys.acorn.programmer in September 1996. Apart from formatting changes, and changes to the text to reflect the changes to the formatting, this is the original articles just joined together.
This is the first of four parts of a long and boring article about sorting. It's just a brief summary of most of the most important sorting algorithms.
For a pretty full account of this sort of thing, read volume 3 of Knuth. I'm just going to sketch things.
In all that follows we have an array of N items, numbered 1..N or 0..N-1 (whichever is more convenient), and arranged sequentially. (Sometimes linked lists are more efficient; not usually.)
The items could be anything, and the criterion for being "less than" could be anything (more or less). Depending on what these are, it might be very expensive to compare two items, or it might be very expensive to move an item from one place to another. Or both :-(.
In measuring the efficiency of a sorting technique, it's usual to say something like "it takes time of order N^2". This means that for large enough N the amount of work done by the sorting algorithm looks like some constant times N^2, plus smaller things. This work is made up partly of comparisons and partly of actually moving data around; in different applications these will have different relative costs. (So no one set of figures will be good enough for all cases.)
Here's a little summary of each of several important methods. I'll have more to say later.
DON'T USE BUBBLE SORT. Everyone learns it because it's easy to write a bubble sort program, so it's become popular with the writers of textbooks. Unfortunately it's a very very bad algorithm. DON'T USE IT.
I'm certainly not going to tell you how it works. You don't need to know. DON'T USE IT.
This takes time of order N^2 in the worst case, because if the list is in reverse order then you have to do a lot of comparing and moving things around. (Work it out.) It's actually of order N^2 on average too.
Insertion sort is very easy to program, and because of its simplicity it works well when N is very small. For moderate N it is one of the worst plausible algorithms.
The trouble with insertion sort is that everything it does is "local", so that if there are a lot of things in completely the wrong place insertion behaves very badly. So, a guy called Shell invented the following algorithm:
If you're careful, the code for this can be very nearly as short as that for insertion sort; and if you choose your list of values for d carefully it can perform very much better.
No one is quite sure what the theoretical performance of shellsort is. If you choose things carefully, for typical data the time it takes seems to go like N^1.3 or something.
Mergesort takes time of order N log N if you do it right. Unfortunately, it also requires quite a lot of extra storage.
Quicksort is for most applications the fastest known sorting method. It requires a small amount of extra storage, to keep track of the recursion. Its worst-case performance is awful, but if you do things right this more or less never happens. For typical data it takes time of order N log N.
The algorithm for heapsort is more complicated than any of those I've discussed, and won't fit into the amount of space I'm willing to use. You can find it discussed in any number of textbooks.
Heapsort, like quicksort and mergesort, takes time of order N log N. Unlike quicksort it always takes this long (there isn't that awful worst case). Unfortunately, it's usually slower than quicksort by a moderate-sized constant factor (2 or thereabouts). It also requires no extra storage at all.
This only works when your data are numbers in a restricted range, say 0..K-1.
This is clearly blindingly fast: of order N. Unfortunately, it's not common for K to be small enough to do this.
This only works where your data are numbers in a restricted range, but the range can be much larger. Say the range is 0..K^r-1, where K is small enough for the trivial case.
This is very efficient, provided you have some way of adding numbers to bins without moving them around. Linked lists are good for this.
That concludes my brief outline of sorting methods. There are plenty more, but you are unlikely ever to need anything not in the list above.
This is the second of four parts of a long and boring article about sorting. It describes an experiment I did to get some timings for several sorting routines, and briefly summarises the results.
The obvious experiment, really. For each sorting method, and for a range of values of N, make up an array of items and sort them. However, the performance of a sorting method can depend heavily on how the items are arranged (for instance, a really stupid implementation of quicksort may do really badly if it happens to be given an array that's already sorted!), so I did this for a variety of datasets for each N.
All my data items were ordinary 4-byte integers. This was just to make my job a bit easier. It would be interesting to do the same thing again, using some other kinds of data: this might make a significant difference.
The sorting routines I used were as follows. Ones written so as to allow sorting things other than integers are emphasised. Of course they take some performance loss for this.
I'll say more about tuning quicksort later.
I didn't do a radix sort, because that really requires space to add link fields, and I wanted to work with simple arrays of integers.
The complete figures are in the next part of this article. Here's a summary of the most interesting points.
For sorting 300,000 completely random integers, the ranking was as follows. The numbers are timings, in seconds. Blank lines indicate big drops in performance.
Tuned quicksort 3.02 Naive quicksort 3.46 Generic quicksort 5.22 Heapsort 6.20 Mergesort 7.40 OS_HeapSort 9.52 Shellsort (good) 9.58 Shellsort (OKish) 10.08 qsort() 35.76 Shellsort (bad) 47.67 Insertion sort huge (on the order of 20000, at a guess)
This is the third of four parts of a long and boring article about sorting. It gives, in full gory horrible detail, the results of the experiment described in the second part.
Firstly, numbers obtained by running each sorting routine once on each sort of dataset. Small numbers should be taken with a pinch of salt for obvious reasons. All numbers are in seconds; blanks indicate huge times.
I'm afraid the figures for generic OS_HeapSort aren't here. That's because the experiment that produced this output didn't include that. Sorry. The results always seem to be within a few centiseconds of the non-generic OS_Heapsort anyway: see the previous part of this article.
100 300 1000 3000 10000 30000 100000 300000
------ ------ ------ ------ ------ ------ ------ ------
Insertion 0.00 0.01 0.11 1.16 14.09
Shellsort-2 0.00 0.01 0.01 0.06 0.34 1.52 9.36 47.67
Shellsort-2a 0.00 0.00 0.01 0.04 0.17 0.64 2.59 10.08
Shellsort-3 0.01 0.00 0.01 0.03 0.16 0.59 2.41 9.58
Mergesort 0.00 0.00 0.02 0.04 0.17 0.60 2.27 7.40
Quick(Naive) 0.00 0.00 0.00 0.02 0.08 0.29 1.06 3.46
Quick(M3,18I) 0.00 0.00 0.00 0.02 0.08 0.24 0.93 3.02
Quick(Gen.) 0.00 0.00 0.01 0.03 0.12 0.42 1.60 5.22
qsort() 0.00 0.00 0.04 0.13 0.54 2.08 8.79 35.76
OS_HeapSort 0.00 0.01 0.01 0.05 0.20 0.71 2.75 9.52
Heapsort 0.00 0.00 0.00 0.03 0.13 0.43 1.75 6.20
100 300 1000 3000 10000 30000 100000 300000
------ ------ ------ ------ ------ ------ ------ ------
Insertion 0.00 0.00 0.01 0.00 0.01 0.02 0.09 0.26
Shellsort-2 0.00 0.00 0.01 0.03 0.11 0.38 1.44 4.77
Shellsort-2a 0.00 0.00 0.01 0.02 0.10 0.35 1.35 4.53
Shellsort-3 0.00 0.00 0.00 0.01 0.06 0.23 0.89 2.96
Mergesort 0.00 0.01 0.01 0.02 0.10 0.34 1.25 4.03
Quick(Naive) 0.00 0.00 0.01 0.01 0.05 0.19 0.64 2.23
Quick(M3,18I) 0.00 0.00 0.00 0.01 0.04 0.14 0.57 1.82
Quick(Gen.) 0.00 0.00 0.01 0.02 0.08 0.26 1.03 3.27
qsort() 0.00 0.00 0.01 0.05 0.17 0.57 2.17 7.19
OS_HeapSort 0.00 0.00 0.02 0.05 0.20 0.64 2.40 7.86
Heapsort 0.00 0.01 0.00 0.02 0.11 0.38 1.38 4.52
100 300 1000 3000 10000 30000 100000 300000
------ ------ ------ ------ ------ ------ ------ ------
Insertion 0.01 0.02 0.23 2.40 28.36
Shellsort-2 0.00 0.01 0.01 0.03 0.13 0.44 1.68 5.58
Shellsort-2a 0.00 0.00 0.01 0.03 0.12 0.40 1.57 5.24
Shellsort-3 0.00 0.00 0.00 0.03 0.09 0.32 1.18 3.91
Mergesort 0.00 0.00 0.01 0.04 0.14 0.48 1.81 5.82
Quick(Naive) 0.00 0.01 0.00 0.02 0.06 0.20 0.67 2.31
Quick(M3,18I) 0.00 0.00 0.00 0.02 0.05 0.16 0.60 1.90
Quick(Gen.) 0.00 0.00 0.01 0.02 0.08 0.28 1.06 3.35
qsort() 0.00 0.00 0.02 0.08 0.29 1.01 3.69 12.11
OS_HeapSort 0.00 0.00 0.02 0.04 0.18 0.61 2.28 7.56
Heapsort 0.00 0.00 0.01 0.03 0.11 0.36 1.33 4.39
100 300 1000 3000 10000 30000 100000 300000
------ ------ ------ ------ ------ ------ ------ ------
Insertion 0.00 0.00 0.01 0.05 0.32 1.67 11.10 56.22
Shellsort-2 0.00 0.00 0.01 0.03 0.16 0.61 2.54 7.91
Shellsort-2a 0.00 0.00 0.00 0.03 0.13 0.46 1.74 5.93
Shellsort-3 0.00 0.01 0.00 0.02 0.10 0.34 1.35 4.56
Mergesort 0.00 0.00 0.01 0.04 0.14 0.50 1.83 5.91
Quick(Naive) 0.00 0.00 0.01 0.02 0.05 0.19 0.65 2.23
Quick(M3,18I) 0.00 0.00 0.00 0.01 0.04 0.15 0.58 1.83
Quick(Gen.) 0.00 0.00 0.01 0.02 0.07 0.26 1.03 3.27
qsort() 0.00 0.00 0.03 0.08 0.33 1.12 4.40 15.54
OS_HeapSort 0.00 0.01 0.01 0.05 0.19 0.65 2.41 7.88
Heapsort 0.00 0.00 0.00 0.03 0.11 0.38 1.38 4.52
100 300 1000 3000 10000 30000 100000 300000
------ ------ ------ ------ ------ ------ ------ ------
Insertion 0.00 0.00 0.00 0.01 0.08 0.40 2.40 12.37
Shellsort-2 0.00 0.00 0.01 0.03 0.13 0.46 1.83 6.38
Shellsort-2a 0.00 0.00 0.01 0.03 0.11 0.41 1.57 5.30
Shellsort-3 0.00 0.00 0.00 0.02 0.09 0.30 1.16 3.93
Mergesort 0.00 0.00 0.01 0.04 0.13 0.44 1.68 5.47
Quick(Naive) 0.00 0.00 0.01 0.02 0.07 0.23 0.88 2.87
Quick(M3,18I) 0.00 0.00 0.01 0.02 0.07 0.21 0.77 2.52
Quick(Gen.) 0.00 0.01 0.01 0.02 0.11 0.37 1.36 4.47
qsort() 0.00 0.00 0.02 0.07 0.27 0.92 3.61 12.36
OS_HeapSort 0.00 0.00 0.01 0.05 0.19 0.63 2.37 7.87
Heapsort 0.00 0.00 0.01 0.03 0.10 0.37 1.37 4.55
100 300 1000 3000 10000 30000 100000 300000
------ ------ ------ ------ ------ ------ ------ ------
Insertion 0.00 0.01 0.12 0.57 9.38
Shellsort-2 0.00 0.00 0.11 0.54 10.86
Shellsort-2a 0.00 0.00 0.01 0.05 0.21 0.89 4.27 17.62
Shellsort-3 0.01 0.00 0.01 0.03 0.13 0.43 1.77 5.83
Mergesort 0.00 0.00 0.01 0.04 0.17 0.59 2.03 7.10
Quick(Naive) 0.00 0.01 0.01 0.02 0.08 0.29 1.11 3.52
Quick(M3,18I) 0.00 0.01 0.01 0.01 0.07 0.25 1.56 3.07
Quick(Gen.) 0.00 0.01 0.01 0.03 0.12 0.43 2.82 6.04
qsort() 0.00 0.01 0.03 0.11 0.45 1.50 6.11 19.99
OS_HeapSort 0.00 0.00 0.01 0.06 0.22 0.76 2.99 10.35
Heapsort 0.00 0.01 0.01 0.03 0.13 0.47 1.96 6.94
Now, corresponding results for smaller datasets; these numbers are in centiseconds, and are a bit more reliable. Each routine was run enough times to give reasonable accuracy.
100 300 1000 3000
------ ------ ------ ------
Insertion 0.12 1.02 11.34 117.35
Shellsort-2 0.06 0.28 1.45 6.71
Shellsort-2a 0.05 0.21 0.96 3.94
Shellsort-3 0.05 0.18 0.85 3.43
Mergesort 0.09 0.33 1.30 4.67
Quick(Naive) 0.05 0.17 0.66 2.29
Quick(M3,18I) 0.04 0.13 0.54 1.91
Quick(Gen.) 0.06 0.23 0.94 3.27
qsort() 0.17 0.70 3.23 12.97
OS_HeapSort 0.09 0.33 1.36 5.07
Heapsort 0.05 0.18 0.77 3.06
100 300 1000 3000
------ ------ ------ ------
Insertion 0.01 0.02 0.08 0.26
Shellsort-2 0.04 0.16 0.71 2.80
Shellsort-2a 0.04 0.14 0.63 2.55
Shellsort-3 0.03 0.10 0.44 1.72
Mergesort 0.06 0.20 0.78 2.70
Quick(Naive) 0.03 0.10 0.44 1.34
Quick(M3,18I) 0.02 0.06 0.29 1.14
Quick(Gen.) 0.03 0.13 0.51 2.00
qsort() 0.07 0.27 1.16 4.27
OS_HeapSort 0.09 0.34 1.40 5.02
Heapsort 0.05 0.19 0.80 2.92
100 300 1000 3000
------ ------ ------ ------
Insertion 0.22 1.99 22.57 239.92
Shellsort-2 0.05 0.19 0.87 3.28
Shellsort-2a 0.04 0.17 0.78 3.04
Shellsort-3 0.04 0.14 0.61 2.32
Mergesort 0.08 0.28 1.05 3.80
Quick(Naive) 0.03 0.11 0.46 1.41
Quick(M3,18I) 0.02 0.07 0.31 1.22
Quick(Gen.) 0.03 0.14 0.54 2.09
qsort() 0.12 0.46 2.03 7.35
OS_HeapSort 0.08 0.30 1.27 4.60
Heapsort 0.04 0.17 0.72 2.67
100 300 1000 3000
------ ------ ------ ------
Insertion 0.04 0.17 0.97 5.43
Shellsort-2 0.06 0.22 0.95 3.76
Shellsort-2a 0.05 0.19 0.82 3.20
Shellsort-3 0.04 0.16 0.65 2.45
Mergesort 0.10 0.29 1.09 3.85
Quick(Naive) 0.04 0.12 0.45 1.37
Quick(M3,18I) 0.03 0.08 0.32 1.16
Quick(Gen.) 0.04 0.15 0.56 2.04
qsort() 0.14 0.52 2.24 8.18
OS_HeapSort 0.10 0.35 1.42 5.02
Heapsort 0.06 0.20 0.82 2.93
100 300 1000 3000
------ ------ ------ ------
Insertion 0.01 0.07 0.30 1.51
Shellsort-2 0.05 0.20 0.80 3.06
Shellsort-2a 0.04 0.18 0.69 2.78
Shellsort-3 0.03 0.15 0.55 2.26
Mergesort 0.08 0.26 0.99 3.46
Quick(Naive) 0.05 0.16 0.55 2.01
Quick(M3,18I) 0.03 0.12 0.43 1.53
Quick(Gen.) 0.05 0.20 0.80 2.89
qsort() 0.10 0.41 1.73 6.44
OS_HeapSort 0.10 0.36 1.36 4.78
Heapsort 0.05 0.20 0.80 2.89
100 300 1000 3000
------ ------ ------ ------
Insertion 0.11 1.00 11.20 115.92
Shellsort-2 0.15 1.14 11.75 116.92
Shellsort-2a 0.05 0.22 1.07 4.59
Shellsort-3 0.04 0.18 0.78 2.92
Mergesort 0.09 0.33 1.32 4.70
Quick(Naive) 0.05 0.16 0.68 2.36
Quick(M3,18I) 0.04 0.13 0.55 1.92
Quick(Gen.) 0.06 0.23 0.97 3.32
qsort() 0.16 0.69 2.90 10.26
OS_HeapSort 0.09 0.33 1.38 5.17
Heapsort 0.05 0.19 0.79 3.09
This is the fourth and last part of a long and boring article about sorting. It describes how to make a quicksort routine go faster.
There are two problems with quicksort. Firstly, and more importantly, if you choose your "pivot" x badly then you can lose badly: if you chop the list into one very small part and one very big part then you haven't gained much. In the worst case, you chop the list into parts of size 1 and N-1. Ouch.
Secondly, the bookkeeping involved in doing all the recursive stuff gets a bit expensive when you're sorting very short lists. It pays to "truncate" and either (i) sort short lists by insertion, or (ii) not sort short lists at all until your recursion is finished, and then do them all at once by doing a single insertion-sort pass over the whole dataset.
To deal with the first problem, there are two common approaches: you can choose a random element of your list as a pivot each time (ugh), or you can examine more than one element and take something in the middle. What my tuned quicksort does is to look at the first, last and middle elements of the sublist being sorted, and choose the median of those three (i.e., if they're A,B,C and A<=B<=C then use B.)
To deal with the second, it turns out that on my machine it's best to truncate when you get to lists of size 18 or smaller, and to do a single sorting pass at the end. (On machines with virtual memory it's sometimes better to do lots of small insertion sorts instead, to improve locality. On machines without VM, like the Risc PC, that doesn't matter and the deciding factor is the decrease in the amount of bookkeeping that results from just doing one insertion sort. I'm not sure what happens on machines with moderately large caches; this might also make the lots-of-little-insertion-sorts approach a win.)
The other issue with quicksort is stack overflow. If you're really unlucky in your choice of pivots (or if you choose them stupidly) then the recursion can get very deep, and this can be a problem. It's therefore safest to do the recursion explicitly, maintaining a stack of things-to-be-done. Then, if the stack is about to overflow, you can just not bother stacking the relevant thing; that means that a sublist won't get sorted. That's sort of OK, because it will get fixed by the insertion sort at the end. This means that really bad choice of pivots just makes the sort take a long time, rather than crashing your machine. If you make your stack a reasonable size (which needn't actually be very big) then overflow is very very very unlikely ever to occur, anyway.
Original article by Gareth McCaughan Dept. of Pure Mathematics & Mathematical Statistics, Cambridge University, England. Comments about the content of this document should go to him.
Comments about the formatting and other aspects of the HTML should go to Steven Singer.