In this post, we’ll cover the use of the Python matplotlib package to animate several traditional sorting algorithms. While I’ll briefly touch on the sorting algorithms in question, there’s no shortage of resources and tutorials on these topics floating around the internet, so the purpose of this post is not necessarily to delve into the algorithms themselves, but to focus on Python and matplotlibspecific implementation details, supplemented with a bit of discussion on algorithm complexity. As always, let’s begin with the end result:
Time complexity
The performance of an algorithm is generally measured by its time complexity, which is often expressed in Big O notation (not to be confused with The Big O, an anime featuring a giant robot and a catchy theme song that I find myself whistling whenever reading about algorithmic complexity). Big O notation tells us the worstcase runtime of an algorithm that has \(n\) inputs. For example, an algorithm that takes the same amount of time regardless of the number of the inputs is said to have constant, or \(O(1)\), complexity, whereas an algorithm whose runtime increases quadratically with the number of inputs has a complexity of \(O(n^2)\), and so on.
In this post, we’ll quantify the time complexity with the number of operations performed, where I’ve defined “operations” to include swaps (exchanging the position of one element with that of another element in an array) and/or comparisons (checking if one element of an array is larger than or smaller than another element). This metric will give us an idea of how each sorting algorithm scales as the number of elements that must be sorted increases.
The code and the algorithms
The code is
available on my Github. Download sorts.py
and follow along. Note that this
script takes advantage of Python’s yield from
feature, or generator
delegation, which was
introduced by PEP 380 in Python 3.3. Therefore, you’ll need Python 3.3
or higher for this script to work.
First, the imports:


The random
and time
modules will be used to generate a random array of
numbers to actually demonstrate the sorting algorithms (we’ll get the
current time from time
to seed the random number generator). The matplotlib
pyplot
and animation
modules will allow us to animate the algorithms.


We’ll also define a helper function to swap the positions of two elements in an array. Note that we take advantage of Python’s assignment rules and evaluation order for assignments to assign both elements on one line.
The subsequent subsections will detail the implementations of the different sorting algorithms. For the purposes of this post, the goal is to sort an array of integers in ascending order (smallest value to largest value).
Bubble sort
In bubble sort, each element is compared to the next element. During each iteration, if the next element sorts before the current element (i.e., if the next element is greater than the current element), the two are swapped. This is performed for all elements of the array, causing large elements to “bubble” to the end of the array. It looks something like this:
This process (which involves \(n1\) comparisons on the first iteration, with one less comparison/swap on each subsequent iteration) is repeated \(n1\) times until all the elements are sorted, producing a time complexity of \(O(n^2)\). The function used to create the animation in the video above is as follows:


On each iteration of the outer loop, we check to see if at least one swap was
performed on the previous iteration using the variable swapped
. If no swaps
were performed, that means the array has been sorted. Also note that we use the
yield
statement to turn this function into a generator. A generator is an
iterator, essentially a “frozen function” that remembers the state
of its variables between calls. In this case, the generator stops when the
yield
statement is encountered and returns a reference to the list A
. The
next time the generator is called, it runs through the loop(s) and stops again
at the yield
statement. Another thing to keep in mind is that, in Python,
lists (like A
in the code above) are passed by reference, not by value. So,
when we modify the input list A
by swapping elements, we’re modifying
the original list, not a copy of the list. Thus, it’s not strictly
necessary for us to return A
with yield A
each time—we could simply
yield
. However, it’s useful because it allows us to call the generator
from a function that may not have access to the original list A
.
The exact time complexity of the algorithm can be found from the sequence
\[(n1) + (n2) + \dots + (n(n1))\]
which represents the number of operations performed at each iteration as a
function of the number of elements \(n\) in the list (array) A
. The sequence
can be written as the following sum:
\[T_n = \sum_{i=1}^{n1} (ni)\]
where \(T_n\) is the time (or number of operations) required to sort \(n\) elements. The sum above can be rewritten as two sums:
\[T_n = \sum_{i=1}^{n1}n\  \sum_{i=1}^{n1}i\]
The first sum in the equation above simply evaluates to \(n(n1)\). The second sum is a triangular number sum, which turns out to be \(\frac{1}{2}n(n1)\). Putting these together:
\[T_n = n(n1)\  \frac{n(n1)}{2} = \frac{n(n1)}{2} = \frac{n^2  n}{2}\]
which is where \(O(n^2)\) comes from (keeping only the highest order term and ignoring coefficients). This can be seen in the video above, where bubble sort took 43 operations (swaps) to sort 10 elements and 4291 operations (swaps) to sort 100 elements—both of which match the theoretical worstcase values of \(T_{10} = \frac{1}{2}(10)(101) = 45\) and \(T_{100} = \frac{1}{2}(100)(1001) = 4950\), respectively. The actual numbers can sometimes be lower than the theoretical values because the Big O complexity provides an upper bound on the complexity, and the array may end up being sorted before the maximum number of iterations has been completed.
Insertion sort
Insertion sort works by iterating through the array once, in the process effectively dividing the array into a sorted portion and an unsorted portion. On each iteration, it inserts the current element from the unsorted portion into the correct position in the sorted portion via a series of swaps. Hopefully, a visual helps (it is, after all, the entire point of this post).
The code, shown below, is relatively simple. Like bubble sort, it only contains
one yield
statement, which is encountered after each swap.


What's the time complexity of this algorithm? Up to one swap is performed on the first iteration, up to two on the second, and so on, giving us the following:
\[T_n = \sum_{i=1}^{n1}i = 1 + 2 + \dots + (n1)\]
As with bubble sort, this sums to \(\frac{1}{2}(n^2n)\), for a complexity of \(O(n^2)\).
Merge sort
In merge sort, the array to be sorted is subdivided into two subarrays. Each of those subarrays is sorted by recursively calling the merge sort algorithm on them, after which the final sorted array is produced by merging the two (now sorted) subarrays. Here it is visually:
The following code for merge sort is a little more interesting than the last
several functions because it relies on generator delegation, a relatively new
feature in Python that allows you to easily call and exhaust a generator from
within another generator via yield from
.


Observe that, given an array, we split it down the middle into two subarrays,
call mergesort()
on each of them, then call a helper function merge()
to
combine the results of the two sorted subarrays. Each call to mergesort()
yields from additional calls to mergesort()
, then yields from a call to the
helper function, and finally yields the sorted array. It might help to see what
merge()
is doing.


In merge()
, we're creating a new array (merged
), then appending the next
smallest number from the correct subarray to merged
until all the elements
from one of the subarrays have been appended to the new array. Then, we iterate
through the remaining subarray and add all its elements to merged
, after which
the values are copied from the temporary array to the original array A
.
In other words, the original array is first divided into two arrays, after which each subarray is divided into two arrays, and so on. For example, if our original array had 8 elements, it would be divided into two subarrays with 4 elements each, each of which would be divided into subarrays of 2 elements, and finally 1 element, like this:
8
4, 4
2, 2, 2, 2
1, 1, 1, 1, 1, 1, 1, 1
An array with 1 element is already sorted, so no work needs to be done on the last (bottom) "layer." That leaves three layers containing arrays that must be sorted. There's a pattern here—because we divide each (sub)array in half each time, we end up with \(\mathrm{log}_2 n\) layers, each of which contains \(n\) elements that must be sorted (one array in the top layer containing \(n=8\) elements, two arrays in the second layer each containing \(\frac{1}{2}n=4\) elements, and four arrays in the third layer each containing \(\frac{1}{4}n=2\) elements). Thus, this algorithm requires \(n\mathrm{log}_2 n\) operations and is said to have a worstcase time complexity of \(O(n\mathrm{log}n)\), termed quasilinear time complexity, which is a significant improvement over bubble sort and insertion sort.
Quicksort
Like merge sort, quicksort is also usually implemented as a recursive algorithm that divides an array into smaller subarrays (or subsegments of the original array) and sorts each subarray. Unlike merge sort, it achieves this by selecting one element from the unsorted array, called the "pivot," then swapping values such that all elements smaller than the pivot value are on one side of the array and all elements larger than the pivot are on the other side. This process is known as "partitioning." Quicksort is then applied again to each side. Here it is animated:
Here's the code for this implementation of quicksort.


As both the code above and the code for merge sort have shown, a generator can
contain any number and combination of yield
statements and yield from
statements.
The time complexity of quicksort is, well, a little more complex than the previous examples. In the worstcase scenario, the pivot ends up being the largest (or smallest) value in the array, in which case partitioning hasn't helped and, for an array or subarray of length \(n\), has left us with one subarray of length \(1\) and one of length \(n1\) instead of two subarrays each containing approximately \(\frac{1}{2}n\) elements. Thus, the worstcase time complexity for quicksort is \(O(n^2)\). However, this depends on how the pivot is chosen. It turns out that the averagecase time complexity, in which partitioning roughly splits each array or subarray in half like merge sort, is \(\Theta(n\mathrm{log}n)\) ("Big theta" notation, which uses \(\Theta\), is often used to describe the averagecase complexity of an algorithm).
Selection sort
We now arrive at selection sort, the last sorting algorithm we'll consider in this post. Selection sort is similar to insertion sort in that there's a sorted portion of the array and an unsorted portion. However, in selection sort, the next element to add to the sorted portion is determined by finding the minimum value from the unsorted portion, which looks like this:
And here is the code:


On the first iteration, the algorithm searches through \(n\) elements to find the minimum, then swaps it with the first element. On the second iteration, it searches through \(n1\) elements to find the minimum, then swaps it with the second, and so on for the remaining iterations. Counting each lookup/comparison to find the minimum as one operation and each swap as one operation, we can express the total number of operations with the following sum.
\[T_n = \sum_{i=1}^{n1} (ni+1) + 1\]
Rewriting this as two sums and using the results from earlier in this post:
\[\sum_{i=1}^{n1}(ni) + \sum_{i=1}^{n1}2 = \frac{n^2n}{2} + 2(n1)\]
Rewriting and combining terms:
\[T_n = \frac{1}{2}n^2 + \frac{3}{2}n  2 = \frac{n^2 + 3n  4}{2}\]
Thus, this algorithm also has a time complexity of \(O(n^2)\).
Animating the algorithms with matplotlib
Finally, this brings us to the fun part: animating the algorithms. For brevity's
sake, I'll skip the part of the script that creates and randomizes a list A
of
N
unique, consecutive integers. Instead, let's jump to the part that
instantiates one of the generator functions defined above based on user input:


Having obtained the appropriate generator, let's create a matplotlib figure and axis to serve as the canvas for our animation.


fig
and ax
are handles to our plot figure and axis, respectively, allowing
us to manipulate them in an objectoriented manner. Next, we'll initialize a bar
plot (incidentally, "bar plot" sounds like a scheme hatched at a pub by a couple
of friends who've had a few too many shots of tequila) in which each bar will
correspond to one element of the list, and the height of a given bar will
correspond to its value; the bar corresponding to the integer 7, for example,
will have a height of 7 units measured along the y axis.


This is achieved by calling bar()
on the axis. The argument align="edge"
causes the left side of the bars to line up with the xaxis indices (instead of
the middle of the bars being aligned with the xaxis indices). The method
returns a list of
matplotlib Rectangle objects, which we store in bar_rects
.
These Rectangle
objects contain attributes like the height and width of each
bar (rectangle).
Next, we create a text label on the axis to display the number of operations.


The first two arguments to text()
are the coordinates of the text label origin
as a decimal fraction of the lengths of the x and y axes, respectively. The
third argument is the text to display, which I've initialized to nothing via
""
. The keyword argument transform=ax.transAxes
specifies that the
coordinates given by the first two arguments should be interpreted as axis
fractions rather than data coordinates.
Next, we define a function, update_fig()
, which, as the name suggests, will be
used to update the figure for each frame of the animation. Specifically, this
function will be passed to
matplotlib.animation.FuncAnimation(), the matplotlib animation
module class that calls it to update the figure.


Observe that I've defined update_fig()
to accept the list A
as an input,
which it uses to update the height of each bar (rectangle) using its
set_height()
method. Also observe that, though update_fig()
receives no
reference to the plot axis the way we've defined it, it does take rects
, which
is a reference to a list of bar plot rectangles.
Furthermore, notice how we define a list of one element (0
) called
iteration
. This will be passed to update_fig()
as well. Its single element
will be used to keep track of the number of iterations, or operations, that the
algorithm has performed up to the current frame. Why a list with one element?
Because a simple integer like iteration = 0
cannot be passed by reference,
only by value, unlike a list, which is passed by reference—thus, we can
preserve the number of iterations (number of operations) between frames. If this
were C++, we could simply pass iteration
by reference or via a pointer, but
this is not possible in Python. We could have achieved the same result in Python
in other ways, as well (like use of the global
or nonlocal
statements).


Finally, we instantiate the FuncAnimation
object. We pass it the generator
function corresponding to the algorithm, conveniently named generator
, as the
source for the frames via the frames
keyword argument. Each frame of the
animation corresponds to one iteration of the generator, i.e., to one yield
statement in the generator function. After each yield
, the animation object
passes the value returned by the generator (the list A
from yield A
) to
update_fig()
. Optionally, the animation object can also pass other arguments
to update_fig()
, which can be specified via the fargs
keyword argument. In
this case, I've opted to pass bar_rects
and iteration
so update_fig()
can
actually update the bar plot rectangles and the text label.
On the last line, we call plt.show()
to show the plot and let the animation do
its thing.
Final notes
The implementations of the algorithms discussed in this post have been written
specifically to expose a lot of the inner workings and, more importantly, to
yield
at key operations, allowing us to count the total number of operations
required to sort a given array based on the number of generator iterations. Many
of them could be written more succinctly, but that wasn't the point of this
post. For example, selection sort can be rewritten into a twoline function
using Python's builtin min()
method:
def selectionsort(A):
for i in range(len(A)  1):
swap(A, i, A.index(min(A[i:])))
However, this wouldn't allow us to animate and count the number of operations needed to actually sort the list.
Anyway, I hope this post has helped demonstrate the use of generators, the
yield
statement, and the yield from
statement to produce animations with
matplotlib, as well as provided a better understanding of the time complexity of
several sorting algorithms.