Faulhaber's Formula

It's fairly natural upon encountering the formulas:

And, possibly:

To wonder what pattern is going on here. Formulas for the sums of higher powers, some historical information, and insight into what patterns hold can be found on Wikipedia.

This interactive will focus on a geometric way of deriving these formulas that will involve reasoning in higher dimensions.

Triangle Numbers

A common technique for demonstrating the formula for is to draw a staircase, as below, and then try to compute its area. If each square has area 1, computing the area is the same as adding up the number of squares. Counting by rows gives us that the area is the sum of 1, 2, 3, ..., n.

Adding in some small triangles gives us a large triangle, allowing us to solve for the area of the staircase. Check the other "show" checkbox below to see these smaller triangles.

n:
(n+1) * Area of small triangle Show
+ Area of staircase Show
= Area of large triangle

Pyramidal Numbers

Similarly, we can demonstrate the formula for By drawing a step pyramid: a three dimensional object made of layers. On the top layer is one cube, on the next layer down is a 2 by 2 square of cubes, on the next layer is a 3 by 3 square of cubes, and so on. These squares of cubes are usually lined up at a corner, as shown below.

Just as before, the problem of finding the sum of the first n squares can be viewed as a geometric problem, in this case computing the volume of the step pyramid. And, just as before, we add on pieces to make a large, smooth shape whose volume we know:

n:
Volume of step pyramid Show
+ volume of Show
+ volume of Show
+ (n+1)* volume of Show
= Volume of large pyramid

Volume of a Pyramid

Of course our reasoning above requires us to know the formula for the volume of our small and large pyramids. We could very easily show three of them fitting together to form a cube:
, and thus each one is one-third the volume of the corresponding cube.

If we examine these pieces more carefully, we notice that they consist exactly of those points whose z-coordinates are less than their x- and y- coordinates, where the bottom square is chosen to be on the xy-plane. By symmetry, this region is congruent to the region whose x-coordinates are least, and the region whose y-coordinates are least. This covers all possible points in the cube -- after all, one of their coordinates must be least -- excluding the meager cases of ties. Indeed, the three pieces in our diagram are the regions described above, and ties correspond to faces where they meet.

Going Down a Dimension

When looking into patterns, mathematicians often try to consider trivial cases to see if they can yield insights into a general pattern. For instance, what should a sum of zero terms be? Or what should zero factorial be? In this case, the sum of zeroth powers will give us another data point for trying to find a pattern:

One might even try to consider the sum of negative powers, but that way is more trouble than it's worth.

Going Up a Dimension

Just as we were able to derive a formula for the sum of squares by drawing a pyramid with a square base, we are able to derive a formula for the sum of cubes by drawing a hyperpyramid with a cube for a base. Specifically, we will stack cubes "on top of" (displaced in the fourth dimension) each other to form a "step hyperpyramid". Then, we will add on pieces of small hypercubes to turn our step hyperpyramid into a flat hyperpyramid, whose hypervolume we know. We can then work out the hypervolume of the step hyperpyramid by subtracting the hypervolume of the pieces we added.

There's just one problem: it's really hard to visualize a hyperpyramid! How could we possibly attempt such a feat? By looking at cross-sections!

Each point in four dimensional space has four coordinates, which we will call x, y, z, and w. If we fix a particular value of w, we can draw points at the (x,y,z) coordinates of those points in our shape with that particular value of w.

Cross sections can also be useful in visualizing three dimensional objects, since your screen is only two dimensional. to unlock a cross section visualizer for the sum of squares visualizer above.


w: 3 y: 3 Show Axes
Volume of step hyperpyramid Volume of
Volume of
Volume of
Volume of
Volume of
Volume of
Volume of

If we add up all the pieces above, we get , the hypervolume of the whole smooth hyperpyramid, and so by subtracting, we can get the desired formula .

Generalizing to Higher Dimensions

If we want to talk about shapes in four dimensions or higher, the best way to do so is really by talking about coordinates of points. Suppose we're interested in computing for some d ≥ 1. To compute this, we'll construct a hyperpyramid in d+1 dimensions. We will denote points in this space by vectors and use index notation such that: .

We will call our hyperpyramid P and it will consist of those points whose coordinates are all between 0 and n+1 whose last coordinate is least. If the following makes sense to you:

Our hyperpyramid P falls within the hypercube (which we'll call C) of those points whose coordinates are all between 0 and n+1, and is one d+1st the hypervolume of C. After all, the volume of P/the volume of C is the probability that a random point in C will wind up in P, and the probability that a randomly selected point has its last coordinate smallest is one in the number of coordinates. So the hypervolume of P is given by:

On the other hand, we can calculate the volume of P by splitting C up into an grid of unit hypercubes. In effect, this means looking at the integer parts of the coordinates: all the points in each unit hypercube share all the same integer parts of their coordinates. For each unit hypercube, we'll talk about its far bottom corner, which is obtained by rounding all of the coordinates except the last up to an integer, and rounding the last coordinate (indexed by d+1) down to an integer.

For each unit hypercube, we can ask how much of that unit hypercube is in P. The corners of the unit hypercube containing a point are given by rounding up or down each of the coordinates of that point. Most likely to be contained in P is the far bottom corner, after all, rounding all the coordinates but the last up and then rounding the last coordinate down is most likely to make the last coordinate smallest. Similarly, the corner least likely to be contained in P is given by rounding all but the last coordinates down and then rounding the last coordinate up.

For each unit hypercube (call it u) let Au be the set of coordinate indicies i ≤ d such that xi rounded down is more than xd+1 rounded up. In this case, for any point in u, xd+1 is going to be less than xi (as desired for points in P), so we will call these coordinate indices safe.

Similarly, let Bu be the set of coordinate indices i ≤ d such that xi and xd+1 have the same integer parts. In this case, for some points in u, xd+1 is going to be less than xi and others won't. We'll call these coordinate indicies boundary.

Note that for each hypercube that intersects P, all of its coordinate indices are safe or boundary: none can have points where xi rounded up be less than xd+1 rounded down, because then xd+1 would be more than xi. We will write b(u) to mean the number of boundary indices of u.

We now write another formula for the hypervolume of P:

For a particular unit hypercube u with b(u) = j, what is the hypervolume of the portion of u in P? Since the hypervolume of u is 1, it's the probability that a randomly selected point in u is in P. When picking the coordinates of a point x in u randomly, it doesn't matter what we pick for the safe coefficients. The boundary coefficients are picked from between m and m+1 for some integer m, as is xd+1. The probability that xd+1 is the smallest is 1 in the number of numbers we picked, or . This lets us rewrite our formula for the hypervolume of P:

So now we need to determine how many unit hypercubes in P have b(u) = j. There are ways of selecting which indices are boundary indices. Once we choose which coordinates are boundary, the integer parts of their values (the important part of determining which unit hypercube) are determined. We now ask how many ways there are of choosing the remaining safe coordinates. If xd+1 is between m and m+1 (for m ranging from 0 to n), then there are m different possible integer parts for the safe coordinates. Since we need to pick d-j values from this collection, there are md-j ways (with 00 being, as per combinatoricists, 1). So the total number of unit hypercubes with b(u)=j is . This lets us rewrite our formula for the hypervolume of P:

Since our two formulas for the hypervolume of P are the same:
If one is uncomfortable with the combinatoricist's 00, we can rewrite:
So:

Where the second line is obtained via the replacement k=d-j. Finally, we can solve for the sum of powers of d in terms of sums of lower powers: