Mahmoud Abdelkhalek

We describe what fencepost errors are, how to avoid them, and provide a precise derivation for why they occur.

Suppose that you’re working on a large codebase in your favorite editor, and suppose that you want to find out how many lines of code a particular class in this codebase takes up.

To do this, you find the first line of code that the class starts on, say line $7$, and then you find the last line of code that the class ends on, say line $373$. You then compute $373 - 7 = 366$ and conclude that this class takes up $366$ lines of code.

However, the correct answer is actually $367$ lines of code, since the last line, line $373$, was not counted in the original answer.

This is an example of a fencepost error. The word “fencepost” comes from one of the simplest examples where fencepost errors occur:

If you build a fence that is $10$ feet long with posts spaced $1$ foot apart, and assuming that each fence section consists of $2$ posts, how many posts do you need to build the fence?

If we divide $10$ feet of fence by $1$ foot per fence section, then we obtain $10$ fence sections. However, each fence section consists of $2$ posts, so while there will be $10$ fence sections, there will be $11$ posts.

We can resolve the fencepost error in the example above by writing down the sequence of lines of code as

$$ \begin{equation} 7, 8, \dots, 373 \label{loc_seq} \end{equation} $$

Then, because two sequences have the same length if there exists a bijection between them, we can determine the length $N$ of $\eqref{loc_seq}$ by finding a bijection between it and the sequence $1,2,\dots,N$ as follows,

$$ \begin{align*} &7, 8, \dots, 373 \\ &0, 1, \dots, 366 \\ &1, 2, \dots, 367 \end{align*} $$

Hence, the length $N$ of $\eqref{loc_seq}$ is $367$.

More formally, consider the integers $r$ and $a$ such that $a > r$. What is the length of the sequence $r, r+1, r+2, \dots, a$? That is, how many integers are there in this sequence?

As done above, the length of this sequence can be determined by finding a bijection between this sequence and the sequence $1,2,\dots,N$, where $N \in \mathbb N$ is the length of the sequence and $N > 1$, as follows,

$$ \begin{align} &r,r+1,r+2,\dots,a \label{gen_seq1} \\ &0,1,2,\dots,(a-r) \label{gen_seq2} \\ &1,2,3,\dots,(a-r) + 1 \label{gen_seq3} \end{align} $$

where $\eqref{gen_seq2}$ is obtained from $\eqref{gen_seq1}$ by subtracting $r$ from each element in $\eqref{gen_seq2}$ and $\eqref{gen_seq3}$ is obtained from $\eqref{gen_seq2}$ by adding $1$ to each element in $\eqref{gen_seq2}$. Hence, the length of the sequence $r,r+1,r+2,\dots,a$ is $(a-r) + 1$.

We can generalize this reasoning to the more general sequence $r,r+b,r+2b,\dots,a$, such that the difference between consecutive elements in the sequence is $b \in \mathbb N$ instead of $1$ as assumed above and $a = r + q \cdot b$ for some $q \in \mathbb N$.

The length of this more general sequence can be obtained as follows,

$$ \begin{align*} &r,r+b,r+2b,\dots,a \\ &0,b,2b,\dots,(a-r) \\ &0,1,2,\dots,\frac{a-r}{b} \\ &1,2,3,\dots,\frac{a-r}{b} + 1 \end{align*} $$

Hence, the length of the sequence $r,r+b,r+2b,\dots,a$ is $\frac{a-r}{b} + 1$. It is interesting to note that because $a = r + q \cdot b$, then $a, r, q,$ and $b$ can be interpreted as the dividend, remainder, quotient, and divisor, respectively, when $a$ is divided by $b$ via Euclidean division.

This general case is useful when determining how many “points” there are in a given interval. For example, suppose that you are trying to run a numerical simulation over time. You have the desired total simulation time $T > 0$ and a fixed time step size $\Delta t > 0$. Given this information, you would like to compute how many points in time that the simulation will step through, denoted $N$.

That is, you would like to determine the length of the sequence $0, \Delta t, 2\Delta t, \dots, T$. As done above, we can determine the length of this sequence by finding a bijection between this sequence and the sequence $1,\dots,N$, where $N \in \mathbb N$, as follows:

$$ \begin{align} &0, \Delta t, 2\Delta t, \dots, T \label{sim_seq1} \\ &0, 1, 2, \dots, \frac{T}{\Delta t} \label{sim_seq2} \\ &1, 2, 3, \dots, \frac{T}{\Delta t} + 1 \label{sim_seq3} \end{align} $$

where $\eqref{sim_seq2}$ is obtained from $\eqref{sim_seq1}$ by dividing each element in the sequence by $\Delta t$ and $\eqref{sim_seq3}$ is obtained from $\eqref{sim_seq2}$ by adding $1$ to each element in the sequence. Hence, the number of time points that this simulation will step through is $\frac{T}{\Delta t} + 1$.

To summarize, to compute the correct number of numbers in a sequence and avoid fencepost errors, do the following:

Given the end-points of the sequence, write down the full sequence that you would like to count. Be sure to avoid including any numbers that you do not want to count.
Transform the sequence from step 1 into the canonical sequence $1,\dots,N$, where $N \in \mathbb N$.
The amount of numbers in the sequence is $N$.

Fencepost errors occur when the sequence that is written down in step 1 includes numbers that should not be counted or does not include numbers that should be counted.

For further reading on fencepost errors, see this article.