First Attempts

After thinking about the second of the APR problems on and off, and making no progress for a year, I finally picked up enough courage to contact a mathematician at the University of Geneva. I looked around on their website and found a link to prof. Hugo Duminil who worked in a domain that I thought might perhaps be vaguely related to the problem.

Knowing that mathematicians receive letters from quacks trying to prove that π=3.0 and other silly ideas, I was a little wary. To ensure that he would give my message more than two seconds attention, I carefully crafted an e-mail to prof. Duminil, which to my surprise and satisfaction he actually did read and replied to. A few messages later we decided to meet in Geneva.

We had a fruitful coffee, but after about an hour it was clear that this was not a simple problem. Hugo drew many railway layouts of which he was certain that they would take more than two iterations. This was the approach of finding a counter-example. I too was convinced that some of them would need more than two, but back at home my program always ended after the second step…

We were flabbergasted. As we made no more progress at the cafeteria table, we decided to let it rest for a while.

Observations

I tinkered more with the APR program, got some more suggestions from Hugo, we wrote some more e-mail but did not make much progress. I set out to make special versions of the program that let me observe better how the algorithm went about its operation.

This approach showed immediately that at each iteration the algorithm already uses information it has just produced during that iteration. But before I go on, we will need some terminology.

matrix: the square table describing the railway layout.
dot: checkmark indicating a connection, usually represented by ◉.
row: a row in the matrix.
cell: one of the positions in a row.
refer: we say that a row i with a dot in column j refers to row j.

Inspect these two matrices (tables):

	1	2	3	4	5	6
1		◉
2	◉		◉
3						◉
4
5
6				◉
(0)

→
α

	1	2	3	4	5	6
1	α	◉	α	α		α
2	◉	α	◉	α		α
3				α		◉
4
5
6				◉
(1)

The left hand one, labelled (0), is an initial set-up with five dots. The right hand one, labelled (1), is the result after the first iteration, which we call α. In it, the dots added during iteration α are shown with the letter α.

The algorithm takes matrix (0), starts at row 1 and progresses from left to right and top to bottom, processing each cell. During an iteration, think of it as running over the matrix, with processed cells "behind" it and cells yet to be processed "in front" of it.

Cell 1,1 is empty and produced nothing, but cell 1,2 has a dot, it refers to row 2 and therefore the dots of row 2 will be added to those of row 1. That leads to a dot in 1,1 and one in 1,3 (two red alphas).

When the algorithm comes to cell 1,3 it finds one of the dots which it just put there, and processes that as well. It therefore adds the dots of row 3 to those of row 1 (the red alpha in 1,6).

Already during the processing of each row information gets added that can be used immediately! Not only new dots put "in front" of the algorithm can be used in this iteration, the dot that was added in cell 1,1 will be used by any row lower down that refers to row 1.

There was only one original dot in row 1, but we have already added to row 1 the dots of two other rows, not one!

If we had used a "purist" form of the algorithm, in which matrix (1) is produced by looking at matrix (0) only (instead of looking at (1) while it is progressive modified), then matrix (1) would look like:

	1	2	3	4	5	6
1	α	◉	α
2	◉	α	◉			α
3						◉
4
5
6				◉
(1)

A very much less populated matrix indeed, and with the "purist" version we would indeed potentially need n iterations to arrive at a conclusion about the APR property.

With this "new" information Hugo decided it was indeed possible that two iterations sufficed, and that it might not be too difficult to prove. But neither of us having a lot of time to spend on it, we once again let it rest.

Independence of Railways

I made yet a further version of the program so that I could quickly set up the initial configuration by clicking in the cells. Placing dots at random showed that two iterations are sufficient to arrive at a stable configuration of the matrix, no matter how many or how few dots I put there, or where I put them.

This was interesting news: the algorithm arrives at its conclusion in two iterations whether the matrix represents a railway layout or not! Therefore the property of getting there in two iterations is a property of the algorithm and not of the configuration of the dots, the railway switches or anything outside the algorithm itself. That makes it simpler to find a proof, since we need only to take into account the way the algorithm works. I set out to try for a proof on my own.

I did not succeed. You may skip to section 6 if you want the final result. Most of the following reflections only give some more insights in the way the algorithm works and/or are statements that prepare the proof.

More Observations

Now we can proceed with any dimension of matrix and any configuration of dots and try to prove the algorithm always finds a stable configuration in only two iterations.

We have an initial matrix we call (0), then matrices (1), (2) and (3) as they are after the first, second and third iterations. The iteration that makes (0) into (1) we call α, the one that makes (1) into (2) we call β and the one that gives (3) from (2) we call γ. We need to prove that (3) is always equal to (2).

Dots can only be added, never removed, so a cell that is empty at one iteration must be empty in all preceding iterations.

Dots can only be added to a row i in a certain column j of that row if there is a dot present elsewhere in the same column j, i.e. dots can only propagate vertically, never horizontally.

The algorithm works left to right and top to bottom, so during a certain iteration it cannot exploit dots that it has added in that iteration but placed in the same row before the cell that it was processing. It can only use the extra information in dots that it added in front of the current cell. It can also use dots it placed in earlier rows, if those rows are referenced in the row currently being processed.

Ex Absurdo

I tried many avenues to find a proof, all in vain, until I decided to try the last resort: a proof ex absurdo.

In other words, I would first assume that there is a configuration that needs three iterations and then show that such a configuration is not possible, i.e. it is impossible to design a matrix (0) such that matrix (3) is different from matrix (2). The following paragraphs try to illuminate a few thoughts that came up.

Statement 1.

Suppose iteration γ is needed, then (3) is different from (2) and there is at least one cell that is empty in (2) and gets filled in (3) during iteration γ.

Statement 2.

To fill an empty cell at position i,j there must be at least one dot in column j. Say it is in row k. Then cell k,j must contain a dot.

In the figure below, the pink cell is empty, and will be filled by the presence of dots in cells 4,5 and 5,2. It is dot 5,2 that will be propagated to 4,2 by processing the dot in 4,5.

	1	2	3	4	5	6
1
2
3
4					◉
5		◉
6

[1]

To fill a cell i,j there must be dots in cells i,k and k,j whereby k≠i and k≠j

[2]

Dots in cells i,k and k,j fill a dot in cell i,j

Statement 3.

Since cell i,j (4,2 in the figure above) must be empty in matrix (2), it must also be empty in matrices (0) and (1). Iterations α and β must not have filled it. Therefore the dots k,j and i,k (5,2 and 4,5) cannot both be present in (1). At least one of them must have been propagated to its position in (2) by iteration β.

Cases

There are now three cases to explore: i,k came from elsewhere, k,j came from elsewhere, and both came from elsewhere.

Case i,k from elsewhere

If i,k came from elsewhere, it must have been in column k already since dots can only propagate vertically. Say it was in row l (row 3 in the figure below).

	1	2	3	4	5	6
1
2
3					◉
4			◉
5		◉
6
(1)

→
β

	1	2	3	4	5	6
1
2
3		◉			◉
4			◉
5		◉
6
(2)

Whatever row l we place it in, there must be a reference to row l in row j (cell 4,3 in the figure) since we can apply [1] to get:

[3]

To fill cell i,k there must be dots in cells i,l and l,k whereby l≠i and l≠k and l≠j

(note that row 2 is prohibited as a location for that dot, since then the reference has to be in cell i,j and that cell must remain empty; but this prohibition is a detail).

However, dot l,k (3,5) always makes dot k,j (5,2) propagate to its own row. Matrix (2) is shown in the state after processing 3,5 but before processing 4,3. If we make l smaller than i, then j,k will first go up to row l and then at the processing of row i be propagated into i,j, thus showing that l must be larger than i or we will not need iteration γ. More formally, according to [2] we can compute the three cells that will be filled given the dots present, by finding combinations where the last letter of one is equal to the first of the second:

[4]

Having dots in cells k,j i,l and l,k the cells that will be filled are i,k l,j

[5]

Having dots in cells k,j i,l i,k l,j cell i,j will be filled.

Even if l is larger than i, cell i,k (light green) must be empty after α which implies, like for i,j, that either one of the dots causing to fill i,k must be elsewhere in (0).

There are again three cases possible. Consider the following figure which deals with the first case:

	1	2	3	4	5	6
1
2					◉
3		◉
4
5						◉
6			◉
(0)

→
α

	1	2	3	4	5	6
1
2						◉
3		◉
4
5
6			◉
(1)

→
β

	1	2	3	4	5	6
1
2			◉
3		◉
4
5
6
(2)

→
γ

	1	2	3	4	5	6
1
2		◉	◉
3		◉
4
5
6
(3)

Iteration γ fills cell i,j (2,2 in the figure). Therefore in (1) at least one of the dots must be elsewhere, say the one in row i. Then in (1) there still must be a reference to it in row i. But then not both dots that cause the light green cell to be filled by iteration β can be present in (0). Move one of them, which moves the reference in row i.

But this once again only displaces the propagation, it does not prevent it: if we now work forward from (0), 2,5 will call up 5,6 which will still be processed during α, not during β, and thus will call up 6,3 to 2,3 during α. We arrive at situation (2) already in (1). There is no way to move the dots vertically without also moving the reference horizontally. If the dot is moved vertically before row i, then only one iteration suffices to bring it to the target, if it is moved to a position lower than row i, then it takes two iterations but never more.

I ended up getting lost in an exploding number of cases, but somehow the approach seemed to be on the right track.

Then, on 29 September 2013, after giving a lecture at CERN's Open Days, by chance I met Valentin Butanescu, a Romanian mathematician who was in the audience. We chatted over coffee about the APR problem, and he went away enthusiastically, convinced it would be a simple thing to prove. He did prove it, though it was not as simple as he had thought and Valentin's proof too has a difficult case. I am also delighted to know that my reflections above helped him see the path through the many cases the proof has to deal with.

Valentin Butanescu's proof

The text below is adapted from a series of e-mails sent between Valentin and I over a few months. I was fairly "thick" in understanding some of his arguments, but I can partly blame the nature of communication by e-mail and the fact that sometimes there were several weeks between messages and spare moments of time to spend on the problem.

One interesting facet of the proof given below is that there is a sequence of cases and sub-cases that grows fast in complexity. The formalisms used below may not satisfy mathematical minds, but this is not a page for a mathematics textbook. I hope I wrote something that Valentin approves of, and that Hugo appreciates. To both my thanks for their help in understanding and confirming it was not a trivial problem and finding a path out of the wilderness of my notes.

Any errors below are mine alone.

Definitions and simple observation

We start with an arbitrary but given matrix. Beside the usual “row-column” notation we use a set associated to a given row that has as elements the numbers of the columns that have dots in that given row. For example if a row has dots in columns 2 and 5 its set would be {2,5}. This set can also be empty of course.

We also need some notation for those sets as they are at different iterations, so we use A_x,y to denote the set associated with row y as it is after iteration x.

If row 3 contains dots in columns 2 and 5 after iteration 1, we have A_1,3 = {2,5}

A_0,1 , A_0,2 , A_0,3 , … describes the original matrix as the sequence of the sets associated with its columns

A_1,1 , A_1,2 , A_1,3 , … describes the matrix in the state it is in after the first iteration and so on.

At each iteration dots are only added, so we have this observation:

for all states n (n>0) and rows r we have that A_n-1,r ⊆ A_n,r

The set for any row can only “grow”.

We define completeness:

the set of a row is “complete” if it reached the final configuration, i.e. no more dots will be added by further iterations of the algorithm.

Note that this is not the same as saying that a row is complete when A_n+1,r = A_n,r because there is no guarantee that A_n+2,r = A_n+1,r; a row is complete when A_n+1,r = A_n,r for all n greater than a certain number (and our goal is to prove that n≤2).

Lemma

Let A_x,y be a set as above.

If ∀i ∈ A_x,y ⇒ A_0,i ⊆ A_x,y then A_x,y is complete.

If after iteration x a row y contains all the dots from the rows it now references as those rows were in the original matrix then this is the final configuration of row y. In other words we can decide completeness on the basis of the sets of the original matrix.

To make it more visually clear what the lemma states, consider the matrices shown below. They are not results of successive iterations.

	1	2	3	4	5	6
1
2					◉
3		◉
4
5						◉
6			◉
(original)

	1	2	3	4	5	6
1
2					◉
3		◉			◉
4
5						◉
6			◉
A

	1	2	3	4	5	6
1
2					◉
3		◉			◉	◉
4
5						◉
6			◉
B

	1	2	3	4	5	6
1
2					◉
3		◉	◉		◉	◉
4
5						◉
6			◉
C

The first one is an example of an original one. Let us consider row 3, or A_x,3. It refers to row 2, but it does not contain row 2, so it does not correspond to the lemma's hypothesis. In matrix A the set A_0,2 has been added to A_x,3, but though now A_x,3 refers to row 5, it does not contain A_0,5 and so still does not correspond to the hypothesis. In matrix B we added also A_0,5 to A_x,3, and find ourselves again in the same situation: A_x,3 refers to row 6 but does not contain A_0,6. Finally matrix C does have a row 3 for which the lemma's hypothesis holds: for all dots in row 3, row 3 contains the set of dots of the row referenced as it is in the original state (from the first matrix); i.e. A_x,3 is the union of A_0,2,A_0,3,A_0,5 and A_0,6.

The lemma states that row 3 as shown in matrix C is complete.

Proof:

It is clear that if none of the other rows change, then the lemma is true. But we must show that no matter how any of the rows A_0,i are changing over the iterations, those changes are always already in A_x,y.

An A_0,i can only gain dots from rows that it references. But since A_0,i is in A_x,y, those referencing dots are in A_x,y already: A_x,y already references all the rows referenced by any of its constituent A_0,i. In other words, if A_0,i references some A_0,j then that j (through A_0,i) already occurs in A_x,y and therefore, by hypothesis, A_0,j is also contained in A_x,y. Although the algorithm will change A_0,i so that it includes A_0,j, A_0,j is already in A_x,y. Continuing iterations therefore cannot add more dots to A_x,y, QED.

Theorem

For any arbitrary matrix, the sets A_2,y are complete.

For any arbitrary matrix only two iterations are ever needed to reach a stable state.

Throughout the proof we use the observation that A_n-1,r ⊆ A_n,r

We will also use another observation: when a row's dots are added to the row that is being processed, and the row being added is lower down in the matrix, then necessarily its dots must have been present in that row in the previous iteration, because the rows lower down have not yet been processed in this iteration and therefore are still in the state of the previous one. This is not true if the row is higher up: then it has already been processed in this iteration and therefore may have been changed by adding dots from one or more other rows to it.

Proof:

Reductio ad absurdum: if the theorem does not hold, then after two iterations there is still at least one row with an incomplete set.

Let A_2,y be the set of the first row that is incomplete, i.e. A_2,i is complete for any i<y, and y can be 1.

We show that ∀i ∈ A_2,y ⇒ A_0,i ⊆ A_2,y therefore A_2,y is in fact complete as per Lemma.

There are several cases:

Case a:

i ∈ A_1,y (the dot in column i was already in row y before iteration 2).

This is the simplest case because it means during the second iteration the algorithm brings all dots from row i, more specifically: A_1,i ⊆ A_2,y and of course because A_0,i ⊆ A_1,i we have A_0,i ⊆ A_2,y.

Case b:

i ∉ A_1,y (the dot in column i was not yet in row y before iteration 2).

Since i ∈ A_2,y it means it was put there by the algorithm when processing the row y during the second iteration. So there must be a dot in some other column e ∈ A_2,y that “brings” i into “our” row y during the second iteration and i ∈ A_2,e (we use 2,y instead of 1,y because we don’t know if e is added during the processing of row y in the second iteration or it was there before).

Processing e adds A_2,e to A_2,y therefore A_2,e ⊆ A_2,y

e (the number of the row “bringing” i to “our” row) can’t be y (the number of “our” row). We have either e<y or e>y.

Case b₁:

If e<y then it means A_2,e is complete (by assumption all rows before y are complete). But because i ∈ A_2,e we have A_0,i ⊆ A_2,e which is included in A_2,y because e was just used in the second step to bring i (and in fact all A_2,e) in A_2,y and thus A_0,i ⊆ A_2,y.

Case b₂:

So we are left with the case e>y. The e row will be processed after y so what we get at second iteration in row y will be the dots from row e as it is after the first iteration: i ∈ A_1,e.

Subcase 1 of case b₂:

If i was already in A_0,e then A_0,i ⊆ A_1,e (because dots from row i got copied into row e in first iteration) just what we need.

Subcase 2 of case b₂:

Now the other case is if i ∈ A_1,e but i ∉ A_0,e (the dot in column i of row e was added during the first iteration).

Then still i must have come from somewhere and that would be another row that got copied into row e, let’s say row e₁. If i also wasn’t originally in row e₁ (as in A_0,e₁) then again it must have come from somewhere, let’s say row e₂ that got copied into e₁. And so on until let’s say at row e_n we had originally the i (there is a dot at column i in row e_n).

Now we look at these numbers e, e₁, e₂, … e_n which are numbers of rows that propagate the dot until the i originally from row e_n gets copied all the way to row e (and after that to A_2,y). We are not worried about the propagation of the dot i itself but about the copying of its associated row A_0,i.

The sequence of numbers e, e₁, e₂, … e_n is strictly descending except possibly for the last number. Indeed, any row e_p in the sequence gets i from e_p+1. If e_p+1>e_p i.e. it denotes a row further down, then that row is not yet processed and therefore must already possess i: a row further down cannot give i to a row being processed further up if it has not yet got it (the second observation). In that case we need not look further: the sequence ends there with the original i dot found there.

We have two cases: one where the series is completely monotonous and one where the last item is greater than the one before it. I.e.:

1) e>e₁>e₂> … >e_n

and

2) e>e₁>e₂> … >e_n-1 with e_n somewhere in this series, but we don't know where except that it cannot be smaller than e_n-1 (because that is case 1).

Case 1: case of only copying down during iteration 1:

The case e>e₁>e₂> … >e_n is rather simple: the e_n row has the original i dot and is processed before any of the others. Therefore it gets all of A_0,i and we have A_0,i ⊆ A_{1,e_n}. Then, because rows are processed in order, we have A_{1,e_n} ⊆ A_{1,e_n-1} … ⊆ A_1,e ⊆ A_2,y so A_0,i ⊆ A_2,y.

Case 2: case of one copy up during iteration 1:

Row e_n has the original i dot but is not the first one to be processed.

There are again two interesting positions of the row e_n:

a) e_n>e>e₁> … >e_n-1 (i.e. e_n is further down than e)

b) e>… >e_k>e_n>e_j…> e_n-1 (i.e. somewhere in the middle)

First, processing of row e_n will always copy A_0,i to it since it has the dot at i in the original matrix: A_0,i ⊆ A_{1,e_n}.

Note also that in both cases the dot e_n will be copied to the rows in the series, because the first one in that series will reference row e_n but that will not copy A_0,i. The crucial part of the argument here is the propagation not of i but of e_n.

To help understanding, the table below shows schematically the rows concerned for case a). In the column labelled (0) are the interesting dots from the original state, and in column (1) is the state at the end of the first iteration. Each row, including row e, has a reference to e_n after the first iteration.

	(0)	(1)
e_n-1	e_n	e_n , i
e_n-2	e_n-1	e_n-1 , e_n , i
…	…	…
e₁	e₂	e₂ , e_n , i
e	e₁	e₁ , e_n , i
e_n	i	i , A_0,i

The dot for i is copied along too, and it also gets to e. Finally e_n gets A_0,i, but too late for the first iteration.

As e_n>e, row e will get the e_n dot in a column further along than that of e.

When row y is processed in the second iteration it will get the contents of row e when processing reaches that column, and therefore the e_n will be placed further along in row y so that it will still be processed and bring A_0,i.

More formally: A_0,i ⊆ A_{1,e_n} and e_n ∈ A_{1,e_n-1} and because rows are processed in order:
A_{1,e_n-1} ⊆ A_{1,e_n-2} ⊆ … ⊆A_1,e₁ ⊆ A_1,e ⊆ A_2,y so e_n ∈ A_2,y and it will be processed because e_n > e.

Lastly, in case b) we do not care about the rows before e_n (consider the table below, analogous to the previous one). It's when we get to e_n that A_0,i will certainly be included and then all further rows including e will get it: A_0,i ⊆ A_{1,e_n}. Finally it will be copied in the second iteration: A_0,i ⊆ A_{1,e_n} ⊆ A_2,y so A_0,i ⊆ A_2,y.

	(0)	(1)
e_n-1	e_n	e_n , i
e_n-2	e_n-1	e_n-1 , e_n , i
…	…	…
e_p	e_p+1	e_p+1 , e_n , i
e_n	i	i , A_0,i
e_p-1	e_p	e_p , e_n , i , A_0,i
…	…	…
e₁	e₂	e₂ , e_n , i , A_0,i
e	e₁	e₁ , e_n , i , A_0,i

QED.

Note that the proof does rely on the processing from left to right and top to bottom. This must be so as it is characteristic of the algorithm and is why it needs only two iterations.

Mathematics

The All-Points-Reachable (APR) Proof

First Attempts

Observations

Independence of Railways

More Observations

Ex Absurdo

Statement 1.

Statement 2.

Statement 3.

Cases

Case i,k from elsewhere

Valentin Butanescu's proof

Definitions and simple observation

Lemma

Theorem

Case a:

Case b:

Case b₁:

Case b₂:

First Attempts

Observations

Independence of Railways

More Observations

Ex Absurdo

Statement 1.

Statement 2.

Statement 3.

Cases

Case i,k from elsewhere

Valentin Butanescu's proof

Definitions and simple observation

Lemma

Theorem

Case a:

Case b:

Case b1:

Case b2:

Case b₁:

Case b₂: