First Attempts
After thinking about the second of the APR problems on and off, and making no progress for a year, I finally picked up enough courage to contact a mathematician at the University of Geneva. I looked around on their website and found a link to prof. Hugo Duminil who worked in a domain that I thought might perhaps be vaguely related to the problem.
Knowing that mathematicians receive letters from quacks trying to prove that π=3.0 and other silly ideas, I was a little wary. To ensure that he would give my message more than two seconds attention, I carefully crafted an e-mail to prof. Duminil, which to my surprise and satisfaction he actually did read and replied to. A few messages later we decided to meet in Geneva.
We had a fruitful coffee, but after about an hour it was clear that this was not a simple problem. Hugo drew many railway layouts of which he was certain that they would take more than two iterations. This was the approach of finding a counter-example. I too was convinced that some of them would need more than two, but back at home my program always ended after the second step…
We were flabbergasted. As we made no more progress at the cafeteria table, we decided to let it rest for a while.
Observations
I tinkered more with the APR program, got some more suggestions from Hugo, we wrote some more e-mail but did not make much progress. I set out to make special versions of the program that let me observe better how the algorithm went about its operation.
This approach showed immediately that at each iteration the algorithm already uses information it has just produced during that iteration. But before I go on, we will need some terminology.
- matrix
- the square table describing the railway layout.
- dot
- checkmark indicating a connection, usually represented by ◉.
- row
- a row in the matrix.
- cell
- one of the positions in a row.
- refer
- we say that a row i with a dot in column j refers to row j.
Inspect these two matrices (tables):
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | ◉ | | | | |
2 | ◉ | | ◉ | | | |
3 | | | | | | ◉ |
4 | | | | | | |
5 | | | | | | |
6 | | | | ◉ | | |
(0) |
| →
α |
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | α | ◉ | α | α | | α |
2 | ◉ | α | ◉ | α | | α |
3 | | | | α | | ◉ |
4 | | | | | | |
5 | | | | | | |
6 | | | | ◉ | | |
(1) |
|
The left hand one, labelled (0), is an initial set-up with five dots. The right hand one, labelled (1), is the result after the first iteration, which we call α. In it, the dots added during iteration α are shown with the letter α.
The algorithm takes matrix (0), starts at row 1 and progresses from left to right and top to bottom, processing each cell. During an iteration, think of it as running over the matrix, with processed cells "behind" it and cells yet to be processed "in front" of it.
Cell 1,1 is empty and produced nothing, but cell 1,2 has a dot, it refers to row 2 and therefore the dots of row 2 will be added to those of row 1. That leads to a dot in 1,1 and one in 1,3 (two red alphas).
When the algorithm comes to cell 1,3 it finds one of the dots which it just put there, and processes that as well. It therefore adds the dots of row 3 to those of row 1 (the red alpha in 1,6).
Already during the processing of each row information gets added that can be used immediately! Not only new dots put "in front" of the algorithm can be used in this iteration, the dot that was added in cell 1,1 will be used by any row lower down that refers to row 1.
There was only one original dot in row 1, but we have already added to row 1 the dots of two other rows, not one!
If we had used a "purist" form of the algorithm, in which matrix (1) is produced by looking at matrix (0) only (instead of looking at (1) while it is progressive modified), then matrix (1) would look like:
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | α | ◉ | α | | | |
2 | ◉ | α | ◉ | | | α |
3 | | | | | | ◉ |
4 | | | | | | |
5 | | | | | | |
6 | | | | ◉ | | |
(1) |
A very much less populated matrix indeed, and with the "purist" version we would indeed potentially need n iterations to arrive at a conclusion about the APR property.
With this "new" information Hugo decided it was indeed possible that two iterations sufficed, and that it might not be too difficult to prove. But neither of us having a lot of time to spend on it, we once again let it rest.
Ex Absurdo
I tried many avenues to find a proof, all in vain, until I decided to try the last resort: a proof ex absurdo.
In other words, I would first assume that there is a configuration that needs three iterations and then show that such a configuration is not possible, i.e. it is impossible to design a matrix (0) such that matrix (3) is different from matrix (2). The following paragraphs try to illuminate a few thoughts that came up.
Statement 1.
Suppose iteration γ is needed, then (3) is different from (2) and there is at least one cell that is empty in (2) and gets filled in (3) during iteration γ.
Statement 2.
To fill an empty cell at position i,j there must be at least one dot in column j. Say it is in row k. Then cell k,j must contain a dot.
In the figure below, the pink cell is empty, and will be filled by the presence of dots in cells 4,5 and 5,2. It is dot 5,2 that will be propagated to 4,2 by processing the dot in 4,5.
[1]
To fill a cell i,j there must be dots in cells i,k and k,j whereby k≠i and k≠j
[2]
Dots in cells i,k and k,j fill a dot in cell i,j
Statement 3.
Since cell i,j (4,2 in the figure above) must be empty in matrix (2), it must also be empty in matrices (0) and (1). Iterations α and β must not have filled it. Therefore the dots k,j and i,k (5,2 and 4,5) cannot both be present in (1). At least one of them must have been propagated to its position in (2) by iteration β.
Cases
There are now three cases to explore: i,k came from elsewhere, k,j came from elsewhere, and both came from elsewhere.
Case i,k from elsewhere
If i,k came from elsewhere, it must have been in column k already since dots can only propagate vertically. Say it was in row l (row 3 in the figure below).
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | | | | | |
2 | | | | | | |
3 | | | | | ◉ | |
4 | | | ◉ | | | |
5 | | ◉ | | | | |
6 | | | | | | |
(1) |
| →
β |
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | | | | | |
2 | | | | | | |
3 | | ◉ | | | ◉ | |
4 | | | ◉ | | | |
5 | | ◉ | | | | |
6 | | | | | | |
(2) |
|
Whatever row l we place it in, there must be a reference to row l in row j (cell 4,3 in the figure) since we can apply [1] to get:
[3]
To fill cell i,k there must be dots in cells i,l and l,k whereby l≠i and l≠k and l≠j
(note that row 2 is prohibited as a location for that dot, since then the reference has to be in cell i,j and that cell must remain empty; but this prohibition is a detail).
However, dot l,k (3,5) always makes dot k,j (5,2) propagate to its own row. Matrix (2) is shown in the state after processing 3,5 but before processing 4,3. If we make l smaller than i, then j,k will first go up to row l and then at the processing of row i be propagated into i,j, thus showing that l must be larger than i or we will not need iteration γ. More formally, according to [2] we can compute the three cells that will be filled given the dots present, by finding combinations where the last letter of one is equal to the first of the second:
[4]
Having dots in cells k,j i,l and l,k the cells that will be filled are i,k l,j
[5]
Having dots in cells k,j i,l i,k l,j cell i,j will be filled.
Even if l is larger than i, cell i,k (light green) must be empty after α which implies, like for i,j, that either one of the dots causing to fill i,k must be elsewhere in (0).
There are again three cases possible. Consider the following figure which deals with the first case:
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | | | | | |
2 | | | | | ◉ | |
3 | | ◉ | | | | |
4 | | | | | | |
5 | | | | | | ◉ |
6 | | | ◉ | | | |
(0) |
| → α |
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | | | | | |
2 | | | | | | ◉ |
3 | | ◉ | | | | |
4 | | | | | | |
5 | | | | | | |
6 | | | ◉ | | | |
(1) |
| → β |
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | | | | | |
2 | | | ◉ | | | |
3 | | ◉ | | | | |
4 | | | | | | |
5 | | | | | | |
6 | | | | | | |
(2) |
| → γ |
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | | | | | |
2 | | ◉ | ◉ | | | |
3 | | ◉ | | | | |
4 | | | | | | |
5 | | | | | | |
6 | | | | | | |
(3) |
|
Iteration γ fills cell i,j (2,2 in the figure). Therefore in (1) at least one of the dots must be elsewhere, say the one in row i. Then in (1) there still must be a reference to it in row i. But then not both dots that cause the light green cell to be filled by iteration β can be present in (0). Move one of them, which moves the reference in row i.
But this once again only displaces the propagation, it does not prevent it: if we now work forward from (0), 2,5 will call up 5,6 which will still be processed during α, not during β, and thus will call up 6,3 to 2,3 during α. We arrive at situation (2) already in (1). There is no way to move the dots vertically without also moving the reference horizontally. If the dot is moved vertically before row i, then only one iteration suffices to bring it to the target, if it is moved to a position lower than row i, then it takes two iterations but never more.
I ended up getting lost in an exploding number of cases, but somehow the approach seemed to be on the right track.
Then, on 29 September 2013, after giving a lecture at CERN's Open Days, by chance I met Valentin Butanescu, a Romanian mathematician who was in the audience. We chatted over coffee about the APR problem, and he went away enthusiastically, convinced it would be a simple thing to prove. He did prove it, though it was not as simple as he had thought and Valentin's proof too has a difficult case. I am also delighted to know that my reflections above helped him see the path through the many cases the proof has to deal with.
Valentin Butanescu's proof
The text below is adapted from a series of e-mails sent between Valentin and I over a few months. I was fairly "thick" in understanding some of his arguments, but I can partly blame the nature of communication by e-mail and the fact that sometimes there were several weeks between messages and spare moments of time to spend on the problem.
One interesting facet of the proof given below is that there is a sequence of cases and sub-cases that grows fast in complexity. The formalisms used below may not satisfy mathematical minds, but this is not a page for a mathematics textbook. I hope I wrote something that Valentin approves of, and that Hugo appreciates. To both my thanks for their help in understanding and confirming it was not a trivial problem and finding a path out of the wilderness of my notes.
Any errors below are mine alone.
Definitions and simple observation
We start with an arbitrary but given matrix. Beside the usual “row-column” notation we use a set associated to a given row that has as elements the numbers of the columns that have dots in that given row. For example if a row has dots in columns 2 and 5 its set would be {2,5}. This set can also be empty of course.
We also need some notation for those sets as they are at different iterations, so we use Ax,y to denote the set associated with row y as it is after iteration x.
If row 3 contains dots in columns 2 and 5 after iteration 1, we have A1,3 = {2,5}
A0,1 , A0,2 , A0,3 , … describes the original matrix as the sequence of the sets associated with its columns
A1,1 , A1,2 , A1,3 , … describes the matrix in the state it is in after the first iteration and so on.
At each iteration dots are only added, so we have this observation:
for all states n (n>0) and rows r we have that An-1,r ⊆ An,r
The set for any row can only “grow”.
We define completeness:
the set of a row is “complete” if it reached the final configuration, i.e. no more dots will be added by further iterations of the algorithm.
Note that this is not the same as saying that a row is complete when An+1,r = An,r because there is no guarantee that An+2,r = An+1,r; a row is complete when An+1,r = An,r for all n greater than a certain number (and our goal is to prove that n≤2).
Lemma
Let Ax,y be a set as above.
If ∀i ∈ Ax,y ⇒ A0,i ⊆ Ax,y then Ax,y is complete.
If after iteration x a row y contains all the dots from the rows it now references as those rows were in the original matrix then this is the final configuration of row y. In other words we can decide completeness on the basis of the sets of the original matrix.
To make it more visually clear what the lemma states, consider the matrices shown below. They are not results of successive iterations.
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | | | | | |
2 | | | | | ◉ | |
3 | | ◉ | | | | |
4 | | | | | | |
5 | | | | | | ◉ |
6 | | | ◉ | | | |
(original) |
| |
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | | | | | |
2 | | | | | ◉ | |
3 | | ◉ | | | ◉ | |
4 | | | | | | |
5 | | | | | | ◉ |
6 | | | ◉ | | | |
A |
| |
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | | | | | |
2 | | | | | ◉ | |
3 | | ◉ | | | ◉ | ◉ |
4 | | | | | | |
5 | | | | | | ◉ |
6 | | | ◉ | | | |
B |
| |
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | | | | | |
2 | | | | | ◉ | |
3 | | ◉ | ◉ | | ◉ | ◉ |
4 | | | | | | |
5 | | | | | | ◉ |
6 | | | ◉ | | | |
C |
|
The first one is an example of an original one. Let us consider row 3, or Ax,3. It refers to row 2, but it does not contain row 2, so it does not correspond to the lemma's hypothesis. In matrix A the set A0,2 has been added to Ax,3, but though now Ax,3 refers to row 5, it does not contain A0,5 and so still does not correspond to the hypothesis. In matrix B we added also A0,5 to Ax,3, and find ourselves again in the same situation: Ax,3 refers to row 6 but does not contain A0,6. Finally matrix C does have a row 3 for which the lemma's hypothesis holds: for all dots in row 3, row 3 contains the set of dots of the row referenced as it is in the original state (from the first matrix); i.e. Ax,3 is the union of A0,2,A0,3,A0,5 and A0,6.
The lemma states that row 3 as shown in matrix C is complete.
Proof:
It is clear that if none of the other rows change, then the lemma is true. But we must show that no matter how any of the rows A0,i are changing over the iterations, those changes are always already in Ax,y.
An A0,i can only gain dots from rows that it references. But since A0,i is in Ax,y, those referencing dots are in Ax,y already: Ax,y already references all the rows referenced by any of its constituent A0,i. In other words, if A0,i references some A0,j then that j (through A0,i) already occurs in Ax,y and therefore, by hypothesis, A0,j is also contained in Ax,y. Although the algorithm will change A0,i so that it includes A0,j, A0,j is already in Ax,y. Continuing iterations therefore cannot add more dots to Ax,y, QED.
Theorem
For any arbitrary matrix, the sets A2,y are complete.
For any arbitrary matrix only two iterations are ever needed to reach a stable state.
Throughout the proof we use the observation that An-1,r ⊆ An,r
We will also use another observation: when a row's dots are added to the row that is being processed, and the row being added is lower down in the matrix, then necessarily its dots must have been present in that row in the previous iteration, because the rows lower down have not yet been processed in this iteration and therefore are still in the state of the previous one. This is not true if the row is higher up: then it has already been processed in this iteration and therefore may have been changed by adding dots from one or more other rows to it.
Proof:
Reductio ad absurdum: if the theorem does not hold, then after two iterations there is still at least one row with an incomplete set.
Let A2,y be the set of the first row that is incomplete, i.e. A2,i is complete for any i<y, and y can be 1.
We show that ∀i ∈ A2,y ⇒ A0,i ⊆ A2,y therefore A2,y is in fact complete as per Lemma.
There are several cases:
Case a:
i ∈ A1,y (the dot in column i was already in row y before iteration 2).
This is the simplest case because it means during the second iteration the algorithm brings all dots from row i, more specifically:
A1,i ⊆ A2,y and of course because A0,i ⊆ A1,i we have A0,i ⊆ A2,y.
Case b:
i ∉ A1,y (the dot in column i was not yet in row y before iteration 2).
Since i ∈ A2,y it means it was put there by the algorithm when processing the row y during the second iteration. So there must be a dot in some other column e ∈ A2,y that “brings” i into “our” row y during the second iteration and i ∈ A2,e (we use 2,y instead of 1,y because we don’t know if e is added during the processing of row y in the second iteration or it was there before).
Processing e adds A2,e to A2,y therefore A2,e ⊆ A2,y
e (the number of the row “bringing” i to “our” row) can’t be y (the number of “our” row). We have either e<y or e>y.
Case b1:
If e<y then it means A2,e is complete (by assumption all rows before y are complete). But because i ∈ A2,e we have A0,i ⊆ A2,e which is included in A2,y because e was just used in the second step to bring i (and in fact all A2,e) in A2,y and thus A0,i ⊆ A2,y.
Case b2:
So we are left with the case e>y. The e row will be processed after y so what we get at second iteration in row y will be the dots from row e as it is after the first iteration: i ∈ A1,e.
Subcase 1 of case b2:
If i was already in A0,e then A0,i ⊆ A1,e (because dots from row i got copied into row e in first iteration) just what we need.
Subcase 2 of case b2:
Now the other case is if i ∈ A1,e but i ∉ A0,e (the dot in column i of row e was added during the first iteration).
Then still i must have come from somewhere and that would be another row that got copied into row e, let’s say row e1. If i also wasn’t originally in row e1 (as in A0,e1) then again it must have come from somewhere, let’s say row e2 that got copied into e1. And so on until let’s say at row en we had originally the i (there is a dot at column i in row en).
Now we look at these numbers e, e1, e2, … en which are numbers of rows that propagate the dot until the i originally from row en gets copied all the way to row e (and after that to A2,y). We are not worried about the propagation of the dot i itself but about the copying of its associated row A0,i.
The sequence of numbers e, e1, e2, … en is strictly descending except possibly for the last number. Indeed, any row ep in the sequence gets i from ep+1. If ep+1>ep i.e. it denotes a row further down, then that row is not yet processed and therefore must already possess i: a row further down cannot give i to a row being processed further up if it has not yet got it (the second observation). In that case we need not look further: the sequence ends there with the original i dot found there.
We have two cases: one where the series is completely monotonous and one where the last item is greater than the one before it. I.e.:
1) e>e1>e2> … >en
and
2) e>e1>e2> … >en-1 with en somewhere in this series, but we don't know where except that it cannot be smaller than en-1 (because that is case 1).
Case 1: case of only copying down during iteration 1:
The case e>e1>e2> … >en is rather simple: the en row has the original i dot and is processed before any of the others. Therefore it gets all of A0,i and we have A0,i ⊆ A1,en. Then, because rows are processed in order, we have A1,en ⊆ A1,en-1 … ⊆ A1,e ⊆ A2,y so A0,i ⊆ A2,y.
Case 2: case of one copy up during iteration 1:
Row en has the original i dot but is not the first one to be processed.
There are again two interesting positions of the row en:
a) en>e>e1> … >en-1 (i.e. en is further down than e)
b) e>… >ek>en>ej…> en-1 (i.e. somewhere in the middle)
First, processing of row en will always copy A0,i to it since it has the dot at i in the original matrix: A0,i ⊆ A1,en.
Note also that in both cases the dot en will be copied to the rows in the series, because the first one in that series will reference row en but that will not copy A0,i. The crucial part of the argument here is the propagation not of i but of en.
To help understanding, the table below shows schematically the rows concerned for case a). In the column labelled (0) are the interesting dots from the original state, and in column (1) is the state at the end of the first iteration. Each row, including row e, has a reference to en after the first iteration.
| (0) | (1) |
en-1 | en | en , i |
en-2 | en-1 | en-1 , en , i |
… | … | … |
e1 | e2 | e2 , en , i |
e | e1 | e1 , en , i |
en | i | i , A0,i |
The dot for i is copied along too, and it also gets to e. Finally en gets A0,i, but too late for the first iteration.
As en>e, row e will get the en dot in a column further along than that of e.
When row y is processed in the second iteration it will get the contents of row e when processing reaches that column, and therefore the en will be placed further along in row y so that it will still be processed and bring A0,i.
More formally: A0,i ⊆ A1,en and en ∈ A1,en-1 and because rows are processed in order:
A1,en-1 ⊆ A1,en-2 ⊆ … ⊆A1,e1 ⊆ A1,e ⊆ A2,y so en ∈ A2,y and it will be processed because en > e.
Lastly, in case b) we do not care about the rows before en (consider the table below, analogous to the previous one). It's when we get to en that A0,i will certainly be included and then all further rows including e will get it: A0,i ⊆ A1,en. Finally it will be copied in the second iteration: A0,i ⊆ A1,en ⊆ A2,y so A0,i ⊆ A2,y.
| (0) | (1) |
en-1 | en | en , i |
en-2 | en-1 | en-1 , en , i |
… | … | … |
ep | ep+1 | ep+1 , en , i |
en | i | i , A0,i |
ep-1 | ep | ep , en , i , A0,i |
… | … | … |
e1 | e2 | e2 , en , i , A0,i |
e | e1 | e1 , en , i , A0,i |
QED.
Note that the proof does rely on the processing from left to right and top to bottom. This must be so as it is characteristic of the algorithm and is why it needs only two iterations.