Last week during a Code Jam at TechEd Barcelona, Witalij Rudnicki threw the glove at me by starting a challenge to solve the “Knight’s Tour” problem using the Graph capabilities of HANA.

The problem is best summarised as the following: “find a succession of moves of a knight piece on a chess board such that the knight visits each square exactly once”.

## Graph Theory

We can model the chess board like a graph with the squares as the vertices and all the possible moves as the edges.

By using this abstraction, the problem actually boils down to a special case of the more general Hamiltonian Path problem, which is basically the search of a path traversing all vertices of a graph exactly once.

## Graph Engine

HANA has built-in support for processing graphs. As of now, it has the following main features:

- Storing and representing graphs (basically you need a vertex and an edge table and then you can create a so-called “workspace” out of them).
- Running some simple algorithms on graphs (neighbourhood, shortest path, connected components, etc).
- Performing queries based on the openCypher query language.
- Writing stored procedures using a dedicated GraphScript language.

To solve this problem, I used the stored procedures together with the graph representation.

## Chess Board

The first step would be to generate the chess board as a graph. We will label the rows from 1 to 8 and the columns from A to H:

To store the graph, I defined the data model as follows:

```
CREATE TABLE nodes (
label VARCHAR(2),
row_number INT NOT NULL,
col_number INT NOT NULL,
PRIMARY KEY (label)
);
CREATE TABLE edges(
label VARCHAR(5) NOT NULL,
from_node VARCHAR(2) NOT NULL,
to_node VARCHAR(2) NOT NULL,
PRIMARY KEY (label)
);
CREATE GRAPH WORKSPACE board
EDGE TABLE edges SOURCE COLUMN from_node
TARGET COLUMN to_node KEY COLUMN label
VERTEX TABLE nodes KEY COLUMN label;
```

I did not want to manually define the board, so I wrote a procedure for generating it for me based on an input size:

```
CREATE OR REPLACE PROCEDURE generate_board_for_knights(
IN size INT,
OUT nodes TABLE(
label VARCHAR(2),
row_number INT,
col_number INT
),
OUT edges TABLE(
label VARCHAR(5),
from_node VARCHAR(2),
to_node VARCHAR(2)
)
) LANGUAGE SQLSCRIPT READS SQL DATA AS BEGIN
nodes = SELECT col_label || row_label AS label,
row_number, col_number
FROM (
-- generate row numbers from 1 to 8
SELECT GENERATED_PERIOD_START AS row_number,
TO_VARCHAR(GENERATED_PERIOD_START) AS row_label
FROM SERIES_GENERATE_INTEGER(1, 1, 9)
) AS row_table
CROSS JOIN (
-- generate column numbers from 1 to 8
-- and column labels from A to H
SELECT GENERATED_PERIOD_START AS col_number,
NCHAR(ASCII('A') + GENERATED_PERIOD_START - 1) AS col_label
FROM SERIES_GENERATE_INTEGER(1, 1, 9)
) AS col_table;
-- the edge label is simply the node labels
-- joined by a hyphen (e.g. A1-C2)
edges = SELECT A.label || '-' || B.label AS label,
A.label AS from_node, B.label AS to_node
FROM :nodes AS A CROSS JOIN :nodes AS B
WHERE (ABS(A.row_number - B.row_number) = 2
AND ABS(A.col_number - B.col_number) = 1)
OR (ABS(A.row_number - B.row_number) = 1
AND ABS(A.col_number - B.col_number) = 2);
END;
```

After creating the procedure, I was able to simply call it to fill my tables:

`CALL "GENERATE_BOARD_FOR_KNIGHTS"(8, nodes, edges) WITH OVERVIEW;`

WebIDE has a built-in graph viewer which provides a pretty nice visualisation of the chess board:

Lastly, I defined a table type for representing paths:

`CREATE TYPE tt_path AS TABLE(label VARCHAR(2));`

## Naive Approach

The first idea that came to my mind was to try to solve it via backtracking. I decided to not do it, because of the fact that it would be simply too slow.

Vanilla backtracking has an exponential asymptotic time complexity. In our graph each node has potentially 8 neighbours and we have to find a 64-node long path, so it would need to do (slightly less than) 8 ^ 64 steps to finish the whole algorithm.

Of course, if we would stop when we find the first solution it will be faster, but I still decided that there must be a better solution.

## Greedy Algorithm

After thinking on it a while, I decided that it should be possible to find a solution using a greedy algorithm with an appropriate selection heuristic.

The first heuristic that I tried was to always try to **go towards the bottom** of the board and slowly fill in from there. After fighting a little with the GraphScript language, I wrote the following procedure:

```
CREATE OR REPLACE PROCEDURE "FIND_HAMILTONIAN_PATH" (
IN iv_origin VARCHAR(2),
OUT ot_path tt_path
) LANGUAGE GRAPH READS SQL DATA AS
BEGIN
Graph lo_graph = Graph("BOARD");
BigInt lv_size = Count(Vertices(:lo_graph));
-- retrieve the start point from the graph based on the origin label
Vertex lo_current = Vertex(:lo_graph, :iv_origin);
Sequence<Vertex> lt_nodes = [:lo_current];
ALTER lo_graph ADD TEMPORARY VERTEX ATTRIBUTE(Boolean visited = false);
WHILE (COUNT(:lt_nodes) <= :lv_size) {
lo_current."VISITED" = true;
Boolean lf_found = false;
Int lv_row = -1;
Int lv_col = -1;
FOREACH lo_neighbor IN NEIGHBORS(:lo_graph, { :lo_current }, 1, 1) {
-- first try to go on lower rows
-- if the row is the same, go towards the left corner
IF (NOT :lo_neighbor."VISITED"
AND (:lv_row < :lo_neighbor."ROW_NUMBER"
OR (:lv_row == :lo_neighbor."ROW_NUMBER"
AND :lv_col < :lo_neighbor."COL_NUMBER"))) {
lv_row = :lo_neighbor."ROW_NUMBER";
lv_col = :lo_neighbor."COL_NUMBER";
lf_found = true;
lo_current = :lo_neighbor;
}
}
-- stop the loop if we did not find a next node
-- this avoids infinite loops which do not do anything
IF (NOT :lf_found) {
BREAK;
}
-- append the next node to the path
lt_nodes = :lt_nodes || [:lo_current];
}
ot_path = SELECT :lo_node.label FOREACH lo_node IN :lt_nodes;
END;
```

Unfortunately, this approach failed miserably, yielding mediocre results:

The problem looked to me like once the bottom half of the board was almost full, the knight got into a dead end. To prevent this, I tried applying a different heuristic: to always **go as close as possible to the center** of the board.

```
CREATE OR REPLACE PROCEDURE "FIND_HAMILTONIAN_PATH" (
IN iv_origin VARCHAR(2),
OUT ot_path tt_path
) LANGUAGE GRAPH READS SQL DATA AS
BEGIN
Graph lo_graph = Graph("BOARD");
-- get the size of the graph and use it throughout the algorithm
BigInt lv_size = Count(Vertices(:lo_graph));
-- retrieve the start point from the graph based on the origin label
Vertex lo_current = Vertex(:lo_graph, :iv_origin);
Sequence<Vertex> lt_nodes = [:lo_current];
ALTER lo_graph ADD TEMPORARY VERTEX ATTRIBUTE(Boolean visited = false);
WHILE (COUNT(:lt_nodes) <= :lv_size) {
lo_current."VISITED" = true;
Boolean lf_found = false;
BigInt lv_score = 4L * :lv_size;
FOREACH lo_neighbor IN NEIGHBORS(:lo_graph, { :lo_current }, 1, 1) {
IF (NOT :lo_neighbor."VISITED") {
-- compute the delta (4.5 would be the middle of the board, but we
-- multiply everything by 2 to be able to use integers)
BigInt lv_row_delta = 2L * BIGINT(:lo_neighbor."ROW_NUMBER") - 9L;
-- do an ABS, because GraphScript does not support math functions
IF (:lv_row_delta < 0L) {
lv_row_delta = :lv_row_delta * -1L;
}
BigInt lv_col_delta = 2L * BIGINT(:lo_neighbor."COL_NUMBER") - 9L;
IF (:lv_col_delta < 0L) {
lv_col_delta = :lv_col_delta * -1L;
}
IF (:lv_score > :lv_col_delta + :lv_row_delta) {
lv_score = :lv_col_delta + :lv_row_delta;
lo_current = :lo_neighbor;
lf_found = true;
}
}
}
IF (NOT :lf_found) {
BREAK;
}
-- append the next node to the path
lt_nodes = :lt_nodes || [:lo_current];
}
ot_path = SELECT :lo_node.label FOREACH lo_node IN :lt_nodes;
END;
```

I envisioned that the knight will spin around the center of the board. What I did not realise is that this approach would basically cut off the corners of the board leading the knight to be stuck in one of the corners:

Seeing that my heuristics are not working, I decided to read a little literature on the topic.

## Warnsdorff Rule

Source: Squirrel, Douglas, and Paul Çull. “A Warnsdorff-Rule Algorithm for Knight’s Tours.” (1996).

After reading some papers, I found out that I was very close to a correct solution. The heuristic that I was searching for was to always select the next node to be the one with the smallest degree. That is, we select the node which has the smallest amount of not-yet-visited neighbours.

```
CREATE OR REPLACE PROCEDURE "FIND_HAMILTONIAN_PATH_VIA_WARNSDORFF" (
IN iv_origin VARCHAR(2),
OUT ot_path tt_path
) LANGUAGE GRAPH READS SQL DATA AS
BEGIN
Graph lo_graph = Graph("BOARD");
-- get the size of the graph and use it throughout the algorithm
BigInt lv_size = Count(Vertices(:lo_graph));
-- retrieve the start point from the graph based on the origin label
Vertex lo_current = Vertex(:lo_graph, :iv_origin);
Sequence<Vertex> lt_nodes = [:lo_current];
ALTER lo_graph ADD TEMPORARY VERTEX ATTRIBUTE(Boolean visited = false);
WHILE (COUNT(:lt_nodes) <= :lv_size) {
lo_current.visited = true;
Boolean lf_found = false;
BigInt lv_min = :lv_size;
FOREACH lo_first IN NEIGHBORS(:lo_graph, { :lo_current }, 1, 1) {
IF (NOT :lo_first.visited) {
BigInt lv_first_count = 0L;
-- compute the degree of each neighbour
FOREACH lo_second IN NEIGHBORS(:lo_graph, { :lo_first }, 1, 1) {
IF (NOT :lo_second.visited AND :lo_second != :lo_current) {
lv_first_count = :lv_first_count + 1L;
}
}
-- go for the minimal degree
IF (:lv_first_count < :lv_min) {
lo_current = :lo_first;
lv_min = :lv_first_count;
lf_found = true;
}
}
}
IF (NOT :lf_found) {
BREAK;
}
-- append the next node to the path
lt_nodes = :lt_nodes || [:lo_current];
}
ot_path = SELECT :lo_node.label FOREACH lo_node IN :lt_nodes;
END;
```

Basically, this would yield the exact opposite result as the image above: first visit the margins of the board, then the center:

I finally got a solution! But what I don’t like at this heuristic is the fact that, when we have to decide on the next node and there are two nodes with the exact same score, we always take the first one. In the literature they usually take one at random, but there is no RANDOM function in GraphScript.

Sometimes this heuristic may not be able to yield a result (although I never encountered this situation), so I wanted to improve it further.

## Pohl’s Tie Breaking

Source: Pohl, Ira. “A method for finding Hamilton paths and Knight’s tours.” (1967).

One possible way of breaking ties would be to repeatedly apply the selection mechanism on the candidate nodes. Because I cannot really write a recursive call using GraphScript, I decided to go for a single level tie-breaking:

- First find all candidate nodes based on the Warnsdorff rule.
- If there is only one candidate, select that one.
- Otherwise, compute the minimal neighbour degree for each candidate and select the one with the smallest value. At this point, we could again reach a tie, so we just select the first one (this would be solved via recursion – except for the perfectly symmetrical arrangements).

```
CREATE OR REPLACE PROCEDURE "FIND_HAMILTONIAN_PATH_VIA_WARNSDORFF_AND_POHL" (
IN iv_origin VARCHAR(2),
OUT ot_path tt_path
) LANGUAGE GRAPH READS SQL DATA AS
BEGIN
Graph lo_graph = Graph("BOARD");
-- get the size of the graph and use it throughout the algorithm
BigInt lv_size = Count(Vertices(:lo_graph));
-- retrieve the start point from the graph based on the origin label
Vertex lo_current = Vertex(:lo_graph, :iv_origin);
Sequence<Vertex> lt_nodes = [:lo_current];
ALTER lo_graph ADD TEMPORARY VERTEX ATTRIBUTE(Boolean visited = false);
WHILE (COUNT(:lt_nodes) <= :lv_size) {
lo_current.visited = true;
BigInt lv_min = :lv_size;
-- we will use this sequence to store candidate nodes
Sequence<Vertex> lt_candidates = Sequence<Vertex>(:lo_graph);
FOREACH lo_first IN NEIGHBORS(:lo_graph, { :lo_current }, 1, 1) {
IF (NOT :lo_first.visited) {
BigInt lv_first_count = 0L;
-- compute the degree of each candidate
FOREACH lo_second IN NEIGHBORS(:lo_graph, { :lo_first }, 1, 1) {
IF (NOT :lo_second.visited) {
lv_first_count = :lv_first_count + 1L;
}
}
-- add the candidate to the list if needed
IF (:lv_first_count < :lv_min) {
lv_min = :lv_first_count;
lt_candidates = [ :lo_first ];
} ELSE {
IF (:lv_first_count == :lv_min) {
lt_candidates = :lt_candidates || [ :lo_first ];
}
}
}
}
-- terminate the algorithm if no candidates were found
IF (COUNT(:lt_candidates) == 0L) {
BREAK;
} ELSE {
-- select the single candidate if only one was found
IF (COUNT(:lt_candidates) == 1L) {
lo_current = :lt_candidates[1L];
-- otherwise, apply the tie-breaking heuristic
} ELSE {
lv_min = :lv_size;
FOREACH lo_candidate IN :lt_candidates {
BigInt lv_candidate_min = :lv_size;
-- find the minimum neighbour degree for each candidate
FOREACH lo_first IN NEIGHBORS(:lo_graph, {:lo_candidate}, 1, 1) {
BigInt lv_first_count = 0L;
IF (NOT :lo_first.visited) {
FOREACH lo_second IN NEIGHBORS(:lo_graph, {:lo_first}, 1, 1) {
IF (NOT :lo_second.visited AND :lo_second != :lo_candidate) {
lv_first_count = :lv_first_count + 1L;
}
}
}
IF (:lv_first_count < :lv_candidate_min) {
lv_candidate_min = :lv_first_count;
}
}
-- select the candidate with the smallest minimum neighbour degree
IF (:lv_candidate_min <= :lv_min) {
lo_current = :lo_candidate;
lv_min = :lv_candidate_min;
}
}
}
}
-- add the next node to the path
lt_nodes = :lt_nodes || [:lo_current];
}
ot_path = SELECT :lo_node.label FOREACH lo_node IN :lt_nodes;
END;
```

Running it seems to generate the expected result:

I still don’t like this solution, because even for asymmetrical arrangements, we still have to take a random choice (due to the lack of recursion).

## Roth’s Tie Breaking

Source: Ganzfried, Sam. “A Simple Algorithm for Knight’s Tours.” (2004).

Another idea for the tie breaking is to select from the candidate nodes based on the distance from the center of the board (take the node with the maximal distance to the center):

```
CREATE OR REPLACE PROCEDURE "FIND_HAMILTONIAN_PATH_VIA_WARNSDORFF_AND_ROTH" (
IN iv_origin VARCHAR(2),
OUT ot_path tt_path
) LANGUAGE GRAPH READS SQL DATA AS
BEGIN
Graph lo_graph = Graph("BOARD");
-- get the size of the graph and use it throughout the algorithm
BigInt lv_size = Count(Vertices(:lo_graph));
-- retrieve the start point from the graph based on the origin label
Vertex lo_current = Vertex(:lo_graph, :iv_origin);
Sequence<Vertex> lt_nodes = [:lo_current];
ALTER lo_graph ADD TEMPORARY VERTEX ATTRIBUTE(Boolean visited = false);
WHILE (COUNT(:lt_nodes) <= :lv_size) {
lo_current.visited = true;
BigInt lv_min = :lv_size;
-- we will use this sequence to store candidate nodes
Sequence<Vertex> lt_candidates = Sequence<Vertex>(:lo_graph);
FOREACH lo_first IN NEIGHBORS(:lo_graph, { :lo_current }, 1, 1) {
IF (NOT :lo_first.visited) {
BigInt lv_first_count = 0L;
-- compute the degree of each candidate
FOREACH lo_second IN NEIGHBORS(:lo_graph, { :lo_first }, 1, 1) {
IF (NOT :lo_second.visited) {
lv_first_count = :lv_first_count + 1L;
}
}
-- add the candidate to the list if needed
IF (:lv_first_count < :lv_min) {
lv_min = :lv_first_count;
lt_candidates = [ :lo_first ];
} ELSE {
IF (:lv_first_count == :lv_min) {
lt_candidates = :lt_candidates || [ :lo_first ];
}
}
}
}
-- terminate the algorithm if no candidates were found
IF (COUNT(:lt_candidates) == 0L) {
BREAK;
} ELSE {
-- select the single candidate if only one was found
IF (COUNT(:lt_candidates) == 1L) {
lo_current = :lt_candidates[1L];
-- otherwise, apply the tie-breaking heuristic
} ELSE {
Double lv_max_distance = -1.0;
FOREACH lo_candidate IN :lt_candidates {
-- compute the distance between the vertex and the board center
Double lv_distance = (DOUBLE(:lo_candidate."ROW_NUMBER") - 4.5)
* (DOUBLE(:lo_candidate."ROW_NUMBER") - 4.5)
+ (DOUBLE(:lo_candidate."COL_NUMBER") - 4.5)
* (DOUBLE(:lo_candidate."COL_NUMBER") - 4.5);
-- select the candidate with maximal distance
IF (:lv_distance > :lv_max_distance ) {
lo_current = :lo_candidate;
lv_max_distance = :lv_distance;
}
}
}
}
-- add the next node to the path
lt_nodes = :lt_nodes || [:lo_current];
}
ot_path = SELECT :lo_node.label FOREACH lo_node IN :lt_nodes;
END;
```

Surprisingly, running this for the same starting point generated the same result as Pohl’s tie-breaking (most of the time).

## Bloopers

I don’t want to give a false impression and make people think that everything went smoothly. I actually encountered a few bugs and limitations which made me waste a few hours of my time:

- HANA Express is too large for my computer. I had to clean up 40GB of my laptop to be able to run it. I wanted to use HXE to have access to the latest service pack, but I really started thinking to use an older HANA to which I have access…
- I am not very happy with the fact that GraphScript is a completely different language than SQLScript, especially because of the fact that most functions available in SQLScript are not available in GraphScript. For example, there is no
*ABS*in GraphScript. - I did not find a way to pass a Graph object through the procedure interface (e.g. to create a recursive procedure).
- The WebIDE is always showing a ton of errors as if all the code is completely broken. To be honest, I would prefer to not have any kind of syntax check and just have syntax highlighting.
- Once I introduced the first IF clause, the WebIDE “Run” button would not anymore execute my “CREATE OR REPLACE PROCEDURE” statement. I had to resort to manually sending the code via an
*jQuery.ajax*call from the development console.

```
jQuery.ajax({
method: "POST",
url: "/sap/hana/cst/api/v2/sessions(...)/connections(...)/sqlExecute()",
headers: {"X-CSRF-Token": "..."},
contentType: "application/json",
data: JSON.stringify({
"statements": [{"statement": code, "type": "UPDATE"}],
"maxResultSize": 1000,
"limitLOBColumnSize": 1024,
"includePosColumn": true,
"bExeWithoutPrepare": false,
"bStopOnError": true,
"bRunAsBackgroundActivity": false
})
});
```

## Conclusion

Overall, I think that working with the Graph Engine is fun, but it is still not a fully mature technology. Actually, this might apply to the whole XSA and not only to the Graph capabilities. Coming from the Java ecosystem, where you can have an excellent developer experience, developing on HANA using the WebIDE is somewhat lacklustre.

I doff my hat/helmet/kepi to you. Witalij Rudnicki said you solved this in a ~weekend~. That’s impressive! I could barely describe the problem in my write-up, much know where to begin to solve it. 🙂

Thanks! There is a simple explanation though: I become very competitive when people challenge me in public to solve a coding problem I also really enjoyed Witalij Rudnicki‘s code Jam, so I wanted to play around a little more with the Graph engine…

Follow up question, for benchmarks. You mentioned this was solved in a weekend (of human time). How long does each alternate solution execute? Seconds? Sub-seconds? I understand these runs are on a PC-based express version, not a cloud-deployed state-of-the–art hardware platform.

“Back in the day”, as we say, I compared various platforms that would run sort or sieve (prime numbers – sorry, first said Fibonacci) algorithms, generally getting faster as newer hardware appeared (and sometimes compiler tricks).

https://www.tldp.org/HOWTO/text/Benchmarking-HOWTO

http://www.cs.nthu.edu.tw/~tingting/Archi_07/benchmark/sieve.c

I believe the Sieve benchmark was rigged

[2017]http://forum.6502.org/viewtopic.php?f=2&t=4392

My Sieve commentary from ~2008, but about learning this topic around 1962:

http://jspath55.blogspot.com/2008/02/eight-is-enough-things-to-know-about-me.html

So all three of the final variants run at about 20ms on my beaten down laptop.

That is somewhat understandable, because the time complexities are the following:

Where n is the number of nodes in the graph (64), m is the number of edges going out of each node (max. 8) and x is the complexity of running the neighbourhood function.

X could be anything from O(1) if there is an optimisation on SAP’s side (e.g. storing adjacency lists) or O(n) if they actually have to loop over an adjacency matrix. Seeing how well the algorithm went, I actually expect that X is something constant, i.e. O(1), so the complexities would reduce to:

Which is simply a linear complexity in regard to the number of squares on the board (m is constant for all chess boards). Also, considering the relatively small values of n and m, it is no surprise that it works this fast.

Compare that with backtracking which is exponential (green – linear; dark orange – exponential):

Very clear, at least the parts I understand.

From the statement “linear complexity in regard to the number of squares”, I take it that a larger board (say, 64×64 squares) would be solved in a linear multiple time of the smaller board?

How about, say, 3-D chess, where the board is a series of slices? That would increase the number of connections from each square, though presumably not too much degradation in solve time. Of course, then the next question is n-dimensionality.

I briefly looked up a Star Trek 3-D chess board, and recalled the layers are not cubical, as different steps have different sizes [ https://www.chessvariants.com/3d.dir/startrek.html ]. I guess that variation would only affect the initial provisioning of the board “universe” where edges would be truncated as needed.

Thanks again for the fascinating insights into graphs!

Yes, so if my assumptions related to the cost of running neighbourhood function call are right, solving it for a 16×16 board for example should take at most 4 times as long as running it for a 8×8 board (as the number of squares is 4 times higher).

Probably it will take less than 4 x 20ms = 80ms, because some part of that original 20ms is just initialisation or other kind of constant work.

You really did it! Congrats on this (and on the time it took you to solve this). I was there at the CodeJam when Witalij challenged you. Didn’t think there’d be a solution mere four days later (especially as it was my first contact with HANA Graph)

Well, I challenged more people with this (including myself), but only Serban came with the solution so far 🙂