15 votes

Programming Challenge: Compute the shortest path to visit all target spots on a grid.

Posted September 27, 2018 by Emerald_Knight

Let's do something a little more challenging this time.

Given an MxN grid of arbitrary size, and given a random starting place on that grid and a list of points to visit, find the shortest path such that you visit all of them. Path lengths will be computed using taxicab distances rather than strict coordinate distance calculations.

There are no restrictions on expected input this time. Output should be the total distance traveled between points.

Example

Assume that we use the character # to denote a spot on the grid, the character @ to denote your starting point, and the character * to denote a place on the grid that you're required to visit. One such grid may look something like this:

######
######
**####
#*####
#*#*##
#@####
######

In this case, let's say that the bottom-left point on the grid is point (0, 0) and we're starting on point (1, 1). One valid solution would be to move to point (3, 2), then (1, 2), then (1, 3), then (1, 4), and finally (0, 4). The shortest path available is thus 8. Note that it's not enough just to visit the next nearest point on the grid!

13 comments

[2]
wirelyre
September 28, 2018
Link
Here's a naive solution in J: start =. 1j1 points =. 0j4 1j2 1j3 1j4 3j2 distance =. +/ @ (2&(+/@:|@+.@-/\)) score =. distance @ (start , points A.~ ]) "0 smoutput <./ score i.!#points Try it...
- Exemplary
Here's a naive solution in J:
```
start  =. 1j1
points =. 0j4 1j2 1j3 1j4 3j2

distance =. +/ @ (2&(+/@:|@+.@-/\))
score =. distance @ (start , points A.~ ]) "0
smoutput <./ score i.!#points
```
Try it online!

What?

J is a functional language and a descendant of APL. J is unusual because you don't write loops. Instead, you write code to execute each iteration. Array iteration happens automatically.
```
   | (2 _3 3j4)   NB. absolute value of each in list
2 3 5
```
Complex numbers are a fundamental data type in J. For instance, the absolute value (magnitude) of 3j4, or 3+4i, is 5. We'll encode points on the plane as complex numbers.
```
start  =. 1j1
points =. 0j4 1j2 1j3 1j4 3j2   NB. lists are written with spaces
```
Suppose we had a list of two points (two complex numbers). What is their taxicab distance? We need to subtract them (-/), split the result into real and imaginary components (+.), take the absolute value of each of those (|), and add the result together (+/).
```
   -/   0j4 1j2
_1j2
   +. -/   0j4 1j2
_1 2
   | +. -/   0j4 1j2
1 2
   +/ | +. -/   0j4 1j2
3
   (+/@:|@+.@-/)   0j4 1j2    NB. equivalently
3
```
Now suppose we have a list of many points. What is the distance between each pair of points? We can reuse the function above, but apply it to pairs of numbers using the infix operator \.
```
   2&(+/@:|@+.@-/\) points
3 1 1 4
```
Now we just need to add those together (+/), and we have the distance travelled for a particular path. To form the path start followed by the list of points, we use the comma operator ,.
```
   start,points
1j1 0j4 1j2 1j3 1j4 3j2
   2&(+/@:|@+.@-/\) start,points
4 3 1 1 4
   +/ @ (2&(+/@:|@+.@-/\)) start,points
13
   NB. functions can be assigned like scalars
   distance =. +/ @ (2&(+/@:|@+.@-/\))
   distance start,points
13
```
Here's the plan: We'll find every possible permutation of points. For each one, we'll prepend start to it, then figure out the distance travelled. Once we have all the distances, we'll find the minimum.

In J, permutations are called "anagrams". They are calculated using the operator A.. For any list there are !#list angrams (factorial of number of elements in list). For example,
```
   i. 3
0 1 2
   0 A. (i.3)
0 1 2
   1 A. (i.3)
0 2 1
   i. ! 3
0 1 2 3 4 5
   (i.!3) A. (i.3)   NB. loop over anagram indices on left
0 1 2
0 2 1
1 0 2
1 2 0
2 0 1
2 1 0
```
So given an anagram index, we can find that anagram of points, prepend start, and then calculate the distance for that anagram. Let's call the distance travelled, given an anagram index, the "score" of that index.
```
   (start , points A.~ ]) 0
1j1 0j4 1j2 1j3 1j4 3j2
   (start , points A.~ ]) 1
1j1 0j4 1j2 1j3 3j2 1j4
   distance @ (start , points A.~ ]) 0
13
   distance @ (start , points A.~ ]) 1
15
   score =. distance @ (start , points A.~ ]) "0
```
The "0 makes sure that the function score applies to a single value (the anagram index), instead of trying to run on a list, which produces bizarre behavior.

Finally, we iterate score over each anagram index (i.!#points), find the minimum (<./), and print it out (smoutput).
```
   smoutput <./ score i.!#points
8
```
Exercise

Compute path distances using the Euclidean metric. (Hint 1: All of the operators have already been introduced!) (Hint 2: This can be done by deleting code!)

Why?

This isn't as hard to get used to as you might think. Once you know the syntax, it's not hard to read this kind of program, and it's downright easy to write one. J is, if nothing else, an extremely powerful desk calculator.

People seem to get hung up on the terseness of J. But J merely packs more source code per character. Versions of this program in any language will have the same elements and concepts. You'll always have to read forwards and backwards, inside and out, in order to fully understand a program. In most languages, this happens at the level of lines and functions. In J, this happens at the level of characters and parentheses.

Also you look like a computer sorcerer. A sourcerer.
6 votes
1. Emerald_Knight (OP)
  September 28, 2018
  Link Parent
  Wow. I'm really going to have to dive into this later. I never expected such a short, compact solution. Neat!
  
  Wow. I'm really going to have to dive into this later. I never expected such a short, compact solution. Neat!
  
  2 votes
[7]
Emerald_Knight (OP)
September 28, 2018 (edited September 28, 2018)
Link
It's important to note that this problem is essentially an extension of the Hamiltonian path problem. It's sufficient to calculate the distances between each given target node (including their...
It's important to note that this problem is essentially an extension of the Hamiltonian path problem. It's sufficient to calculate the distances between each given target node (including their distances from the start node) and treat these distances as our edge weights. This gives us an approximate O(n^2) complexity at the start. This is the only "fast" part of the program. The rest is performed as a brute force search, which is O(n!). There exist faster algorithms, but no known algorithm will even be polynomial--they're all exponential at best.

This problem is an NP-complete problem.

Here's my own submission, once again in PHP:
```
<?php

class Node {

    public $x;
    public $y;

    public function __construct($x, $y) {
        $this->x = $x;
        $this->y = $y;
    }

    public function distanceTo($other_node) {
        return abs($this->x - $other_node->x) + abs($this->y - $other_node->y);
    }

}

class Main {

    public function __construct() {
        // Empty.
    }

    public function solve($data) {
        $nodes = array();
        foreach($data as $node_info) {
            array_push($nodes, new Node($node_info['x'], $node_info['y']));
        }

        $start = array_shift($nodes);
        return $this->computePathLength($start, $nodes);
    }

    private function computePathLength($current_node, $remaining_nodes) {
        $num_nodes = count($remaining_nodes);
        if($num_nodes === 0) {
            return 0;
        }

        $current_minimum = PHP_INT_MAX;
        for($i = 0; $i < $num_nodes; $i++) {
            $next_node = array_shift($remaining_nodes);
            $path_length = $current_node->distanceTo($next_node) + $this->computePathLength($next_node, $remaining_nodes);
            array_push($remaining_nodes, $next_node);
            $current_minimum = min($path_length, $current_minimum);
        }

        return $current_minimum;
    }

}

$data = array(
    array('x'=>1, 'y'=>1),
    array('x'=>0, 'y'=>4),
    array('x'=>1, 'y'=>4),
    array('x'=>3, 'y'=>2),
    array('x'=>1, 'y'=>3),
    array('x'=>1, 'y'=>2)
);

$main = new Main();

echo $main->solve($data);

?>
```
I decided to keep my solution simple. I didn't even precompute the edge weights, so there's a ton of waste in this solution. The first element of the array is the starting node. Obviously this program could be extended to accept arbitrary input from a user specifying the grid size and node locations, or the grid could have been passed in as a string with dimensions specified to aid in delimiting start and end points for the individual rows, with all of the data validation that all entails. I didn't feel like doing that this time around.
3 votes
1. [6]
  Emerald_Knight (OP)
  September 28, 2018
  Link Parent
  An interesting aside, for those wondering about maximum complexity: in the very worst case, the number of nodes that need to be traversed will be at most MxN, which means your program could end up...
  
  An interesting aside, for those wondering about maximum complexity: in the very worst case, the number of nodes that need to be traversed will be at most MxN, which means your program could end up needing to handle (MxN)! paths if you use a brute force solution like I did. In the example grid I provided for this challenge, if we replace all remaining instances of # with *, then we find ourselves with a (6x7)! = 42! ~= 1.4 x 10^52 permutations to work through. Thus, even in such a small grid we encounter an absolutely enormous problem that can't be solved in a reasonable amount of time with the brute force approach. Even with the known O(1.657^n) algorithm, we're looking at more than 1.6 billion paths to work through.
  
  2 votes
  1. [5]
    wirelyre
    September 28, 2018
    Link Parent
    I'm not convinced that this is NP hard. It is a version of the Travelling salesman problem, but it's very restricted. In particular: This is not a general graph, but rather a metric space; it's...
    
    I'm not convinced that this is NP hard. It is a version of the Travelling salesman problem, but it's very restricted. In particular:
    
    This is not a general graph, but rather a metric space;
    
    it's planar;
    
    all coordinates are integers.
    
    That makes me suspicious of the time complexity. Suppose we change just one thing: coordinates become 1D. Now the solution is very easy. Use radix sort on the coordinates, then figure out whether it's faster to go left first or right first. Complexity is O(n log max{points}).
    
    So why does the jump to 2D make this NP hard? Intuitively, it seems much closer to the 1D case than to an arbitrary weighted graph.
    
    1 vote
    
    [4]
    Emerald_Knight (OP)
    September 28, 2018
    Link Parent
    Those restrictions don't actually have the effect you think it would. The additional restrictions mean that this problem is reducible to the TSP by removing those restrictions. If anything, your...
    
    Those restrictions don't actually have the effect you think it would. The additional restrictions mean that this problem is reducible to the TSP by removing those restrictions. If anything, your 1D example is a very special case of the TSP where all but one dimension for all nodes is different. That solution would absolutely work in a TSP where all cities are perfectly lined up, but fails for the general case.
    
    There should be no reason for any of those restrictions having a significant impact on the problem complexity. If there is, then that could have some major implications for the TSP--that is, you could probably reduce the TSP to the allegedly simpler problem posed here.
    
    I sincerely doubt that we're discussing a solution to P vs. NP :)
    
    2 votes
    
    [3]
    wirelyre
    September 29, 2018
    Link Parent
    I think we're talking past each other here. Let's call the problems TSP, 2D, and 1D. We're on the same page that TSP > 2D > 1D, in the sense that a solution for TSP implies a solution for 2D, and...
    
    I think we're talking past each other here.
    
    Let's call the problems TSP, 2D, and 1D. We're on the same page that TSP > 2D > 1D, in the sense that a solution for TSP implies a solution for 2D, and a solution for 2D implies a solution for 1D. And TSP is NP complete, while 1D is practically linear (haha).
    
    The additional restrictions mean that this problem is reducible to the TSP by removing those restrictions.
    
    You seem to be saying that, because you can rephrase any problem in 2D to a problem in TSP, 2D must be NP hard. But that's backwards, right? It should be, if you can rephrase any problem in TSP to 2D, then 2D must be NP hard.
    
    Hence my analogy. You can rephrase any problem in 1D to a problem in TSP. So must 1D must be NP hard? No, 1D is obviously P.
    
    As evidence that 2D is not obviously NP hard, here's a closely related unsolved problem. This was very hard to Google for.
    
    2 votes
    
    Emerald_Knight (OP)
    September 29, 2018
    Link Parent
    Fair enough. I'll have to circle back to this at some point when I'm not away from home. Although similar in some respects, there is the restriction that the graphs be "solid". This has some...
    
    It should be, if you can rephrase any problem in TSP to 2D, then 2D must be NP hard.
    
    Fair enough. I'll have to circle back to this at some point when I'm not away from home.
    
    here's a closely related unsolved problem.
    
    Although similar in some respects, there is the restriction that the graphs be "solid". This has some pretty significant implications on the results in that the general form of the graph is somewhat predictable. This would be a very special case of the taxicab Hamiltonian path problem we're looking at here.
    
    I'm pretty confident that this problem is NP-hard, but I'll have to dig around.
    
    1 vote
    
    Emerald_Knight (OP)
    September 29, 2018
    Link Parent
    Alright, I took a second look. In your source, please see the following: Our problem space is all general planar grid graphs. There are cases in which polynomial time may be achieved, but the...
    
    Alright, I took a second look. In your source, please see the following:
    
    [IPS82] show that the problem is NP-complete in general planar grid graphs.
    
    Our problem space is all general planar grid graphs. There are cases in which polynomial time may be achieved, but the general case is NP-complete. Your source is just exploring one of those special cases.
    
    1 vote
[4]
cfabbro
September 27, 2018
Link
Off-topic, but I can't wait to see this topic develop now that syntax highlighting has been added to the site. It's a small change for sure, but a huge QoL improvement for code legibility. p.s....

Off-topic, but I can't wait to see this topic develop now that syntax highlighting has been added to the site. It's a small change for sure, but a huge QoL improvement for code legibility.

p.s. For anyone wondering how to do it.

4 votes
1. [3]
  Emerald_Knight (OP)
  September 27, 2018
  Link Parent
  I'm definitely looking forward to seeing all of the colors. I've already retroactively updated my older submissions :)
  
  I'm definitely looking forward to seeing all of the colors. I've already retroactively updated my older submissions :)
  
  3 votes
  1. [2]
    teaearlgraycold
    September 27, 2018
    Link Parent
    That's the most Emerald_Knight thing I've ever read.
    
    I've already retroactively updated my older submissions
    
    That's the most Emerald_Knight thing I've ever read.
    
    3 votes
    
    Emerald_Knight (OP)
    September 27, 2018
    Link Parent
    Well then I suppose it may be even more so if I mention that I did it the very same day that syntax highlighting was announced ;)
    
    Well then I suppose it may be even more so if I mention that I did it the very same day that syntax highlighting was announced ;)
    
    4 votes

What?

Exercise

Why?