Programming Challenge: Compute the shortest path to visit all target spots on a grid.
Let's do something a little more challenging this time.
Given an MxN grid of arbitrary size, and given a random starting place on that grid and a list of points to visit, find the shortest path such that you visit all of them. Path lengths will be computed using taxicab distances rather than strict coordinate distance calculations.
There are no restrictions on expected input this time. Output should be the total distance traveled between points.
Example
Assume that we use the character #
to denote a spot on the grid, the character @
to denote your starting point, and the character *
to denote a place on the grid that you're required to visit. One such grid may look something like this:
######
######
**####
#*####
#*#*##
#@####
######
In this case, let's say that the bottom-left point on the grid is point (0, 0)
and we're starting on point (1, 1)
. One valid solution would be to move to point (3, 2)
, then (1, 2)
, then (1, 3)
, then (1, 4)
, and finally (0, 4)
. The shortest path available is thus 8
. Note that it's not enough just to visit the next nearest point on the grid!
Here's a naive solution in J:
Try it online!
What?
J is a functional language and a descendant of APL. J is unusual because you don't write loops. Instead, you write code to execute each iteration. Array iteration happens automatically.
Complex numbers are a fundamental data type in J. For instance, the absolute value (magnitude) of
3j4
, or 3+4i, is 5. We'll encode points on the plane as complex numbers.Suppose we had a list of two points (two complex numbers). What is their taxicab distance? We need to subtract them (
-/
), split the result into real and imaginary components (+.
), take the absolute value of each of those (|
), and add the result together (+/
).Now suppose we have a list of many points. What is the distance between each pair of points? We can reuse the function above, but apply it to pairs of numbers using the infix operator
\
.Now we just need to add those together (
+/
), and we have the distance travelled for a particular path. To form the pathstart
followed by the list ofpoints
, we use the comma operator,
.Here's the plan: We'll find every possible permutation of
points
. For each one, we'll prependstart
to it, then figure out the distance travelled. Once we have all the distances, we'll find the minimum.In J, permutations are called "anagrams". They are calculated using the operator
A.
. For anylist
there are!#list
angrams (factorial of number of elements inlist
). For example,So given an anagram index, we can find that anagram of
points
, prependstart
, and then calculate the distance for that anagram. Let's call the distance travelled, given an anagram index, the "score" of that index.The
"0
makes sure that the functionscore
applies to a single value (the anagram index), instead of trying to run on a list, which produces bizarre behavior.Finally, we iterate
score
over each anagram index (i.!#points)
, find the minimum (<./
), and print it out (smoutput
).Exercise
Compute path distances using the Euclidean metric. (Hint 1: All of the operators have already been introduced!) (Hint 2: This can be done by deleting code!)
Why?
This isn't as hard to get used to as you might think. Once you know the syntax, it's not hard to read this kind of program, and it's downright easy to write one. J is, if nothing else, an extremely powerful desk calculator.
People seem to get hung up on the terseness of J. But J merely packs more source code per character. Versions of this program in any language will have the same elements and concepts. You'll always have to read forwards and backwards, inside and out, in order to fully understand a program. In most languages, this happens at the level of lines and functions. In J, this happens at the level of characters and parentheses.
Also you look like a computer sorcerer. A sourcerer.
Wow. I'm really going to have to dive into this later. I never expected such a short, compact solution. Neat!
It's important to note that this problem is essentially an extension of the Hamiltonian path problem. It's sufficient to calculate the distances between each given target node (including their distances from the start node) and treat these distances as our edge weights. This gives us an approximate
O(n^2)
complexity at the start. This is the only "fast" part of the program. The rest is performed as a brute force search, which isO(n!)
. There exist faster algorithms, but no known algorithm will even be polynomial--they're all exponential at best.This problem is an NP-complete problem.
Here's my own submission, once again in PHP:
I decided to keep my solution simple. I didn't even precompute the edge weights, so there's a ton of waste in this solution. The first element of the array is the starting node. Obviously this program could be extended to accept arbitrary input from a user specifying the grid size and node locations, or the grid could have been passed in as a string with dimensions specified to aid in delimiting start and end points for the individual rows, with all of the data validation that all entails. I didn't feel like doing that this time around.
An interesting aside, for those wondering about maximum complexity: in the very worst case, the number of nodes that need to be traversed will be at most
MxN
, which means your program could end up needing to handle(MxN)!
paths if you use a brute force solution like I did. In the example grid I provided for this challenge, if we replace all remaining instances of#
with*
, then we find ourselves with a(6x7)! = 42! ~= 1.4 x 10^52
permutations to work through. Thus, even in such a small grid we encounter an absolutely enormous problem that can't be solved in a reasonable amount of time with the brute force approach. Even with the knownO(1.657^n)
algorithm, we're looking at more than 1.6 billion paths to work through.I'm not convinced that this is NP hard. It is a version of the Travelling salesman problem, but it's very restricted. In particular:
That makes me suspicious of the time complexity. Suppose we change just one thing: coordinates become 1D. Now the solution is very easy. Use radix sort on the coordinates, then figure out whether it's faster to go left first or right first. Complexity is O(n log max{points}).
So why does the jump to 2D make this NP hard? Intuitively, it seems much closer to the 1D case than to an arbitrary weighted graph.
Those restrictions don't actually have the effect you think it would. The additional restrictions mean that this problem is reducible to the TSP by removing those restrictions. If anything, your 1D example is a very special case of the TSP where all but one dimension for all nodes is different. That solution would absolutely work in a TSP where all cities are perfectly lined up, but fails for the general case.
There should be no reason for any of those restrictions having a significant impact on the problem complexity. If there is, then that could have some major implications for the TSP--that is, you could probably reduce the TSP to the allegedly simpler problem posed here.
I sincerely doubt that we're discussing a solution to P vs. NP :)
I think we're talking past each other here.
Let's call the problems TSP, 2D, and 1D. We're on the same page that TSP > 2D > 1D, in the sense that a solution for TSP implies a solution for 2D, and a solution for 2D implies a solution for 1D. And TSP is NP complete, while 1D is practically linear (haha).
You seem to be saying that, because you can rephrase any problem in 2D to a problem in TSP, 2D must be NP hard. But that's backwards, right? It should be, if you can rephrase any problem in TSP to 2D, then 2D must be NP hard.
Hence my analogy. You can rephrase any problem in 1D to a problem in TSP. So must 1D must be NP hard? No, 1D is obviously P.
As evidence that 2D is not obviously NP hard, here's a closely related unsolved problem. This was very hard to Google for.
Fair enough. I'll have to circle back to this at some point when I'm not away from home.
Although similar in some respects, there is the restriction that the graphs be "solid". This has some pretty significant implications on the results in that the general form of the graph is somewhat predictable. This would be a very special case of the taxicab Hamiltonian path problem we're looking at here.
I'm pretty confident that this problem is NP-hard, but I'll have to dig around.
Alright, I took a second look. In your source, please see the following:
Our problem space is all general planar grid graphs. There are cases in which polynomial time may be achieved, but the general case is NP-complete. Your source is just exploring one of those special cases.
Off-topic, but I can't wait to see this topic develop now that syntax highlighting has been added to the site. It's a small change for sure, but a huge QoL improvement for code legibility.
p.s. For anyone wondering how to do it.
I'm definitely looking forward to seeing all of the colors. I've already retroactively updated my older submissions :)
That's the most Emerald_Knight thing I've ever read.
Well then I suppose it may be even more so if I mention that I did it the very same day that syntax highlighting was announced ;)