9 votes

How do I calculate my family's "average family location"?

So, I just listened to a This American Life podcast called Ghost in the Machine. In one of the stories, a man decides to calculate, every week, the Average Family Location of his family. By that, he means: once you add everyone's coordinates for every coordinate in which they've been in that period, what city/location represents the average point between them all?

I decided to do the same for my family, which will be much easier because there are no touring musicians among us. The one complication is that a good chunk of the family is on other continents, and I wouldn't want us to "meet" in the middle of the ocean. So some approximation might be warranted.

I'd be happy if someone could provide me the math, I'm fairly confident I would be able to do it with a calculator or maybe put into some crude Python. I don't think I need to make a weekly report, since we're not that mobile. Maybe twice a year, or once every two months.

Thanks!

Edit: I don't know much math

Edit2: holy shit this is not simple at all! Now I feel kinda bad for throwing this problem at you guys. I really thought it would be quick and easy!

24 comments

  1. [10]
    arghdos
    (edited )
    Link
    If you have access to time series location data you should be able to calculate the average location via scipy’s integrator functions (e.g., the (un?)fortunately named cumtrapz). From calc, the...

    If you have access to time series location data you should be able to calculate the average location via scipy’s integrator functions (e.g., the (un?)fortunately named cumtrapz).

    From calc, the average is simply the integral of the position divided by the time window: newest time stamp in the week minus the oldest time stamp. Use the same time units in as in the integral for this.

    For your purposes, I’d suggest just calculating the average latitude and longitude of each person separately and then compute a the plain old average over all family members.

    There’s probably a service out there to translate that into a nearest town or something, but I’m less sure there

    HTH

    7 votes
    1. [6]
      vektor
      Link Parent
      There's a problem with using lat/lon here:[1] The average doesn't respect that these wrap around. Imagine a person at 179°E and one at 179°W. Their average position should be at180°E/W, i.e. right...

      There's a problem with using lat/lon here:[1] The average doesn't respect that these wrap around. Imagine a person at 179°E and one at 179°W. Their average position should be at180°E/W, i.e. right between them. Your method of averaging lat/lon will put the average at 0°, i.e. the other end of the globe. This should work as long as all your locations are on one half-sphere that doesn't cross the date line, but I don't really trust it still.[2]

      The more robust computation would be to convert to cartesian coordinates (assume earth radius of one, some trig to convert from lat/lon to x/y/z;[3] do the integration/averaging in x/y/z; convert back to lat/lon; you'll get a lower distance than 1 this time around, but that's no problem. This is a result of points on opposite ends of the earth averaging to the center of the earth. Unless you're really unlucky, you'll be able to find the coordinates for the point above. And there's your average location.)

      [1] "Problem" in the sense that I can break it, not necessarily that it breaks for the use case described by OP.

      [2] After writing the rest, I see another failure mode: If you have two points of opposite ends of a pole (canada and russia, e.g.), then the average would not be on the north pole, but rather at the same distance from the pole but either at 3 or 9 o'clock. We'd have to add more constraints to make sure we don't run into this one either, and I'm not sure we even can.

      [3] I've done these conversions before. For now I'll leave them as exercise to the reader. You'll need sin/cos of the angles to get your x/y/z and arccos/arcsin for the reverse. If you, /u/lou , want to do the math here, a simple average of x/y/z can be done by hand, no integrals needed; that is if you just want to average e.g. the location of your family's homes. I can also figure out the formulae later or look them up if you don't want to deal with that.

      4 votes
      1. [5]
        arghdos
        (edited )
        Link Parent
        Ahhh yeah, you’re right, i forgot that lat/lon aren’t real coordinate systems and would screw with the calculation at the date lines or the poles. It’s probably a safe assumption the OP wouldn’t...

        Ahhh yeah, you’re right, i forgot that lat/lon aren’t real coordinate systems and would screw with the calculation at the date lines or the poles. It’s probably a safe assumption the OP wouldn’t hit this as they stated they’re all on the same continent.

        Rather than converting to Cartesian, wouldn’t it be easier to convert to spherical coordinates and arbitrary map longitude to 0-360 and latitude to 0-180? You just choose a solution and roll with it, way less trig :p

        1 vote
        1. [4]
          vektor
          Link Parent
          It absolutely would be easier, correct. It would also be less trig, that's correct as well. But because it's less trig, it's also a lot less fun. (Yes, I'd be a terrible math teacher. Or terrific....

          It absolutely would be easier, correct. It would also be less trig, that's correct as well. But because it's less trig, it's also a lot less fun.

          (Yes, I'd be a terrible math teacher. Or terrific. I'm not quite sure.)

          1 vote
          1. arghdos
            Link Parent
            Heh, I'd take your class

            Heh, I'd take your class

          2. [2]
            Wulfsta
            Link Parent
            I think using spherical coordinates rather than Cartesian vectors will bias the average? I would have to test this to be sure.

            I think using spherical coordinates rather than Cartesian vectors will bias the average? I would have to test this to be sure.

            1. vektor
              Link Parent
              If you're saying that spherical coordinates will give weird results when averaged, exactly. There's lots of examples you can find, but it'll give weird results even for two points; the longer the...

              If you're saying that spherical coordinates will give weird results when averaged, exactly. There's lots of examples you can find, but it'll give weird results even for two points; the longer the distance the weirder the result. I think the root cause of this is that the distance metric for spherical coordinates is just nonsensical. The distance between 0°E and 40°E is dependent on how far north/south you are: At the equator it's 1000s of km, at 89.99°N it's just a few meters away. Manipulating the digits of spherical coordinates does not follow many mathematical laws.

              1 vote
    2. [2]
      lou
      (edited )
      Link Parent
      Thank you for answering! But I don't even know what an "integral" is lol

      Thank you for answering!

      But I don't even know what an "integral" is lol

      3 votes
      1. arghdos
        (edited )
        Link Parent
        Heh. Well this may or may not help, but an attempted ELI5: Imagine you draw a line on an 2D axis: https://i.stack.imgur.com/kCW16.png Then you decide you want to know the area enclosed by the...

        Heh. Well this may or may not help, but an attempted ELI5:

        Imagine you draw a line on an 2D axis:

        https://i.stack.imgur.com/kCW16.png

        Then you decide you want to know the area enclosed by the shape of the line to horizontal axis:

        https://en.m.wikipedia.org/wiki/File:Integral_example.svg

        Then, if you divide that by the length of the curve on the horizontal axis, you calculate the height of a rectangle with the same area:

        https://www.mathopenref.com/calcaveval.html

        Which is the average value of the height of the line!

        This is applies in multiple dimensions and to find the average in time!

        4 votes
    3. Wulfsta
      Link Parent
      Wow, that is truly a terrible/great function name.

      Wow, that is truly a terrible/great function name.

      3 votes
  2. [6]
    Wulfsta
    Link
    This is a much harder problem than you might suspect, due to the fact that you’re dealing with continents. You’ll hit issues if you fail to take into account that you’re on an ellipsoid (though...

    This is a much harder problem than you might suspect, due to the fact that you’re dealing with continents. You’ll hit issues if you fail to take into account that you’re on an ellipsoid (though assuming a sphere will be good enough) as well as choose a proper coordinate system.

    4 votes
    1. [4]
      bub
      Link Parent
      The problem is slightly easier if you allow for the fact that most peoples' average family location is underground. Otherwise, what, are we talking about bounding the results to the surface of the...

      The problem is slightly easier if you allow for the fact that most peoples' average family location is underground.

      Otherwise, what, are we talking about bounding the results to the surface of the Earth? If that's the case, then what if your family manages the unlikely case that they're equally distributed across the surface? What if there are just two members, and they're at exactly opposite sides of the earth? If it's not bound to the surface then you can say the average is just at the center of the Earth. But if it is surface-only, then do you say they're everywhere and nowhere? Has the family ascended? This surface mapping seems problematic.

      1 vote
      1. [3]
        arghdos
        Link Parent
        The average position in the surface-constrained case where you have two members on opposite sides of the globe would have two solutions: both the points equidistant between them. The answer would...

        The average position in the surface-constrained case where you have two members on opposite sides of the globe would have two solutions: both the points equidistant between them.

        The answer would be arbitrary, depending on the coordinate system you choose.

        But, given OPs problem statement, it’s probably not relevant

        1 vote
        1. [2]
          rogue_cricket
          Link Parent
          If you've a North Pole person and South Pole person then the entire equator would be equidistant between them.

          If you've a North Pole person and South Pole person then the entire equator would be equidistant between them.

          4 votes
          1. arghdos
            Link Parent
            Ahhh, yeah good point. Santa's family visiting the Antarctic would be quite fun :)

            Ahhh, yeah good point. Santa's family visiting the Antarctic would be quite fun :)

            1 vote
    2. lou
      Link Parent
      I see. I really thought it would trivial for the smart crowd of Tildes!

      I see. I really thought it would trivial for the smart crowd of Tildes!

  3. [8]
    psi
    (edited )
    Link
    Hmm, this is actually a pretty nontrivial question. As pointed out, simply averaging the longitude/latitude won't give you a desirable result since the spacings aren't equal as you vary your...

    Hmm, this is actually a pretty nontrivial question. As pointed out, simply averaging the longitude/latitude won't give you a desirable result since the spacings aren't equal as you vary your longitude/latitude. (As a pathological example, consider the "average location" of two people: one very close to the north pole at 10 deg longitude and one person near the equator at 30 deg longitude. The arithmetic mean is 20 deg, but if you look at a globe, it should be clear that the average position should be at basically 30 deg longitude).

    In fact, I'm not even sure how "average" position is defined when constraining your average to a sphere.
    Luckily, mathoverflow comes to the rescue [1]: they suggest using the Karcher-mean, which is the point on the sphere that minimizes the sum of the squares of geodesic distances between the data and that point.

    In math

    That is, minimize with respect to \vector p

    \sum_i G(\vector p, \vector r_i)^2
    

    where \vector p is the average position, \vector r_i are the data, and G(., .) computes the geodesic distance between two points. It's straightforward to show that this reduces to the arithmetic average of Euclidean coordinates when your metric is flat.

    Unfortunately, this doesn't give us a neat prescription for calculating the average location on a sphere!

    Intuitively, I think you could use a prescription something like this (assuming you have 2^n points; I'll leave the general case as an exercise to the reader :p)

    1. Pair each of the 2^n points together so that you have 2^n-1 pairs
    2. For each pair, find the midpoint of the geodesic connecting the points of the pair.
    3. Repeat the process using the midpoints until only one point remains.

    A hitch I foresee, however, is that the midpoint is not guaranteed to be unique (consider the midpoint between the two poles). In that case, reshuffle the pairs until they are. If that still doesn't work... well, hopefully your setup is symmetrical enough that the average position is obvious!

    --

    [1] https://mathoverflow.net/questions/231501/the-mean-of-points-on-a-unit-n-sphere-sn

    3 votes
    1. [7]
      vektor
      (edited )
      Link Parent
      Your pathological example of the midpoint of the two poles actually works for any two antipodean points. For any two such points, the "average point" by any reasonable definition will have to be...

      Your pathological example of the midpoint of the two poles actually works for any two antipodean points. For any two such points, the "average point" by any reasonable definition will have to be undefined, as the "equator" will by symmetry offer infinitely many equally reasonable results. Practically, we can ignore this, as it only happens in examples that have been tuned to mathematical perfection to be ideally opposite.

      If you remove yourself from polar coordinates (cartesian, I tell you!), I think you'll find that my algorithm above would in fact come to the same result as yours, except I can cover the general case. I can't prove it, but I've certainly convinced myself that for two points, I'll return the midpoint of the geodesic. For the pathological example, my algorithm will fail gracefully (i.e. noticably) by trying to normalize the zero-vector to length one.

      Or maybe our algorithms will come up with different results, but then I find myself convinced by my solution, as yours is dependent on pairing-order (yikes! ;) ) - consider the positions 91°W, 1°W, 1°E, 91°E, all on the equator. By symmetry, we can restrict the solutions to 0° and 180°. By common sense, we know that it must be 0°C. If we pair up 91 and 91, you'll get 180°, pair that with the 0° from the other two points and you'll get a problem. If we introduce a little bit of noise to break the stalemate, we'll get an unreasonable result that could be anywhere on earth. Pair them up the other way, it works well. Compare the vector-averaging method I show, you'll find the vectors of 91 and 91 basically cancel out while the 1 and 1 give it a strong shove towards 0°.

      3 votes
      1. [5]
        psi
        (edited )
        Link Parent
        Yes, of course. I elided over this point, but in general the Karcher-mean does not have a unique solution. Would you mind elaborating on your algorithm? To be clear, I'm not advocating using any...

        Your pathological example of the midpoint of the two poles actually works for any two antipodean points.

        Yes, of course. I elided over this point, but in general the Karcher-mean does not have a unique solution.

        If you remove yourself from polar coordinates (cartesian, I tell you!), I think you'll find that my algorithm above would in fact come to the same result as yours, except I can cover the general case.

        Would you mind elaborating on your algorithm? To be clear, I'm not advocating using any particular coordinate system -- in fact, the point of my intuitive algorithm was to show how you could (sometimes) find the average by just looking at a globe (well, and guesstimating the geodesics, I suppose). It's definitively not the most robust method! For the general case, one should just solve for the Karcher-mean. If you mean that we should just calculate the arithmetic average of the Cartesian coordinates to find the average, I would have to disagree (unless you're satisfied with the average position being underground).

        1 vote
        1. [4]
          vektor
          Link Parent
          That is pretty much my algorithm, except for one little bit: We normalize the vector back to the surface, to unit length. That'll get you back onto the geodesic, but in telling you how far you had...

          That is pretty much my algorithm, except for one little bit: We normalize the vector back to the surface, to unit length. That'll get you back onto the geodesic, but in telling you how far you had to stretch the average vector to get to the surface, you also know how good of an average it is: If it's far from the surface, that means you have more components that are on different ends of the globe, and the average position is more sensitive to minute changes. If it's near the surface, it's a very robust result, robustness again referring to the change in the outputs resulting from changes in the inputs.

          I know it's not aesthetically pleasing to have to normalize back to the surface, but normalization and (cartesian) interpolation are operations that of themselves have nicer properties than, say, interpolation in spherical coordinates.

          2 votes
          1. [3]
            psi
            Link Parent
            Ah, gotcha. Your algorithm is pretty easy to compute, which I definitely appreciate. And I think it would generally work well, especially if the points are clustered (of course, if the points are...

            Ah, gotcha. Your algorithm is pretty easy to compute, which I definitely appreciate. And I think it would generally work well, especially if the points are clustered (of course, if the points are clustered closely enough, averaging the longitude/latitude might be sufficient). But it's a little hard to visualize the effects of small perturbations when the points are far apart.

            Regardless, I can think of a counterexample. First consider a small patch of the sphere with three points laid out in an equilateral spherical triangle. Intuitively we know where the average position is (in the middle of this triangle).

            Now consider adding a fourth point antipodal to the average of the three points (ie, a point antipodal to the middle of the triangle). Your algorithm would say that the average position of this 4-point system is the same as this 3-point system, which I find unlikely. (If you aren't convinced, consider adding the fourth point instead at the middle of the spherical triangle and slowly dragging it to the antipode; clearly the average position should change over time.)

            In contrast, my algorithm1, would give three different solutions, depending on your pairings; however, the solutions are symmetric, and I wouldn't expect the solution to be unique in this configuration anyway. (In fact, I would expect 3 solutions minimum.)

            --

            1 Which, again, may not be correct. I'm honestly not sure whether my algorithm gives the correct result in this case, either.

            2 votes
            1. [2]
              vektor
              Link Parent
              I'm at peace with that. Let's have the triangle at the north pole and the antipode at the south pole. Clearly it isn't anywhere else: It can't be at the south pole, as that would mean you'd move...

              Your algorithm would say that the average position of this 4-point system is the same as this 3-point system, which I find unlikely.

              I'm at peace with that. Let's have the triangle at the north pole and the antipode at the south pole. Clearly it isn't anywhere else: It can't be at the south pole, as that would mean you'd move it closer to one point at the cost of three other points. It can't be anywhere between the poles either, as you'd have to decide, at random, on a longitude. Ergo, center of the triangle. The three have more weight than the one, and that's ok. Think about it this way: That antipode is pulling the average away from the triangle in every possible direction at the same time.

              (If you aren't convinced, consider adding the fourth point instead at the middle of the spherical triangle and slowly dragging it to the antipode; clearly the average position should change over time.)

              As you move the 4th point in towards the triangle, you'll see the average move out of the triangle as the direction in which the 4th point pulls becomes more clear; then the average will move back in again, as the 4th point gets closer to the triangle. All those results seem valid to me.

              Thinking about it a bit more, I'm not sure my algorithm would give you the "meeting point of minimum travel distances: Consider your setup. The travel distance for the north pole would be half a circumference plus 3x the "radius" of the triangle. Compare that to the edge of the triangle, where the neighboring points of the triangle have less distance to cover, and the antipode saves as much as the 3rd point of the triangle gets extra. Clearly a better choice under both sum and sum-of-squares norms. Not a nice result for my algorithm, but I'm willing to trade that for smoothness and determinism.

              2 votes
              1. psi
                Link Parent
                Well, that's where I'd disagree. The mistake, I believe, is requiring the average point be unique. There can be multiple solutions along different longitudes. Hmm, I also don't find this...

                Let's have the triangle at the north pole and the antipode at the south pole. Clearly it isn't anywhere else: It can't be at the south pole, as that would mean you'd move it closer to one point at the cost of three other points. It can't be anywhere between the poles either, as you'd have to decide, at random, on a longitude. Ergo, center of the triangle.

                Well, that's where I'd disagree. The mistake, I believe, is requiring the average point be unique. There can be multiple solutions along different longitudes.

                As you move the 4th point in towards the triangle, you'll see the average move out of the triangle as the direction in which the 4th point pulls becomes more clear; then the average will move back in again, as the 4th point gets closer to the triangle.

                Hmm, I also don't find this desirable. As you move father from the triangle, the better you can approximate the triangle as three points stacked on top of each other at the center. In this case, the average position is roughly 1/4 the arc length from the center of the triangle to the fourth point, with the maximum arc length occurring at the antipode. Notably, there is a preferred longitude until you reach the antipode, at which point the solutions become degenerate.

                I'm not sure my algorithm would give you the "meeting point of minimum travel distances" [...] Not a nice result for my algorithm, but I'm willing to trade that for smoothness and determinism.

                Right, that's what I'm getting at. Of course, your algorithm is perfectly fine so long as you can live its trade-offs.

                1 vote
      2. Wulfsta
        (edited )
        Link Parent
        I think the ideal way to do this is use a local (since I don’t think you can constrain this to work with a global one) optimization algorithm such that you start with a copy of any one of your...

        I think the ideal way to do this is use a local (since I don’t think you can constrain this to work with a global one) optimization algorithm such that you start with a copy of any one of your points, and minimize the geodesic distance between that variable and every other point.

        Edit: You could probably do this in a few lines of PyTorch code with L-BFGS.