16 votes

What are your unsolved programming problems?

I thought it could be fun to discuss problems that we've encountered in our programming or programming-related work and have never found a solution for. I figure that at worst we can have a lot of fun venting about and scratching our heads at things that just don't make any sense to anyone, and at best we might be able to help each other find answers and, more importantly, some closure.

21 comments

  1. [3]
    jonluca
    Link
    For me it's more of a general request but a fast way to do front end testing. I can test all my back end and controllers in milliseconds, but front end testing usually involves spinning up an...

    For me it's more of a general request but a fast way to do front end testing. I can test all my back end and controllers in milliseconds, but front end testing usually involves spinning up an entire headless browser, and is orders of magnitude slower. I don't know if there's a more efficient way of doing it but it's always been a pain to me.

    6 votes
    1. [2]
      Emerald_Knight
      Link Parent
      I don't think there would be a better way, honestly. Testing within the intended environment, using a headless version, to allow simulated traversing of your front-end seems like the...

      I don't think there would be a better way, honestly. Testing within the intended environment, using a headless version, to allow simulated traversing of your front-end seems like the lowest-resource and greatest-accuracy solution for automated testing to me.

      Do you at least have a way to automate spinning up said headless browser and triggering the tests? If not, then that could be an avenue to explore to at least reduce the overhead of having to manually spin up that browser. Background processing to parallelize some of your automated tasks could also be an avenue to explore.

      Honestly I have no knowledge of your setup so I'm just spitballing here :)

      2 votes
      1. jsx
        Link Parent
        It depends on what "frontend testing" means for this context. It almost sounds like @jonluca is talking about end-to-end testing (like you'd use a tool like Selenium for). There are other quicker...

        It depends on what "frontend testing" means for this context. It almost sounds like @jonluca is talking about end-to-end testing (like you'd use a tool like Selenium for). There are other quicker and less painful ways to pretty thoroughly test frontend work (and leave e2e tests for pure smoke / system health testing). Unit testing (involving appropriate mocking) and integration / contract tests can be numerous, with good coverage, while remaining easy and fast to write.

  2. [6]
    s4b3r6
    Link
    Type safety. I've been building a stack based language for nearly five years now, the goal being a type-safe, fast, stack based language with simple semantics. (And a more consistent syntax than...

    Type safety.

    I've been building a stack based language for nearly five years now, the goal being a type-safe, fast, stack based language with simple semantics. (And a more consistent syntax than Forth).

    So far... I've found only partial evaluation and full tree searching allow the compiler to ensure partial safety, and that's exponentially slow at each branching (like an if statement).

    5 votes
    1. [5]
      Emerald_Knight
      Link Parent
      Just to make sure we're on the same page, what do you mean by "type safety"?

      Just to make sure we're on the same page, what do you mean by "type safety"?

      1 vote
      1. [4]
        s4b3r6
        Link Parent
        An object of a certain type, once created, will only be used in a valid scenario. Example: display [Hello, World!] This generates a string, and passes it to display, a function whose signature is...

        An object of a certain type, once created, will only be used in a valid scenario.

        Example:

        display [Hello, World!]
        

        This generates a string, and passes it to display, a function whose signature is display(string x).

        However if you did:

        tonumber [21.x]
        

        The compiler should catch this, and tell you at compile time that the string is not valid to be turned into a number. And that string may be generated by other functions.

        cat [10.] [x]
        tonumber
        

        (Of course, IO functions tend to be inherently unsafe, and instead get runtime safety wrappers... But that irritates me a little.)

        4 votes
        1. [3]
          Emerald_Knight
          Link Parent
          That sounds like a tricky one. Compilers aren't really my forte, so please excuse any ignorance on my part, but surely the right form of tokenization would help? I'm envisioning e.g. cat [10.] [x]...

          That sounds like a tricky one. Compilers aren't really my forte, so please excuse any ignorance on my part, but surely the right form of tokenization would help? I'm envisioning e.g. cat [10.] [x] being tokenized as string {function string string} (string, string), where you could then ensure that the left-most token matches the expected token.

          Feel free to tell me why I'm wrong about this :)

          1. [2]
            s4b3r6
            Link Parent
            It does! Get it right, and all types are applied. The issue is detecting that the right types are being used on loop 500 of a for loop with if branches, without doing a little evaluation. for 'i...

            It does! Get it right, and all types are applied.

            The issue is detecting that the right types are being used on loop 500 of a for loop with if branches, without doing a little evaluation.

            for 'i 500
              if even i
                set x [Hello]
              else
                set x i
            'endfor
            

            I can do it by analysing the whole program... But it's slooooow.

            But, lots of tips n tricks hidden inside Rust and CakeML.

            2 votes
            1. Emerald_Knight
              Link Parent
              It looks like you're creating a dynamically-typed language, much like Python. In that case I don't think it's possible to accomplish what you're trying to accomplish in an efficient manner. My...

              It looks like you're creating a dynamically-typed language, much like Python. In that case I don't think it's possible to accomplish what you're trying to accomplish in an efficient manner. My understanding is that except in cases like the for loop you show here, where the results are guaranteed to be the same no matter how many times you run it, you can't know for sure what the stored type of a variable is, thus preventing full type safety. Variable types must ultimately be determined at runtime.

              I would imagine that at best you could get partial type safety, but any type determinations at compile time for a dynamically-typed language would be due to some amount of partial evaluation.

              If I'm completely wrong and you're actually designing a statically-typed language, then I'm not sure why you would need to evaluate an entire for loop to determine type safety.

              Though, again, I could be off the mark.

              3 votes
  3. [10]
    Emerald_Knight
    Link
    To start off with my own, I had a MongoDB collection in a production database wherein I needed to update the document structure from an array of subdocuments to a subdocument where each array...

    To start off with my own, I had a MongoDB collection in a production database wherein I needed to update the document structure from an array of subdocuments to a subdocument where each array element is the value of the new subdocument and its key is the array element's _id.

    In other words, a document like...

    {
        _id: "my_document_id",
        my_field: [
            { _id: "sub_id_1", "my_key": "val1"},
            {_id: "sub_id_2", "my_key": "val2"}
        ]
    }
    

    ...would be converted to...

    {
        _id: "my_document_id",
        my_field: {
            sub_id_1: { _id: "sub_id_1", "my_key": "val1"},
            sub_id_2: {_id: "sub_id_2", "my_key": "val2"}
        }
    }
    

    It was a weird use case, and yes, it was necessary. This was the only reasonable workaround, believe it or not.

    The problem I was running into was when I attempted a forEach() call in the MongoDB shell. When converting from an array to a document, some of our documents that had a non-empty my_field entry would end up having the value my_field: {}. What was weirder was that these documents were never the same each time. I could restore the collection to its original version and run the same exact command with different results.

    In the end I gave up and wrote a one-off PHP script to handle the document structure update. It worked flawlessly. Why the forEach() call continued to fail, despite all of my data validation, with unpredictable results, is still something I've never found the answer to.

    4 votes
    1. [2]
      Comment deleted by author
      Link Parent
      1. Emerald_Knight
        Link Parent
        I'm just glad that the collection was small enough (<400 documents) to use a PHP script. I can't even begin to tell you how much of a nightmare optimizing that database and the code base was!

        I'm just glad that the collection was small enough (<400 documents) to use a PHP script. I can't even begin to tell you how much of a nightmare optimizing that database and the code base was!

        1 vote
    2. [8]
      spit-evil-olive-tips
      Link Parent
      <rant type="slightly offtopic"> Maybe this is just me being a database snob (I used to work on the Database Services team at AWS) but MongoDB sucks. Jepsen testing showed that they dropped data...

      <rant type="slightly offtopic">

      MongoDB

      Maybe this is just me being a database snob (I used to work on the Database Services team at AWS) but MongoDB sucks. Jepsen testing showed that they dropped data during a network failure, even with the strictest possible write consistency settings. They subsequently got their act together, and 4 years later made a big deal about how amazing it was that they passed Jepsen. Good job guys, I'm sure that blog post will be real helpful to anyone who lost data with you previously.

      10gen (the company behind Mongo) is based in NYC and not Silicon Valley, and their attitude reflects that. It's a "fake it 'til you make it" company, a sales/marketing machine that also has an engineering division.

      If your data fits on one machine, use Postgres with JSONB. If your data is larger than one machine, use Cassandra. If you want to push the envelope a bit, you can run CockroachDB as a drop-in replacement for Postgres (including JSONB support) or ScyllaDB as a drop-in replacement for Cassandra.

      2 votes
      1. [7]
        Emerald_Knight
        Link Parent
        I feel like your assessment is a bit unfair here. Saying that "MongoDB sucks" (present tense) instead of "MongoDB sucked" (past tense) for problems in the past is both dismissive and unproductive....

        I feel like your assessment is a bit unfair here. Saying that "MongoDB sucks" (present tense) instead of "MongoDB sucked" (past tense) for problems in the past is both dismissive and unproductive. Historical issues aside, a database should be chosen based on your given use case. Cassandra is a wide column store database, Postgres is a relational database and JSONB is just a data type to use with it, CockroachDB is a relational database, and ScyllaDB would also appear to be a wide column store database. These have different advantages and disadvantages as compared to a document database like MongoDB.

        Although, when it comes to MongoDB, Inc. (formerly 10gen), I don't really have an opinion. I use whatever tools are best suited for the problems I need to solve and don't really follow the politics of the organizations surrounding them. Hell, despite my strong feelings about Microsoft, I'm still using GitHub until such time that they've screwed it up (which is probably inevitable).

        As for my own work with databases, MongoDB fits my use cases best.

        3 votes
        1. [6]
          super_james
          Link Parent
          Purely in the interests of furthering this discussion (which I am finding interesting whilst having not much to add!) How did you come to the conclusion that MongoDB fits your use case better than...

          As for my own work with databases, MongoDB fits my use cases best.

          Purely in the interests of furthering this discussion (which I am finding interesting whilst having not much to add!)

          How did you come to the conclusion that MongoDB fits your use case better than Cassandra or Postgres + JSONB ? What are the missing features you use? Did you do any benchmarking?

          2 votes
          1. [5]
            Emerald_Knight
            Link Parent
            MongoDB itself is designed specifically to handle updating of JSON-based data. While those other options support JSON, they're not designed to handle the kinds of safe updates I perform on a...

            MongoDB itself is designed specifically to handle updating of JSON-based data. While those other options support JSON, they're not designed to handle the kinds of safe updates I perform on a regular basis.

            2 votes
            1. [4]
              Emerald_Knight
              Link Parent
              I feel that I should expand on this a bit. The choice of NoSQL over SQL is pretty clear for a startup without a dedicated DBA. It's flexible and highly conducive to agile development, allowing for...

              I feel that I should expand on this a bit. The choice of NoSQL over SQL is pretty clear for a startup without a dedicated DBA. It's flexible and highly conducive to agile development, allowing for sudden changes in data model structure without much effort. It's then a question of what type of NoSQL database to use. Among those options are document databases, wide-column stores, key-value stores, and graph stores.

              Graph stores are definitely out as they're a bit more on the specialized use case end. Simple key-value stores are too restrictive in their lack of embedded data structure support, so they're also a no-go.

              That leaves us with the wide-column and document databases, e.g. Cassandra and MongoDB. The primary benefit of MongoDB over Cassandra is that I have absolutely no fucking clue what the data is going to end up looking like two months from now and I have no way of knowing if that structure is going to change dramatically or not, so it's by far the simplest to use for managing that unpredictability. Both are fairly flexible, but MongoDB's additional flexibility over Cassandra is more desirable for the rapid pace I need to work at.

              3 votes
              1. [3]
                lemon-fresh
                Link Parent
                I find a big gotcha with NoSQL is recognizing when it might be time to start migrating over some/all of your data to a relational data store because certain things like enforcing data schemas and...

                I find a big gotcha with NoSQL is recognizing when it might be time to start migrating over some/all of your data to a relational data store because certain things like enforcing data schemas and referential integrity start to become a nightmare as your business, data model complexity and code base grows.

                2 votes
                1. [2]
                  Emerald_Knight
                  Link Parent
                  That's a very valid point you bring up, and is unfortunately just one of those things you need to consider when doing a cost/benefit analysis of the options available to you. For my own work, I...

                  That's a very valid point you bring up, and is unfortunately just one of those things you need to consider when doing a cost/benefit analysis of the options available to you.

                  For my own work, I don't foresee any such problems arising. We're just not going to reach the sort of scale that tech giants like Amazon, Microsoft or Google are at, not even given 20 years time. Our use case will also likely result in older documents being phased out and their consistency with updated data models being unnecessary. Most of the data that would be problematic will just end up being deprecated away. The few document structure updates that we do end up needing will require very little in the way of manual updates.

                  In other words, we really don't need to worry about any of those problems for the sort of business we're in.

                  Not everyone can say that about the kind of work they're doing, though, and in those cases you should probably be starting out with a relational database in the first place.

                  2 votes
                  1. lemon-fresh
                    Link Parent
                    Awesome, glad to hear your use case supports a NoSQL solution for the long term. :)

                    Awesome, glad to hear your use case supports a NoSQL solution for the long term. :)

                    1 vote
  4. [2]
    mironimous
    (edited )
    Link
    So I guess this is more a math problem than a programming problem, but it occured to me while programming and is about brute force and stuff, so I guess it's close enough. So you know maximum...

    So I guess this is more a math problem than a programming problem, but it occured to me while programming and is about brute force and stuff, so I guess it's close enough.

    So you know maximum length sequences? Basically, you have a repeating sequence of 0s and 1s (I am just looking at the case F_2 which basically just means modulo 2, it trivially extends to other fields) where each subsequence of length n (except all 0s) appears exactly once in one period, which obviously means that the period length is 2^n-1. It turns out you can easily calculate such a sequence by finding a irreducible polynomial of degree n over F_2 (so it's necessary that the polynomial has no zero mod 2 anywhere) and use the coefficients for a recurrence relation (a_n = b_{n-1}a_{n-1} + ... + b_0 a_0 with b_k being the coefficient of x^k).

    I've had the need of such a thing but over 2 dimensions, meaning you have a n*m tile where every x*y subrectangle appears exactly once. For example, here is a 4*4 tile where every 2*2 subrectangle appears exactly once (note that a subrectangle at the edge behaves like in pacman, extending to the other side):

    +--+--+--+--+
    |  |##|  |  |
    +--+--+--+--+
    |##|##|##|  |
    +--+--+--+--+
    |  |##|##|##|
    +--+--+--+--+
    |  |  |##|  |
    +--+--+--+--+
    

    The problem is that the tile becomes very big very quickly (if you're doing a 4*4 subrectangle, you're already dealing with a 256*256 tile, which has 65536 bits and is basically impossible to brute force).

    I've basically ruled out anything that linearly depends on only the x*y subrectangle (so recurrence relations and matrices) and now I'm trying to do something with the fact that every (x-k)*y subrectangle defines a k*y -> k*y permutation with no success so far.

    Maybe someone knows just a word I have to google to find this out

    4 votes
    1. pleure
      Link Parent
      Ooof this looks like it could be annoying. My instinct is to flatten the grid back into a sequence, then a sub square turns into a subsequence with jumps. That's still awful to work with, but may...

      Ooof this looks like it could be annoying. My instinct is to flatten the grid back into a sequence, then a sub square turns into a subsequence with jumps. That's still awful to work with, but may be slightly more tractable to combinatorial methods.