8 votes

Node's "Single Threaded, Event Driven" programming model seems highly deceptive and farcical

The more I think about it, the more I'm convinced of it.

The biggest selling point of Node folks has been the "single threaded, event driven" model, right? Unlike JavaScript, other languages work on a "blocking" basis i.e. you run a statement or command and the program "waits" until the I/O is complete. For example, you issue open('xyz.txt', 'rb').read() in python and the program waits or blocks until the underlying driver is able to read that whole text file (which could take arguably long time if said file is too large in size).

But with the Nodejs equivalent, you just issue the statement and then pass the "event handler" so that your program is never in the "waiting state". The whole premise of Node/JS event-callback is that "you don't call us, we will call you".

This is all nice in theory but if this were indeed true then Nodejs scripts should be blazing fast compared to Python and even Java considering that most programs we write are I/O heavy and 99% of time, they're just waiting for an input from a File/URI/User? If this event callback model indeed worked as effectively as claimed, Node would have been the numero one and only language being used today?

I think I'm starting to understand why that isn't the case. This whole "single threaded, event driven" thing is just a farce. You can also replicate the same thing that Node.js is doing in your Java or Python too by applying multi-threading (i.e. one thread just "waits" for the I/O in the background while the other keeps doing its job). All you've done here is just handed or delegated that complexity of multi-threading to Node.js?

Realistically, it's impossible to wait or block an I/O request while at the same time also letting the other part of the code engage in other tasks, that's the very definition of multi-threading. Doing "async" is impossible without multiple threads in that sense. Node must have a thread pool of sorts where one of them is engaged in the wait/block while another is running your JS code further. When the wait is over, the control is then passed to the "event handler" function it was bound to in that other thread.

What Node is selling as "single threaded" applies to application or business logic we are writing, node itself can't be single threaded. I feel it's better to just implement multi-threading in your own code (as needed) instead of using something convoluted and confusing like Node.js. What say you?

15 comments

  1. [2]
    em-dash
    Link
    The Python thing you should be comparing to is asyncio, not threads. I'm not sure you quite understand what Node is doing. Callbacks aren't just spawning a thread and doing a blocking operation on...

    The Python thing you should be comparing to is asyncio, not threads.

    I'm not sure you quite understand what Node is doing. Callbacks aren't just spawning a thread and doing a blocking operation on it. It's an event loop that, when you make an async call, goes off and finds some other pending operation to do on the same thread.

    Node would have been the numero one and only language being used today?

    It, uh, is pretty popular. No language will ever get 100% market share, but it caught on way more than I expected "what if javascript but outside of the browser" to.

    31 votes
    1. unkz
      Link Parent
      They are actually kind of partially right, node does have sneaky hidden threads running in the background that it pretends not to have in certain cases, some of which are file system (fs), dns,...

      Callbacks aren't just spawning a thread and doing a blocking operation on it

      They are actually kind of partially right, node does have sneaky hidden threads running in the background that it pretends not to have in certain cases, some of which are file system (fs), dns, crypto, and zlib, maybe others by now. And of course there are the kernel threads that are busy handling network io and other things.

      You’re right though, they do have some fundamental misunderstanding of how node operates.

      11 votes
  2. Macil
    (edited )
    Link
    For small amounts of concurrent IO/threads it's relatively true that there's not much performance benefit to async IO over threads, but the difference is when you have a lot. Async IO doesn't use...

    For small amounts of concurrent IO/threads it's relatively true that there's not much performance benefit to async IO over threads, but the difference is when you have a lot. Async IO doesn't use threads in the background but more efficient OS APIs that allow IO to be done without extra threads. This is important because each thread has its own performance and memory cost.

    Async IO became popular in the context of the C10k problem, when it became obvious that having a thread per IO operation performed terribly at handling thousands of concurrent connections. Apache, the most popular webserver, was built on the threaded IO model, and it lost its top position to Nginx, which was built on the async IO model and performed much better at handling many concurrent connections.

    As async IO became popular, developers found that it was hard to program for in existing programming environments (like C, Python, and Java), because most standard library functions and third-party libraries used blocking IO, and calling any blocking IO functions within async code would often silently negate the performance benefit or completely destroy the performance of async code. Developers had to carefully avoid using pre-existing code that used blocking IO.

    The creator of Node, Ryan Dahl, decided that one way to solve this problem would be to create a brand new server programming ecosystem that (mostly) had no blocking IO functions and was built from the ground up for async IO. He realized that this would be hard to do within most existing language ecosystems, and that it would be easiest to do on a language not already commonly used on servers. Javascript was a convenient choice because it fit that, had no blocking IO in its standard library, and already had well-optimized implementations from its use in browsers. Unlike the common misconception, Node did not primarily exist to allow frontend JS developers to program on servers but to be a programming environment built on async IO.

    18 votes
  3. [2]
    spit-evil-olive-tips
    Link
    it does work as claimed. there isn't some hidden secret knowledge that asynchronous programming is just 3 threads in a trench coat. the reason not everything is in Node is that many people (myself...

    If this event callback model indeed worked as effectively as claimed, Node would have been the numero one and only language being used today?

    it does work as claimed. there isn't some hidden secret knowledge that asynchronous programming is just 3 threads in a trench coat.

    the reason not everything is in Node is that many people (myself included) don't like JavaScript.

    almost all the other languages people use have similar facilities for asynchronous programming. Python has asyncio. Java has Netty (and I'm sure there are others, it's been a long time since I lived in that world). Golang has it built-in. Rust has Tokio. and so on.

    You can also replicate the same thing that Node.js is doing in your Java or Python too by applying multi-threading (i.e. one thread just "waits" for the I/O in the background while the other keeps doing its job).

    what it's actually doing under the covers is using epoll (or a similar function). specifically Node uses libuv which implements a platform-agnostic abstraction on top of epoll, kqueue, Windows IOCPs, etc.

    if you picture a stereotypical database-backed webserver, it is accepting connections on some TCP port, reading data (HTTP requests) from those connections, translating the HTTP requests into SQL queries, sending those queries off to the database, waiting for the response, and then turning the response data into HTML or JSON and sending that back to the client.

    this is extremely well-suited to the event-driven model, because you have N sockets open for active clients, M sockets open for active connections to the database, and you can have a single thread waiting for events from all of them. the CPU-bound work (translating requests to SQL queries going one way, and database results to HTTP responses going the other way) is relatively small and doesn't need multi-threading.

    if you want to know the history, you can read about the c10k problem. using one thread per connection like you're describing scaled very very poorly. the rise of asynchronous programming models was a direct result of how poorly multi-threading scales for IO-bound work.

    7 votes
    1. sparksbet
      Link Parent
      I find it particularly strange that someone refers to "applying multi-threading" in Python when the GIL is such a well-known pain point for Python. Though tbf I think using the threading library...

      I find it particularly strange that someone refers to "applying multi-threading" in Python when the GIL is such a well-known pain point for Python. Though tbf I think using the threading library with the GIL in place makes it function essentially how OP thinks node.js works (in contrast to asyncio which afaik generally works how node.js does).

      4 votes
  4. [6]
    Moonchild
    Link
    I agree, but not for the reasons you said. Node's model is a cooperative scheduling model, which inhibits modularity because one concurrency domain can prevent another from making progress for...

    I feel it's better to just implement multi-threading in your own code (as needed) instead of using something convoluted and confusing like Node.js. What say you?

    I agree, but not for the reasons you said.

    Node's model is a cooperative scheduling model, which inhibits modularity because one concurrency domain can prevent another from making progress for arbitrarily long, which can harm latency.

    It also inhibits scalability, since it can only use one core at a time; you can ameliorate this by using many processes, but then you lose the ability to share data and state between them.

    (I have deliberately avoided using the word 'thread', because it's poorly defined in context and I don't think it matters too much in this case what it means.)

    6 votes
    1. [4]
      teaearlgraycold
      Link Parent
      This is why Redis and Postgres exist

      but then you lose the ability to share data and state between them

      This is why Redis and Postgres exist

      4 votes
      1. [3]
        Moonchild
        Link Parent
        Sort of—see object-relational mismatch—for this sort of application you're almost certainly using postgres for persistence anyway; but in general insofar as you have the opportunity to decouple...

        Sort of—see object-relational mismatch—for this sort of application you're almost certainly using postgres for persistence anyway; but in general insofar as you have the opportunity to decouple your data model from it, you should take it. (Also, a little bird tells me redis doesn't scale, but that's neither here nor there.) Also, the cost of interprocess communication is very great; whereas, I can look up a key in a shared hash table in nanoseconds.

        4 votes
        1. [2]
          teaearlgraycold
          Link Parent
          The cost is latency which doesn’t matter if you use it sparingly. The total request latency is measured in milliseconds anyway. Could be tens or hundreds of milliseconds. What does a 100...

          The cost is latency which doesn’t matter if you use it sparingly. The total request latency is measured in milliseconds anyway. Could be tens or hundreds of milliseconds. What does a 100 microsecond lookup matter?

          2 votes
          1. Moonchild
            Link Parent
            Yes, that's the problem.

            sparingly

            Yes, that's the problem.

            1 vote
    2. Greg
      Link Parent
      I think those design decisions make sense given node's origins as a web server - a task that's generally far more I/O bound than CPU bound when it comes to scheduling, and one that primarily...

      I think those design decisions make sense given node's origins as a web server - a task that's generally far more I/O bound than CPU bound when it comes to scheduling, and one that primarily involves handling stateless HTTP requests that largely shouldn't need interprocess communication anyway.

      It's fair to say that decision is probably showing some cracks now that node's become the de facto choice for a lot of local applications via wrappers like electron, though.

      1 vote
  5. winther
    Link
    I don't write Node myself but do work heavily with async Python which works sort of similar. As another has already said, being event-driven or async is not the same as spawning threads. As with...

    I don't write Node myself but do work heavily with async Python which works sort of similar. As another has already said, being event-driven or async is not the same as spawning threads. As with every other language, its benefits depends on the application usecase.

    I find that the async event driven model we use in Python, and I assume the same would apply to Node here, is great for writing scalable APIs. We can process many concurrent requests without the pains of multi-threading, but still process all the requests "in parallel" because most of the time is spent waiting for the network. Running in containers in Kubernetes makes it in a non-issue that we only utilize one core at a time, we just start more pods if we are close to hitting the CPU limits. I am sure there are other cases where that model won't work as well.

    4 votes
  6. [2]
    0xSim
    Link
    NodeJS itself is single-threaded, but some operations (e.g. disk i/o) are delegated to the system. So even though there are effectively multiple threads of operations, there is always one single...

    What Node is selling as "single threaded" applies to application or business logic we are writing, node itself can't be single threaded.

    NodeJS itself is single-threaded, but some operations (e.g. disk i/o) are delegated to the system. So even though there are effectively multiple threads of operations, there is always one single thread assigned to Node. The OS is doing its own thing and will just ping Node once it's done.

    Async/await and callbacks are not a magic bullet to make your code fast. If you need to do some heavy computations in JavaScript, writing an async function won't save you: the heavy computations will be done on the single thread, and it will block it. You can mitigate that by i.e. "breaking" huge loops every x operations to let the single thread breathe a bit and catch up on other operations it has to do, but that's it.

    3 votes
    1. SleepyGary
      Link Parent
      You can also use workers, if what you're doing is cpu intensive, which are on a separate thread from the main process.

      You can also use workers, if what you're doing is cpu intensive, which are on a separate thread from the main process.

  7. vczf
    Link
    Async programming feels like it's multi threaded, but it's not. Code in callbacks isn't executed until the main synchronous flow is done ("returns"). Then the scheduled callbacks are run in-turn....

    Async programming feels like it's multi threaded, but it's not. Code in callbacks isn't executed until the main synchronous flow is done ("returns"). Then the scheduled callbacks are run in-turn.

    I'm not a fan of JavaScript. Dart does async programming much better, one reason being that a sound type system helps with understanding async return values. If you can wrap your head around how Future and Stream work, now you know async.

    Besides what other commenters have mentioned, the reason I like async programming is because it's highly efficient and reduces latency and jank in user interfaces. I often see crappy UIs that lag out when you try to interact with them. That is easily avoidable with an async programming model.

    2 votes