The Python thing you should be comparing to is asyncio, not threads. I'm not sure you quite understand what Node is doing. Callbacks aren't just spawning a thread and doing a blocking operation on...
The Python thing you should be comparing to is asyncio, not threads.
I'm not sure you quite understand what Node is doing. Callbacks aren't just spawning a thread and doing a blocking operation on it. It's an event loop that, when you make an async call, goes off and finds some other pending operation to do on the same thread.
Node would have been the numero one and only language being used today?
It, uh, is pretty popular. No language will ever get 100% market share, but it caught on way more than I expected "what if javascript but outside of the browser" to.
They are actually kind of partially right, node does have sneaky hidden threads running in the background that it pretends not to have in certain cases, some of which are file system (fs), dns,...
Callbacks aren't just spawning a thread and doing a blocking operation on it
They are actually kind of partially right, node does have sneaky hidden threads running in the background that it pretends not to have in certain cases, some of which are file system (fs), dns, crypto, and zlib, maybe others by now. And of course there are the kernel threads that are busy handling network io and other things.
You’re right though, they do have some fundamental misunderstanding of how node operates.
For small amounts of concurrent IO/threads it's relatively true that there's not much performance benefit to async IO over threads, but the difference is when you have a lot. Async IO doesn't use...
For small amounts of concurrent IO/threads it's relatively true that there's not much performance benefit to async IO over threads, but the difference is when you have a lot. Async IO doesn't use threads in the background but more efficient OS APIs that allow IO to be done without extra threads. This is important because each thread has its own performance and memory cost.
Async IO became popular in the context of the C10k problem, when it became obvious that having a thread per IO operation performed terribly at handling thousands of concurrent connections. Apache, the most popular webserver, was built on the threaded IO model, and it lost its top position to Nginx, which was built on the async IO model and performed much better at handling many concurrent connections.
As async IO became popular, developers found that it was hard to program for in existing programming environments (like C, Python, and Java), because most standard library functions and third-party libraries used blocking IO, and calling any blocking IO functions within async code would often silently negate the performance benefit or completely destroy the performance of async code. Developers had to carefully avoid using pre-existing code that used blocking IO.
The creator of Node, Ryan Dahl, decided that one way to solve this problem would be to create a brand new server programming ecosystem that (mostly) had no blocking IO functions and was built from the ground up for async IO. He realized that this would be hard to do within most existing language ecosystems, and that it would be easiest to do on a language not already commonly used on servers. Javascript was a convenient choice because it fit that, had no blocking IO in its standard library, and already had well-optimized implementations from its use in browsers. Unlike the common misconception, Node did not primarily exist to allow frontend JS developers to program on servers but to be a programming environment built on async IO.
I find it particularly strange that someone refers to "applying multi-threading" in Python when the GIL is such a well-known pain point for Python. Though tbf I think using the threading library...
I find it particularly strange that someone refers to "applying multi-threading" in Python when the GIL is such a well-known pain point for Python. Though tbf I think using the threading library with the GIL in place makes it function essentially how OP thinks node.js works (in contrast to asyncio which afaik generally works how node.js does).
I agree, but not for the reasons you said. Node's model is a cooperative scheduling model, which inhibits modularity because one concurrency domain can prevent another from making progress for...
I feel it's better to just implement multi-threading in your own code (as needed) instead of using something convoluted and confusing like Node.js. What say you?
I agree, but not for the reasons you said.
Node's model is a cooperative scheduling model, which inhibits modularity because one concurrency domain can prevent another from making progress for arbitrarily long, which can harm latency.
It also inhibits scalability, since it can only use one core at a time; you can ameliorate this by using many processes, but then you lose the ability to share data and state between them.
(I have deliberately avoided using the word 'thread', because it's poorly defined in context and I don't think it matters too much in this case what it means.)
Sort of—see object-relational mismatch—for this sort of application you're almost certainly using postgres for persistence anyway; but in general insofar as you have the opportunity to decouple...
Sort of—see object-relational mismatch—for this sort of application you're almost certainly using postgres for persistence anyway; but in general insofar as you have the opportunity to decouple your data model from it, you should take it. (Also, a little bird tells me redis doesn't scale, but that's neither here nor there.) Also, the cost of interprocess communication is very great; whereas, I can look up a key in a shared hash table in nanoseconds.
The cost is latency which doesn’t matter if you use it sparingly. The total request latency is measured in milliseconds anyway. Could be tens or hundreds of milliseconds. What does a 100...
The cost is latency which doesn’t matter if you use it sparingly. The total request latency is measured in milliseconds anyway. Could be tens or hundreds of milliseconds. What does a 100 microsecond lookup matter?
I think those design decisions make sense given node's origins as a web server - a task that's generally far more I/O bound than CPU bound when it comes to scheduling, and one that primarily...
I think those design decisions make sense given node's origins as a web server - a task that's generally far more I/O bound than CPU bound when it comes to scheduling, and one that primarily involves handling stateless HTTP requests that largely shouldn't need interprocess communication anyway.
It's fair to say that decision is probably showing some cracks now that node's become the de facto choice for a lot of local applications via wrappers like electron, though.
I don't write Node myself but do work heavily with async Python which works sort of similar. As another has already said, being event-driven or async is not the same as spawning threads. As with...
I don't write Node myself but do work heavily with async Python which works sort of similar. As another has already said, being event-driven or async is not the same as spawning threads. As with every other language, its benefits depends on the application usecase.
I find that the async event driven model we use in Python, and I assume the same would apply to Node here, is great for writing scalable APIs. We can process many concurrent requests without the pains of multi-threading, but still process all the requests "in parallel" because most of the time is spent waiting for the network. Running in containers in Kubernetes makes it in a non-issue that we only utilize one core at a time, we just start more pods if we are close to hitting the CPU limits. I am sure there are other cases where that model won't work as well.
NodeJS itself is single-threaded, but some operations (e.g. disk i/o) are delegated to the system. So even though there are effectively multiple threads of operations, there is always one single...
What Node is selling as "single threaded" applies to application or business logic we are writing, node itself can't be single threaded.
NodeJS itself is single-threaded, but some operations (e.g. disk i/o) are delegated to the system. So even though there are effectively multiple threads of operations, there is always one single thread assigned to Node. The OS is doing its own thing and will just ping Node once it's done.
Async/await and callbacks are not a magic bullet to make your code fast. If you need to do some heavy computations in JavaScript, writing an async function won't save you: the heavy computations will be done on the single thread, and it will block it. You can mitigate that by i.e. "breaking" huge loops every x operations to let the single thread breathe a bit and catch up on other operations it has to do, but that's it.
Async programming feels like it's multi threaded, but it's not. Code in callbacks isn't executed until the main synchronous flow is done ("returns"). Then the scheduled callbacks are run in-turn....
Async programming feels like it's multi threaded, but it's not. Code in callbacks isn't executed until the main synchronous flow is done ("returns"). Then the scheduled callbacks are run in-turn.
Besides what other commenters have mentioned, the reason I like async programming is because it's highly efficient and reduces latency and jank in user interfaces. I often see crappy UIs that lag out when you try to interact with them. That is easily avoidable with an async programming model.
The Python thing you should be comparing to is asyncio, not threads.
I'm not sure you quite understand what Node is doing. Callbacks aren't just spawning a thread and doing a blocking operation on it. It's an event loop that, when you make an async call, goes off and finds some other pending operation to do on the same thread.
It, uh, is pretty popular. No language will ever get 100% market share, but it caught on way more than I expected "what if javascript but outside of the browser" to.
They are actually kind of partially right, node does have sneaky hidden threads running in the background that it pretends not to have in certain cases, some of which are file system (fs), dns, crypto, and zlib, maybe others by now. And of course there are the kernel threads that are busy handling network io and other things.
You’re right though, they do have some fundamental misunderstanding of how node operates.
For small amounts of concurrent IO/threads it's relatively true that there's not much performance benefit to async IO over threads, but the difference is when you have a lot. Async IO doesn't use threads in the background but more efficient OS APIs that allow IO to be done without extra threads. This is important because each thread has its own performance and memory cost.
Async IO became popular in the context of the C10k problem, when it became obvious that having a thread per IO operation performed terribly at handling thousands of concurrent connections. Apache, the most popular webserver, was built on the threaded IO model, and it lost its top position to Nginx, which was built on the async IO model and performed much better at handling many concurrent connections.
As async IO became popular, developers found that it was hard to program for in existing programming environments (like C, Python, and Java), because most standard library functions and third-party libraries used blocking IO, and calling any blocking IO functions within async code would often silently negate the performance benefit or completely destroy the performance of async code. Developers had to carefully avoid using pre-existing code that used blocking IO.
The creator of Node, Ryan Dahl, decided that one way to solve this problem would be to create a brand new server programming ecosystem that (mostly) had no blocking IO functions and was built from the ground up for async IO. He realized that this would be hard to do within most existing language ecosystems, and that it would be easiest to do on a language not already commonly used on servers. Javascript was a convenient choice because it fit that, had no blocking IO in its standard library, and already had well-optimized implementations from its use in browsers. Unlike the common misconception, Node did not primarily exist to allow frontend JS developers to program on servers but to be a programming environment built on async IO.
I find it particularly strange that someone refers to "applying multi-threading" in Python when the GIL is such a well-known pain point for Python. Though tbf I think using the threading library with the GIL in place makes it function essentially how OP thinks node.js works (in contrast to asyncio which afaik generally works how node.js does).
I agree, but not for the reasons you said.
Node's model is a cooperative scheduling model, which inhibits modularity because one concurrency domain can prevent another from making progress for arbitrarily long, which can harm latency.
It also inhibits scalability, since it can only use one core at a time; you can ameliorate this by using many processes, but then you lose the ability to share data and state between them.
(I have deliberately avoided using the word 'thread', because it's poorly defined in context and I don't think it matters too much in this case what it means.)
This is why Redis and Postgres exist
Sort of—see object-relational mismatch—for this sort of application you're almost certainly using postgres for persistence anyway; but in general insofar as you have the opportunity to decouple your data model from it, you should take it. (Also, a little bird tells me redis doesn't scale, but that's neither here nor there.) Also, the cost of interprocess communication is very great; whereas, I can look up a key in a shared hash table in nanoseconds.
The cost is latency which doesn’t matter if you use it sparingly. The total request latency is measured in milliseconds anyway. Could be tens or hundreds of milliseconds. What does a 100 microsecond lookup matter?
Yes, that's the problem.
I think those design decisions make sense given node's origins as a web server - a task that's generally far more I/O bound than CPU bound when it comes to scheduling, and one that primarily involves handling stateless HTTP requests that largely shouldn't need interprocess communication anyway.
It's fair to say that decision is probably showing some cracks now that node's become the de facto choice for a lot of local applications via wrappers like electron, though.
I don't write Node myself but do work heavily with async Python which works sort of similar. As another has already said, being event-driven or async is not the same as spawning threads. As with every other language, its benefits depends on the application usecase.
I find that the async event driven model we use in Python, and I assume the same would apply to Node here, is great for writing scalable APIs. We can process many concurrent requests without the pains of multi-threading, but still process all the requests "in parallel" because most of the time is spent waiting for the network. Running in containers in Kubernetes makes it in a non-issue that we only utilize one core at a time, we just start more pods if we are close to hitting the CPU limits. I am sure there are other cases where that model won't work as well.
NodeJS itself is single-threaded, but some operations (e.g. disk i/o) are delegated to the system. So even though there are effectively multiple threads of operations, there is always one single thread assigned to Node. The OS is doing its own thing and will just ping Node once it's done.
Async/await and callbacks are not a magic bullet to make your code fast. If you need to do some heavy computations in JavaScript, writing an async function won't save you: the heavy computations will be done on the single thread, and it will block it. You can mitigate that by i.e. "breaking" huge loops every x operations to let the single thread breathe a bit and catch up on other operations it has to do, but that's it.
You can also use workers, if what you're doing is cpu intensive, which are on a separate thread from the main process.
Async programming feels like it's multi threaded, but it's not. Code in callbacks isn't executed until the main synchronous flow is done ("returns"). Then the scheduled callbacks are run in-turn.
I'm not a fan of JavaScript. Dart does async programming much better, one reason being that a sound type system helps with understanding async return values. If you can wrap your head around how Future and Stream work, now you know async.
Besides what other commenters have mentioned, the reason I like async programming is because it's highly efficient and reduces latency and jank in user interfaces. I often see crappy UIs that lag out when you try to interact with them. That is easily avoidable with an async programming model.