IMO the risc vs cisc debate is often just people looking at correlation. I do think that ARM CPUs, particularly by Apple, show noticeably better performance per watt than x86 CPUs from Intel and...
IMO the risc vs cisc debate is often just people looking at correlation. I do think that ARM CPUs, particularly by Apple, show noticeably better performance per watt than x86 CPUs from Intel and AMD, but that's because Apple, and ARM CPUs in general, have their architecture born from the demands of mobile devices, whereas Intel and AMD had their roots in desktop and servers, where power is plentiful.
That's not coincidence, per say - the correlation is more that x86 is an enforced duopoly, so if you wanted to make a CPU, you pretty much had to go with ARM.
In that respect, x86 may have to die, but not because it's CISC, but rather because it's licensing ensures that only two companies will ever make x86 processors.
Or in analogous terms, it would be like attributing the number of top universities in English speaking countries to the properties of the English language.
Realistically, x86 likely could have been around forever and become the de-facto instruction set for basically any computing implementation. It's not going to be because Intel have been...
Realistically, x86 likely could have been around forever and become the de-facto instruction set for basically any computing implementation. It's not going to be because Intel have been ridiculously litigious with their license, and have exploited new features that are patented and implemented into their chips to somehow refresh their patent expiration date. The 8086 was released in 1978. The patent on x86 should have expired decades ago, but it hasn't.
Processors and computers in general are so complicated that switching to a different instruction set is such a massive, gargantuan effort that they're effectively natural monopolies, and probably should have been regulated that way. Failing that, patent law should have been fairly applied to Intel, and competitors should have been free to make x86 chips since the late 90s. Because neither of those things have happened, x86 never really existed in the mobile market, where so much development money has been invested. This is going to hurt Intel in the long run if the continuing trend towards more license friendly instruction sets holds, no matter what architecture they're built on.
I don't see why anybody couldn't make a brand new 8086 today, the relevant patents have all expired. The problem is that there isn't much use in the modern world for any 20+ year old x86 chip designs.
I don't see why anybody couldn't make a brand new 8086 today, the relevant patents have all expired. The problem is that there isn't much use in the modern world for any 20+ year old x86 chip designs.
They could, but the heaps of extensions to the instruction set that modern applications all expect wouldn't be implemented. Intel routinely comes out with those extensions and tacks then into the...
They could, but the heaps of extensions to the instruction set that modern applications all expect wouldn't be implemented.
Intel routinely comes out with those extensions and tacks then into the base set, then operating systems and applications use them and expect them, and Intel and AMD get to keep their duopoly.
The only reason why AMD is even in the game at all is because they violated Intel's parents, and after Intel took them to court, they settled with an agreement to allow AMD to continue making x86 chips. They then cemented themselves by developing x86-64 and allowing Intel to make chips with that architecture.
Any lesser company would have just been sued to oblivion, and I doubt that nowadays any company would be able to get away with the stunt AMD pulled.
That means that effectively, x86 can only be produced by two companies, which kills any real motivation Intel has to compete, because the costs of changing to an entirely new instruction set is so monumental.
Exactly. I am exited by RISC V because I hope it will encourage more competition meaning a more diverse chip-landscape and cheaper mainstream chips. I thought that it was faster and more energy...
Exactly. I am exited by RISC V because I hope it will encourage more competition meaning a more diverse chip-landscape and cheaper mainstream chips. I thought that it was faster and more energy efficient because I hear that a lot but in the end I really don't care. If the open standard was CISC and the closed one RISC, I would have been excited for CISC.
I think RISC V has the problem of insufficient investment to produce advanced enough parts to take the place of a desktop computer or a phone. This StarFive board...
I think RISC V has the problem of insufficient investment to produce advanced enough parts to take the place of a desktop computer or a phone. This StarFive board
is about as expensive as a Raspberry Pi and you could do Linux development on it but you can't quite watch YouTube on it. Somebody has to get behind it in a big way, I can picture China liking the idea of an instruction set which is not encumbered by western IP. Maybe next year we get a board like that but better.
I can't see it making a practical difference for most of us. There are already a staggering variety of ARM chips for sale. Many of them are very cheap. Someone who is an electronics geek and...
I can't see it making a practical difference for most of us. There are already a staggering variety of ARM chips for sale. Many of them are very cheap. Someone who is an electronics geek and designs their own circuit boards can buy decent ARM-based microcontrollers for 70 cents each. You can get microcontroller boards with WiFi for a few bucks on Amazon.
Any licensing cost is entirely hidden from us. Only compiler writers and chip designers will have any reason to care what the ISA is; for the rest of us it's a compiler configuration setting or maybe a different binary to download.
(And even that might be hidden away if WebAssembly gets more adoption.)
The reason why I am hoping for RISC V is a bit more simple; I am hoping for an open system. The reason why so many phones are so poorly supported is because performant ARM SoCs are built with tons...
The reason why I am hoping for RISC V is a bit more simple; I am hoping for an open system. The reason why so many phones are so poorly supported is because performant ARM SoCs are built with tons of black boxes in them. It takes a huge amount of effort to reverse engineer them to get them working properly, and even then some peripherals don’t work right. Apples M series chips have a lot of people working on supporting them - probably more than any other ARM SoC - and the M1 has been out for a while, but there is still some errata to be ironed out. And predictably we still don’t know much about the GPU.
Ultimately those are mostly separate things, though. It's not the ISA that people have trouble reverse engineering - RISC-V designs can be just as black-boxy as ARM ones.
Ultimately those are mostly separate things, though. It's not the ISA that people have trouble reverse engineering - RISC-V designs can be just as black-boxy as ARM ones.
Afaik black boxes are still entirely possible with Risc-V due to it's modular design which allows a manufacturer to implement their own custom extensions for specific functionality.
Afaik black boxes are still entirely possible with Risc-V due to it's modular design which allows a manufacturer to implement their own custom extensions for specific functionality.
there is no reason that a riscv core would be more open than any other. and the reverse engineering on apple computers mostly has nothing to do with the cpu itself (dougall johnson did do some...
there is no reason that a riscv core would be more open than any other. and the reverse engineering on apple computers mostly has nothing to do with the cpu itself (dougall johnson did do some work on the cpu, but that was just for fun).
there is a difference between the inner workings of the components being open, and the interface to them being open. the reason x86 pcs are 'well supported' is that the interfaces are, all things considered, incredibly well standardised, and you can in many cases just grab a spec document and read it. (sometimes you're supposed to pay for them. i'm not sure where i got my copy of the pci specs, tbh, uh...)
so if all you care about is the interfaces, then the x86 pc is about as open as it gets
if you are looking for an open (internals) cpu project, see libre-soc (which is based on power, not riscv). if you are looking for an open computer project then, um, keep looking
I am well aware of all of this. I am hoping for change, and I am hoping that RISC V is a sign that a completely open computer system is on the horizon.
I am well aware of all of this. I am hoping for change, and I am hoping that RISC V is a sign that a completely open computer system is on the horizon.
People who work building these chips have known for a while that the distinction has been somewhat useless. x86 today is just as divorced from the chips that gave it its namesake as ARM or MIPS or...
People who work building these chips have known for a while that the distinction has been somewhat useless. x86 today is just as divorced from the chips that gave it its namesake as ARM or MIPS or AVR are, and that transition came about around the time of the release of the Pentium Pro. One company in the early naughts had a chip designed to have a programmable decode stage so the same chip could be put in place of multiple architectures.
If you want to know why x86 hasn’t done well in the embedded space or why ARM hasn’t done well in the consumer or server space, it’s rather simple; the companies involved simply haven’t done the same amount of work in the areas important to those niches. ARM prioritized efficiency while x86 prioritized interoperability and performance. Today x86 has made great strides in efficiency and ARM vendors have made great strides in performance.
that's not javascript float/int conversions; it's x86 float/int conversions, which were adopted by js. and arm's squeamish about naming x86 for obvious reasons, so they called it 'js' instead....
that's not javascript float/int conversions; it's x86 float/int conversions, which were adopted by js. and arm's squeamish about naming x86 for obvious reasons, so they called it 'js' instead. aarch64 also e.g. has 32 registers partly to make it easier to emulate x86, and they recently standardised apple's instructions for emulating x86 flag handling
My understanding is that Intel has two problems: (1) An x86 instruction can be up to 15 bytes long; a modern CPU will decode several instructions per clock cycle so the instruction decoder is a...
My understanding is that Intel has two problems:
(1) An x86 instruction can be up to 15 bytes long; a modern CPU will decode several instructions per clock cycle so the instruction decoder is a complex machine with wide data paths, lots of parts, at a rate limiting step before that workload is dispatched to multiple long pipelines. Intel has to work hard on that part. They must make it up partially by having long but powerful instructions.
(2) But there Intel does not do a great job of getting the newest features into a wide enough range of chips that developers regularly exploit these features. For instance the SIMD approach manifest in
requires you to recode or at least recompile your applications, quite likely the first because SIMD instructions reward assembly language optimization of the inner loops of applications. If you buy your own servers you can target a particular version but if you are publishing software to a wide audience you are going to pick the least common denominator. It doesn't help that Intel drags its feet about getting the features out there, possibly they made you a core that has advanced SIMD but they fused it off because they'd like to charge you more to access it, or maybe it is joined to an "efficiency core" that doesn't support it so turned off in the "performance core", etc.
People don't like it that Windows 11 wants a newer machine but the Windows culture of backwards compatibility is a part of Intel's problem. Microsoft just recently announced that Windows 11 will require the 14 year old POPCNT instruction. It's a scandal that our computers have been living below their potential all this time.
Meh ... not really. Decode is a thing, sure, and I don't envy the engineers that have to build fast hardware decoders, but it's worth looking at actual numbers. Typical execution averages about 2...
Exemplary
Meh ... not really.
Decode is a thing, sure, and I don't envy the engineers that have to build fast hardware decoders, but it's worth looking at actual numbers. Typical execution averages about 2 uops/cycle on a good day; intel has been at 4 decode/rename per cycle for a while now (we want the latter number to be larger than the former because it's bursty and there's a big queue in between, but how much larger do we need?); apple decode is 8/cycle. Recently, intel found a new strategy for accelerating parallel decode (don't remember the details, but I think it involves breadcrumbs marking the starts of instructions in the instruction cache?), employed in its little cpus and soon to be incorporated into the big ones, which gets it up to 6 instructions/cycle (and note that, due to memory operands, an x86 instruction not infrequently decodes to two uops). So this is not the biggest deal. This doesn't really affect 'pipeline length' insofar as it matters for branch mispredictions—the penalty for those is similar on intel and apple—because of the uop cache. My biggest complaint about the amd64 encoding is that it's not as dense as it could be, due to shortsighted (though likely justifiable) choices made by amd back in the day.
the SIMD approach manifest in AVX-512 requires you to recode or at least recompile your applications
If you're talking about variable-length vectors then 1, those are not a good idea (some exposition here) and 2, the interesting parts of avx512 have nothing to do with the size extension.
Avx512 wasn't fused off of adl big cores because of market segmentation; it was fused off because 1, it wasn't validated and 2, it wasn't supported on the little cores and software couldn't deal with that. (Imo software should be able to deal with it, but the fact is that it couldn't. I also think it should have been a toggleable default-off option.) Upcoming little cores can deal with avx512, now called avx10 and including a mode in which only 256-bit ops are supported.
If the point is that shipping is hard then, well, yes. But that goes for everybody; sve is barely shipping now.
If the point is that isas are bad then, well, yes. But, again, that goes for everybody. Ditto popcnt. It's not like aarch64 doesn't have the exact same problems; they come with the territory.
Requiring popcnt is not the same as detecting whether the hardware supports it and using it only if so (which is what good software does). Yes, that kinda sucks, but it's what you do when you have isas to deal with, and for workloads that really care about such features, that is already what they do; your computer has not been 'living below its potential'. (Edit: worth emphasising: workloads that care—most don't. Somebody attempted to recompile the entire userspace of some linux distro targeting x86-64-v3, and found no significant performance improvements. I'm sure the benchmarks were bad, as most are, and I don't remember the link, but the point stands: you are not going to see massive wins on stuff that doesn't really care, and stuff that really cares is already using the relevant functionality.)
If I had to guess, I would say windows is upping its hardware requirements because it's incredibly slow. Why is it so slow? No fucking idea. My old laptop runs freebsd perfectly fine, and win8 is okay—win10 dragged, and I refuse to ever use a newer version of windows ever again. No, this is just microsoft being dumb and hostile.
Backwards compatibility is a great thing. I am not looking forward to the fragmented hellscape that is arm taking over. (Or, worse, riscv—at least aarch64 is an actually good isa, and at least the core isa is well standardised even if the platform isn't.) x86 (or, I should say, x86-pc) pulled off the bios->uefi transition beautifully, I have no idea how.
make it up partially by having long but powerful instructions
Other than memory operands, this is not meaningfully more true for x86 than any other isa.
There's a couple other features holding intel back. One is the coherent instruction cache; I don't know how it manages that, but it doesn't seem to mind particularly. The other is the stronger concurrency model. It costs apple cpus about 10% to use the x86 concurrency model, so that's an upper bound (and it likely costs intel appreciably less).
People underestimate the generality of SIMD instructions and also the difficulty of actually using them. Lately people have found ways to greatly speed parsing of variable length UTF-8 characters...
People underestimate the generality of SIMD instructions and also the difficulty of actually using them.
Lately people have found ways to greatly speed parsing of variable length UTF-8 characters (a little bit like that instruction decoding) with recent SIMD instructions. This is critical in everything from data analysis tools to web browsers and has a definite effect on perceived speed and power consumption. My main beef about parsers today is that an emphasis on performance means parser generators are unergonomic, can’t do obvious things like “unparsing” an AST (steroids for metaprogramming) and limit the kind of languages people make. Still I can see why because parsers spend a lot of time grinding on text and the same tricks used for UTF-8 parsing ought to work for lexical analyzers if somebody makes a lexer generator that compiles to SIMD.
As for “test to see if the instruction is there and use it if it is” that’s a major PITA. It’s why Intel’s MKL is bigger than the typical electron app. BLAS is not that complex, but MKL contains multiple implementations of every algorithm.
I worked on a search engine for patents that trained a neural network for similarity search just at the time GPU computing was about to take off. The project had been going in circles for two years before I whipped it into shape and got it in front of customers. We had built the training and inference code with hand-written SIMD instructions (what a hassle to get all the derivatives right!). When we bought the servers to run it on in production these supported a new generation of SIMD instructions that might have accelerated the network by 70% or more but there was no way were going to rewrite that code. (If I had to do it on my own account I would have written a compiler simplified by rules like “all vectors lengths are a multiple of 8”.) Other parts of the system were worse in terms of slowness but we left SIMD performance on the table.
So far as ordinary quotidian software which is “write once, run on all sorts of customers computers”, the “support multiple instructions sets” is expensive to develop, expensive to test, expensive to debug. So mostly people don’t do it and stick to the lowest common denominator. All of the delays, mis-steps (cough… 10 nm) and ill thought out products have delayed Intel’s roadmap by years when really Intel should have been doing a lot more to put SIMD power in developers hands outside the national labs, Facebook, etc.
Worth a note, that last paragraph doesn’t shrink the amount of currently supported CPUs. It only affects people circumventing the hardware requirements.
Worth a note, that last paragraph doesn’t shrink the amount of currently supported CPUs. It only affects people circumventing the hardware requirements.
Right, officially Win 11 requires a much more recent processor than that, but it wasn't really using extra instructions until now. Contrast that to Apple's policy which is that you only get major...
Right, officially Win 11 requires a much more recent processor than that, but it wasn't really using extra instructions until now.
Contrast that to Apple's policy which is that you only get major OS upgrades for your Mac for so long so Apple can actually use newer instructions in their OS after some time passes.
IMO the risc vs cisc debate is often just people looking at correlation. I do think that ARM CPUs, particularly by Apple, show noticeably better performance per watt than x86 CPUs from Intel and AMD, but that's because Apple, and ARM CPUs in general, have their architecture born from the demands of mobile devices, whereas Intel and AMD had their roots in desktop and servers, where power is plentiful.
That's not coincidence, per say - the correlation is more that x86 is an enforced duopoly, so if you wanted to make a CPU, you pretty much had to go with ARM.
In that respect, x86 may have to die, but not because it's CISC, but rather because it's licensing ensures that only two companies will ever make x86 processors.
Or in analogous terms, it would be like attributing the number of top universities in English speaking countries to the properties of the English language.
Realistically, x86 likely could have been around forever and become the de-facto instruction set for basically any computing implementation. It's not going to be because Intel have been ridiculously litigious with their license, and have exploited new features that are patented and implemented into their chips to somehow refresh their patent expiration date. The 8086 was released in 1978. The patent on x86 should have expired decades ago, but it hasn't.
Processors and computers in general are so complicated that switching to a different instruction set is such a massive, gargantuan effort that they're effectively natural monopolies, and probably should have been regulated that way. Failing that, patent law should have been fairly applied to Intel, and competitors should have been free to make x86 chips since the late 90s. Because neither of those things have happened, x86 never really existed in the mobile market, where so much development money has been invested. This is going to hurt Intel in the long run if the continuing trend towards more license friendly instruction sets holds, no matter what architecture they're built on.
Presumably you can make an 8086 CPU today legally, correct?
I don't see why anybody couldn't make a brand new 8086 today, the relevant patents have all expired. The problem is that there isn't much use in the modern world for any 20+ year old x86 chip designs.
They could, but the heaps of extensions to the instruction set that modern applications all expect wouldn't be implemented.
Intel routinely comes out with those extensions and tacks then into the base set, then operating systems and applications use them and expect them, and Intel and AMD get to keep their duopoly.
The only reason why AMD is even in the game at all is because they violated Intel's parents, and after Intel took them to court, they settled with an agreement to allow AMD to continue making x86 chips. They then cemented themselves by developing x86-64 and allowing Intel to make chips with that architecture.
Any lesser company would have just been sued to oblivion, and I doubt that nowadays any company would be able to get away with the stunt AMD pulled.
That means that effectively, x86 can only be produced by two companies, which kills any real motivation Intel has to compete, because the costs of changing to an entirely new instruction set is so monumental.
Exactly. I am exited by RISC V because I hope it will encourage more competition meaning a more diverse chip-landscape and cheaper mainstream chips. I thought that it was faster and more energy efficient because I hear that a lot but in the end I really don't care. If the open standard was CISC and the closed one RISC, I would have been excited for CISC.
I think RISC V has the problem of insufficient investment to produce advanced enough parts to take the place of a desktop computer or a phone. This StarFive board
https://www.starfivetech.com/en/site/boards
is about as expensive as a Raspberry Pi and you could do Linux development on it but you can't quite watch YouTube on it. Somebody has to get behind it in a big way, I can picture China liking the idea of an instruction set which is not encumbered by western IP. Maybe next year we get a board like that but better.
I can't see it making a practical difference for most of us. There are already a staggering variety of ARM chips for sale. Many of them are very cheap. Someone who is an electronics geek and designs their own circuit boards can buy decent ARM-based microcontrollers for 70 cents each. You can get microcontroller boards with WiFi for a few bucks on Amazon.
Any licensing cost is entirely hidden from us. Only compiler writers and chip designers will have any reason to care what the ISA is; for the rest of us it's a compiler configuration setting or maybe a different binary to download.
(And even that might be hidden away if WebAssembly gets more adoption.)
The reason why I am hoping for RISC V is a bit more simple; I am hoping for an open system. The reason why so many phones are so poorly supported is because performant ARM SoCs are built with tons of black boxes in them. It takes a huge amount of effort to reverse engineer them to get them working properly, and even then some peripherals don’t work right. Apples M series chips have a lot of people working on supporting them - probably more than any other ARM SoC - and the M1 has been out for a while, but there is still some errata to be ironed out. And predictably we still don’t know much about the GPU.
Ultimately those are mostly separate things, though. It's not the ISA that people have trouble reverse engineering - RISC-V designs can be just as black-boxy as ARM ones.
That is true. But I think having an open core means that more open designs will show up in the market as well.
Afaik black boxes are still entirely possible with Risc-V due to it's modular design which allows a manufacturer to implement their own custom extensions for specific functionality.
there is no reason that a riscv core would be more open than any other. and the reverse engineering on apple computers mostly has nothing to do with the cpu itself (dougall johnson did do some work on the cpu, but that was just for fun).
there is a difference between the inner workings of the components being open, and the interface to them being open. the reason x86 pcs are 'well supported' is that the interfaces are, all things considered, incredibly well standardised, and you can in many cases just grab a spec document and read it. (sometimes you're supposed to pay for them. i'm not sure where i got my copy of the pci specs, tbh, uh...)
so if all you care about is the interfaces, then the x86 pc is about as open as it gets
if you are looking for an open (internals) cpu project, see libre-soc (which is based on power, not riscv). if you are looking for an open computer project then, um, keep looking
I am well aware of all of this. I am hoping for change, and I am hoping that RISC V is a sign that a completely open computer system is on the horizon.
People who work building these chips have known for a while that the distinction has been somewhat useless. x86 today is just as divorced from the chips that gave it its namesake as ARM or MIPS or AVR are, and that transition came about around the time of the release of the Pentium Pro. One company in the early naughts had a chip designed to have a programmable decode stage so the same chip could be put in place of multiple architectures.
If you want to know why x86 hasn’t done well in the embedded space or why ARM hasn’t done well in the consumer or server space, it’s rather simple; the companies involved simply haven’t done the same amount of work in the areas important to those niches. ARM prioritized efficiency while x86 prioritized interoperability and performance. Today x86 has made great strides in efficiency and ARM vendors have made great strides in performance.
ARM has an instruction dedicated to Javascript float/int conversions. Not exactly a reduced instruction set at that point.
that's not javascript float/int conversions; it's x86 float/int conversions, which were adopted by js. and arm's squeamish about naming x86 for obvious reasons, so they called it 'js' instead. aarch64 also e.g. has 32 registers partly to make it easier to emulate x86, and they recently standardised apple's instructions for emulating x86 flag handling
that's not what risc means
(ed: it is true that aarch64 is not meaningfully different from x86 as far as instruction count or complexity goes)
My understanding is that Intel has two problems:
(1) An x86 instruction can be up to 15 bytes long; a modern CPU will decode several instructions per clock cycle so the instruction decoder is a complex machine with wide data paths, lots of parts, at a rate limiting step before that workload is dispatched to multiple long pipelines. Intel has to work hard on that part. They must make it up partially by having long but powerful instructions.
(2) But there Intel does not do a great job of getting the newest features into a wide enough range of chips that developers regularly exploit these features. For instance the SIMD approach manifest in
https://en.wikipedia.org/wiki/AVX-512
requires you to recode or at least recompile your applications, quite likely the first because SIMD instructions reward assembly language optimization of the inner loops of applications. If you buy your own servers you can target a particular version but if you are publishing software to a wide audience you are going to pick the least common denominator. It doesn't help that Intel drags its feet about getting the features out there, possibly they made you a core that has advanced SIMD but they fused it off because they'd like to charge you more to access it, or maybe it is joined to an "efficiency core" that doesn't support it so turned off in the "performance core", etc.
People don't like it that Windows 11 wants a newer machine but the Windows culture of backwards compatibility is a part of Intel's problem. Microsoft just recently announced that Windows 11 will require the 14 year old POPCNT instruction. It's a scandal that our computers have been living below their potential all this time.
Meh ... not really.
Decode is a thing, sure, and I don't envy the engineers that have to build fast hardware decoders, but it's worth looking at actual numbers. Typical execution averages about 2 uops/cycle on a good day; intel has been at 4 decode/rename per cycle for a while now (we want the latter number to be larger than the former because it's bursty and there's a big queue in between, but how much larger do we need?); apple decode is 8/cycle. Recently, intel found a new strategy for accelerating parallel decode (don't remember the details, but I think it involves breadcrumbs marking the starts of instructions in the instruction cache?), employed in its little cpus and soon to be incorporated into the big ones, which gets it up to 6 instructions/cycle (and note that, due to memory operands, an x86 instruction not infrequently decodes to two uops). So this is not the biggest deal. This doesn't really affect 'pipeline length' insofar as it matters for branch mispredictions—the penalty for those is similar on intel and apple—because of the uop cache. My biggest complaint about the amd64 encoding is that it's not as dense as it could be, due to shortsighted (though likely justifiable) choices made by amd back in the day.
If you're talking about variable-length vectors then 1, those are not a good idea (some exposition here) and 2, the interesting parts of avx512 have nothing to do with the size extension.
Avx512 wasn't fused off of adl big cores because of market segmentation; it was fused off because 1, it wasn't validated and 2, it wasn't supported on the little cores and software couldn't deal with that. (Imo software should be able to deal with it, but the fact is that it couldn't. I also think it should have been a toggleable default-off option.) Upcoming little cores can deal with avx512, now called avx10 and including a mode in which only 256-bit ops are supported.
If the point is that shipping is hard then, well, yes. But that goes for everybody; sve is barely shipping now.
If the point is that isas are bad then, well, yes. But, again, that goes for everybody. Ditto popcnt. It's not like aarch64 doesn't have the exact same problems; they come with the territory.
Requiring popcnt is not the same as detecting whether the hardware supports it and using it only if so (which is what good software does). Yes, that kinda sucks, but it's what you do when you have isas to deal with, and for workloads that really care about such features, that is already what they do; your computer has not been 'living below its potential'. (Edit: worth emphasising: workloads that care—most don't. Somebody attempted to recompile the entire userspace of some linux distro targeting x86-64-v3, and found no significant performance improvements. I'm sure the benchmarks were bad, as most are, and I don't remember the link, but the point stands: you are not going to see massive wins on stuff that doesn't really care, and stuff that really cares is already using the relevant functionality.)
If I had to guess, I would say windows is upping its hardware requirements because it's incredibly slow. Why is it so slow? No fucking idea. My old laptop runs freebsd perfectly fine, and win8 is okay—win10 dragged, and I refuse to ever use a newer version of windows ever again. No, this is just microsoft being dumb and hostile.
Backwards compatibility is a great thing. I am not looking forward to the fragmented hellscape that is arm taking over. (Or, worse, riscv—at least aarch64 is an actually good isa, and at least the core isa is well standardised even if the platform isn't.) x86 (or, I should say, x86-pc) pulled off the bios->uefi transition beautifully, I have no idea how.
Other than memory operands, this is not meaningfully more true for x86 than any other isa.
There's a couple other features holding intel back. One is the coherent instruction cache; I don't know how it manages that, but it doesn't seem to mind particularly. The other is the stronger concurrency model. It costs apple cpus about 10% to use the x86 concurrency model, so that's an upper bound (and it likely costs intel appreciably less).
People underestimate the generality of SIMD instructions and also the difficulty of actually using them.
Lately people have found ways to greatly speed parsing of variable length UTF-8 characters (a little bit like that instruction decoding) with recent SIMD instructions. This is critical in everything from data analysis tools to web browsers and has a definite effect on perceived speed and power consumption. My main beef about parsers today is that an emphasis on performance means parser generators are unergonomic, can’t do obvious things like “unparsing” an AST (steroids for metaprogramming) and limit the kind of languages people make. Still I can see why because parsers spend a lot of time grinding on text and the same tricks used for UTF-8 parsing ought to work for lexical analyzers if somebody makes a lexer generator that compiles to SIMD.
As for “test to see if the instruction is there and use it if it is” that’s a major PITA. It’s why Intel’s MKL is bigger than the typical electron app. BLAS is not that complex, but MKL contains multiple implementations of every algorithm.
I worked on a search engine for patents that trained a neural network for similarity search just at the time GPU computing was about to take off. The project had been going in circles for two years before I whipped it into shape and got it in front of customers. We had built the training and inference code with hand-written SIMD instructions (what a hassle to get all the derivatives right!). When we bought the servers to run it on in production these supported a new generation of SIMD instructions that might have accelerated the network by 70% or more but there was no way were going to rewrite that code. (If I had to do it on my own account I would have written a compiler simplified by rules like “all vectors lengths are a multiple of 8”.) Other parts of the system were worse in terms of slowness but we left SIMD performance on the table.
So far as ordinary quotidian software which is “write once, run on all sorts of customers computers”, the “support multiple instructions sets” is expensive to develop, expensive to test, expensive to debug. So mostly people don’t do it and stick to the lowest common denominator. All of the delays, mis-steps (cough… 10 nm) and ill thought out products have delayed Intel’s roadmap by years when really Intel should have been doing a lot more to put SIMD power in developers hands outside the national labs, Facebook, etc.
Worth a note, that last paragraph doesn’t shrink the amount of currently supported CPUs. It only affects people circumventing the hardware requirements.
Right, officially Win 11 requires a much more recent processor than that, but it wasn't really using extra instructions until now.
Contrast that to Apple's policy which is that you only get major OS upgrades for your Mac for so long so Apple can actually use newer instructions in their OS after some time passes.