So, they're selling this as an AI accelerator, with drop in compatibility with existing boards, and no boost to RAM bandwidth.
As I understand things, it would be extremely unusual to ship a chip that was bound by floating point throughput, not uncached memory access, especially in the desktop/laptop space.
I haven't been following the Intel server space too carefully, so it's an honest question: Was the old thing compute and not bandwidth limited, or is this going to be running inference at the same throughput (though maybe with lower power consumption)?
No, they're not selling this as an "AI accelerator":
Here is the quote:
"The company says operators deploying 5G Advanced and future 6G networks increasingly rely on server CPUs for virtualized RAN and edge AI inference, as they do not want to re-architect their data centers in a bid to accommodate AI accelerators."
Edge AI usually means very small models that run fine on CPUs.
Perhaps instead of posting erroneous assertions to HN you could wander over to your LLM of choice and ask it something along the lines of: What are some examples of edge AI applications that achieve good performance on a CPU where memory bandwidth is severely limited compared to a GPU? Please link to publicly available models where possible.
I run AI applications all the time in exactly those situations. The models range from 2GB (vector models) 30GB (small LLMs) to 100GB (medium LLMs).
None of those fit in 4MB of cache (the per-core on this part), or 1GB (the aggregate cache).
What AI models are you actually talking about? Do you mean old-school ML stuff, like decision trees or high dimensional indexes? No one I know calls those "AI", which is generally reserved for big-ish neural networks.
"Exactly those situations" you say while describing an entirely different sort of model. Your first clue that you're missing knowledge should have been the part where the thing that the well financed experts were doing didn't make sense to you. Your second clue should have been the part where what I was saying didn't seem to match up with your experience.
I let you know that your were uninformed and even suggested a very low effort way that you might look into the matter. So why didn't you do that?
A couple fairly arbitrary examples. A high performance zero shot TTS model can weigh in at well under 150 MiB. You can solve MNIST (ie perform OCR of handwritten english) to better than 99% accuracy with a sub-100 KiB model. Your LLM of choice will be able to provide you with plenty of others.
As I understand things, it would be extremely unusual to ship a chip that was bound by floating point throughput, not uncached memory access, especially in the desktop/laptop space.
I haven't been following the Intel server space too carefully, so it's an honest question: Was the old thing compute and not bandwidth limited, or is this going to be running inference at the same throughput (though maybe with lower power consumption)?