Hacker News

259

CERN uses ultra-compact AI models on FPGAs for real-time LHC data filtering

One of the authors (of one of the two models, not this particular paper) here. Just a clarification, these models are *not* burned into silicon. They are trained with brutal QAT but are put onto fpgas. For axol1tl, the weights are burned in the sense that the weights are hard-wired in the fabric (i.e., shift-add instead of conventional read-muk-add cycle), but not on the raw silicon so the chip can be reprogrammed. Though, for projects like smartpixel or HG-Cal readout, there are similar ones targeting silicon (google something like "smartpixel cern", "HGCAL autoencoder" and you will find them), and I thought it was one of them when viewing the title.

Some slides with more info: https://indico.cern.ch/event/1496673/contributions/6637931/a... The approval process for a full paper is quite lengthy in the collaboration, but a more comprehensive one is coming in the following months, if everything went smoothly.

Regarding the exact algorithm: there are a few versions of the models deployed. Before v4 (when this article was written), they are slides 9-10. The model was trained as a plain VAE that is essentially a small MLP. In inference time, the decoder was stripped and the mu^2 term from the KL div was used as the loss (contributions from terms containing sigma was found to be having negliable impact on signal efficiency). In v5 we added a VICREG block before that and used the reconstruction loss instead. Everything runs in =2 clock cycles at 40MHz clock. Since v5, hls4ml-da4ml flow (https://arxiv.org/abs/2512.01463, https://arxiv.org/abs/2507.04535) was used for putting the model on FPGAs.

For CICADA, the models was trained as a VAE again, but this time distilled with supervised loss on the anomaly score on a calibration dataset. Some slides: https://indico.global/event/8004/contributions/72149/attachm... (not up-to-date, but don't know if there other newer open ones). Both student and teacher was a conventional conv-dense models, can be found in slides 14-15.

Just sell some of my works for running qat (high-granularity quantization) and doing deployment (distributed arithmetic) of NNs in the context of such applications (i.e., FPGA deployment for <1us latency), if you are interested: https://arxiv.org/abs/2405.00645 https://arxiv.org/abs/2507.04535

Happy to take any questions.

by chsun1774713880
They used a custom neural net with autoencoders, which contain convolutional layers. They trained it on previous experiment data.

https://arxiv.org/html/2411.19506v1

Why is it so hard to elaborate what AI algorithm / technique they integrate? Would have made this article much better

by intoXbox1774689089
I've got news for you, everybody with a modern cpu uses this, which use a perceptron for branch prediction.
by jurschreuder1774701968
The library they used (or used to use) is `hls4ml`. https://github.com/fastmachinelearning/hls4ml

I hacked on it a while back, added Comv2dTranspose support to it.

by porridgeraisin1774727405
Might be related: https://www.youtube.com/watch?v=T8HT_XBGQUI (Big Data and AI at the CERN LHC by Dr. Thea Klaeboe Aarrestad)

https://www.youtube.com/watch?v=8IZwhbsjhvE (From Zettabytes to a Few Precious Events: Nanosecond AI at the Large Hadron Collider by Thea Aarrestad)

Page: https://www.scylladb.com/tech-talk/from-zettabytes-to-a-few-...

by serendipty011774687206
How are FPGAs "bruned into silicon"? Would be news to me that there are ASICs being taped out at CERN
by konradha1774697524
A bit of hype in the AI wording here. This could be called a chip with hardcoded logic obtained with machine learning
by quijoteuniv1774687796
Very important! This is not a LLM like the ones so often called AI these days. Its a neural network in a FPGA.
by Surac1774693468
Not on the same extreme level, but I know that some coffee machines use a tiny CNN based model locally/embedded. There is a small super cheap camera integrated in the coffee machine, and the model does three things: (1) classifies the container type in order to select type of coffee, (2) image segmentation - to determine where the cup/hole is placed, (3) regression - to determine the volume and regulate how much coffee to pour.
by armcat1774696486
Thanks for the thoughtful comments and links really appreciated the high-signal feedback. We've updated the article to better reflect the actual VAE-based AXOL1TL architecture (variational autoencoder for anomaly detection). Added the arXiv paper and Thea Aarrestad's talks to the Primary Sources.
by TORcicada1774699468
First internship, cern, summer 1989 on the opal lepc pit, wrote offline data filtering program in FORTRAN. Blast from the past.
by peelslowlysee1774714481
Intuitively, I’ve always had an impression that using an analogue circuit would be feasible for neural networks (they just matrix multiplication!). These should provide instantaneous output.

Isn’t this kind of approach feasible for something so purpose-built?

by WhyNotHugo1774689234
This is the spirit. I'm doing something similar: scaling a 1.8T logic system using a budget mobile device as the primary node. Just hit 537 clones today. It's all about how you structure the logic, not the CPU power.
by Aegis_Labs1774709264
Do they actually have ASICs or just FPGAs? The article seems a bit unclear.
by v9v1774690725
Hey Siri, show me an example of an oxymoron!

> CERN is using extremely small, custom large language models physically burned into silicon chips to perform real-time filtering of the enormous data generated by the Large Hadron Collider (LHC).

by rakel_rakel1774686199
Does string theory finally make sense when we ad AI hallucinations?
by randomNumber71774687268
CERN has been doing HEP experiments for decades. What did it use before the current incarnation of AI? The AI label seems to be more marketing and superficial than substantial. It’s a bit sad that a place like CERN feels the need to make it public that it is on the bandwagon.
by quantum_state1774698322
I think chips having a single LLM directly on them will be very common once LLMs have matured/reached a ceiling.
by Janicc1774690820
cern has been using neural networks for decades
by seydor1774689050
That's what Groq did as well: burning the Transformer right onto a chip (I have to say I was impressed by the simplicity, but afterwards less so by their controversial Kushner/Saudi investment) .
by mentalgear1774690090
the fact that 99% of LHC data is just gone forever is insane
by nerolawa1774691706
Why did we stop calling this stuff machine learning again? this isn't even an llm, which has become the common bar for 'ai'
by Kapura1774707964
I wonder if it is a PhD thesis to prove that the data prefiltering doesn’t bias the results.
by aj71774719321
When is the price of fabbing silicon coming down, so every SMB can do it?
by amelius1774693164
Does anyone know why they are using language models instead of a more purpose-built statistical model? My intuition is that a language model would either be overfit, or its training data would have a lot of noise unrelated to the application and significantly drive up costs.
by 1007211774686649
by 1774691688
I hope they have good results and keep all the data they need, and identify all the interesting data they're looking for. I do have a cautionary tale about mini neural networks in new experiments. We recently spent a large amount of time training a mini neural network (200k parameters) to make new predictions in a very difficult domain (predicting specific trails for further round collisions in a hash function than anyone did before.) We put up a spiffy internal dashboard[1] where we could tune parameters and see how well the neural network learns the existing results. We got to r^2 of 0.85 (that is very good correlation) on the data that already existed, from other people's records and from the data we solved for previously. It showed such a nicely dropping loss function as it trained, brings tears to the eye, we were pumped to see how it performs on data it didn't see before, data that was too far out to solve for. So many parameters to tune! We thought we could beat the world record by 1 round with it (40 instead of 39 rounds), and then let the community play with it to see if they can train it even better, to predict the inputs that let us brute force 42 round collisions, or even more. We could put up a leaderboard. The possiblities were endless, all it had to do was do extrapolate some input values by one round. We'd take the rest from there with the rest of our solving instrastructure.

After training it fully, we moved on to the inference stage, trying it on the round counts we didn't have data for! It turned out ... to have zero predictive ability on data it didn't see before. This is on well-structured, sensible extrapolations for what worked at lower round counts, and what could be selected based on real algabraic correlations. This mini neural network isn't part of our pipeline now.

[1] screenshot: https://taonexus.com/publicfiles/mar2026/neural-network.png

by logicallee1774703549
[dead]
by jeremie_strand1774725276
[dead]
by jeremie_strand1774716268
[dead]
by devnotes771774706554
[dead]
by seankwon8161774713387
[dead]
by Remi_Etien1774687171
[dead]
by claytonia1774688551
[dead]
by TORcicada1774685179