Grok 3: Another win for the bitter lesson

128 points by kiyanwang a day ago

bccdee a day ago

The creation of a model which is "co-state-of-the-art" (assuming it wasn't trained on the benchmarks directly) is not a win for scaling laws. I could just as easily claim out that xAI's failure to significantly outperform existing models despite "throwing more compute at Grok 3 than even OpenAI could" is further evidence that hyper-scaling is a dead end which will only yield incremental improvements.

Obviously more computing power makes the computer better. That is a completely banal observation. The rest of this 2000-word article is groping around for a way to take an insight based on the difference between '70s symbolic AI and the neural networks of the 2010s and apply it to the difference between GPT-4 and Grok 3 off the back of a single set of benchmarks. It's a bad article.

starspangled 15 hours ago

> The creation of a model which is "co-state-of-the-art" (assuming it wasn't trained on the benchmarks directly) is not a win for scaling laws.
Just based on the comparisons linked in the article, it's not "co-state-of-the-art", it's the clear leader. You might argue those numbers are wrong or not representative, but you can't accept them then claim it's not outperforming existing models.
horsawlarway a day ago

I agree.
There's a lot of attention being paid to metrics that often don't align all that well with actual production use-cases, and frankly the metrics are good but hardly breath-taking.
They have an absolutely insane outlay of additional compute, which appears to have given them a relatively paltry increase in capabilities.
15 times the compute for 5-15% better performance is basically the exact opposite of the bitter lesson.
Hell - it genuinely seems like the author didn't even read the actual bitter lesson.
The lesson is not "scale always wins" the lesson was "We have to learn the bitter lesson that building in how we think we think does not work in the long run."
And somewhat ironically - the latest advances seem to genuinely undermine the lesson. It turns out that building in reasoning/thinking (a heuristic that copies human behavior) is the biggest performance jump we've seen in the last year.
Does that mean we won't scale out of the current capabilities? No, we definitely might. But we also definitely might not.
The diminishing returns we're seeing for scale hint strongly that just throwing more compute at the problem is not enough by itself. Possibly still required, but definitely not sufficient.

bambax a day ago

This article is weak and just general speculation.

Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks. And Sabine Hossenfelder says this:

> Asked Grok 3 to explain Bell's theorem. It gets it wrong just like all other LLMs I have asked because it just repeats confused stuff that has been written elsewhere rather than looking at the actual theorem.

https://x.com/skdh/status/1892432032644354192

Which shows that "massive scaling", even enormous, gigantic scaling, doesn't improve intelligence one bit; it improves scope, maybe, or flexibility, or coverage, or something, but not "intelligence".

cardanome a day ago

> Sabine Hossenfelder
She really needs to stop commenting on topics outside of theoretical physics.
Even in physics she does not represent the scientific consensus but has some very questionable fringe beliefs like labeling whole sub-fields as "scams to get funding".
She regularly speaks with "scientific authority" about topics she barely knows anything about.
Her video on autism is considered super harmful and misleading by actual autistic people. She also thinks she is an expert on trans-issues and climate change. And I doubt she know enough about artificial intelligence and computer science to comment on LLMs.
- Mekoloto a day ago
  
  Your statement is missleading.
  She doesn't say she is an expert on trans-issues at all! She analyzed the studies and looked at data and stated that there is no real transpendemic but highlighed an statistical increased numbers in young woman without stating a clear opinion on this finding.
  The climate change videos do the same thing. She evaluates these studies discusses them to clarify that for her, certain numbers are unspecific and she also is not coming to a clear conclusion in sense of climate change yes, no, bad, good.
  She is for sure not an expert in all fields, but her way of discussing these topics are based on studies, numbers and is a good viewpoint.
  The funding scam you mention is a reference of "these people get billions for particle research but the outcome for us as society is way to small"
  - cardanome a day ago
    
    Having studied physics does not allow you to evaluate studies in completely unrelated field in any meaningful way.
    Especially not in such politically-charged fields that require deeper knowledge about the historical context, the different interest groups and their biases and so on.
    Her video on trans-issues labels people that advocate for the rights of trans-people as "extremists" and presents transphobic talking points as valid part of the scientific discussion.
    Her trying to appear "neutral" and "just presenting the science" is exactly the issue. Using her authority as a scientist when talking about topics she has no expertise in.
    Here is a debunking of her video on trans-issues: https://www.youtube.com/watch?v=r6Kau7bO3Fw
    Here is a longer criticism of her video on autism: https://www.youtube.com/watch?v=vaZZiX0veFY
    
    bflesch a day ago
    
    So where does your "scientific authority" come from, which is needed before criticizing someone according to your own logic?
    You're not even using your real name here. Nobody knows if you have any scientific qualifications, or a university degree at all.
    
    cardanome a day ago
    
    I am writing here as a random hacker-news user. I don't claim to have any authority.
    Sabine Hossenfelder presents her opinions as "what the science says" and that is the problem.
    
    dimal a day ago
    
    She does not. She specifically speaks against that kind of thinking. Recently, she stated it as, “Believe arguments, not people.” I couldn’t have said it better.
    She makes arguments, forcefully. That’s good. That’s what science is supposed to be. I don’t agree with her on everything, but I find her arguments engaging, and sometimes convincing, sometimes not. But her process is not dogmatic, as you’re trying to make it out to be.
    
    anonym29 a day ago
    
    HN mod dang, if you are reading this, I have a question. I was previously given a warning for a post that levied factual criticisms about the quality of source code contributions performed by a woman who had intentionally put work forward into a highly public open source project in her own name.
    I had specifically mentioned her by name in my criticisms, and I was given a written warning that doing so went against HN's policy on "targeted attacks" or "targeted harassment" or something similar.
    Why is it okay for this user to suggest that the act of this woman presenting her work publicly "is the problem", while it is a HN AUP offense for me to criticize the quality of the source code contributions written by another woman presenting her work publicly?
    I'm not requesting enforcement against this user or a retroactive removal of my warning, I'm just trying to understand the difference better to improve the conformance of my own discourse to the intent of HN's AUP.
    
    starspangled 15 hours ago
    
    Interesting question. I'm not sure what "woman" has to do with it since they're both women, but we'll go with it. It would be helpful if you could link your comment, but I guess it's been nuked. Anyway...
    Just because you can tie a person to work they have performed using public records does not seem like it should put them on the same level as someone who communicates with and creates work directly to the public, or a public figure. Not even if some of the actual work itself is performed in some open and observable space -- For example I don't think one has any more or less moral right to commentate on and publicly critique the work of a carpenter working on a building scaffold that's easily observable from the public street, than one does about a programmer working on their own idea from their own home in private. That seems like the immediate obvious difference between the two situations you describe. They don't sound equivalent at all, so I don't think you can win your case on that angle.
    But work by "non-public-figures" is frequently posted about and commented on at Hackernews. Obviously open source work is a significant source of such discussion simply because it is accessible. Therefore, clearly it's not entirely verboten to talk about that. Is it permitted to criticize? I don't have a particular example at hand but I'm quite certain that I've seen negative opinions about people's work on this site from time to time. I think this is the angle you could argue your case. Was it fair and consistent that yours was called an attack or harassment? Are similar criticisms of work by non-public-figures permitted on here? Without the full context we can't answer that.
    
    srid a day ago
    
    In my understanding, one reason Sabine gets readily attacked (as user `cardanome` does here) is because of her criticism of orthodox physics theories. She has famously exclaimed that all cosmological theories supposing a “beginning” of the universe are essentially “a creation myth written in the language of mathematics”.
    https://news.ycombinator.com/item?id=32618719
    
    jiggawatts 19 hours ago
    
    Read the linked blog post by Sabine again. That’s not what she says at all.
    She’s saying something much more specific about the earliest moments (milliseconds!!) of the history of the universe, and before that time, of which we have practically zero observational data, and can’t ever visit even in principle.
    She’s arguing against woo, against science fiction, against unsupported what-if musings that are fun to talk about — but are not science.
    
    latexr 21 hours ago
    
    It is extremely unlikely Dan is reading that, or the posts above yours. HN mods are only human and can’t see everything, that is why members have tools like flagging.
    If you want to contact Dan, email HN and make your case. The concat information is at the bottom of the page.
    
    pyinstallwoes a day ago
    
    She can’t state her opinion? lol.
    
    fx1994 a day ago
    
    random Internet user, yes, Sabine... no, it's all hyped, you know sometimes you just have to come to your own conclusions
    
    kordlessagain 10 hours ago
    
    Looking at this HN commentator's behavior, we can see the early stages of a troubling pattern:
    They start by attacking a physicist for being "neutral" and "just presenting the science" - exactly the kind of delegitimization of objectivity we see in early stages of information control Notice how they frame staying neutral as actively harmful - it's not just "wrong," but presented as dangerous because it doesn't take a strong enough stance against what they view as "extremist" positions Most tellingly, they're not arguing that her analysis is incorrect. Their complaint is that she's even allowing certain viewpoints to be examined objectively at all.
    This maps directly to historical patterns where:
    1. First you attack individuals for being neutral
    2. Then you establish that certain topics are "beyond" neutral analysis
    3. Finally you create an environment where examining data objectively becomes seen as suspicious or harmful
    This HN comment is a perfect micro-example of this - it's not even sophisticated gatekeeping, it's raw "how dare you look at this objectively when you should be taking my side." This kind of thinking, multiplied across society and amplified by modern media, is exactly how larger patterns of information control take hold.
    
    mistercheph a day ago
    
    > Having studied physics does not allow you to evaluate studies in completely unrelated field in any meaningful way.
    I agree! Before one may touch the pink sceptre, they must be permitted through the gate, and kissed by the doddling sheep, Harry, who will endow them with permission to pass and comment on many a great manor of thing which are simply out of reach of the natural human mind without these great blessings which we bestow. And, amen.
  - jiggawatts 19 hours ago
    
    > The funding scam you mention is a reference of "these people get billions for particle research but the outcome for us as society is way to small"
    More specifically, even particle physicists admit that a 2x or even a 10x bigger accelerator is not expected to find anything fundamentally new.
    The core criticism is that it has become a self-licking ice cream cone that serves no real purpose other than keeping physicists employed.
  - bccdee a day ago
    
    > for her, certain numbers are unspecific and she also is not coming to a clear conclusion in sense of climate change yes, no, bad, good.
    Climate chance is settled science. To claim that "certain numbers are unspecific, so I can't say whether climate change is real or not, or whether it's good or bad" (which, based on your paraphrasing, is what it sounds like she said) is an unacceptable position. It's muddying the waters.
    I'm not going to go watch her content about trans people, but it sounds like the same thing: Muddying the waters by Just Asking Questions about anti-trans "social contagion" talking points.
    ---
    EDIT: Okay I went back and watched some clips of her anti-trans video. She takes a pseudoscientific theory based on an opinion poll of parents active on an anti-trans web forum and suggests we take it seriously because "there is no conclusive evidence for or against it," as if the burden of proof weren't on the party making the positive claim, and as if the preponderance of evidence and academic consensus didn't overwhelmingly weigh against it. It's textbook laundering of pseudoscience. You've significantly misrepresented her position.
    
    toolz a day ago
    
    There's no such thing as "settled science". You can not prove that any scientific consensus has no flaws in the same way you can't prove the absence of bugs in any software. It's unproductive to treat science as anything more than an ongoing, constantly improving process.
    
    bccdee a day ago
    
    Yes there is. Germ theory is settled science. Is it theoretically possible that we'll overturn it? Sure. Is it likely? No. In the absence of any groundbreaking experimental results, it worth wasting time entertaining germ theory skepticism? Also no.
    > It's unproductive to treat science as anything more than an ongoing, constantly improving process.
    It's unproductive to constantly re-litigate questions like "is germ theory true" or "is global warming real" in the absence of any experimental results that seriously challenge those theories. Instead, we should put our effort into advancing medicine and fixing climate change, predicated on the settled science which makes both those fields possible.
    
    toolz a day ago
    
    > Germ theory is settled science. Is it theoretically possible that we'll overturn it? Sure.
    You need to understand that every single theory will be improved upon in the future. That means they will change and it's impossible to predict if these improvements will have consequences in different contexts where people incorrectly claim the science is settled.
    > It's unproductive to constantly re-litigate questions like "is germ theory true" or "is global warming real"
    Can you think of any cases where the science had nearly full consensus and it was useful to re-litigate? Galileo isn't the only example. I can think of many.
    
    mrguyorama 20 hours ago
    
    Newtonian physics is still settled science even though we have relativity to give more accurate results in domains where Newtonian mechanics fails. It still holds in all the same places it used to.
    You don't seem to understand how scientific models and theories work.
    In fact, germ theory of medicine is much the same way. Germ Theory does not explain or predict or account for ALL disease, for example PTSD, and if you build a useful theory for mental illnesses that aren't caused by little creatures of some sort, that doesn't overturn germ theory, it compliments it. A person creating a new theory of how Long Covid hurts people for example may not stick strictly to germ theory, but that would STILL not overturn germ theory.
    >Can you think of any cases where the science had nearly full consensus and it was useful to re-litigate? Galileo isn't the only example.
    Galileo isn't an example of the science being "settled" and someone radically overturning it. Nobody believed in geocentrism due to "Science", which is also why Galileo had so much difficulty, it was literally a religious matter. Kepler was about as close as we had to any sort of consistent theory to how the heavenly bodies moved, and it was not at all settled, and yet he was still basically right
    In actuality, there are remarkably few times where a theory was entirely overturned, especially by a new theory. When we know little enough about a field that we could get something so wrong, we usually don't have much in the way of "theory" and are still spitballing, and that's not considered settled science. If you want a good feeling for what this looks like, go read up on the debates science had when we first started looking at Statistical Mechanics and basics of thermodynamics. There were heated(lol) debates about the very philosophy of science, and whether we should really rely on theories that don't seem like they are physical, and that mostly went away as it continued to bear high quality predictions. The problems and places where theories are not great are usually well understood by the very scientists who work through a theory, because understanding the parameter space and confidence intervals for a theory are a requirement of using that theory successfully.
    "Human CO2 and other pollutants are the near totality of the cause of the globe warming" is settled science.
    "The globe is warming" is settled science
    "Global warming will cause changes in micro and macro climates all over" is settled science
    "A hotter globe will result in more energetic, chaotic, and potentially destructive weather" is settled science and obvious
    "Global warming is going to kill us all in a decade" is NOT settled science. There is no settled science for how bad climate change will make things for us, who will be worst affected, who might benefit, etc. There is comprehensive agreement among climate scientists that global warming is harmful to our future, and something we have to try and reduce the effect of, prepare for the outcomes of, and adapt to the consequences of, and something that, whether we do anything to combat it, will be immensely costly to handle.
    
    toolz 20 hours ago
    
    > Newtonian physics is still settled science even though we have relativity to give more accurate results..
    You've proved my entire point in your very first sentence and then go on to say I don't seem to understand how scientific models and theories work.
    > Nobody believed in geocentrism due to "Science"...
    This isn't a serious argument. Feel free to look up the works of Aristotle, who is sometimes called the first scientist.
    I don't have the energy to address the rest of your incorrect conjecture.
    
    Matthyze a day ago
    
    Spot on. Reminds me of that old approach by evangelicals to frame scientific consensus as 'just a theory.'
    
    barbazoo a day ago
    
    “Just a theory” is simply a signal that they have no clue about basics of scientific process.
    
    toolz a day ago
    
    Ironically, it's often attributed to religion that they claim settled truths that can't be proven.
    
    srid 21 hours ago
    
    We understand very little about human microbiota (therapies like fecal microbiota transplant, however, are promising) yet germ theory is "settled science"? Interesting.
    
    Mekoloto 19 hours ago
    
    [dead]
    
    pyinstallwoes a day ago
    
    It’s not settled.
    
    Mekoloto 18 hours ago
    
    [dead]
- dimal a day ago
  
  > Her video on autism is considered super harmful and misleading by actual autistic people.
  I’m autistic and I just watched her video. I found it to be one of the best primers on autism I’ve seen. Not complete, of course, and there’s a lot more nuance to it, but very even handed. She doesn’t make any judgements. She just gives the history and talks about the controversies without choosing sides, except to say that the idea of neurodiversity seems reasonable to her. When compared to most of the discourse about autism, it stands up pretty well. Of course, there’s a lot more I want people to know about autism, but it’s an ok start.
  Actually, many autistic people (myself included) would find your statement far more harmful. You assume that all autistic people think alike and believe the same thing. This is false. You try to defend us without understanding us.
  Don’t do that.
  I suppose there’s a possibility that you’re autistic and found it harmful to you. If so, don’t speak for me.
  And she was commenting on an AI’s knowledge of Bell’s Inequality, which is PHYSICS. If she can’t comment on that, who can?
  - cardanome a day ago
    
    There is a misunderstanding: I did not specify that ALL autistic people think like this. Just that autistic people found it harmful and misleading. There is quite a lot of autistic content creators criticizing the video. It does not mean every autistic person needs to feel the same way.
    I am neurodivergent myself but (probably) not autistic. The first time I watched the video I actually didn't think that it was that bad and had a similar reaction to you. But once I started to think about it and educate myself more on the topic I realized how bad it is.
    Sure it is not the worst video on autism but it still promotes some really bad ableist views.
    Autism speaks it is a horrible hate organization. I don't think there is a spoiler tag here so please skip this paragraph if ableism is triggering to you but there is a video of the autism speaks founder where she talks about how she at one point wanted to kill herself and her autistic child because she couldn't cope with having an autistic child and only didn't do it because of her other non-autistic child. She says that while her autistic child is in the background.
    I also didn't know about "aspie supremacy" and why people still use the term "asperger" despite being outdated. Hans Asperger was a Nazi Scientist who is responsible for killing thousands of children. He thought some autistic children might be useful as future scientists for the Nazi regime so assigned them the diagnoses "asperger syndrome" while the other autistic children were to be murdered.
    I recommend you watch Ember Green on this topic: https://www.youtube.com/watch?v=vaZZiX0veFY
    
    dimal a day ago
    
    You said "Her video on autism is considered super harmful and misleading by actual autistic people".
    While you didn't say ALL, you didn't clarify, so your wording says that autistic people categorically think her video is super harmful and misleading. That's simply not true.
    I'm in a bit of a minority in that I see autism through the neurodiversity lens, but I also think this tribal us-vs-them mentality is doing us more harm than good. Tarring and feathering people for not getting everything exactly "right" by my standards isn't helping anyone. It just causes people to throw up their hands and vote Trump.
    So, while I disagree with Autism Speaks on pretty much every point and I think they're extremely harmful, labeling them as a hate group is self-defeating. Parents with high-needs autistic children look to Autism Speaks. These parents love their children, but are being mislead. When you push people, they push back. If we shout "You're a hate group!" then all dialogue stops, and we can't help their children. And helping those children the important thing.
    Ironically, I think the problem for Sabine is simply miscommunication, which is due to her probably being autistic and not communicating according to neurotypical standards. She ended the video by showing how she scored high on an online test (which isn't definitive, of course), but then she dismisses it out of hand.
    I dismissed those tests too when I first took them, but that's because I still didn't really know what autism is, even after I did tons of research. I couldn't really know that by reading studies or talking to a psychologist. I didn't really know until I met other autistic people and realized that they actually "got" me in a way that no one else ever has.
    Her behavior is VERY typical of autistic people. Monotone voice, hyperlogical, hypermoral. She quit a successful career in physics for moral reasons. No neurotypical person would do that. An autistic person would. She wears the same velour shirt in every video. Sensory issues(?), repetitive behavior.
    It's up to her to make the call as to whether she's actually autistic or not, but I see her as one of us.
    And... I'm going to need a TLDR on that Ember Green video. It's a 2 hour commentary on a 25 minute video. Instead of asking me to unpack the arguments, make them.
- dauertewigkeit a day ago
  
  I agree with you that Sabine often talks about matters far outside of her expertise, but as somebody with a foot in academia, I would bet that a very large number of academics have at least one academic research direction in mind that they would categorize as a "scam to get funding".
- hitekker 17 hours ago
  
  I was nodding along to your comment, at first. But then I read your follow-ups, which look like you're covering something up that you fear could be true.
  I don't know what that something is, so I think I should go listen to Sabine Hossenfelder.
- netbioserror a day ago
  
  Based on what I've seen of Sabine, virtually all of this post is lies. She regularly positions herself as an outside skeptic and critic. Do you have any examples or her claiming authority or representing consensus?
- me_me_me a day ago
  
  but Bell Theorem IS physics! so according to you she absolutely can comment on LLMs understanding of physics or lack of it.
  so your whole rant makes no sense
  - dambi0 21 hours ago
    
    There is a difference between the question “LLMs don’t understand Bells Theorem what does this tell us about physics” and “LLMs don’t understand Bells Theorem what does this tell us about LLMs”.
    
    me_me_me 6 hours ago
    
    [dead]
- Der_Einzige a day ago
  
  She’s very full of shit and feels a lot like a lex Friedman for women.
  I can’t wait for others to call her further out for being herself the biggest grifter of them all, bigger than most she tries to take down.
- idiotsecant a day ago
  
  Yes, she is the worst type of 'vibes based' science communicator and mainly just says edgy things to improve click rate and drive engagement.
aubanel a day ago

> Which shows that "massive scaling", even enormous, gigantic scaling, doesn't improve intelligence one bit; it improves scope, maybe, or flexibility, or coverage, or something, but not "intelligence".
Do you have any data to support 1. That grok is not more intelligent than previous models (you gave one anecdotal datapoint), and 2. That it was trained on more data than other models like o1 and Claude-3.5 Sonnet?
All datapoints I have support the opposite: scaling actually increases intelligence of models. (agreed, calling this "intelligence" might be a stretch, but alternative definitions like "scope, maybe, or flexibility, or coverage, or something" seem to me like beating around the bush to avoid saying that machines have intelligence)
Check out the technical report of Llama 3 for instance, with nice figures on how scaling up model training does increase performance on intelligence tests (might as well call that intelligence): https://arxiv.org/abs/2407.21783
ttoinou a day ago
```
   Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks
```
That's something I always wondered about, Goodhart's law is so obvious to apply to each new AI release. Even the fact that writers and journalists don't mention that possibility makes me instantly skeptical about the quality of the article I'm reading
- NitpickLawyer a day ago
  
  > Many people doubt the actual performance of Grok 3 and suspect it has been specifically trained on the benchmarks
  2 anecdotes here:
  - just before grok2 was released, they put it on livearena under a pseudonim. If you read the topics (reddit,x,etc) when that hit, everyone was raving about the model. People were saying it's the next 4o, that it's so good, hyped, so on. Then it launched, and they revealed the pseudonim, and everyone started shitting on it. There is a lot of bias in this area, especially with anything touching bad spaceman, so take "many people doubt" with a huge grain of salt. People be salty.
  - there are benchmarks that seem to correlate very well with end to end results on a variety of tasks. Livebench is one of them. Models scoring highly there have proven to perform well on general tasks, and don't feel like they cheated. This is supported by the finding in that paper that found models like phi and qwen to lose ~10-20% of their benchmarks scores when checked against newly-built, unseen but similar tasks. Models scoring strongly on livebench didn't see that big of a gap.
  - staticman2 a day ago
    
    I found arena was a place with a 2000 token limit on inputs.
    I think it even quietly eliminates the input without telling you. Nobody is putting serious work tasks in 2000 tokens on Arena.
    The lesson you should have learned is Arena is a dumb metric, not that people have unfounded biases against Grok 2. (Which I've used on Perplexity and found to be unimpressive.)
    The other thing is dumb, low quality statements are all over reddit and Twitter about any "hype" topic, including mysterious new models on arena. So it isn't surprising you encountered that for Grok 2, but you could have said the same thing for Gemini models.
    If reddit can be believed, Wizard LM 2 was so much better than OpenAI models that Microsoft had to cancel it so OpenAI wouldn't be driven out of business.
    People say all sorts of dumb stuff on social media.
  - Mekoloto a day ago
    
    I'm following AI news and models for few years now and i have not read about your Grok2 controversy.
    Nonetheless, i do not use grok and i do not try it out due to it being part of Musk.
    I'm also not aware that Grok2 was communicated as the top model in any relevant timespan at all. Perhaps it just didn't deliver? Or a lot more people are not awaare of how to use it or boycot Musk.
    After all he clearly doesn't care for any rules or laws it is probably a very high risk sending anything to grok.
  - ttoinou a day ago
    
    Interesting, thank you !
melodyogonna a day ago

How can it be specifically trained on benchmarks when it is leading on blind chatbot tests?
The post you quoted is not a Grok problem if other LLMs are also failing, it seems, to me, to be a fundamental failure in the current approach to AI model development.
- bearjaws a day ago
  
  Any LLM that is uncensored does well on Chatbot tests because a refusal is an automatic loss.
  And since 30% of people using Chatbots are Gooning it up theres far more refusals...
  - pyinstallwoes a day ago
    
    Gooning?
    
    bearjaws a day ago
    
    https://www.urbandictionary.com/define.php?term=gooning
- nycdatasci a day ago
  
  I think a more plausible path to gaming benchmarks would be to use watermarks in text output to identify your model, then unleash bots to consistently rank your model over opponents.
aucisson_masque a day ago

Last time I used chatbox arena, I was the one to ask question to LLM and so I made my own benchmark. There wasn't any predefined question.
How could Musk LLM train on data that does not yet exist ?
- JKCalhoun a day ago
  
  That's true. You can head over to lmarena.ai and pit it against other LLMs yourself. I only tried two prompts but was surprised at how well it did.
  There are "leaderboards" there that provide more anecdotal data points than my two.
- HenryBemis a day ago
  
  That. I have used only ChatGPT and I remember asking 4 legacy to write some code. I asked o3 the same question when it came out, and then I compared the codes. o3 was 'better' more precise, more detailed, less 'crude'. Now, don't get me wrong, crude worked fine. But when I wanted to do the v1.1 and v1.2 o3 nailed it every time, while 4 legacy was simply bad and full of errors.
  With that said, I assume that every 'next' version of each engine is using my 'prompts' to train, so each new version has the benefit of having already processed my initial v1.0 and then v1.1 and then v1.2. So it is somewhat 'unfair' because for "ChatGTP v2024" my v1.0 is brand new while for "ChatGTP v2027" my v1.0, v1.1, v1.2 is already in the training dataset.
  I haven't used Grok yet, perhaps it's time to pause that OpenAI payment and give Elon some $$$ and see how it works 'for me'.
BiteCode_dev a day ago

It is very up to date however, I asked it about recent stuff on python packaging, and it gets it while others don't.
jiggawatts a day ago

People have called LLMs a "blurry picture of the Internet". Improving the focus won't change the subject of the picture, it just makes it sharper. Every photographer knows this!
A fundamentally new approach is needed, such as training AIs in phases, where instead of merely training them to learn to parrot their inputs, the first AI is used to critique and analyse the inputs, which is then used to train another model in a second pass, which is used to critique the data again, and so on, probably for half a dozen or more iterations. On each round, the model can learn not just what it heard, but also an analysis of the veracity, validity, and consistency.
Notably, something akin to this was done for training Deepseek, but only in a limited fashion.

smy20011 a day ago

Did they? Deepseek spent about 17 months achieving SOTA results with a significantly smaller budget. While xAI's model isn't a substantial leap beyond Deepseek R1, it utilizes 100 times more compute.

If you had $3 billion, xAI would choose to invest $2.5 billion in GPUs and $0.5 billion in talent. Deepseek, would invest $1 billion in GPUs and $2 billion in talent.

I would argue that the latter approach (Deepseek's) is more scalable. It's extremely difficult to increase compute by 100 times, but with sufficient investment in talent, achieving a 10x increase in compute is more feasible.

mike_hearn a day ago

We don't actually know how much money DeepSeek spent or how much compute they used. The numbers being thrown around are suspect, the paper they published didn't reveal the costs of all models nor the R&D cost it took to develop them.
In any AI R&D operation the bulk of the compute goes on doing experiments, not on the final training run for whatever models they choose to make available.
- wallaBBB a day ago
  
  One thing I (intuitively) don't doubt - that they spent less money for developing R1 than OpenAI spent on marketing, lobbying and management compensation.
  - pertymcpert a day ago
    
    What makes you say that? Do you think Chinese top tier talent is cheap?
    
    wallaBBB a day ago
    
    I did not refer to the talent directly contributing to the technical progress.
    P.S. - clarification: I mean not referring to talent at OpenAI. And yes I have very little doubt talent at DeepSeek is a lot cheaper than the things I listed above for OpenAI. I would be interested in a breakdown of the cost of OpenAI and seeing if even their technical talent costs more than the things I mentioned.
    
    pertymcpert a day ago
    
    Do you think 1.5M a year compensation is cheap? That’s in the range of OpenAI offers.
    
    anonzzzies a day ago
    
    What is cheap? But compared to the US, yes. Almost everywhere talent is 'cheap' compared to the US unless they move to the US.
    
    pertymcpert a day ago
    
    How experienced are you with Chinese AI talent compensation?
    
    victorbjorklund a day ago
    
    I'm sure the salaries at Deepseek in China were lower than the salaries at OpenAI.
    
    pertymcpert a day ago
    
    How are you sure about that?
    
    victorbjorklund 8 hours ago
    
    A qualified guess. Do you have something that indicates dev salaries are lower in US vs China?
    
    amunozo a day ago
    
    Definitely cheaper than American top tier talent
    
    pertymcpert a day ago
    
    How much cheaper? I’m curious because I’ve seen the offers that Chinese tech companies pay and it’s in the millions for the top talent.
- tw1984 a day ago
  
  > The numbers being thrown around are suspect, the paper they published didn't reveal the costs of all models nor the R&D cost it took to develop them.
  did any lab release such figure? will be interesting to see.
sigmoid10 a day ago

>It's extremely difficult to increase compute by 100 times, but with sufficient investment in talent, achieving a 10x increase in compute is more feasible.
The article explains how in reality the opposite is true. Especially when you look at it long term. Compute power grows exponentially, humans do not.
- llm_trw a day ago
  
  If the bitter lesson were true we'd be getting sota results out of two layer neural networks using tanh as activation functions.
  It's a lazy blog post that should be thrown out after a minute of thought by anyone in the field.
  - sigmoid10 18 hours ago
    
    That's not how the economics work. There has been a lot of research that showed how deeper nets are more efficient. So if you spend a ton of compute money on a model, you'll want the best output - even though you could just as well build something shallow that may well be state of the art for its depth, but can't hold up with the competition on real tasks.
    
    llm_trw 18 hours ago
    
    Which is my point.
    You need a ton of specialized knowledge to use compute effectively.
    If we had infinite memory and infinite compute we'd just throw every problem of length n to a tensor of size R^(n^n).
    The issue is that we don't have enough memory in the world to store that tensor for something as trivial as mnist (and won't until the 2100s). And as you can imagine the exponentiated exponential grows a bit faster than the exponential so we never will.
    
    sigmoid10 8 hours ago
    
    Then how does this invalidate the bitter lesson? It's like you're saying if aerodynamics were true, we'd have planes flying like insects by now. But that's simply not how it works at large scales - in particular if you want to build something economical.
- OtherShrezzing a day ago
  
  Humans don't grow exponentially indefinitely. But there's only something in the order of 100k AI researchers employed in the big labs right now. Meanwhile, there's around 20mn software engineers globally, and around 200k math graduates per year.
  The number of humans who could feasibly work on this problem is pretty high, and the labs could grow an order of magnitude, and still only be tapping into the top 1-2% of engineers & mathematicians. They could grow two orders of magnitude before they've absorbed all of the above-average engineers & mathematicians in the world.
  - sigmoid10 a day ago
    
    I'd actually say the market is stretched pretty thin by now. I've been an AI researcher for a decade and what passes as AI researcher or engineer these days is borderline worthless. You can get a lot of people who can use scripts and middleware like frontend lego sets to build things, but I'd say there are less than 1k people in the world right now who can actually meaningfully improve algorithmic design. There are a lot more people out there who do systems design and cloud ops, so only when you choose to go for scaling, you'll find a plentiful set of human brainpower.
    
    llm_trw a day ago
    
    Do you know what places people who are interested in research congregate at? Every forum, meet up or journal gets overwhelmed by bullshit with a year of being good.
    
    sigmoid10 18 hours ago
    
    Universities (at least certain ones) and startups (more in absolute terms than universities, but there's also a much bigger fraction of swindlers). Most blogs and forums are garbage. If you're not inside these ecosystems, try to find out who the smart/talented people are by reading influential papers. Then you can start following them on X, linkedin etc. and often you'll see what they're up to next. For example, there's a pretty clear research paper and hiring trail of certain people that eventually led to GPT-4, even though OpenAI never published anything on the architecture.
    
    llm_trw 18 hours ago
    
    I am in correspondence with a number of worth while authors, it's just that there isn't any place where they congregate in the (semi) open and without the weirdos who do stuff with the models you're missing out on a lot.
    My favorite example I can never share in polite company is that the (still sota) best image segmentation algorithm I ever saw was done by a guy labeling parts of the vagina for his stable diffusion fine tune pipeline. I used what he'd done as the basis for a (also sota 2 years later) document segmentation model.
    Found him on a subreddit about stable diffusion that's now completely overrun by shitesters and he's been banned (of course).
    
    sigmoid10 6 hours ago
    
    It's pretty easy nowadays to come up with a narrow domain SOTA in image tasks. All you need to do is label some pictures and do a bit of hyperparameter search. This can literally be done by high schoolers on a laptop. And that's exactly what they do in those subreddits where everyone primarily cares about creating explicit content. The real frontier for algorithmic development is large domains (which need a lot more data by default as well). But there actually are some big-game explicit content platforms engaged in research in this area and they have shown somewhat interesting results.
- smy20011 a day ago
  
  Human do write code that scalable with compute.
  The performance is always raw performance * software efficiency. You can use shitty software and waste all these FLOPs.
- alecco a day ago
  
  Algorithmic improvements in new fields are often bigger than hardware improvements.
stpedgwdgfhgdd a day ago

Large amounts of teams are very hard to scale.
There is a reason why startups innovate and large companies follow.
mirekrusin a day ago

Deepseek innovation is applicable to xAI setup - results are simply multiply of their compute scale.
Deepseek didn’t have option A or B available, they only had extreme optimisation option to work with.
It’s weird that people present those two approaches as mutually exclusive ones.
PeterStuer a day ago

It's not an either/or. Your hiring of talent is only limited by your GPU spend if you can't hire because you ran out of money.
In reality pushing the frontier on datacenters will tend to attract the best talent, not turn them away.
And in talent, it is the quality rather than the quantity that counts.
A 10x breakthrough in algorithm will compound with a 10x scaleout in compute, not hinder it.
I am a big fan of Deepseek, Meta and other open model groups. I also admire what the Grok team is doing, especially their astounding execution velocity.
And it seems like Grok 2 is scheduled to be opened as promised.
- smy20011 a day ago
  
  Not that simple, It could cause resource curse [1] for developers. Why optimize algorithm when you have nearly infinity resources? For deepseek, their constrains is one of the reason they achieve breakthrough. One of their contribution, fp8 training, is to find a way to train models with GPUs that limit fp32 performance due to export control.
  [1]: https://www.investopedia.com/terms/r/resource-curse.asp#:~:t...
- krainboltgreene a day ago
  
  Have fun hiring any talent after three years of advertising to students that all programming/data jobs are going to be obsolete.
SamPatt a day ago

R1 came out when Grok 3's training was still ongoing. They shared their techniques freely, so you would expect the next round of models to incorporate as many of those techniques as possible. The bump you would get from the extra compute occurs in the next cycle.
If Musk really can get 1 million GPUs and they incorporate some algorithmic improvements, it'll be exciting to see what comes out.
dogma1138 a day ago

Deepseek didn’t seem to invest in talent as much as it did in smuggling restricted GPUs into China via 3rd countries.
Also not for nothing scaling compute x100 or even x1000 is much easier than scaling talent by x10 or even x2 since you don’t need workers you need discovery.
- tw1984 a day ago
  
  Talent is not something you can just freely pick up from your local Walmart.
oskarkk a day ago

> While xAI's model isn't a substantial leap beyond Deepseek R1, it utilizes 100 times more compute.
I'm not sure if it's close to 100x more. xAI had 100K Nvidia H100s, while this is what SemiAnalysis writes about DeepSeek:
> We believe they have access to around 50,000 Hopper GPUs, which is not the same as 50,000 H100, as some have claimed. There are different variations of the H100 that Nvidia made in compliance to different regulations (H800, H20), with only the H20 being currently available to Chinese model providers today. Note that H800s have the same computational power as H100s, but lower network bandwidth.
> We believe DeepSeek has access to around 10,000 of these H800s and about 10,000 H100s. Furthermore they have orders for many more H20’s, with Nvidia having produced over 1 million of the China specific GPU in the last 9 months. These GPUs are shared between High-Flyer and DeepSeek and geographically distributed to an extent. They are used for trading, inference, training, and research. For more specific detailed analysis, please refer to our Accelerator Model.
> Our analysis shows that the total server CapEx for DeepSeek is ~$1.6B, with a considerable cost of $944M associated with operating such clusters. Similarly, all AI Labs and Hyperscalers have many more GPUs for various tasks including research and training then they they commit to an individual training run due to centralization of resources being a challenge. X.AI is unique as an AI lab with all their GPUs in 1 location.
https://semianalysis.com/2025/01/31/deepseek-debates/
I don't know how much slower are these GPUs that they have, but if they have 50K of them, that doesn't sound like 100x less compute to me. Also, a company that has N GPUs and trains AI on them for 2 months can achieve the same results as a company that has 2N GPUs and trains for 1 month. So DeepSeek could spend a longer time training to offset the fact that have less GPUs than competitors.
- cma a day ago
  
  Having 50K of them isn't the same thing as 50K in one high bandwidth cluster, right? x.AI has all theirs so far in one connected cluster, and all of homogenous H100s right?
wordofx a day ago

Deepseek was a crypto mining operation before they pivoted to AI. They have an insane amount of GPUs laying around. So we have no idea how much compute they have compared to xAI.
- oskarkk a day ago
  
  Do you have any sources for that? When I searched "DeepSeek crypto mining" the first result was your comment, the other results were just about the wide tech market selloff after DeepSeek appeared (that also affected crypto). As far as I know, they had many GPUs because their parent company was using AI algorithms for trading for many years.
  https://en.wikipedia.org/wiki/High-Flyer
  - wordofx 21 hours ago
    
    You know crypto mining is illegal in China right? Of course they avoid mentioning it. Discussion boards in China had ex employees mention doing crypto mining but it’s all been wiped.
    
    crmi 4 hours ago
    
    [dead]
- miki123211 a day ago
  
  Crypto GPUs have nothing to do with AI GPUs.
  Crypto mining is an embarassingly parallel problem, requiring little to no communication between GPUs. To a first approximation, in crypto, 10x-ing the amount of "cores" per GPU, 10x-ing the number of GPUs per rig and 10X-ing the number of rigs you own is basically equivalent. An infinite amount of extremely slow GPUs would do just as well as one infinitely fast GPU. This is why consumer GPUs are great for crypto.
  AI is the opposite. In AI, you need extremely fast communication between GPUs. This means getting as much memory per GPU as possible (to make communication less necessary), and putting all the GPUs all in one datacenter.
  Consumer GPUs, which were used for crypto, don't support the fast communication technologies needed for AI training, and they don't come in the 80gb memory versions that AI labs need. This is Nvidia's price differentiation strategy.
  - miohtama a day ago
    
    Any relevant crypto has been not mined on GPUs for a long time.
    But a point was made to make it less parallel. For example, Ethereum uses DAG, making requirement to have 1 GB RAM, GPU was not enough.
    https://ethereum.stackexchange.com/questions/1993/what-actua...
    Also any GPUs are now several generations old, so their FLOPS/watt is likely irrelevant.
_giorgio_ 14 hours ago

Deepseek spent at least 1.5 billion on hardware.

rfoo a day ago

I'm pretty skeptical of that 75% on GPQA Diamond for a non-reasoning model. Hope that xAI can make Grok 3 API available next week so I can run it against some private evaluations to see if it's really this good.

Another nit-pick: I don't think DeepSeek had 50k Hopper GPUs. Maybe they have 50k now after getting the world's attention and having national-sponsored grey market back them, but that 50k number is certainly dreamed-up. During the past year DeepSeek's intern recruitment ads always just mentioned "unlimited access to 10k A100s", suggesting that they may have very limited H100/H800s, and most of their research ideas were validated on smaller models on an Ampere cluster. The 10k A100 number matches with a cluster their parent hedge fund company announced a few years ago. All in all my estimation is they had more (maybe 20k) A100s, and single-digit thousands of H800s.

kgwgk a day ago

> my estimation is they had more (maybe 20k) A100s, and single-digit thousands of H800s.
Their technical report on DeepSeek-V3 says that it "is trained on a cluster equipped with 2048 NVIDIA H800 GPUs." If they had even high-single-digit thousands of H800s they would have probably used more computing power instead of waiting almost two months.
riku_iki 21 hours ago

> I'm pretty skeptical of that 75% on GPQA Diamond for a non-reasoning model.
could that benchmark be simply leaked to training data as many others?

viraptor a day ago

This is a weird takeaway from the recent changes. Right now companies can scale because there's stupid amount of stupid money flowing into the AI craze, but that's going to end. Companies are already discovering the issues with monetising those systems. Sure, they can "let go" and burn the available cash, but the investors will come knocking finally. Since everyone figures out similar tech anyway, it's the people with most tech improvement experience that will be in the best position long term, while openai will be stuck trying to squeeze adverts and monitoring into their chat for cash flow.

az226 a day ago

Until we see progress slowing down, I don’t see venture capital disappearing in the race to ASI and beyond.
- podgorniy a day ago
  
  Adjustment: not progress, the hype.
  People belive that LLM progress will become foundation of the future economy expansion the same way as microelectronics did. But for now there are few signs of that economic benefit from AI/LLM stuff. If one do math of what productivity increase tech should give in order to have positive ROI, one would be surprised how reality is far from feasibility of investments https://www.bruegel.org/working-paper/tension-between-explod.... Yes, anecdotally people tell stories how they can code twice/trice/ten times faster, or how they atumated their whole workflow or replaced support with LLM. But that's far not enough for AI investment feasibility in existing businesses (AI startups will flurish for a while on venture money). Also anecdotally there are many failed attempts to replace people with LLMs (like mcdonalds ordering which resulted in crazy orders).
  So what we have is a hype on top of beliefs in progress as continious phenomena. But progress itself has slowed greately. Where are all breakthroughs which change our way of living? Pocket computers and consumer electronics (which is not a discovery rather an optimisation) and internet (also more about scaling than inventing) were the last. 3d printing, cancer treatment, robotics thought to be the new factors. Till AI/LLM. Now AIs/LLMs are the last resourt for believers in progress and technooptimists like Musk.
  - sgt101 a day ago
    
    >>Pocket computers and consumer electronics (which is not a discovery rather an optimisation)
    Can you really describe a process like EUV lithography an optimisation? I mean it requires matter to be controlled and manipulated in a way that would have been regarded as pure science fiction 20 years ago. Also the material science that provided Gallium Nitride electronics in our communcation system is rather amazing. There are other things as well - I have an electric car with a battery that lets me travel for hundreds of kilometers, if I trusted it, it could take me there without me operating it (much). I know where I am in London and where I am going because of satellites in geosynchronous orbits and calculations that use relativity. Last year I got an mRNA vaccine, that's pretty new... and pretty pervasive. I've seen rockets that fly up into space and then turn around and come back down to land, and I spend my days talking face to face with people on the other side of the world. I've never met half of them, but they greet me as an old friend.
    How is it that you can't see any of these wonders that have sprung into being in our life times?
  - NitpickLawyer a day ago
    
    > So what we have is a hype on top of beliefs in progress as continious phenomena. But progress itself has slowed greately.
    I think we can split the two.
    a) I don't think there's anyone seriously doubting there's a lot of hype going around. But that's to be expected. Trust not the trust me bros, and so on. I think we agree here.
    b) Progress hasn't slowed, and at the same time progress (in research) can stop tomorrow and we'd still have work to do for 20years, simply using what's on the table today. LLMs have solved "intent comprehension". RL seems to have solved "verifiable problems". The two can and will be used together to tackle "open ended questions". And together we'll see advancements in every aspect of our lives, on every repetitive mundane and boring task we have ever done.
    
    viraptor a day ago
    
    > on every repetitive mundane and boring task we have ever done.
    A lot of them could be solved with pre-ai things. Many were self-inflicted by people badly designing and approving existing processes. I really don't see how AI is going to get us out of this one. I've been paid money to automate a few existing self-inflicted repetitive, mundane and boring tasks and the companies that created them are not even interested in talking about solving that. Some work the opposite way - in the US the value keeps being extracted from the tax filing system for example, even though we know how to remove the issue itself.
    It's weird to discuss how the AI will automate everything when we're actively fighting simplification right now.

nickfromseattle a day ago

Side question, let's say Grok is comparable in intelligence to other leading models. Will any serious business switch their default AI capabilities to Grok?

taf2 a day ago

We might - I was testing it out on some salesforce apex code and it was doing a better job then o3 mini high at getting to job done for me…
marcuschong 20 hours ago

I wouldn't for two reasons: - we've already tested OpenAI's GPT and Gemini a lot. Although they're not deterministic, we have used them enough to trust them and know the pitfalls. - Elon's example of Grok outputting 'X is the only place for real information' makes the model almost completely useless for text generation. Even more so than DeepSeek.
resfirestar a day ago

Grok 2's per-token prices are similar to GPT-4o, but since Grok tends to write longer responses than others, it can be significantly more expensive to use depending on the task. If xAI prices Grok 3 to compete with o1, not everyone is going to be lining up to use it even if it's a bit better than the competition. If that's how it goes, I'll be interested in the pricing for Grok 3 mini.
gusfoo a day ago

> Side question, let's say Grok is comparable in intelligence to other leading models. Will any serious business switch their default AI capabilities to Grok?
Yes, I'd say so. Bear in mind that, outside of the Terminally Online, very few people would deliberately hobble their business by deliberately choosing an inferior product.
nickthegreek a day ago

if the api is cheap enough.
ssijak a day ago

Why not?
- chank a day ago
  
  Because they already have something that works. Why switch if theres no advantage.
  - tucnak a day ago
    
    The API is compatible, and even if it weren't, it wouldn't matter anyway; everybody has been writing OpenAI API-compatible proxies, including Google, for months now. The only thing that matters is availability, throughput, cost per token (Google is ahead of everyone here: Vertex API is insanely cheap for what it does, Batch API at 50% discount, Prompt Caching at 75%, fully multimodal, better performance in multilingual tasks so actually useful outside the U.S. etc etc etc)
- cowpig a day ago
  
  Because people want to protect their businesses?
  I don't even think it makes sense to use the closed, API-access models for core business functions. Seems like an existential risk to be handing over that kind of information to a third party in the age of AI.
  But handing it over to the guy who is simultaneously running for-profit companies that contract with the government and seizing control of the Treasury? To a guy who has a long history of SEC issues, and who was caught cheating at a video game just to convince everyone he's a genius? That just seems incredibly short-sighted.
  - dmix a day ago
    
    ? there's tons of businesses and startups building on closed models already
    the risk is mitigated by competition in the space and commoditization of LLMs over time. This isn't like building an app solely on Facebook platform or something. It's very easy to swap models. There's platforms that let you use multiple models with the same interface, etc.
    
    cowpig 4 hours ago
    
    None of those other companies are owned by a guy who is ransacking the federal government and firing people who are investigating his companies for fraud.

aqueueaqueue a day ago

How bitter is the bitter lesson when throwing more compute js costing billions. Maybe the bitter lesson is more about money now than the hardware. You are scaling up investments not just relying on moores law. But I think there is a path for the less power consuming models that people can run affordably without VC money.

tankenmate a day ago

Don't forget it isn't just money; there's a whole heap of infrastructure that already exists to make this cheaper (standards for electricity, production scale for said standards, etc) but also socio-economic infrastructure (rule of law, etc). Imagine trying to build the same scale of infrastructure in a country with rampant corruption or in a war zone.
In particular from a strategic level consider all the benefits of Mahan's "Influence of Sea Power"[0] and the benefit nations with suitable sea power / access to global markets has on being able to build the necessary infrastructure at scale quickly in order to support such endeavours. And this isn't just about raw "Sea Power" but all the concomitant requirements to achieve this, in particular the "Number of population", "Character of the people", and "Character of the government".
[0] Alfred Thayer Mahan; "The Influence of Sea Power on History"
- ttoinou a day ago
  
  I agree with you, it’s never really about the money. By definition, money is the cheapest good in the economic equation
az226 a day ago

The bitter lesson is all about more data/compute. Doesn’t matter if it costs more because it’s not as efficient, the scale is what matters at the end of the day. Sure you’d reach farther with both scale and optimization, but raw scale still takes the cake.

Rochus a day ago

Interesting, but I think the article’s argument for the "bitter lesson" relies on logical fallacies. First, it misrepresents critics of scaling as dismissing compute entirely, then frames scaling and optimization as mutually exclusive strategies (which creates a false dilemma), ignoring their synergy. E.g. DeepSeek’s algorithmic innovations under export constraints augmented - and not replaced - the scaling efforts. The article also overgeneralizes from limited cases, asserting that compute will dominate the "post-training era" while overlooking potential disruptors like efficient architectures. The CEO's statements are barely suited to support its claims. A balanced view aligning with the "bitter lesson" should recognize that scaling general methods (e.g. learning algorithms) inherently requires both compute and innovation.

user14159265 a day ago

It will be interesting to see how talent acquisition evolves. Many great engineers were put off by strong DEI-focused PR, and even more oppose the sudden opportunistic shift to the right. Will Muslims continue to want to work for Google? Will Europeans work for X? Some may have previously avoided close relations with China for ethical reasons—will the same soon apply to the US?

sigmoid10 a day ago

Thanks to a particular culture in the US, those companies have a very strong argument against morals: money. If you can make $440k per year as an engineer at xAi, do you really care what your boss says in public? How many people sit in jobs right now getting paid much less and still have to endure similar shit from their execs?
- giladvdn a day ago
  
  If you're an engineer capable of holding a $440k/year job then you should care where your talent goes and who benefits from it. There are plenty of places that will pay you a good salary where the boss isn't trying to badly play at world domination.
  - NitpickLawyer a day ago
    
    > There are plenty of places that will pay you a good salary where the boss isn't trying to badly play at world domination.
    Ah, yes. Google? Meta? Amazon? Microsoft? Hahaha. You're right, they aren't doing it in the open, and some certainly aren't doing a bad job about it. But they are all playing at world domination.
  - crimsoneer a day ago
    
    This isn't really true though... The US tech sector owns big salaries right now.
- lynx97 a day ago
  
  [dead]

Amekedl a day ago

another ai hype blog entry. Not even a mention of the differently colored bars on the benchmark result. For me, grok-3 does not prove/disprove scaling laws in any meaningful capacity.

ArtTimeInvestor a day ago

It looks like the USA is bringing all technology in-house that is needed to build AI.

TSMC has a factory in the USA now, ASML too. OpenAI, Google, xAI and Nvidia are natively in the USA.

While no other country is even close to build AI on their own.

Is the USA going to "own" the world by becoming the keeper of AI? Or is there an alternative future that has a probability > 0?

lompad a day ago

You implicitly assume, LLMs are actually important enough to make a difference on the geopolitical level.
So far, I haven't seen any indication that this is the case. And I'd say, hyped up speculations by people financially incentivized to hype AI should be taken with an entire mine full of salt.
- ArtTimeInvestor a day ago
  
  First, its not just about LLMs. Its not an LLM that replaced human drivers in Waymo cars.
  Second, how could AI not be the deciding geopolitical factor of the future? You expect progress to stop and AI not to achieve and surpass human intelligence?
  - lompad a day ago
    
    >First, its not just about LLMs. Its not an LLM that replaced human drivers in Waymo cars.
    As far as I know, Waymo is still not even remotely able to operate in any kind of difficult environment, even though insane amounts of money have been poured into it. You are vastly overstating its capabilities.
    Is it cool tech? Sure. Is it safely going to replace all drivers? Doubt, very much so.
    Secondly, this only works if progress in AI does not stagnate. And, again, you have no grounds to actually make that claim. It's all built on the fanciful imagination that we're close to AGI. I disagree heavily and think, it's much further away than people profiting financially from the hype tend to claim.
    
    technocrat8080 a day ago
    
    Vastly overstating its capabilities? SF is ~crawling~ with them 24/7 and I've yet to meet someone who's had a bad experience in one of them. They operate more than well enough to replace rideshare drivers, and they have been.
    
    dash2 a day ago
    
    But SF is a single US city built on a grid. Try London or Manila.
    
    namaria a day ago
    
    That's usually how it goes with 'AI'. It is very impressive on the golden path, but the world is 80% edge cases.
    
    rafaelmn a day ago
    
    With nicest weather on the planet probably
    
    Y-bar a day ago
    
    SF has pretty much the best weather there is to drive in. Try putting them on Minnesota winter roads, or muddy roads in Kansas for example.
    
    fragmede a day ago
    
    How stupid of Google. Instead of getting their self driving car technology to work in a blizzard first, and then working on getting it working in a city, they choose to get it working in a city first, before getting it to work in inclement weather. What idiots!
    
    Y-bar a day ago
    
    I hope you are sarcastic! Because it is quite expected that they would test where it is easy first. The stupid ones are those who parrot the incorrect assumption that self-driving cars are comparable to humans at general driving where statistics on general driving includes lots of driving in suboptimal condition.
  - Eikon a day ago
    
    > You expect progress to stop and AI not to achieve and surpass human intelligence?
    A word generator is not intelligence. There’s no “thinking” involved here.
    To surpass human intelligence, you’d first need to actually develop intelligence, and llms will not be it.
    
    willvarfar a day ago
    
    I get that LLMs are just doing a probabilistic prediction etc. Its all Hutter Prize stuff.
    But how are animals with nerve-centres or brains different? What do we think us humans do differently so we are not just very big probabilistic prediction systems?
    A completely different tack: if we develop the technology to engineer animal-style nerves and form them into big lumps called 'brains', in what way is that not artificial and intelligence? And if we can do that, what is to stop that manufactured brain from not being twice or ten times larger than a humans?
    
    grumbel a day ago
    
    I don't think the probabilistic prediction is a problem. The problem with current LLM is that they are limited to doing "System 1" thinking, only giving you a fast instinctive response to a question. While that works great for a lot of small problems, it completely falls apart on any larger task that requires multiple steps or backtracking. "System 2" thinking is completely missing as is the ability to just self-iterate on their own output.
    Reasoning models are trying to address that now, but monologueing in token-space still feels more like a hack than a real solution, but it does improve their performance a good bit nonetheless.
    In practical terms all this means is that current LLMs still need a hell of a lot of hand holding and fail at anything more complex, even if their "System 1" thinking is good enough for the task (e.g. they can write Tetris in 30sec no problem, but they can't write SuperMarioBros at all, since that has numerous levels that would blow the context window size).
    
    fragmede a day ago
    
    give it a filesystem, like you can with Claude computer use, and you can have it make and forget memories to adapt for a limited context window size
    
    sampo a day ago
    
    > But how are animals with nerve-centres or brains different?
    In current LLM neural networks, the signal proceeds in one direction, from input, through the layers, to output. To the extend that LLM's have memory and feedback loops, it's that they write the output of the process to text, and then read that text and process it again though their unidirectional calculations.
    Animal brains have circular signals and feedback loops.
    There are Recurrent Neural Network (RNN) architectures, but current LLM's are not these.
    
    dkjaudyeqooe a day ago
    
    Human (and other animal) brains probably are probabilistic, but we don't understand their structure or mechanism in fine enough detail to replicate them, or simulate them.
    People think LLMs are intelligent because intelligence is latent within the text they digest, process and regurgitate. Their performance reflects this trick.
    
    Eikon a day ago
    
    > But how are animals with nerve-centres or brains different? What do we think us humans do differently so we are not just very big probabilistic prediction systems?
    If you believe in free will, then we are not.
    
    habinero a day ago
    
    > But how are animals with nerve-centres or brains different? What do we think us humans do differently so we are not just very big probabilistic prediction systems?
    I see this statement thrown around a lot and I don't understand why. We don't process information like computers do. We don't learn like they do, either. We have huge portions of our brains dedicated to communication and problem solving. Clearly we're not stochastic parrots.
    > if we develop the technology to engineer animal-style nerves and form them into big lumps called 'brains'
    I think y'all vastly underestimate how complex and difficult a task this is.
    It's not even "draw a circle, draw the rest of the owl", it's "draw a circle, build the rest of the Dyson sphere".
    It's easy to _say_ it, it's easy to picture it, but actually doing it? We're basically at zero.
    
    fragmede a day ago
    
    > Clearly we're not stochastic parrots
    On Internet comment sections, that's not clear to me. Memes are incredibly infectious, and we can see by looking at, say, a thread about Nvidia. It's inevitable that someone is going to ask about a moat. In a thread about LLMs, the likelihood of stoichastic parrots getting a mention approaches one, as the thread gets longer. what does it all mean?
    
    staticman2 a day ago
    
    You seem to be confusing brain design with uniqueness.
    If every single human on earth was an identical clone with the same cultural upbringing and similar language conversation choices and opinions and feelings, they still wouldn't work like an LLM and still wouldn't be stochastic parots.
  - ozornin a day ago
    
    > how could AI not be the deciding geopolitical factor of the future?
    Easily. Natural resources, human talent, land and supply chains all are and will be more important factors than AI
    > You expect progress to stop
    no
    > and AI not to achieve and surpass human intelligence
    yes
- tankenmate a day ago
  
  It's an economic benefit. It's not a panacea but it does make some tasks much cheaper.
  On the other hand if the economic benefit isn't shared across the whole of society it will become a destabilising factor and hence reduce the overall economic benefit it might have otherwise borne.
- fnordsensei a day ago
  
  They seem popular enough that they could be leveraged to influence opinion and twist perception, as has been done with social media.
  Or, as is already being done, use them to influence opinion and twist perception within tools and services that people already use, such as social media.
  - krainboltgreene a day ago
    
    So has Kendrick Lamar’s’ hit song, but no one is suggesting that it has geopolitical implications.
- OtherShrezzing a day ago
  
  I think ground-zero for that line of thought is with Leopold Aschenbrenner[0]. Who I believe now runs an AI focused hedge fund.
  [0] https://situational-awareness.ai
- spacebanana7 a day ago
  
  The same stack is required for other AI stuff like diffusion models as well.
cgcrob a day ago

I would expect it will be the market leader yes. But is there a market large enough to support the investment? That is debatable. If there isn’t then they will be in a deficit that is likely to do serious damage to the economy and investor confidence.
Currently there is no hard ROI on LLMs for example other than force bundling and using it to leverage soft outcomes (layoffs) and generating trash. User interest and revenue drops off fairly quickly. And there are regulations coming in elsewhere.
It’s really not looking good.
OccamsMirror a day ago

Are LLMs really going to own the world?
- throw310822 a day ago
  
  Intelligence is everything. These things are intelligent- already superhuman in speed and a few limited domains, soon they're going to exceed humans in almost every respect. The advantage they give to the country that owns them is nuclear-weapons like.
  - staticman2 20 hours ago
    
    "The advantage they give to the country that owns them is nuclear-weapons like."
    I think the idea that the United States "owns" Grok 3 would be news to Musk and the idea it "owns" ChatGPT would be news to Altman.
  - habinero a day ago
    
    This is just flat out not true. They're not intelligent and not capable of becoming so. They aren't reliable, by design.
    They're a wildly overhyped solution in search of a problem.
    
    throw310822 a day ago
    
    I don't understand this attitude and I am not sure where it comes from- either from generic skepticism, or from some sort of psychological refusal.(*) It's just obvious to me that you're completely wrong and you'll have a hard wake up, eventually.
    * "I know how this works and it's just numbers all the way down" is not an argument of any validity, just to be clear- everything eventually is just physics, blind mechanics.
    
    Amekedl 9 hours ago
    
    Check out operations research.
    The amount of “work” done there is staggering and yet adoption appears abysmal, using such solutions with success only happens as part of a really “well oiled” machine.
    And what about the simple difficulty going from 99% to 99.9%. What percentage are we even talking about today? We don’t know, but very rich people think it is cool and blindly keep investing more billions.
    
    Workaccount2 a day ago
    
    My non-tech company already uses LLMs where we used to contract software people (for 2 years now - no unresolveable issues). I myself also used LLMs to write an app which is used by people on the production floor now (I'm not a programmer and definitely don't know kotlin).
    Maybe LLMs can't work on huge code bases yet, but for writing bespoke software for individuals who need a computer to do xyz but can't speak the language, it already is working wonders.
    Being dismissive of LLMs while sitting above their current scope of capabilities gives strong Microsoft 2007 vibes; "The iPhone is a laughable device that presents no threat to windows mobile".
    
    riku_iki 20 hours ago
    
    > Maybe LLMs can't work on huge code bases yet
    its also not just about code base size, but also about your expectation of output quality/correctness.
- ben_w a day ago
  
  LLMs aren't the only kind of AI.
  Having hardware and software suppliers all together makes it more likely even if you assume (like I do) that we're at least one paradigm shift away from the right architecture, despite how impressively general Transformers have been.
  But software is easy to exfiltrate, so I think anyone with hardware alone can catch up extremely fast.
- ArtTimeInvestor a day ago
  
  It looks like neural network based software is to surpass humans in intelligence in every task in the forseeable future.
  If one country moves along this direction faster than the others, no country will stand a chance to compete with them militarily or economically.
  - hagbarth a day ago
    
    How so? First of all, assuming ASI is developed, as it stands now, it will be owned by a private corporation, not a nation state.
    ASI also will not be magic. Like what exactly would it be doing that enables the country to subject the others? Develop new weapons? We already have the capability to destroy earth. Actually come to think of it, if ASI is an existential threat to other nations, maybe the rational action would be to nuke whichever country develops it first. To safe the world.
    You see what I am saying? There is such a thing as the real world with real constraints.
  - viraptor a day ago
    
    > no country will stand a chance to compete with them militarily or economically.
    It really depends on how they go about it. It can easily instead end up with lots of people without work, no social security and disillusioned with the country. Instead of being economically great, the country may end up fighting uprisings and sabotage.
  - rocmcd a day ago
    
    If this is true, then shouldn't we expect an economic "bump" from NN/LLMs/AI as they are today?
    I have not noticed companies or colleagues 10x'ing (hell, or even 1.5x'ing) their productivity from these tools. What am I missing?
    
    mh- 13 hours ago
    
    There's an implicit assumption here that if a colleague did figure out how to (e.g.) 10x their output with new tools, the employer would capture all (e.g.) 10x of that increased productivity.
    
    ArtTimeInvestor a day ago
    
    What do your colleagues do?
    I see people getting replaced by AI left and right.
    Translators, illustrators, voice over artists, data researchers, photographers, models, writers, personal assistants, drivers, programmers ...
losteric a day ago

US has been reshoring hardware for a while, but that didn’t stop DeepSeek and certainly won’t prevent presently allied powers from building AIs.
A big lesson seems to be that one can rapidly close the gap, with much less compute, once paths have been blazed by others. There’s a first-mover disadvantage.
- ArtTimeInvestor a day ago
  
  DeepSeek has built their software on Nvidia hardware which needs ASML and TSMC hardware to be built.
  Even China has not yet managed to even remotely catch up with this hardware stack. Even though the trail has been blazed by ASML, TSMC and Nvidia.
  - ZiiS a day ago
    
    PRC considers Taiwan hence TSMC to be part of China. Whilst it is easy to disagree with this politically; if push came to shove, it would be much harder to disagree practically.
    
    quesera 17 hours ago
    
    The common belief appears to be that PRC can successfully assimilate Taiwan, but not with an intact and operable semiconductor industry.
spacebanana7 a day ago

> Is the USA going to "own" the world by becoming the keeper of AI?
China has a realistic prospect of developing an independent stack.
It'll be very difficult, especially at the level of developing good enough semiconductor fabs with EUV. However, they're not starting from scratch in terms of a domestic semiconductor industry. And their software development / AI research capabilities are already near par with the US.
But they do have a whole of nation approach to this, and are willing to do whatever it takes.
ZiiS a day ago

Even if you believe that all those companies are exclusively working towards the USA's aims and ignore that the output of TSMC and ASML's US factories are not yet a rounding error on their production. Do you seriously doubt that espionage still works?
wallaBBB a day ago

What factories are TSMC and ASML operating in US?

PaulHoule a day ago

Inference cost rules everything around me.

petesergeant a day ago

The author has come up with their own, _unusual_ definition of the bitter lesson. In fact, as a direct quote, the original Is:

> Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation.

eg: “the study of linguistics doesn’t help you build an LLM” or “you don’t need to know about chicken physiology to make a vision system that tells you how old a chicken is”

The author then uses a narrow and _unusual_ definition of what computation _means_, by saying it simply means access to fast chips, rather than the work you can perform on them, which would obviously include how efficiently you use them.

In short, this article misuses two terms to more simply say “looks like the scaling laws still work”.

s1mplicissimus a day ago

oh what a surprise, a new model performs better on barcharts than the old models. yaawn

GaggiX a day ago

The bitter lesson is about the fact that general methods that leverage computation are ultimately the most effective. Grok 3 is not more general than DeepSeek or OpenAI models so mentioning the bitter lesson here doesn't make much sense, it's just the scaling law.

dubeye a day ago

I use chat gpt for general brain dumping

I've compared my last week's queries and prefer Grok 3

graycat a day ago

> Grok 3 performs at a level comparable to, and in some cases even exceeding, models from more mature labs like OpenAI, Google DeepMind, and Anthropic. It tops all categories in the LMSys arena and the reasoning version shows strong results—o3-level—in math,....

"Math"? Fields Medal level? Tenure? Ph.D.? ... high school plane geometry???

As in

'Grok 3 AI and Some Plane Geometry'

https://news.ycombinator.com/item?id=43113949

Grok 3 failed at a plane geometry exercise.

_giorgio_ a day ago

Grok is the best LLM on https://lmarena.ai/.

---

No benchmarks involved, just user preference.

Rank* (UB)Rank (StyleCtrl)ModelArena Score95% CIVotesOrganizationLicense 1 1

chocolate (Early Grok-3) 1402 +7/-6 7829 xAI Proprietary 2 4

Gemini-2.0-Flash-Thinking-Exp-01-21 1385 +5/-5 13336 Google Proprietary 2 2

Gemini-2.0-Pro-Exp-02-05 1379 +5/-6 11197 GoogleProprietary

cowpig a day ago

I haven't seen grok3 on any benchmark leaderboard other than lm arena. Has anyone else?

sylware a day ago

Is the next step ML-inference fusion? aka artificial small brain?

readthenotes1 a day ago

I had to ask Grok 3 what the bitter lesson was. It gave a plausible answer (compute scale beats human cleverness)

vasco a day ago

That's not what "the exception that proves the rule" means.

huijzer a day ago

In general from a formal logic perspective the whole idea of “an exception that proves the rule” is flawed. If the statement was “an exception that disproves the rule”, then I would agree.
- OccamsMirror a day ago
  
  "The exception that proves the rule" does not mean that an exception confirms a rule in a logical sense. Instead, it originates from legal and linguistic contexts where an explicit exception implies the existence of a general rule. E.g. a sign that says "No parking on Sundays" implies that the rule is that parking is fine on other days.
  - huijzer a day ago
    
    For years I didn't know. Finally. Thanks!
- throw310822 a day ago
  
  I always thought it meant "the exception (to the rule-of-thumb) proves the (hard, correct) rule".
  - AndrewDucker a day ago
    
    It's used in multiple senses, so it's almost impossible to tell what the person using it means.
    https://en.wikipedia.org/wiki/Exception_that_proves_the_rule...
- vasco a day ago
  
  It's only flawed because you are also using it wrong!

searealist a day ago

[flagged]