lgessler 2 days ago

I liked Yoav Goldberg snarky's quote tweet:

> next paper: transformers can solve any problem but on some of them they may compute indefinitely and never provide an answer

> (and you cannot tell in advance which is which!!)

https://twitter.com/yoavgo/status/1835802380589203802

  • teqsun a day ago

    It reminds me of Busy Beaver

lsy 2 days ago

Note that for the purposes of this paper a “problem” just means a formally decidable problem or a formal language, and the proof is that by creatively arranging transformers you can make individual transformer runs behave like individual Boolean circuits. However, this is a long way from any practical application of transformers: for one thing, most problems we care about are not stated as formal languages, and we already have an exceptionally more efficient way to implement Boolean circuits.

  • shawntan 2 days ago

    If a "problem we care about" is not stated as a formal language, does it mean it does not exist in the hierarchy of formal languages? Or is it just as yet unclassified?

    • tsimionescu 2 days ago

      It means that there are two problems: one, to formalize the problem as stated while capturing all relevant details, and two, solving the resulting formal problem. Until you solve problem one, you can't use formal methods to say anything about the problem (it's not even clear a priori that a problem is even solvable).

      Unfortunately, the task of a formalizing an informal problem is itself an informal problem that we don't know how to formalize, so we can't say much about it. So overall, we can't say much about how hard the general problem "given a problem statement from a human, solve that problem" is, whether any particular system (including a human!) can solve it and how long that might take with what resources.

      • viraptor 2 days ago

        > task of a formalizing an informal problem is itself an informal problem

        I couldn't find details about this - do you know of a paper or some resource which digs into that idea?

        • tsimionescu 2 days ago

          No, but it's pretty obvious, isn't it? If you have an informal problem statement, say "I want this button to be bigger", formalizing it can't be a formal process.

          • naasking a day ago

            > "I want this button to be bigger", formalizing it can't be a formal process.

                while (!is_button_big_enough()) {
                   button.scaleUp(1.1);
                }
            
            This is one trivial way to do it, and seems like it would be formalizable. is_button_big_enough is simply an input to whatever process is responsible for judging such a thing, whether that be a design specification or perhaps input from a person.
            • tsimionescu a day ago

              You've translated my informal problem statement into a quasi-formal process, using your inherent natural language processing skills, and your knowledge of general human concepts like size. But you haven't explained the formal process you followed to go from my problem statement to this pseudocode.

              And your pseudocode template only works for one particular kind of informal problem statement. If I instead have the problem "how much money do I need to buy this house and this chair?", or "does this byte fit in my mouth?", your general form will not work.

              And what's more, you haven't actually produced a formally solvable problem definition, that we could analyze for complexity and computability, because you rely on two completely unspecified functions. Where is the formal defintion of a button? Is it a physical push button or a UI control or a clothing button? What does it mean that it is bigger or smaller? When do we know it's big enough, is that computable? And how do we scale it up? Do we increase its volume? Its surface area? One of its sides? Or maybe the radius? And how do we go about doing that? All of these, and many more, need to be explicitly defined in order to apply any kind of formal analysis to this problem. And there is no formal way to do so in a way that matches the intent of whoever posed the problem.

              • naasking a day ago

                > And what's more, you haven't actually produced a formally solvable problem definition, that we could analyze for complexity and computability, because you rely on two completely unspecified functions. Where is the formal defintion of a button?

                Well your statement was underspecified. You said "I want this button bigger". There are procedures to translate informal statements to formal ones, but one basic step is underspecified referents are delegated to abstractions that encapsulate those details, so "this button" designates some kind of model of a button, and "I" refers to a subject outside the system thereby implying some kind of interactive process to query the subject whether the model is satisfactory, eg. a dialog prompt asking "Is this button big enough now?"

                You call these skills "inherent", but humans are not magical. We employ bug riddled poorly specified procedures for doing this kind of interpretive work, and LLMs have already started to do this too, and they'll only get better. Is asking a deterministic LLM to generate a formal specification or program to achieve some result a formal process? I don't think these lines are as clear as many think, not anymore.

                • tsimionescu 17 hours ago

                  I think we're mostly agreed actually. I'm not trying to claim that this is an unsolvable problem, just that it's a difficult problem that we don't have a solution for yet. And yes, LLMs are probably our best tool so far. And asking for clarifying questions is clearly a part of the process.

                  I will say that there is also a possibility the general form of the formal problem is in fact uncomputable. It seems possible to me it might be related to the halting problem. But, until we have a formal specification of it, we won't know, of course.

            • freejazz a day ago

              What is the repeatable method by which you came to that conclusion? That is what needs to be formalized for your response to make sense.

              • naasking a day ago

                There are procedures for translating informal statements to formal ones. If I submit such informal statements to an LLM and ask it to generate a spec or program to achieve some result, that can be made repeatable. There are various arrangements to make this more robust, like having another LLM generate test cases to check the work of the other. Does this qualify?

          • viraptor 2 days ago

            It's... "knee-jerk obvious". But is it actually true? People seem to be interested in the concept in formal logic arguments for example https://www.researchgate.net/publication/346658578_How_to_Fo... (which uses formal process for part of formalization), so maybe it's not as simple as it seems initially. I mean, if we're already talking about formal problems, it could use a stronger proof ;)

            • tsimionescu a day ago

              At best, this is a formal process for manipulating certain kinds of statements. But the general problem, "take a human's statement of a problem and translate it into a formal statement of a problem that, if solved, will address what the human was asking for" is far harder and more nebulous. Ultimately, it's exactly the problem that LLMs have been invented for, so it has been studied in that sense (and there is a broad literature in AI for NLP, algorithm finding, expert systems, etc). But no one would claim that they are even close to having a formal specification of this problem that they could analyze the complexity of.

          • moi2388 2 days ago

            Why not? Bigger is a measure of size and ought to be easy enough to formalise.

            Apply a transformation to B which increases its area and leaves the proportion of its sides equal.

            Why would this statement not be formalisable?

            • tsimionescu a day ago

              I'm not saying that the statement "I want this button to be bigger" can't be formalized. I'm saying that there is no formal process you can follow to get from this problem to a formal problem that is equivalent. There isn't even a formal process you can use to check if a formal definition is equivalent to this problem.

              Consider that if someone asked you solve this problem for them with just this statement, either of the following could be a sketch of a more formal statement of what they actually want:

              1. In a given web page, the css class used for a particular <button> element should be changed to make the button's height larger by 10%, without changing any other <button> element on the page, or any other dimension.

              2. For a particular piece of garment that you are given, the top most button must be replaced with a different button that appears to have the same color and finish to a human eye, and that has the same 3D shape up to human observational precision, but that has a radius large enough to not slip through the opposing hole under certain forces that are commonly encountered, but not so large that it doesn't fit in the hole when pushed with certain forces that are comfortable for humans.

              I think you would agree that (a) someone who intended you to solve either of these problems might reasonably describe them with the statement I suggested, and (b), that it would be very hard to devise a formal mathematical process to go from that statement to exactly one of these statements.

              • moi2388 21 hours ago

                Ah, gotcha. I agree it would be difficult. I’m still not convinced it would be impossible though.

                LLMs could even formalise what you want in the context, even now.

                Or do you mean that you can’t formalise every statement when given incomplete information about the context of the statement, since then we have a single word pointing to multiple different contexts?

                • tsimionescu 20 hours ago

                  Oh yes, it's not impossible, I'm just saying we don't know how to do it yet. LLMs themselves are probably our best attempt so far.

            • Zhyl 2 days ago

              But here's the thing, it's not that the statement isn't formalisable, it's the method that you used to formalise it isn't formalisable.

            • qwertytyyuu 2 days ago

              Yeah you could make it one pixel bigger but if someone asked you that, is that what they actually want?

        • esjeon 2 days ago

          Ah, you are informally inquiring about a formal description concerning the informal nature of formalization of informal questions.

          Joke aside, this is about the nature of the formalization process itself. If the process of formalizing informal problems were fully formalized, it would be possible to algorithmically compute the solution and even optimize it mathematically. However, since this is obviously impossible (e.g. vague human language), it suggests that the formalization process can't be fully formalized.

    • wslh 2 days ago

      My 2 cents: Since LLMs (Large Language Models) operate as at least a subset of Turing machines (which recognize recursively enumerable languages), the chain of thought (CoT) approach could be equivalent to or even more expressive than that subset. In fact, CoT could perfectly be a Turing machine.

      If we leave CoT aside for a moment, it's worth exploring the work discussed in the paper "Neural Networks and the Chomsky Hierarchy"[1], which analyzes how neural networks (including LLMs) map onto different levels of the Chomsky hierarchy, with a particular focus on their ability to recognize formal languages across varying complexity.

      [1] https://ar5iv.labs.arxiv.org/html/2207.02098v1

      • flir 2 days ago

        > In fact, CoT could perfectly be a Turing machine.

        Are we going to need an infinite number of LLMs, arranged on a tape?

  • julienreszka 2 days ago

    > most problems we care about are not stated as formal languages

    then a way would be to translate them to formal language

larodi 2 days ago

I'm waiting for peoples of AI to discover syllogism and inference in its original PROLOG sense, which this CoT abomination basically tries to achieve. Interestingly, if all logical content is translated to rules, and then only rules are fed into the LLM training set, what would the result be, and can the probabilistic magic be made into actually following reason without all the dice.

  • trescenzi 2 days ago

    Right we’ve now gotten to the stage of this AI cycle where we start using the new tool to solve problems old tools could solve. Saying a transformer can solve any Formally decidable problem if given enough tape isn’t saying much. It’s a cool proof, don’t mean to deny that, but it doesn’t mean much practically as we already have more efficient tools that can do the same.

    • marcosdumay 2 days ago

      What I don't get is... didn't people prove that in the 90s for any multi-layer neural network? Didn't people prove transformers are equivalent on the transformers paper?

  • sunir 2 days ago

    I was thinking about the graphrag paper and prolog. I’d like to extract predicates. The source material will be inconsistent and contradictory and incomplete.

    Using the clustering (community) model, an llm can summarize the opinions as a set of predicates which don’t have to agree and some general weight of how much people agree or disagree with them.

    The predicates won’t be suitable for symbolic logic because the language will be loose. However an embedding model may be able to connect different symbols together.

    Then you could attempt multiple runs through the database of predicates because there will be different opinions.

    Then one could attempt to reason using these loosely stitched predicates. I don’t know how good the outcome would be.

    I imagine this would be better in an interactive decision making tool where a human is evaluating the suggestions for the next step.

    This could be better for planning than problem solving.

  • pkoird a day ago

    I've said this before and I'll say it again: Any sufficiently advanced LLM is indistinguishable from Prolog.

  • detourdog 2 days ago

    I’m surprised that understanding how to be thought unfolds is being considered not relevant to the answer. I have done a lot of problem solving in groups and alone. How thoughts develop seems fundamental to understand the solutions.

    The story regarding the banning of terms that can be used with a reasoning system is a big red flag to me.

    This sort of knee jerk reaction displays immature management and an immature technology product.

sigmoid10 2 days ago

>Remarkably, constant depth is sufficient.

How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states? Since transformers also use fully connected layers, none of this should really come as a surprise. But from glancing at the paper, they don't even mention it.

  • nexustext 2 days ago

    It's 'remarkable' because (a) academic careers are as much about hype as science, (b) arxiv doesn't have peer review process to quash this, (c) people take arxiv seriously.

  • logicchains 2 days ago

    >How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states

    Only with infinite precision, which is highly unrealistic. Under realistic assumptions, fixed depth transformer without chain-of-thought are very limited in what they can express: https://arxiv.org/abs/2207.00729 . Chain of thought increases the class of problems which fixed depth transformers can solve: https://arxiv.org/abs/2310.07923

  • IshKebab a day ago

    The universal approximation theorem has no practical relevance.

wodenokoto 2 days ago

But didn't we already know that NN can solve any computable problem? The interesting thing is if they can be trained to solve any (computable) problem.

  • imhoguy 2 days ago

    I don't know why I have read "HN", indeed HN can solve any problem.

  • tossandthrow 2 days ago

    Feed forward NNs can approximate all functions f: X -> Y only for closed domains.

    But recurrent neural networks can do solve any computational problem given enough precision.

    • roboboffin 2 days ago

      Does that mean when we reduce the precision of a NN, for example using bfloat16 instead of float32, we reduce the set of computational problems that can be solved.

      How would that compare with a biological neural network with presumably near-infinite precision ?

    • wodenokoto 2 days ago

      First day of introductions to NN we were asked to create all the logic gates using artificial neurons, and then told "If you have all gates, you can do all computations".

      I got to admit, I'm sorta sticking to that at face value, because I don't know enough computer science to a) discern if that is true and b) know what "f: X -> Y only for closed domains" means.

      • tossandthrow 2 days ago

        I think the easiest way to think about this is in terms of natural numbers, ie. 1, 2, 3, 4.

        When you only have a fixed width, ie. a static feed forward network, you have an upper limit to the data you can represent and compute on.

        Eg. if the highest number you can represent is 1.000, then you will need a new NN if you want to do computations on 1.001.

        ... or use an inductive structure, like a recurrent neural network has.

nopinsight 2 days ago

In the words of an author:

"What is the performance limit when scaling LLM inference? Sky's the limit.

We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient.

http://arxiv.org/abs/2402.12875 (ICLR 2024)"

https://x.com/denny_zhou/status/1835761801453306089

  • tsimionescu 2 days ago

    One question, if anyone knows the details: does this prove that there exists a single LLM that can approximate any function to arbitrary precision given enough CoT, or does it prove that for every function, there exists a Transformer that fits those criteria?

    That is, does this prove that a single LLM can solve any problem, or that for any problem, we can find an LLM that solves it?

    • jstanley 2 days ago

      Doesn't the latter imply the former?

      If it's possible to find an LLM for any given problem, then find an LLM for the problem "find an LLM for the problem and then evaluate it" and then evaluate it, and then you have an LLM that can solve any problem.

      It's the "Universal Turing Machine" for LLMs.

      I wonder what's the LLM equivalent of the halting problem?

      • progval 2 days ago

        > It's the "Universal Turing Machine" for LLMs.

        A closer analogy is the Hutter Search (http://hutter1.net/ai/pfastprg.pdf), as it is also an algorithm that can solve any problem. And it is probably too inefficient to use in practice, like the Hutter Search.

      • detourdog 2 days ago

        In the late ‘80s they were called expert systems.

        Most demonstrations were regarding troubleshooting large systems, industrial processes, and education.

  • ec109685 2 days ago

    Is this the infinite monkey Shakespeare trope?

    • throwup238 2 days ago

      More like the universal approximation theorem extended to computation rather than network complexity: https://en.wikipedia.org/wiki/Universal_approximation_theore...

      • immibis 2 days ago

        The universal approximation theorem is good to know because says there's no theoretical upper bound to a function-approximating NN's accuracy. In practice it says nothing about what can be realistically achieved, though.

    • nopinsight 2 days ago

      A key difference is that the way LMMs (Large Multimodal Models) generate output is far from random. These models can imitate/blend existing information or imitate/probably blend known reasoning methods in the training data. The latter is a key distinguishing feature of the new OpenAI o1 models.

      Thus, the signal-to-noise ratio of their output is generally way better than infinite monkeys.

      Arguably, humans rely on similar modes of "thinking" most of the time as well.

    • CamperBob2 2 days ago

      Yeah. Monkeys. Monkeys that write useful C and Python code that needs a bit less revision every time there's a model update.

      Can we just give the "stochastic parrot" and "monkeys with typewriters" schtick a rest? It made for novel commentary three or four years ago, but at this point, these posts themselves read like the work of parrots. They are no longer interesting, insightful, or (for that matter) true.

      • visarga 2 days ago

        If you think about it, humans necessarily use abstractions, from the edge detectors in retina to concepts like democracy. But do we really understand? All abstractions leak, and nobody knows the whole stack. For all the poorly grasped abstractions we are using, we are also just parroting. How many times are we doing things because "that is how they are done" never wondering why?

        Take ML itself, people are saying it's little more than alchemy (stir the pile). Are we just parroting approaches that have worked in practice without real understanding? Is it possible to have centralized understanding, even in principle, or is all understanding distributed among us? My conclusion is that we have a patchwork of partial understanding, stitched together functionally by abstractions. When I go to the doctor, I don't study medicine first, I trust the doctor. Trust takes the place of genuine understanding.

        So humans, like AI, use distributed and functional understanding, we don't have genuine understanding as meant by philosophers like Searle in the Chinese Room. No single neuron in the brain understands anything, but together they do. Similarly, no single human understands genuinely, but society together manages to function. There is no homunculus, no centralized understander anywhere. We humans are also stochastic parrots of abstractions we don't really grok to the full extent.

        • throwaway290 2 days ago

          > My conclusion

          Are you saying you understood something? Was it genuine? Do you think LLM feels the same thing?

          • visarga 2 days ago

            Haha, "I doubt therefore I am^W don't understand"

          • exe34 2 days ago

            do llms feel?

            • throwaway290 2 days ago

              seems like would be the implication if yes

        • kaechle 2 days ago

          Great points. We're pattern-matching shortcut machines, without a doubt. In most contexts, not even good ones.

          > When I go to the doctor, I don't study medicine first, I trust the doctor. Trust takes the place of genuine understanding.

          The ultimate abstraction! Trust is highly irrational by definition. But we do it all day every day, lest we be classified as psychologically unfit for society. Which is to say, mental health is predicated on a not-insignificant amount of rationalizations and self-deceptions. Hallucinations, even.

      • kaechle 2 days ago

        Every time I read "stochastic parrot," my always-deterministic human brain surfaces this quote:

        > “Most people are other people. Their thoughts are someone else's opinions, their lives a mimicry, their passions a quotation.”

        - Oscar Wilde, a great ape with a pen

        • OKRainbowKid 2 days ago

          Reading this quote makes me wonder why I should believe that I am somehow special or different, and not just another "other".

          • HeatrayEnjoyer 2 days ago

            That's just it. We're not unique. We've always been animals running on instinct in reaction to our environment. Our instincts are more complex than other animals but they are not special and they are replicable.

      • ffsm8 2 days ago

        > novel commentary three or four years ago,

        Chatgpt was released November 2022. That's one year and 10 months ago. Their marketing started in the summer of the same year, still far of from 3-4 years.

        • Banou 2 days ago

          But chatgpt wasnt the first, openai had coding playground with gpt2, and you could already code even before that, around 2020 already, so I'd say it has been 3-4years

        • killerstorm 2 days ago

          GPT-3 paper announcement got 200 comments on HN back in 2020.

          It doesn't matter when marketing started, people were already discussing it in 2019-2020.

          Stochastic parrot: The term was coined by Emily M. Bender[2][3] in the 2021 artificial intelligence research paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? " by Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell.[4]

          • ffsm8 2 days ago

            Confusing to read your comment. So the term was coined 3 yrs ago, but it's been 4 years out of date? Seems legit

            It could be that the term no longer applies, but there is no way you could honestly make that claim pre gpt4, and that's not 3-4yrs ago

            • killerstorm a day ago

              Text generated by GPT-3 usually makes more sense than your comments.

              • ffsm8 a day ago

                Arw, did I hurt your feelings by pointing out how nonsensical you were?

                Poor boy <3

      • hegFdH 2 days ago

        The infinite monkey post was in response to this claim, which, like the universal approximation theorem, is useless in practice:

        "We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient."

        Like an LLM, you omit the context and browbeat people with the "truth" you want to propagate. Together with many political forbidden terms since 2020, let us now also ban "stochastic parrot" in order to have a goodbellyfeel newspeak.

        • chaosist 2 days ago

          There is also a problem of "stochastic parrot" being constantly used in a pejorative sense as opposed to a neutral term to keep grounded and skeptical.

          Of course, it is an overly broad stroke that doesn't quite capture all the nuance of the model but the alternative of "come on guys, just admit the model is thinking" is much worse and has much less to do with reality.

      • 93po 2 days ago

        AI news article comments bingo card:

        * Tired ClosedAI joke

        * Claiming it's predictive text engine that isn't useful for anything

        * Safety regulations are either good or bad, depending on who's proposing them

        * Fear mongering about climate impact

        * Bringing up Elon for no reason

        * AI will never be able to [some pretty achievable task]

        * Tired arguments from pro-IP / copyright sympathizers

        • kmeisthax a day ago

          > Tired ClosedAI joke

          > Tired arguments from pro-IP / copyright sympathizers

          You forgot "Tired ClosedAI joke from anti-IP / copyleft sympathizers".

          Remember that the training data debate is orthogonal to the broader debate over copyright ownership and scope. The first people to start complaining about stolen training data were the Free Software people, who wanted a legal hook to compel OpenAI and GitHub to publish model weights sourced from GPL code. Freelance artists took that complaint and ran with it. And while this is technically an argument that rests on copyright for legitimacy; the people who actually own most of the copyrights - publishers - are strangely interested about these machines that steal vast amounts of their work.

        • larodi 2 days ago

          Interestingly there should be one which is missing which is well appropriate unless everyone is super smart math professor level genius:

          These papers become increasingly difficult to properly comprehend.

          …and thus perhaps the plethora of arguably nonsensical follow ups.

          • CamperBob2 2 days ago

            These papers become increasingly difficult to properly comprehend.

            Feed it to ChatGPT and ask for an explanation suited to your current level of understanding (5-year-old, high-school, undergrad, comp-sci grad student, and so on.)

            No, really. Try it.

        • aurareturn 2 days ago

          >* Claiming it's predictive text engine that isn't useful for anything

          This one is very common on HN and it's baffling. Even if it's predictive text, who the hell cares if it achieves its goals? If an LLM is actually a bunch of dolphins typing on a keyboard made for dolphins, I could care less if it does what I need it to do. For people who continue to repeat this on HN, why? I just want to know out of my curiosity.

          >* AI will never be able to [some pretty achievable task]

          Also very common on HN.

          You forgot the "AI will never be able to do what a human can do in the exact way a human does it so AI will never achieve x".

          • HarHarVeryFunny 2 days ago

            > Even if it's predictive text, who the hell cares if it achieves its goals?

            Haha ... well in the literal sense it does achieve "its" goals, since it only had one goal which was to minimize its training loss. Mission accomplished!

            OTOH, if you mean achieving the user's goals, then it rather depends on what those goals are. If the goal is to save you typing when coding, even if you need to check it all yourself anyway, then I guess mission accomplished there too!

            Whoopee! AGI done! Thanks you Dolphins!

          • peterhadlaw 2 days ago

            I think it's less about what it is, but what it claims to be. "Artificial Intelligence"... It's not. Dolphin keyboard squad (DKS), then sure.

            The "just fancy autocomplete" is in response, but a criticism

            • aurareturn 2 days ago

              What's wrong with the phrase "artificial intelligence"? To me, it doesn't imply that it's human-like. It's just human created intelligence to me.

              • danparsonson 2 days ago

                Partly because "artificial intelligence" is a loaded phrase which brings implications of AGI along for the ride, partly because "intelligence" is not a well defined term, so an artificial version of it could be argued to be almost anything, and partly because even if you lean on the colloquial understanding of what "intelligence" is, ChatGPT (and its friends) still isn't it. It's a Chinese Room - or a stochastic parrot.

                • aurareturn 2 days ago

                  Do people really associate AI with AGI?

                  Because we've been using "AI" to describe things many years before AGI became mainstream. Companies used to use "AI" to describe basic ML algorithms.

                  When I see "AI", I just think it's some sort of NL or ML. I never think it's AGI. AGI is AGI.

                • CamperBob2 2 days ago

                  It's a Chinese Room - or a stochastic parrot.

                  Show me a resident of a Chinese Room who can do this: https://chatgpt.com/share/66e83ff0-76b4-800b-b33b-910d267a75...

                  The Chinese Room metaphor was always beneath Searle's intellectual level of play, and it hasn't exactly gotten more insightful with age.

                  • danparsonson a day ago

                    I understand and agree that ChatGPT achieves impressive results but your appeal to incredulity doesn't make it anything more than it is I'm afraid.

                    • CamperBob2 a day ago

                      It's not incredulity, just pointing out the obvious. Searle placed very specific limitations on the operator of the Room. He rests his whole argument on the premise that the operator is illiterate in Chinese, or at least has no access to the semantics of the material stored in the Room. That's plainly not the case with ChatGPT, or it couldn't review its previous answers to find and fix its mistakes.

                      And you certainly would not get a different response, much less a better one, from the operator of a Chinese Room simply by adding "Think carefully step by step" to the request you hand him.

                      It's just a vacuous argument from square one, and it annoys me to an entirely-unreasonable extent every time someone brings it up. Add it to my "Stochastic Parrot" and "Infinite Monkeys" trigger phrases, I guess.

                      • danparsonson 12 hours ago

                        > ... He rests his whole argument on the premise that the operator is illiterate in Chinese, or at least has no access to the semantics of the material stored in the Room.

                        ...and yet outputs semantically correct responses.

                        > That's plainly not the case with ChatGPT, or it couldn't review its previous answers to find and fix its mistakes.

                        Which is another way of saying, ChatGPT couldn't produce semantically correct output without understanding the input. Disagreeing with which is the whole point of the Chinese Room argument.

                        Why cannot the semantic understanding be implicitly encoded in the model? That is, why cannot the program I (as the Chinese Room automaton) am following be of sufficient complexity that my output appears to be that of an intelligent being with semantic understanding and the ability to review my answers? That, in my understanding, is where the genius of ChatGPT lies - it's a masterpiece of preprocessing and information encoding. I don't think it needs to be anything else to achieve the results it achieves.

                        A different example of this is the work of Yusuke Endoh, whom you may know for his famous quines. https://esoteric.codes/blog/the-128-language-quine-relay is to me one of the most astonishing feats of software engineering I've ever seen, and little short of magic - but at its heart it's 'just' very clever encoding. Each subsequent program understands nothing and yet encodes every subsequent program including itself. Another example is DNA; how on Earth does a dumb molecule create a body plan? I'm sure there are lots of examples of systems that exhibit such apparently intelligent and subtly discriminative behaviour entirely automatically. Ant colonies!

                        > And you certainly would not get a different response, much less a better one, from the operator of a Chinese Room simply by adding "Think carefully step by step" to the request you hand him.

                        Again, why not? It has access to everything that has gone before; the next token is f(all the previous ones). As for asking it to "think carefully", would you feel differently if the magic phrase was "octopus lemon wheat door handle"? Because it doesn't matter what the words mean to a human - it's just responding to the symbols it's been fed; the fact that you type something meaningful to you just obscures that fact and lends subconscious credence to the idea that it understands you.

                        > It's just a vacuous argument from square one, and it annoys me to an entirely-unreasonable extent every time someone brings it up. Add it to my "Stochastic Parrot" and "Infinite Monkeys" trigger phrases, I guess.

                        With no intent to annoy, I hope you at least understand where I'm coming from, and why I think those labels are not just apt, but useful ways to dispel the magical thinking that some (not you specifically) exhibit when discussing these things. We're engineers and scientists and although it's fine to dream, I think it's also fine to continue trying to shoot down the balloons that we send up, so we're not blinded by the miracle of flight.

                        • CamperBob2 10 hours ago

                          Why cannot the semantic understanding be implicitly encoded in the model?

                          That just turns the question into "OK, so what distinguishes the model from a machine capable of genuine understanding and reasoning, then?"

                          At some point you (and Searle) must explain what the difference is in engineering terms, not through analogy or by appeals to ensoulment or by redecorating the Chinese Room with furnishings it wasn't originally equipped with. Having moved the goalpost back to the far corner of the parking garage already, what's your next move?

                          It's easy to dismiss a "stochastic parrot" by saying that "The next token is a function of all of the previous ones," but welcome to our deterministic universe, I guess... deterministic, that is, apart from the randomness imparted by SGD or thermal noise or what-have-you. Again, how is this different from what human brains do? Von Neumann himself naturally assumed that stored-program machines would be modeled on networks of neuron-like structures (a factoid I just ran across while reading about McCullough and Pitts), so it's not that surprising that we're finally catching up to his way of looking at it.

                          At the end of the day we're all just bags of meat trying to minimize our own loss functions. There's nothing special about what we're doing. The magical thinking you're referring to is being done by those who claim "AI isn't doing X" or "AI will never do X" without bothering to define X clearly.

                          I don't think it needs to be anything else to achieve the results it achieves.

                          Exactly, and that's earth-shaking because of the potential it has to illuminate the connection between brains and minds. It's sad that the discussion inevitably devolves into analogies to monkeys and parrots.

  • shawntan 2 days ago

    Theoretical results exist that try to quantify the number of CoT tokens needed to reach different levels of computational expressibility: https://arxiv.org/pdf/2310.07923

    TL;DR: Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size. For a field that constantly harps on parallelism and compute efficiency, this requirement seems prohibitive.

    We really need to get away from constant depth architectures.

    • benkuykendall 2 days ago

      > Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size.

      So, as stated, this is impossible since it violates the Time Hierarchy Theorem.

      The actual result of the paper is that any poly-time computable function can be computed with poly-many tokens. Which is... not a particularly impressive bound? Any non-trivial fixed neural network can, for instance, compute the NAND of two inputs. And any polynomial computable function can be computed with a polynomial number of NAND gates.

      • shawntan 2 days ago

        > The actual result of the paper is that any poly-time computable function can be computed with poly-many tokens.

        You're right.

        Re: NAND of two inputs. Isn't this doable even by a single layer (no hidden layers) neural network?

        Re: Polynomial computable function. I'm assuming this makes no assumption of constant-depth.

        Because my entire point was that the result of this paper is not actually impressive AND covered by a previous paper. Hopefully that's clearer.

  • __loam 2 days ago

    > We have mathematically proven that transformers can solve any problem

    We should require that you've passed an algorithms and a thermodynamics class before you can post.

    • nopinsight 2 days ago

      To be clear I think the tweet is a bit exaggerated (and the word ‘performance’ there doesn’t take into account efficiency, for example) but I don’t have the time to read the full paper (just skimmed the abstract and conclusion). I quoted the tweet by an author for people to discuss since it’s still a fairly remarkable result.

    • bonoboTP 2 days ago

      This is an accepted ICLR paper by authors from Stanford, Toyota and Google. That's not a guarantee for anything, of course, but they likely know basic algorithms and the second law. You can certainly argue against their claims, but you need to put in the legwork.

      • __loam a day ago

        I don't think I should need to argue with the absurd claim that these can solve any problem.

  • DarkNova6 2 days ago

    The more interesting is whether the ability of reason and solve problems scales linearly or logarithmically.

  • candiddevmike 2 days ago

    > We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.

    That seems like a bit of a leap here to make this seem more impressive than it is (IMO). You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.

    Wake me up when a LLM figures out stable fusion or room temperature superconductors.

    • krackers 2 days ago

      I think you're misrepresenting the study. It builds upon previous work that examines the computation power of the transformer architecture from a circuit-complexity perspective. Previous work showed that the class of problems that a "naive" Transformer architecture could compute was within TC0 [1, 2] and as a consequence it was fundamentally impossible for transformers to solve certain classes of mathematical problems. This study actually provides a more realistic bound of AC0 (by analyzing the finite-precision case) which rules out even more problems, including such 'simple' ones as modular parity.

      We also had previous work that hinted that part of the reason why chain-of-thought works from a theoretical perspective is that it literally allows the model to perform types of computations it could not under the more limited setting (in the same way jumping from FSMs to pushdown automata allows you to solve new types of problems) [3].

      [1] https://news.ycombinator.com/item?id=35609652 [2] https://blog.computationalcomplexity.org/2023/02/why-cant-li... [3] https://arxiv.org/abs/2305.15408

      • shawntan 2 days ago

        Generally, literature on the computational power of the SAME neural architecture can differ on their conclusions based on their premises. Assuming finite precision will give a more restrictive result, and assuming arbitrary precision can give you Turing completeness.

        From a quick skim this seems like it's making finite precision assumptions? Which doesn't actually tighten previous bounds, it just makes different starting assumptions.

        Am author of [1].

        • krackers 2 days ago

          Ah my bad, great catch! I've updated my comment accordingly.

          • shawntan 2 days ago

            You can't really be blamed though, the language in the paper does seem to state what you originally said. Might be a matter of taste but I don't think it's quite accurate.

            The prior work they referenced actually did account for finite precision cases and why they didn't think it was useful to prove the result with those premises.

            In this work they simply argued from their own perspective why finite precision made more sense.

            The whole sub-field is kinda messy and I get quoted differing results all the time.

            Edit: Also, your original point stands, obviously. Sorry for nitpicking on your post, but I also just thought people should know more about the nuances of this stuff.

    • Horffupolde 2 days ago

      It is actually impressive.

      One could argue that writing enabled chain of thought across generations.

    • Veedrac 2 days ago

      > Wake me up when a LLM figures out stable fusion or room temperature superconductors.

      Man, the goalposts these days.

      • FeepingCreature 2 days ago

        "I love [goalposts]. I love the whooshing noise they make as they go by." --Douglas Adams, slightly adjusted

      • WalterSear 2 days ago

        Shh!! It's working! It's working!

    • whimsicalism 2 days ago

      it's a TCS result.

      seems like many commenting don't know about computability

    • WalterSear 2 days ago

      > You can say the same thing about humans

      1. Holy shit.

      2. You can't apply Moore's law to humans.

      • Tostino 2 days ago

        You can't to chips any more either.

        Density has continued to increase, but so have prices. The 'law' was tied to the price to density ratio, and it's been almost a decade now since it died.

      • gryn 2 days ago

        > 2. You can't apply Moore's law to humans.

        not with that attitude. /s

        if you take reproduction into account and ignore all the related externalities you can definitely double your count of transistors (humans) every two years.

    • aurareturn 2 days ago

      > You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.

      Isn’t this a good thing since compute can be scaled so that the LLM can do generations of human thinking in a much shorter amount of time?

      Say humans can solve quantum gravity in 100 years of thinking by 10,000 really smart people. If one AGI is equal to 1 really smart person. Scale enough compute for 1 million AGI and we can solve quantum gravity in a year.

      The major assumption here is that transformers can indeed solve every problem humans can.

      • wizzwizz4 2 days ago

        > Isn’t this a good thing since compute and be scaled so that the LLM can do generations of human thinking in a much shorter amount of time?

        But it can't. There isn't enough planet.

        > The major assumption here is that transformers can indeed solve every problem humans can.

        No, the major assumptions are (a) that ChatGPT can, and (b) that we can reduce the resource requirements by many orders of magnitude. The former assumption is highly-dubious, and the latter is plainly false.

        Transformers are capable of representing any algorithm, if they're allowed to be large enough and run large enough. That doesn't give them any special algorithm-finding ability, and finding the correct algorithms is the hard part of the problem!

        • aurareturn 2 days ago

          > But it can't. There isn't enough planet.

          How much resource are you assuming an AGI would consume?

          • wizzwizz4 2 days ago

            Are we talking about "an AGI", or are we talking about overfitting large transformer models with human-written corpora and scaling up the result?

            "An AGI"? I have no idea what that algorithm might look like. I do know that we can cover the majority of cases with not too much effort, so it all depends on the characteristics of that long tail.

            ChatGPT-like transformer models? We know what that looks like, despite the AI companies creatively misrepresenting the resource use (ref: https://www.bnnbloomberg.ca/business/technology/2024/08/21/h...). Look at https://arxiv.org/pdf/2404.06405:

            > Combining Wu’s method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem.

            AlphaGeometry had an entire supercomputer cluster, and dozens of hours. GOFAI approaches have a laptop and five minutes. Scale that inconceivable inefficiency up to AGI, and the total power output of the sun may not be enough.

            • aurareturn 2 days ago

              When computers first became useful, you needed computers the size of rooms to compute. In 2024, my earphones have more compute.

              • krige 2 days ago

                It's always a hindsight declaration though. Currently we can only say that Intel has reused the same architecture several times already and cranking up the voltage until it breaks because they seem to be yet to find the next design leap, while AMD has been toying around with 3D placement but their latest design is woefully unimpressive. We do not know when the next compute leap will happen until it happens.

      • visarga 2 days ago

        > Scale enough compute for 1 million AGI and we can solve quantum gravity in a year.

        That is wrong, it misses the point. We learn from the environment, we don't secrete quantum gravity from our pure brains. It's a RL setting of exploration and exploitation, a search process in the space of ideas based on validation in reality. A LLM alone is like a human locked away in a cell, with no access to test ideas.

        If you take child Einstein and put him on a remote island, and come back 30 years later, do you think he would impress you with is deep insights? It's not the brain alone that made Einstein so smart. It's also his environment that had a major contribution.

        • aurareturn 2 days ago

          Assumption is that the AGI can solve any problem humans can - including learning from the environment if that is what is needed.

          But I think you're missing the point of my post. I don't want to devolve this topic into yet another argument centered around "but AI can't be AGI or can't do what humans can do because so and so".

          • visarga 2 days ago

            I often see this misconception that compute alone will lead us to surpass human level. No doubt it is inspired by the "scaling laws" we heard so much about. People forget that imitation is not sufficient to surpass human level.

        • exe34 2 days ago

          if you told child Einstein that light travels at a constant speed in all inertial frames and taught him algebra, then yes, he would come up with special relativity.

          in general, an AGI might want to perform experiments to guide its exploration, but it's possible that the hypotheses that it would want to check have already been probed/constrained sufficiently. which is to say, a theoretical physicist might still stumble upon the right theory without further experiments.

          • westurner a day ago

            Labeling of observations better than a list of column label strings at the top would make it possible to mine for insights in or produce a universal theory that covers what has been observed instead of the presumed limits of theory.

            CSVW is CSV on the Web as Linked Data.

            With 7 metadata header rows at the top, a CSV could be converted to CSVW; with URIs for units like metre or meter or feet.

            If a ScholarlyArticle publisher does not indicate that a given CSV or better :Dataset that is :partOf an article is a :premiseTo the presented argument, a human grad student or an LLM needs to identify the links or textual citations to the dataset CSV(s).

            Easy: Identify all of the pandas.read_csv() calls in a notebook,

            Expensive: Find the citation in a PDF, search for the text in "quotation marks" and try and guess which search result contains the dataset premise to an article;

            Or, identify each premise in the article, pull the primary datasets, and run an unbiased automl report to identify linear and nonlinear variance relations and test the data dredged causal chart before or after manually reading an abstract.

  • riku_iki 2 days ago

    > Remarkably, constant depth is sufficient.

    I think article also says log(n) embedding size (width?) is required, where n is size of input.

  • m3kw9 2 days ago

    Sort of like quantum superposition state? So here is an idea, using quantum to produce all possible inferences and use some not yet invented algorithms to collapse to the final result

  • tooltower 2 days ago

    Constant depth circuits can solve everything? I feel like I missed some important part of circuit complexity. Or this is BS.

    • shawntan 2 days ago

      Using CoT implicitly increases the depth of the circuit. But yes, poorly worded.

JSDevOps 2 days ago

So given infinite time and resources it can solve any problem? Hardly groundbreaking is it.

HarHarVeryFunny 2 days ago

Sure, in same sense as an infinitely long tape let's a Turing machine solve arbitrary problems. In theory at least. If one had the right program.

  • falcor84 2 days ago

    It's not clear me what you're saying; isn't the whole deal here that by performing RL on the CoT (given sufficient size and compute) it would converge to the right program?

    • HarHarVeryFunny 2 days ago

      I was really saying two things:

      1) The theoretical notion that a fixed depth transformer + COT can solve arbitrary problems involving sequential computation is rather like similar theoretical notions of a Turing machine as universal computer, or of an ANN with a hidden layer able to represent arbitrary functions .. it may be true, but at the same time not useful

      2) The Turing machine, just as the LLM+COT, is only as useful as the program it is running. If the LLM-COT is incapable of runtime learning and just trying to mimic some reasoning heuristics, then that is going to limit it's function, even if theoretically such an "architecture" could do more if only it were running a universal AGI program

      Using RL to encourage the LLM to predict continuations according to some set of reasoning heuristics is what it is. It's not going to make the model follow any specific reasoning logic, but is presumably hoped to generate a variety of continuations that the COT "search" will be able to utilize to arrive at a better response than it otherwise would have done. More of an incremental improvement (as reflected in the benchmark scores it achieves) than "converging to the right program".

    • __loam a day ago

      Sometimes reading hackernews makes me want to slam my head on the table repeatedly. Given sufficient size and compute is one of the most load bearing phrases I've ever seen.

      • falcor84 a day ago

        But it is load bearing. I mean, I personally can't stop being amazed at how with each year that passes, things that were unimaginable with all the world's technology a decade ago are becoming straightforward to run on a reasonably priced laptop. And at this stage, I wouldn't bet even $100 against any particular computational problem being solved in some FAANG datacenter by the end of the decade.

        • HarHarVeryFunny a day ago

          That's an apples and oranges comparison.

          Technology advances, but it doesn't invent itself.

          CPUs didn't magically get faster by people scaling them up - they got faster by evolving the design to support things like multi-level caches, out-of-order execution and branch prediction.

          Perhaps time fixes everything, but scale alone does not. It'll take time for people to design new ANN architectures capable of supporting AGI.

        • __loam a day ago

          There's unimaginable and there's physically and mathematically impossible.

          • falcor84 a day ago

            Agreed - but would you wager a bet on what in TFA (or the related discussion) is physically/mathematically impossible?

mg 2 days ago

Has it been publicly benchmarked yet, if this approach:

    Hello LLM, please solve this task: <task>
Can be improved by performing this afterwards?

    for iteration in range(10):
        Hello LLM, please solve this task: <task>
        Here is a possible solution: <last_reply>
        Please look at it and see if you can improve it.
        Then tell me your improved solution.
  • lorepieri 18 hours ago

    Not sure if it has been benchmarked, but I've called this technique the "for-loop of thought". :)

  • Kiro 2 days ago

    Isn't that the whole reason that o1 works?

    • ben_w 2 days ago

      I think o1 is more like "pretend you're doing a job interview, think step and show your working".

      I tried something similar to the suggested iterative loop on a blog post I'd authored but wanted help copy editing; first few were good enough, but then it got very confused and decided the blog post wasn't actually a blog post to be edited and instead that what I really wanted to know was the implications of Florida something something Republican Party.

      Benchmark would be neat, because all I have is an anecdote.

tossandthrow 2 days ago

> We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.

This is also the case with plain and regular RNNs

  • baq 2 days ago

    Now just need an autoregressive transformer <==> RNN isomorphism paper and we're golden

    • logicchains 2 days ago

      Plain RNNs are theoretically weaker than transformers with COT: https://arxiv.org/abs/2402.18510 .

      • tossandthrow 2 days ago

        The paper says transformers perform better than RNNs, which is not surprising.

        However, they are both, theoretically, Turing complete computers. So they are equally expressive.

  • Intralexical 2 days ago

    Isn't it also expected to be the case with RNGs?

seydor 2 days ago

'can'

But will they? I believe the frontier has moved to making them make sense instead of just making infinite language.

The infinite monkey problem is not solved yet

cpldcpu 2 days ago

Can any of these tools do anything that the Github copilot cannot do? (Apart from using other models?). I tried Continue.dev and cursor.ai, but it was not immediately obvious to me. Maybe I am missing something workflow specific?

floppiplopp 2 days ago

They have also mathematically proven that transformers are great randomness generators.

glial 2 days ago

Apologies if this is a dumb question, but aren't all computations inherently serial? In that a Turing machine performs operations serially?

  • joe_the_user 2 days ago

    Aren't all computations inherently serial?

    No. "inherently serial" refers to problems that are specified serially and can't be spend up by parallel processing. The sum of a set of N numbers is an example of a problem that is not inherently serial. You can use parallel reduction to perform the computation in O(log(N)) time on an idealized parallel computer but it takes O(N) time on an idealized serial computer.

    And, it turns, exactly which problems are really are inherently serial is somewhat challenging problem.

    • visarga 2 days ago

      > The sum of a set of N numbers is an example of a problem that is not inherently serial.

      But addition with floats (not reals) is non associative.

      • immibis 2 days ago

        They didn't say floats, and the sum of a set of floats is not uniquely defined as a float for the rain you stated, at least not without specifying a rounding mode. Most people use "round to whatever my naïve code happens to do" which has many correct answers. To add up a set of floats with only the usual 0.5ULP imprecision, yes, isn't trivial.

  • tromp 2 days ago

    Turing Machines are just one of many computational models. Others offer more parallelism. Two examples:

    In lambda calculus, disjoint redexes can be reduced in parallel.

    And in interaction nets, all active pairs can be reduced in parallel [1].

    []1 https://en.wikipedia.org/wiki/Interaction_nets

  • ants_everywhere 2 days ago

    You can model parallel computation by an arbitrary finite product of Turing machines. And then, yes, you can simulate that product on a single Turing machine. I think that's the sort of thing you have in mind?

    But I'm not aware of what "inherently serial" means. The right idea likely involves talking about complexity classes. E.g. how efficiently does a single Turing machine simulate a product of Turing machines? An inherently serial computation would then be something like a problem where the simulation is significantly slower than running the machines in parallel.

  • ninetyninenine 2 days ago

    Yeah it's talking about a new feature for LLMs where the output of an LLM is fed back in as input and done again and again and again and this produces way more accurate output.

tonii141 2 days ago

Random generator of tokens can also solve any problem if you give it enough time and memory.

qmatch 2 days ago

Is this similar to the Universal Approximator Theorem?

scotty79 2 days ago

Chain of thought GPT is sort of a Turing machine with a tape that it's allowed to write to for purposes other than outputting the answer.

empath75 2 days ago

Is this more general than LLMs? Is it possible to do something Chain-of-Thought-like in a transformer model that _isn't_ trained on language?

CarRamrod 2 days ago

Damn, we just used our entire Round A acquiring an infinite amount of bananas and typewriter ink. The boss is not going to like this.

  • nopinsight 2 days ago

    No worries! With the magic bananas and ink you've acquired, those monkeys will surely produce output with a signal-to-noise ratio rivaling the best LLMs.

    I’m sure your startup will achieve the coveted Apeicorn status soon!

  • dotancohen 2 days ago

    Naturally.

    It's the printer ink that is forbiddingly expensive. And the bananas are carbon neutral.

  • imjonse 2 days ago

    Hopefully not Cavendish, as those are too sugary for monkeys and you'll just get hallucinations.

  • bryanrasmussen 2 days ago

    did you get both infinite bananas and infinite typewriter ink, or was there a limited supply of typewriter ink? If the first, it was worth it.

theshrike79 2 days ago

Are we getting to a point where the LLM will just answer "42" and we need to figure out the question? =)

bottlepalm 2 days ago

Forget UBI, we're going to need Universal Basic Compute.