Show HN: Sourcetable – AI Spreadsheet and Data Platform

93 points by mceoin 19 hours ago

Hi HN! I’m Eoin, founder of Sourcetable (https://sourcetable.com).

Sourcetable is an AI-native spreadsheet that syncs with all your data. Users pair with an AI copilot that helps them do their spreadsheet work, as well as more database-centric analysis and SQL.

Soucetable syncs with databases including Postgres, MySQL, and MongoDB, and over 100+ business applications including Stripe, Zendesk, Hubspot, Quickbooks and Google Analytics. That data is available in a spreadsheet, and any models you build automatically update in near-real-time as new data flows in. The core primitives are AI + spreadsheet + data sync + storage + compute.

If you want to play with Sourcetable today, the easiest way is to upload a CSV and start asking questions.

Who is it for? Sourcetable is for analysts, operators and finance folk doing data-centric work in a spreadsheet. Sourcetable’s spreadsheet-based AI assistant understands workbook range selection and can adjust scope context to the datasets you are working with. You can talk directly to your database and SaaS integrations, which is great for analysis, data search and retrieval, SQL writing & editing (including writing joins across different datasets), and automatic chart creation.

Niching down, if you work in operations at a <50 person startup or SMB and your company relies on a Postgres or MySQL database, Sourcetable is an affordable reporting tool with turnkey data infrastructure that doesn’t require code or engineers to set up.

Spreadsheets are the most used analytical tool on the planet. AI is a platform shift with broad applications. We are staying open-minded about users and use cases since everything is so new.

Backstory: I spent ten years working in de-facto operations and technical roles at startups. Sourcetable draws from that experience of needing better data tooling inside spreadsheets, and constantly hacking ad hoc solutions to fill the gap. Andrew (CTO / co-founder) previously had a deep learning company and was initially drawn to the idea that Sourcetable could be an operating system for the web. We’re both Aussie expats in the Bay Area, which is how we met. Internally, we think of Sourcetable as an application platform, with AI applications being a useful and interesting place to focus.

Features & Use Cases: Talk to your CSV files, spreadsheets, integrations, and datasets using LLMs. AI + data work: Text-to-SQL, search and retrieval from databases, LLM-based data analysis. (This is an entirely different experience to what Copilot/Gemini & Excel/Sheets provide, since they are thin workbooks and not data platforms.) AI + spreadsheet work: formula assist, workbook analysis, data cleaning, chart creation, error handling, summarization, chat, etc. Automated reporting: data is synced, reports you build stay up to date. No-code data access: give the business team safe database access so they will leave you alone! Centralizing data for cross-channel reporting. (e.g. Postgres + Stripe + Mailchimp) Analyzing large CSV files: Sourcetable can handle multi-gigabit files. (Google Sheets can’t handle large data and the experience in Excel is rather cumbersome.)

Technical Details: Sourcetable was built to be fast. It was also built to scale.

AI: LLama 3 (via Groq), Claude, GPT-4o, LiteLLM, custom LLMs

Frontend: DuckDB, React, ShadCN, AntV / Bizcharts, Plotly, CodeMirror, Hookstate

Backend: DuckDB, Python, Cassandra, Redis, NGINX, Cloudflare

Data Eng & Transformations: Fivetran, DBT, Apache Arrow, SQLglot

Distributed Computing & Scaling: Daft, Ray, Cloud Formation

Other: Linux Namespaces, Dill (U.Queensland)

A huge thank you to the open source community, and a special shout-out to DuckDB for being so damn fast. Thank you also to Groq & Anthropic for the rate limit increases in time for this ShowHN post!

Feedback: Product feedback is welcome! eoin@sourcetable.com

primitivesuave 6 hours ago

This is incredible. I uploaded a CSV with ~6000 rows containing campaign finance data for a particularly corrupt local politician and asked "what was the total contributed amount in [year]". Not only did it produce the correct answer (in around the same amount of time it took me to calculate it on my end) but it also seemed to understand that the spreadsheet was related to campaign finance in the "summary" portion of the response.

The most useful aspect was that I could ask "what was the total contributed amount between January and June of 2020" and get an accurate answer for that as well. Since the date column is provided as an "MM/DD/YYYY" string, I would normally have to do some boilerplate work to sanitize this.

For my particular use case, the charting aspect left a few things to be desired - once I grouped campaign donations by contributor, I could only see the first 10 rows in the AI response, with no option to expand the output. But overall I was truly blown away that something like this is even possible for a small team to build.

  • mceoin 5 hours ago

    > For my particular use case, the charting aspect left a few things to be desired - once I grouped campaign donations by contributor, I could only see the first 10 rows in the AI response, with no option to expand the output.

    Insert it as a table on the page (you should see a button), it will then print the whole table result from that query into the spreadsheet. Also, you can check the SQL first and validate it, then print to table after that.

    Try a few million rows and see what happens!

    • dioptre 5 hours ago

      Also keep an eye out on the limit - we default to 10,000 to keep it snappy but if you want to make it larger its a click away. The "summarize table" button should auto limit to 1B+ rows.

mmckelvy 7 hours ago

Interesting. I think you're on to something here. I fully agree that a combination of spreadsheets and SQL are the ideal tools for data analysis -- not a SaaS GUI.

> Niching down, if you work in operations at a <50 person startup or SMB and your company relies on a Postgres or MySQL database, Sourcetable is an affordable reporting tool with turnkey data infrastructure that doesn’t require code or engineers to set up.

With the rise of AI, companies like Tembo that help you set up all in one databases, and tools like this, I'm increasingly of the mind that many companies should start bringing things like analytics and observability in-house. I don't see the need to pay Mixpanel or Datadog thousands of dollars per month when a self-serve solution that relies on tried and true tech is more or less at your fingertips.

  • mceoin 7 hours ago

    Agree. A general thesis I have is that the API-ification of the web fragmented business information, and with every new SaaS tool we fragment our company's data further. The trend at all company sizes is to be increasingly analytical, but for SMBs it's too hard to get access to your data (mainly due to technical limitations). So it makes sense to centralize data somewhere, and we think that somewhere is inside the data tool that everyone actually uses: the spreadsheet.

    Many other advantages of this data centralization too. Data + spreadsheets + compute is a nice application base for agents.

    • threeseed 7 hours ago

      > So it makes sense to centralize data somewhere

      Modelling and integrating datasets that you don't own is extremely hard.

      Shopify for example updates their API every 3 months.

      How much time and money do you think an SMB can afford to spend on this before the ROI becomes so poor that they abandon it entirely.

      • mceoin 6 hours ago

        Yes some integrations are excellent (hey Stripe : ), some are terrible (no comment on who). We're finding that LLMs increasingly able to fill the gap around organizing data schema for that initial data prep piece where someone has to build the data tables that others consume. To your specific question/problem set, when a schema updates you end up with a "fuzzy schema matching problem"; we are solving that separately anyways for a separate product feature requirement.

        Strong note here that the current state of technology is much better for SMB scale data and not enterprise scale data with messy schemas.

      • mceoin 6 hours ago

        There is a separate answer here which is many (most?) SMBs can't afford technical folk, so the ability integrate data at all, talk to it and model it (using SQL or AI), is already a big step forward for them.

        My personal use case tends to involve a lot of Postgres data and transaction events for my reporting. We see "simple" businesses like parts manufacturers, print shops, vineyards, etc. all doing something similar.

  • threeseed 7 hours ago

    Minus the AI part tools like this have existed for decades.

    And companies are not dumping their SaaS tools and switching to them en masse.

    Because (a) data silos have dramatically increased pushing dreams of a unified data schema out of reach, (b) technology stacks have become far more complex necessitating tools like Datadog and (c) competition is stronger than ever meaning that skimping on paying for tools like MixPanel is often short sighted and counter productive.

    Companies like this will do fine and there will be always be a demand for them especially in the SMB space. But there simply isn't the business value in bringing a lot of analytics and observability in-house in almost all cases.

    • mmckelvy 6 hours ago

      Not yet. But in the analytics case, suppose you could build a tool that collected data on your own infrastructure, allowed you to write plain SQL against a PostgreSQL database to get whatever analytics data you need, had an AI-driven text-to-SQL option so non-technical users could get whatever analytics data _they_ need, and output everything to a universal interface, i.e. a spreadsheet? No vendor flavored DSL, GUI, or workflows to learn. That product would be tough to beat. It wasn't built in the past because it was hard. But with AI and something like Tembo or Timescale, is it actually hard anymore?

aerosmile 8 hours ago

It’s amazing that Microsoft - given their focus on AI and decades of experience in spreadsheets - doesn’t offer this type of functionality. Corporate bureaucracy vs startup agility!

  • mceoin 8 hours ago

    At risk of poking the bear, they should have done this decades ago. Except for LLMs they have had everything they needed to bundle this stack into a single product solution; this would be much better for users.

    And yes! We're definitely of the opinion that as a startup we can outcompete the two trillion-dollar death stars when it comes to product experience. AI is a platform shift!

longstaff2009 5 hours ago

Thats a spicy example dataset!

I like that it's able to infer information from the context of the cells, e.g. being able to run a query across continents when the data only contains the country.

Being able to ask it to interpret the results is helpful, it would be cool if it automatically told you if there was enough data to have statistical significance in the conclusions it was presenting.

  • mceoin 5 hours ago

    You may see that we try to suggest follow-up questions or question improvements where we think better context-in will result in a better result-out.

    Curious what will happen if you modify the question to be more explicit?

    I have seen that PMs and data-trained folk tend to be very articulate in asking for exactly what they want and that tends to lead to significantly better LLM responses.

sim_123 14 hours ago

This is amazing. I’ve been scouting for such a solution as we’ve outgrown excel. Giving it a spin

  • mceoin 14 hours ago

    A very common use case we see is SMBs having outgrown their spreadsheet but not wanting to move to a full-blown BI tool. They want the power, but not the change in interface/medium.

    I didn't go into details above but a nice thing is that we leverage cloud compute and storage, so you can query billion-row data in sub-second time. (Courtesy of Duck!)

Brajeshwar 6 hours ago

You might want to check who is blacklisting you and request to unblock. AdGuard blocked sourcetable.com as "Scam".

https://www.dropbox.com/scl/fi/np92pyo0eb0zphysc9wwz/screens...

  • SoulAuctioneer 5 hours ago

    Thanks for reporting! Taking a look now.

    • dioptre 5 hours ago

      Hey do you mind removing this comment? Seems it might have caused us to be blacklisted?

      • Brajeshwar 4 hours ago

        I'm sorry, I've missed the "delete" window. But may I know how a comment here (after it being blacklisted) about it being blacklisted will be the reason to be blacklisted?

        • mceoin an hour ago

          ¯\_(ツ)_/¯ deciphering magic algorithms.

          Very much appreciate the bug report. Thank you!

yawnxyz 15 hours ago

> Niching down, if you work in operations at a <50 person startup or SMB and your company relies on a Postgres or MySQL database, Sourcetable is an affordable reporting tool with turnkey data infrastructure that doesn’t require code or engineers to set up.

I'm already using Retool for these kinds of tasks- what does sourcetable do that I can't already do with Retool?

edit: also, did you build your own spreadsheet engine, or use an off-the-shelf one? (also will it be open source ;P)

  • mceoin 15 hours ago

    Category Comparison (table-based solutions): "How are you different than Retool/Airtable/Coda/Notion/Zapier Tables, etc."

    The primary difference vs table-based solutions is that Sourcetable is a spreadsheet in the common sense of the word, similar to Excel and Sheets. We have A1 notation and cell-based referencing. This is what most users expect, and this flexibility/familiarity has a big impact on the breadth of users and use cases within a team.

    The formula referencing system of these table-based solutions is usually very limited both to columns/rows (not cells), and is a set of SQL-based queries which are much more limited than that 500+ formulas and functions spreadsheet users commonly expect.

    Retool specifically: I tend to think of Retool as a lightweight custom-ERP software system, whereas Sourcetable more like Excel + PowerBI + Data Warehouse, so we will generally be much stronger for reporting and analysis. We definitely have some overlap in potential users since technical operators should like us both. FWIW - Retool is an excellent product.

  • dioptre 14 hours ago

    Hi I'm Andy, Cofounder & CTO @ Sourcetable.

    We use a heavily modified licensed engine that prevents us from open sourcing everything (for now). We have plans to open source our agentic/plugin framework, and other parts of the system. We also have a strong ethos of contributing back to open source where we can (contributed back to Arrow, DuckDB etc.).

    I'd also add that while everyone knows how to use and work with spreadsheets, we also provide a SQL layer on top that you can use to query data sources as an advanced user (we developed a nomenclature to work within sheets/across sheets/files/our data-warehouse). This allows more technical users to work side-by-side in the same environment as non-technical users without crossing pythonic or reporting boundaries.

    On top of this, the AI assistant can answer most of the questions you might have of all this data.

    I think as ML gets more sophisticated, we will in general need to be less technical. The "tooling" might even disappear, but we will still need something to communicate important data centric decisions. Whether you like it or not spreadsheets are the foundation of human research and operations and have been for thousands of years, and I feel humanity will need less complicated "tools" and we will keep to our roots.

    • Jayakumark 6 hours ago

      Will you be able to share name of the engine ?

samymov 5 hours ago

Huge congrats on the Launch ! You guys crushed it with all the thought and hustle behind creating such a valuable tool. Wishing you nothing but success on the ride ahead!

sammysidhu 6 hours ago

Congrats on the launch! It's been great working with you from the Daft side

mg1973 6 hours ago

Brilliant work team, great to see this being launched.

alooPotato 8 hours ago

Cool.

How did you build so many integrations so fast?

Selfishly, would love to see Streak (CRM) integration as well.

  • mceoin 8 hours ago

    Mostly Fivetran, a little Airbyte, and a few custom integrations. Would love to add Streak (can you get it into Fivetran? We can usually crank those integrations out within an hour.)

    • mceoin 8 hours ago

      p.s. I was a massive Streak user at a previous (sales-driven) startup. Big fan!

djbiggs 8 hours ago

Awesome, have you got any mining specific worked examples or spatial examples? Thinking about lidar point clouds and running deltas for stock pile management. Looking at building a new mine and typically there at any mine site there are excel macros which might take an hour to run embeded in the operations. Often developed by older engineers, who will default to excel. Any suggestions on how best drive technical user adoption (asides from dropping it on the kids in the engineering deparments, can't wait that long) ?

  • dioptre 7 hours ago

    The underlying datatypes we support in our data-warehouse support 3d and 4d data. So we can do vector queries on these and do transformations over different spaces. I think given what you need we can put your data in our data-warehouse, and then present it to the older engineers in an excel format with 3d plotting. We might want to chat about the details though, give me a holler at andrew@sourcetable.com

  • mceoin 7 hours ago

    Yes actually! My cousin is a mining engineer so I spent a bunch of time playing around with mining data during testing. Turns out all New South Wales government data is public. Right now you can talk to any CSV or database using LLMs. I've also played around with a bunch of marine biology datasets too!

    (p.s. I think Andrew, CTO, is going to jump in here as he has more experience in this space.)

    • mceoin 7 hours ago

      Can you email me -- eoin@sourcetable.com -- more about the Excel macros? This might be easy to help you out with agents. A lot of compute-intensive stuff that takes ages in Excel is nearly instant in Sourcetable because we are leveraging cloud compute, but it really depends on your use case.

smcleod 6 hours ago

Are you open sourcing the product for non-commercial use?

  • mceoin 6 hours ago

    Would love to but unfortunately there are pieces we can't open-source for various reasons. We'll open source bits and pieces over time, and generally are excited to start blogging about AI & technical learnings now that the product is out of stealth mode.

    Small plug for the analytics tracker we are using which Andrew (CTO) built and is open source: https://github.com/sfproductlabs/tracker

HeralFacker 7 hours ago

What external checks are included to verify the chatbot output?

  • SoulAuctioneer 7 hours ago

    Wherever possible, the chatbot output is deterministic, in that to answer a query, we're realtime generating and running code or SQL against your data. Our LLM orchestrates that, and finally evaluates whether the output correctly and adequately answers the question.

    We also extensively use synthetic data and examples to guide and constrain our models.

    Another way we're ensuring good-quality output is to ensure good-quality _input_ -- by enriching the detail and specificity of the user's question, and asking the user to disambiguate when we determine the question is too broad.

SMAAART 7 hours ago

Looks interesting, commenting so that I can remember.

escot 15 hours ago

Very cool. It would be great to have auto complete across cells.

  • mceoin 15 hours ago

    Yes we don't yet have the full auto-suggest magic that Sheets offers, but you can click-drag for auto-complete the same way Excel offers.

    We released Sourcetable today with the AI chatbot & AI data analysis features, but a very limited cell-based AI (only "summarize" and "fix formula"). We'll be releasing a big AI-based magic-autofill solution in the coming weeks.

_hfqa 13 hours ago

Congrats on the launch! It’s wild to see AI stepping into spreadsheets like this. Pretty soon there won’t be a part of our workflow AI hasn’t touched.

  • mceoin 13 hours ago

    Thanks _hfqa! We think there's massive potential here. It's a big platform shift, and spreadsheets weren't really impacted by the mobile or cloud compute waves, so it's a space long-overdue for disruption. (The last shift was back when Google Sheets took spreadsheets to the browser 17 years ago!!)

halfcat 6 hours ago

I always wonder where these spreadsheet/database apps will land. Usually it falls flat for one of a few reasons I’ve observed:

- Fundamental gap in skillset, in that if you want to have ultimate flexibility to slice and dice the data and report on whatever you’re seeking, you’ve ultimately needed SQL skills in the past (which isn’t rocket science, but also isn’t something most accounting users can run with on their own).

- Fundamental desire of users to work with unstructured data. This goes back at least as far as Excel vs Lotus Improv in the early 90’s. Joel Spolsky talked about this, how they were terrified that Lotus Improv was going to kill Excel, because Improv was built to work with structured data, which users could then query and ask questions of to get any answer they want. But it turned out, as they observed people using both apps, there were zero users that used 100% normalized, structure data.

- Imperfect translation between spreadsheet and database. I’ve seen these work well 99.9% of the time, but at some point a column gets added or something that throws off formulas. And 0.1% error is basically catastrophic in accounting.

Maybe LLMs help overcome these challenges. Wish you luck.

  • SoulAuctioneer 5 hours ago

    Agree with you, and we're definitely trying to thread the needle!

    We're generating the SQL to answer natural language questions, so folks can just get answers and results tables if that's all they need, with the option for power users to fiddle with the SQL either directly or via a query editor GUI.

    There's a ton of use cases for working with unstructured and semi-structured data and that's coming down the pipe!

  • mceoin 5 hours ago

    This is 100% the correct insight in my experience.

    TL;DR, most technical people massively overestimate the technical / data abilities of regular spreadsheet users. We find simple use cases are best, and with each new LLM release the UX around more complex data improves significantly.

    The reason we chose to build as a full-blown spreadsheet instead of just a table-based solution was that we saw that most people want the flexibility of a regular spreadsheet, but access to their (structured) business data. Table-based solutions wedge you into AI and you can never get out of that.

petergreen 8 hours ago

great product. congrats on the launch