Show HN: Open-Source Data Anonymization for Developers

docs.neosync.dev

13 points by edrenova 2 days ago

Hey HN, we're Evis and Nick from Neosync (https://www.github.com/nucleuscloud/neosync).

Since we last introduced Neosync on HN 4 months ago, we’ve made a lot of progress and we’re excited to be launching several new features.

As a reminder, Neosync is an open source platform that helps developers anonymize production data, generate synthetic data, subset it and sync it across their environments for better testing, debugging and developer experience.

We do all of this while handling referential integrity. Whether you have primary keys, foreign keys, unique constraints, circular dependencies (within a table and across tables), sequences and more, Neosync preserves those references.

Our goal is to give every developer production-like, representative data for a better developer experience without any security and privacy issues.

First, we’ve added new integrations. In addition to supporting Postgres and Mysql, we’re introducing first class support for DynamoDB, MongoDB and SQL Server. You can also sync to object storage like S3 and GCP Cloud storage.

Next, we’ve completely revamped our transformers. Transformers are how you anonymize sensitive data and generate new data. We’ve added new Transformers that you can use out of the box or you can write your own custom one in javascript. We’ve added real time validation and the ability to combine transformers together to create your own anonymization scheme.

We’ve also added in new features to make Neosync easier to use. For example, the ability to automatically map transformers to your schema. The ability to only append new records instead of a full refresh. And to stop jobs from running when the schema changes.

We've also upgrade our AI Synthetic Data features. You can use any LLM to generate synthetic data and Neosync will handle the orchestration between your database and the LLM.

Lastly, we’re also announcing Neosync Cloud. Our hosted platform that allows you to use Neosync without having to run any of the infrastructure yourself. All you have to do is connect your source and destination databases(s), configure your schema and you’re done.

Of course, you can use Neosync Open Source on-prem and hundreds of companies do. Neosync is written in Go and Typescript and can be started locally with a single make command.

We'd love any feedback you have and contributions are always welcome.

D_R_Farrell 2 days ago

This is cool – as an open source founder, I've wondered about how to keep things transparent while protecting people's privacy. What’s your general approach to public databases, and when does it make sense to make them public?

  • edrenova 2 days ago

    Thanks! Yeah we generally recommend not making your databases public and instead connecting to them using a bastion host. We support this at Neosync. Also, ideally, not connecting to a live DB and instead a snapshot or back up. A read replica could work as well but a snapshot is better.

haroonchoudery 2 days ago

Big fan of what Evis and team are building. We work with lots of orgs in regulated industries at Autoblocks and Neosync's anonymization is a fan favorite for AI teams at these orgs.

dangtony98 2 days ago

Why this over something like Faker?

  • edrenova 2 days ago

    Thanks for the question! Faker is useful but doesn't have a lot of features. For example, referential integrity, data orchestration or the ability to read/write to a db. So faker can work for simple API schemas but if you need something more robust for an entire database, then that's where we can help.