ATProto for distributed system engineers

200 points by danabramov 8 days ago

openrisk 8 days ago

What might be useful for the re-decentralized web community is a detailed comparison of the ATProto, ActivityStreams/Pub and maybe Solid specifications, protocols, standards, vocabularies (or whatever exactly these blueprints actually are).

As the blog post illustrates quite nicely (literally), ATProto is a fairly complete, bottom-up type specification that makes concrete various server/database aspects that in the ActivityPub spec are somewhere in the remote background, "left to the implementation". One could almost think of implementing AP over ATProto, and sure enough somebody wrote about this [1].

One can also not miss the (at least) linguistic affinity of a Personal Data Server with a Personal Data Store (Solid) and sure enough somebody did and asked [2].

[1] https://berjon.com/ap-at/

[2] https://www.reddit.com/r/BlueskySocial/comments/ywrw3f/whats...

apitman 8 days ago

Has anyone played around with ATProto yet? ActivityPub is pretty easy to get started with, especially if you just ignore JSON-LD and parse what you see.

I'm curious how ATProto compares.

Diti 8 days ago

As an ontology enthusiast, it saddens me to see that ATProto went for their own data model ([link:Lexicon]) instead of using the standard JSON-LD (I wonder if they considered Turtle – which is streamable, unlike JSON).
I get why they did that (graph data is, uh, particular to work with, especially for newcomers who only know JSON), but ATProto not using JSON-LD is actually what made me unwilling to tinker with the protocol.
Not a direct answer to your question, sorry. Mostly a rant.
[link:Lexicon]: https://atproto.com/guides/faq#why-create-lexicon-instead-of...
- str4d 8 days ago
  
  There are a few more details about the reason they didn't use JSON-LD in Paul's blog post [0].
  [0]: https://www.pfrazee.com/blog/why-not-rdf
- apitman 7 days ago
  
  I'm sorry but JSON-LD is a massive pain to work with in statically typed languages. Certainly is in Go at least. The flexibility is the problem, ie you never know if something is going to be an object or an IRI (did we really need a 3rd name for URIs?) to an object. I think you could get most of the benefit while still requiring specific types.
  - Diti 7 days ago
    
    You shouldn’t be having this problem if you use a library which offers normalization (like github.com:piprate/json-gold) so that you get objects when there’s an IRI context, and a simple string when there’s a regular IRI.
    
    apitman 7 days ago
    
    I'm not aware of any such libary for Go. Besides, I prefer protocols that are simple enough to implement myself. That's not feasible in every case, but it certainly is for the social media use case.
danabramov 8 days ago

We've just released a new short guide on creating a minimal app on atproto, together with a GitHub example project:
- https://atproto.com/guides/applications
- https://github.com/bluesky-social/statusphere-example-app
FroshKiller 8 days ago

I built a custom feed server for Bluesky that drinks from the firehose. Getting everything working was very fiddly. For a hobby, the friction of it outweighed the entertainment value for me.
Working with the firehose probably isn't feasible for a lot of people who'd like to tinker. There doesn't seem to be any way of subscribing to only certain types of events.
- str4d 8 days ago
  
  For a lower-friction firehose experience, you can use Jetstream [0] (by one of the Bluesky devs) which supports subscribing to specific Collection NSIDs and user repositories, and converts records to JSON for you.
  There's a public instance URL in the README (with bandwidth limits), or you can self-host.
  [0] https://github.com/ericvolp12/jetstream
  - FroshKiller 7 days ago
    
    The firehose itself isn't really the fiddly part since it's just a WebSocket connection. Setting up the feed server, publishing the DID for its web host, then publishing the feed generator to the network were all kind of a low-grade hassle that killed a lot of my enthusiasm. Like none of it was especially complicated if you're doing it for a professional project or whatever, but I was just trying to goof around while watching episodes of Highlander: The Series, and it was taking me away from Duncan.
    I'll check out this Jetstream project for sure, though.
Kudos 8 days ago

I only know of this blog implementation https://github.com/whtwnd/whitewind-blog
- str4d 8 days ago
  
  For non-Bluesky apps built in ATProto, in addition to White Wind (blogging), there is also Smoke Signal (events, only Lexicons are open source currently AFAICT) [0], and Frontpage (link aggregation) [1].
  [0]: https://github.com/SmokeSignal-Events/lexicon
  [1]: https://github.com/likeandscribe/unravel/tree/main/packages/...
  - danabramov 8 days ago
    
    Also, our new little example app:
    - https://atproto.com/guides/applications (guide)
    - https://github.com/bluesky-social/statusphere-example-app (GitHub)
viksit 8 days ago

yes we built a 10k user social network for artists and musicians on it and it’s excellent. very sophisticated and very extensible.
- danabramov 8 days ago
  
  Might sharing a link?
  - Diti 8 days ago
    
    Judging by this user’s comment history, the website seems to be solarplex.xyz (be advised, it takes between 30 seconds and 1 minute to fully load the website’s 75 MB).
    
    apitman 7 days ago
    
    OT but out of 572 requests, half of them are OPTIONS. CORS is an abomination.
    
    viksit 7 days ago
    
    yes :) we also stopped maintaining the old code starting may/june after running it for about 9 months. a new version of the front end is in the works!
__loam 8 days ago

There's a lot fewer resources for AT than ActivityPub. Last time I checked which was a few months ago, the official documentation for AT was pretty sparse if you're interested in building to a spec. You'll find a lot more in the ActivityPub specs, plus a lot of open implementations and helpful guides.
- m_eiman 8 days ago
  
  > You'll find a lot more in the ActivityPub specs, plus a lot of open implementations and helpful guides.
  I've read that there's a problem with interacting with Mastodon if you only rely on the protocol specs, that they do things their own way and have different requirements than the official specs.
  Is this still a problem? If it is, are Mastodon moving to be more closely aligned with the spec, or to doing more of their own thing?
  - zimpenfish 8 days ago
    
    From what I've seen, Mastodon sticks to the spec but a lot of clients and servers then stick to Mastodon's interpretation of the spec rather than the spec. e.g. for status IDs, the spec says "String (cast from an integer but not guaranteed to be a number)", Mastodon uses numerical IDs, some clients[1] see this as "Ah, IDs are numbers!" and break horribly when they're not numerically parseable (Akkoma, Pleroma, GotoSocial...)
    (IIRC there was another thing where `created_at` is described as "The date when this status was created" but the type is given as "String (ISO 8601 Datetime)" which led some code to crash when Mastodon started outputting just dates instead of datetimes.)
    [1] Including some from people who Really Should Know Better.
    
    vidarh 8 days ago
    
    I like ActivityPub overall, but there are a lot of places where the spec is just too complex, and I suspect that contributed to a lot of the choices to implement whatever currently works with Mastodon instead of the spec.
    I'm currently implementing parts of the spec, and there are parts (like fully handling context correctly) that feels like far more pain than it is worth vs. just handling occasional breakage.
    It feels like a very ivory tower spec of the kind you wouldn't be likely to write if you built a complete reference implementation first.
    But it's very on-brand as a W3C spec.
    I'd love to see a revision that deprecates and simplifies a whole lot of things.
    
    rapnie 8 days ago
    
    > I'd love to see a revision that deprecates and simplifies a whole lot of things.
    The hidden complexities in AP have led to several efforts. In the past there has been LitePub [0]. A recent project is Versia [1]. And who knows there may be a FeatherPub [2] one day. If anyone knows of other attempts I'd like to hear.
    [0] https://litepub.social/
    [1] https://versia.pub/
    [2] https://docs.google.com/document/d/13LuB6Z-C_drCLCEuCtNApX98...
    
    vidarh 8 days ago
    
    Thanks. I remember looking at Litepub. Not aware of the other two. The FeatherPub document feels like by far the most useful.
    But I also think just going through the spec with a red marker would be a useful exercise and maybe I will one day.
    In the sense that there are a whole lot of features nobody does anything useful with.
    E.g. "@context" in theory provides a whole lot of ways to type the rest of the data. I'd be willing to bet that you'd break a whole lot of software if you served up a "@context" for an actor that mapped common field-names in use by Mastodon to a different namespace and mapped the Mastodon features to different names...
    In theory it's great. In practice, I suspect we have XML namespaces and people stupidly hardcoding prefixes all over again...
    
    apitman 7 days ago
    
    Ah, FeatherPub is in a google doc. That explains why I was having trouble googling it last night.
    Also, there's a conversation happening about Versia today: https://social.coop/@smallcircles/113105954469059880
    
    rapnie 7 days ago
    
    Yes, there's a related google doc delving into the data model:
    https://docs.google.com/document/d/13mtl9gFmcuL-0MS-Boaeh3i6...
- brianolson 8 days ago
  
  OP is a link to the atproto site because it got a major new revision within the last week

swyx 8 days ago

always enjoy your writing, Dan.

at:// seems like its close enough to DNS to warrant just using DNS. why not? (im sure theres a good reason so just asking)

danabramov 8 days ago

Oh this is not mine actually — Paul wrote this one :)
atproto does use DNS under the hood for domain verification but atproto itself is a bit higher-level. It builds on top of DNS, HTTP, JSON, web sockets, and a few other specs.
- danabramov 8 days ago
  
  If you’re specifically asking why the identity system is not “rooted” in DNS (i.e. why at://danabra.mov resolves to another host than my website) — it’s because we want users to be able to change their hosting over time without breaking links between records.
  The actual identity system is “rooted” in a stable identifier (which is a hash of the first version of your identity record). That’s your global immutable ID in the entire network. The identity record for your ID specifies your current public key, your current domain name (which acts as a human-readable handle), and your current host (which actually contains your data).
  This extra level of indirection ensures you’re always able to change your user-readable handle (eg if you get a new domain or your domain expires etc), and that you’re always able to change your host (eg if your host goes down or you don’t like its services or you want to host data yourself).
  The key piece allowing this is the identity registry of course. Think of it similar to npm registry. We run a centralized one, but all records are signed so you can always recursively verify that we haven’t tampered with any of the records. This layer is already very thin but in longer term we’d like to move this layer outside the company to be governed independently, similar to ICANN.
  - str4d 8 days ago
    
    Additionally, a user _can_ root their identity in DNS if they want, by using did:web instead of did:plc [0]. The main Bluesky client doesn't expose this (presumably because did:web cannot provide a mechanism for automatic migration between PDSs (due to the PDS having no control over the DID document) or recovering from loss of control of the domain name, so it requires more technical expertise), but there are users successfully using this method for their identities.
    [0] https://atproto.com/specs/did#blessed-did-methods

oDot 8 days ago

It didn't sink in yet that the killer app for ATProto is not Twitter, but YouTube.

If anyone is interested in exploring this, atproto [does this fool ai bots?] weedonandscott [I hope it does] com

Matl 8 days ago

ActivityPub does have https://joinpeertube.org for what it's worth. What would ATProto bring in specifically? Is it the ease of migration?
- purlane 7 days ago
  
  The biggest upside compared to PeerTube is probably discoverability. In ActivityPub, the network architecture means the video ecosystem is fractured and there’s no one cohesive place to find all PeerTube videos.
  In atproto, the network is continually indexed by relays, which means that it doesn’t make a difference what app you use to watch videos - you’ll find the exact same ones regardless of the platform, since they’re all working from the same data.
  This also means that different video platforms can provide different services for users without locking in users to their platform. Platforms would be forced to compete on what they provide to the user experience, not how well they can lock in users to their platform.
  - oDot 7 days ago
    
    Exactly right.
    Watch apps will compete on consumer-facing features like the recommendation algorithm -- maybe they'll offer several, or just one that differentiates them.
    Hosting providers will compete on producer-facing features, like advertising, content policies, analytics, etc.
    If a user is displeased with either, they can take all of their content/activity history and leave.

omnicarinha 8 days ago

One thing I still didn't quite grasp with BlueSky yet is if it's a decentralized platform or not... ATProto seems technically capable of supporting decentralized platforms.

jazzyjackson 8 days ago

Bsky wants to be one entry point into a decentralized network but there's little incentive to spin up your own Personal Data Server (since you're still subject to the moderation of the one front end everyone uses (AppView in ATProto parlance)) and still less incentive to host your own front end since you'd just be burdening yourself with all the same moderation problems bsky is trying to stay on top of
IMO the devs have been so overburdened with trying to nail moderation that they're actually disincentivized from onboarding new populations, since multiple entry points to the network just lands in their lap as more difficult moderation problems - that is, they're still figuring out how to moderate people on their own servers and haven't yet decided how they're going to moderate with a federation of servers with different cultures than their own
I don't think they're avoiding the big problems, but it does seem like they're taking the slow careful route, maybe this is for the best.
shafyy 8 days ago

In theory it is decentralized. But if you compare it to Mastodon for example, it's pretty centralized in practice. I haven't come across any people running their larger own servers, like I do on Mastodon.
- danabramov 8 days ago
  
  Note the shape of decentralization is very different from Mastodon — there's no concept of "running a Bluesky instance". What you can run is a personal server to host your data (which would work for any atproto apps, not just Bluesky). The Bluesky web app (which is ran by Bluesky) would aggregate data from your server (and all other servers on the network).
  Unlike Mastodon, you don't have people running copies of the Bluesky app because it is simply unnecessary — each copy would "see" the same network. If you wanted to fork the Bluesky product (e.g. different branding, different moderation decisions, different product decisions) then yes, you'd run your own product on your own backend and it would be able to ingest Bluesky app data (and vice versa, the Bluesky app would be able to ingest the data from your product).
  - shafyy 7 days ago
    
    Sure, but I still haven't seen people really doing that.
    
    danabramov 7 days ago
    
    Which part are you referring to by “that”? There’s definitely people self-hosting their data (not a lot cause the process is pretty technical and manual atm). Note that you can always move hosts (without asking permission from the previous host) so you can start using the Bluesky hosting and then switch it.
    
    shafyy 7 days ago
    
    I mean it does not seem like there's a bunch of personal data servers or web app instances people choose from, like there's in Mastodon (I know it's not the same technically). Therefore, it looks like Bluesky is much more centralized than Mastodon. I wonder if that's because it's technically harder to run your own software for Bluesky or what?
wmf 8 days ago

It's pretty decentralized. You can run your own PDS, relay, and appview (some of these are more expensive than others). I'm not sure if you can configure the official clients to use an alternate server.
- nunobrito 8 days ago
  
  Decentralization on this social network context means to have users accessing data from other users even when other third-parties don't want you to.
  That platform is (today) a centralized walled garden. As others detailed, it is difficult for anyone to add new servers and even more difficult to convince the official client to support them.
  It is a complete contrast to NOSTR that has zero official servers and zero official clients to access the data. It has hundreds of relays from different people, along with several clients from different developers that compete for your preference.
  - GaryNumanVevo 8 days ago
    
    There are already alternative clients that support custom AppViews and relays
    
    nunobrito 8 days ago
    
    Described as "3rd party client" and which aims as goal to connect with NOSTR: https://docs.bsky.app/blog/feature-skyfeed
    Twitter also had 3rd party clients until one day they turned off the switch.
    Around NOSTR there are no labels as "3rd party clients" because they can't lock down your data with the push of a switch nor block your use of the platform.
    
    GaryNumanVevo 7 days ago
    
    What's not to understand here? The PDS isn't a relay, it's a repository. The data layer is decoupled from the message passing layer, that's it.
    
    nunobrito 7 days ago
    
    That's decoupling, not decentralization.
  - pfraze 8 days ago
    
    What are you talking about? It’s not difficult to add servers or get the official client to use them.
brianolson 8 days ago

atproto PDSes are like blog servers with RSS (but better) and bsky.app is the prevailing RSS reader. It's an open protocol because anyone can host a source and anyone can run a different reader.
GaryNumanVevo 8 days ago

BlueSky is just a reference implementation using AT Proto. They namespace anything bsky related in the lexicon as such.

sebstefan 8 days ago

So the user's repo is decentralized but the event-log services and view services are centrally managed by bluesky?

danabramov 8 days ago

There's a few layers to the system.
- Identity layer: This is where your identity information (public key, current domain handle, current user repo host) is stored, essentially as a piece of JSON. You can think of it as similar to npm registry where each record is self-verifiable (you can verify we haven’t tampered with it). This layer is very thin. It is currently centrally managed by Bluesky but in the longer term we intend to upstream it into neutral governance outside of the company — potentially similar to ICANN.
- User repo hosting: We provide user hosting as a service for people who sign up to Bluesky (and choose the default option) but you can run your own too. The server itself is open source (we publish both TypeScript source code and a Docker container to run it). We also publish a spec so you can implement it from scratch if you'd like it — essentially, it needs to be able to enumerate records and to provide a WebSocket to listen to their updates. I'd say this layer is already decentralized because anyone can participate in it and run their own server.
- Relay: As an optimization (you don't want your app backend to listen to websockets for every single user repo in the system), we run a node that aggregates and caches the entire known network. That node's called a Relay. It's an optimization and not strictly necessary to the protocol. It's open source. We run the only actively used relay at the moment, but there's nothing stopping you from running your own (at the current usage rate, ingesting all content on the network into your own relay would cost you ~$150/mo). If atproto gains adoption, we expect some major stakeholders to run their own relays for different purposes — big tech companies might want to run them to ensure infra independency, governments might want to run them if they have significantly different opinions on what type of content is acceptable on the entire network, and so on.
- Application backends (view services): These are just normal web app backends so they're decentralized in the same way the web is decentralized. Bluesky's backend is managed by Bluesky, but your own app's backend will be managed by you. You can also create a backend that ingests Bluesky's atproto data (which is kind of the point of atproto). That would let you create complementary or competing products using the same identity system and information already on the network.
Hope this helps!
GaryNumanVevo 8 days ago

Yes and no. The relay is run by Bluesky, but only as a matter of practicality because it requires a large footprint to subscribe to all the PDS events. Others have written custom AppViews and clients already. I run a "one man relay" that only scrapes my PDS, puts it into an appview (which doesn't do much) and I can see that on a basic client that I wrote.
The whitepaper clarifies a lot of this: https://arxiv.org/abs/2402.03239

badgersnake 8 days ago

I was expecting something about modems.

ATDT (555)-COOL-BBS

(Totally decentralised btw)

kragen 8 days ago

the nanpa is not decentralized at all, though it does delegate phone number assignment to local telecom companies, and nowadays even to sip providers
but yeah it seems pretty suboptimal that they decided to reuse the name of the protocol you use to talk to most cellular modems
- lifthrasiir 8 days ago
  
  In fact, there had been some complaints [1] about the `at` URI scheme itself as well, even though its registration itself is valid as per RFC 7595 (First Come First Served for provisional entries).
  [1] https://www.iana.org/assignments/uri-schemes/expert-notes/at...
- badgersnake 8 days ago
  
  > the nanpa is not decentralized at all
  Sure but BBSs tend to be. And then you’ve got systems like fidonet to connect them up.
  - kragen 8 days ago
    
    i don't know if you've ever been a regional coordinator but fidonet is not that decentralized either, though the pstn (and nowadays the internet) do put limits on how much power such offices can wield
    each bbs is usually very centralized