Microservices Should Form a Polytree

(bytesauna.com)

54 points | by mapehe 4 days ago

21 comments

  • muvlon 1 hour ago
    Avoiding cyclic dependencies is good, sure. And they do name specific problems that can happen in counterexample #1.

    However, the reasoning as to why it can't be a general DAG and has to be restricted to a polytree is really tenuous. They basically just say counterexample #2 has the same issues with no real explanation. I don't think it does, it seems fine to me.

    • henryfjordan 1 hour ago
      An AuthN/Z system would probably end looking like counterexample #2, which immediately raised a red flag for me about the article.
      • waterproof 3 minutes ago
        Yeah if services can't be used by multiple other services, then what's the point?
  • closeparen 6 minutes ago
    Here's a really simple way to get a cycle.

    Service A: publish a notification indicating that some new data is available.

    Service B: consume these notifications and call back to service A with queries for the changed data and perhaps surrounding context.

    What would you recommend when something like this is desired?

  • didibus 1 hour ago
    I might have a different take. I think microservices should each be independent such that it really doesn't matter how they end up being connected.

    Think more actors/processes in a distributed actor/csp concurrent setup.

    Their interface should therefore be hardened and not break constantly, and they shouldn't each need deep knowledge of the intricate details of each other.

    Also for many system designs, you would explicitly want a different topology, so you really shouldn't restrict yourself mentally with this advice.

    • rcxdude 1 hour ago
      Well, in practice you're likely to have hard dependencies between services in some respect, in that the service won't be able to do useful work without some other service. But I agree that in general it's a good idea to have a graceful degradation of functionality as other services become unavailable.
      • nyrikki 1 hour ago
        As we are talking about micro services, K8s has two patterns that are useful.

        A global namespace root with sub namespaces will just desired config and current config will the complexity hidden in the controller.

        The second is closer to your issue above, but it is just dependency inversion, how the kubelet has zero info on how to launch a container or make a network or provision storage, but hands that off to CRI, CNI or CSI

        Those are hard dependencies that can follow a simple wants/provides model, and depending on context often is simpler when failures happen and allows for replacement.

        E.G you probably wouldn’t notice if crun or runc are being used, nor would you notice that it is often systemd that is actually launching the container.

        But finding those separation of concerns can be challenging. And K8s only moved to that model after suffering from the pain of having them in tree.

        I think a DAG is a better aspirational default though.

      • didibus 1 hour ago
        Right, I don't mean that no service depends on each other, but that they can treat each other like a black box.
    • throwaway894345 1 hour ago
      I agree with this, and also I’m confused by the article’s argument—wouldn’t this apply equally to components within a monolith? Or is the idea that—within a monolith—all failures in any component can bring down the entire system anyway?
      • marcosdumay 1 hour ago
        > wouldn’t this apply equally to components within a monolith?

        It's a nearly universal rule you'll want on every kind of infrastructure and data organization.

        You can get away for some time with making things linked by offline or pre-stored resources, but it's a recipe for an eventual disaster.

  • Lucasoato 1 hour ago
    > Even without a directed cycle this kind of structure can still cause trouble. Although the architecture may appear clean when examined only through the direction of service calls the deeper dependency network reveals a loop that reduces fault tolerance increases brittleness and makes both debugging and scaling significantly more difficult.

    While I understand the first counterexample, this one seems a bit blurry. Can anybody clarify why a directed acyclic graph whose underlying undirected graph is cyclic is bad in the context of microservice design?

    • isotropy 43 minutes ago
      Without necessarily endorsing the article's ideas....I took this to be like the diamond-inheritance problem.

      If service A feeds both B and C, and they both feed service D, then D can receive an incoherent view of what A did, because nothing forces B and C to keep their stories straight. But B and C can still both be following their own spec perfectly, so there's no bug in any single service. Now it's not clear whose job it is to fix things.

  • scuff3d 1 hour ago
    This is a fair enough point, but you should also try to keep that tree as small as possible. You should have a damn good reason to make a new service, or break an existing one in two.

    People treat the edges on the graph like they're free. Like managing all those external interfaces between services is trivial. It absolutely is not. Each one of those connections represents a contract between services that has be maintained, and that's orders of magnitude more effort then passing data internally.

    You have to pull in some kind of new dependency to pass messages between them. Each service's interface had to be documented somewhere. If the interface starts to get complicated you'll probably want a way to generate code to handle serialization/deserialization (which also adds overhead).

    In addition to share code, instead of just having a local module (or whatever your language uses) you now have to manage a new package. It either had to be built and published to some repo somewhere, it has to be a git submodule, or you just end up copying and pasting the code everywhere.

    Even if it's well architected, each new services adds a significant amount of development overhead.

  • adamwong246 1 hour ago
    the problem with "microservices" is the "micro". Why we thought we need so many tiny services is beyond me. How about just a few regular sized services?
    • dragonwriter 1 hour ago
      At the time “microservices” was coined, “service oriented architecture” had drifted from being an architectural style to being associated with inplementation of the WS-* technical standards, and was frequently used to describe what were essentially monoliths with web services interfaces.

      “Microservices” was, IIRC, more about rejecting that and returning to the foundations of SOA than anything else. The original description was each would support a single business domain (sometimes described “business function”, and this may be part of the problem, because in some later descriptions, perhaps through a version of the telephone game, this got shortened to “function” and without understanding the original context...)

    • andix 1 hour ago
      They were never meant to be tiny, in the sense of just a few hundred lines of code.

      The name was properly chosen poorly and led to many confusions.

    • edude03 1 hour ago
      Kind of - AFAIK "micro" was never actually throughly defined. In my mind I think of it as mapping to one table (IE, users = user service, balances = balances service) but that might still be a "full service" worth of code if you need anything more than basic CRUD
      • dragonwriter 1 hour ago
        The original sense was one business domain or business function (which often would include more than one table in a normalized relational db); the broader context was that, given the observation that software architecture tends to reflect software development organization team structure, software development organizations should parallel businesses organizations and that software serving different business functions should be loosely coupled, so that business needs in any area could be addressed with software change with only the unavoidable level of friction from software serving different business functions, which would be directly tied to the business impacts of the change on those connected functions, rather than having unrelated constraints from coupling between unrelated (in business function) software components inhibiting change driven by business needs in a particular area.
    • 9rx 47 minutes ago
      "Micro" refers to the economy, not the technology. A service in the macro economy is provided by another company. Think of a SaaS you use. Microservices takes the same model and moves it under the umbrella of a micro economy (i.e. a single company). Like traditional SaaS, each team is responsible for their own product, with communication between teams limited to sharing of documentation. You don't get to call up a developer when you need help.

      It's a (human) scaling technique for large organizations. When you have thousands of developers they can't possibly keep in communication with each other. You have to draw a line between them. So, we draw the line the same way we do at the global scale.

      Conway's Law, as usual.

  • vedhant 4 days ago
    This actually makes a lot of sense. I have one question though. Why is having 2 microservices depend on a single service a problem?
    • Neywiny 2 hours ago
      The explanation given makes sense. If they're operating on the same data, especially if the result goes to the same consumer, are they really different services? On the other hand, if the shared service provides different data to each, is it really one microservice or has it started to become a tad monolithic in that it's one service performing multiple functions?

      I like that the author provides both solutions: join (my preferred) or split the share.

      • nightpool 1 hour ago
        I don't understand this. Can you help explain it with a more practical example? Say that N1 (the root service) is a GraphQL API layer or something. And then N2 and N3 are different services feeding different parts of that API—using Linear as my example, say we have a different service for ticket management and one for AI agent management (e.g. Copilot integration). These are clearly different services with different responsibilities / scaling needs / etc.

        And then N4 is a shared utility service that's responsible for e.g. performance tracing or logging or something similar. To make the dependency "harder", we could consider that it's a shared service responsible for authentication and authorization. So it's clear why many root services are dependent on it—they need to make individual authorization decisions.

        How would you refactor this to remove an undirected dependency loop?

        • whstl 53 minutes ago
          Yeah, a lot of cross-cutting concerns fall into this pattern: logging, authorization, metrics, audit trails, feature-flags, configuration distribution, etc

          The only way I can see to avoid this is to have all those cross-cutting concerns handled in the N1 root service before they go into N2/N3, but it requires having N1 handle some things by itself (eg: you can do authorization early), or it requires a lot of additional context to be passed down (eg: passing flags/configuration downstream), or it massively overcomplicates others (eg: having logging be part of N1 forces N2/N3 to respond synchronously).

          So yeah, I'm not a fan of the constraint from TFA. It being a DAG is enough.

      • suspended_state 1 hour ago
        I think it does indeed make a lot of sense in the particular example given.

        But what if we add 2 extra nodes: n5 dependent on n2 alone, and n6 dependent on n3 alone? Should we keep n2 and n3 separate and split n4, or should we merge n2 and n3 and keep n4, or should we keep the topology as it is?

        The same sort of problem arises in a class inheritance graph: it would make sense to merge classes n2 and n3 if n4 is the only class inheriting from it, but if you add more nodes, then the simplification might not be possible anymore.

      • throwaway894345 1 hour ago
        Most components need to depend on an auth service, right? I don’t think that means it’s all necessarily one service (does all of Google Cloud Platform or AWS need to be a single service)?
        • Spivak 1 hour ago
          That's immediately what I thought of. You'll never be able to satisfy this rule when every service has lines pointing to auth.

          You'll probably also have lines pointing to your storage service or database even if the data is isolated between them. You could have them all be separate but that's a waste when you can leverage say a big ceph cluster.

  • advisedwang 51 minutes ago
    Requiring that no service is depended on by two services is nonsense.

    You absolutely want the same identity service behind all of your services that rely on an identity concept (and no, you can't just say a gateway should be the only thing talking to an identity service - there are real downstream uses cases such as when identity gets managed).

    Similarly there's no reason to have multiple image hosting services. It's fine for two different frontends to use the same one. (And don't just say image hosting should be done in the cloud --- that's just a microservice running elsewhere)

    Same for audit logging, outbound email or webhooks, acl systems (can you imagine if google docs, sheets, etc all had distinct permissions systems)

    • liampulles 44 minutes ago
      I agree with you. Its interesting when I look at the examples you provide, that they are all non-domain services, so perhaps that is what codifies a potential rule.
  • pavlov 42 minutes ago
    If a service n4 can't be called by separate services n2 and n3 in different parts of the tree (as shown in counterexample #2), then n4 isn't really a service but just a module of either n2 or n3 that happens to be behind a network interface.
  • jayd16 2 hours ago
    It's about the same for most code all the way down to single threaded function flow.
    • sethammons 53 minutes ago
      Yes! This is not unique to microservices.

      If you look at this proposal and reject it, i question your experience. My experience is not doing this leads to codebases so intertwined that organizations grind to a halt.

      My experience is in the SaaS world, working with orgs from a few dozen to several thousand contributors. When there are a couple dozen teams, a system not designed to separate out concerns will require too much coordinated efforts to develop against.

  • Perz1val 1 hour ago
    Rule #2 sounds dumb. If there can't be a single source of truth, for let's say permission checking, that multiple other services relay on, how would you solve that? Replicate it everywhere? Or do you allow for a new business requirement to cause massive refactors to just create a new root in your fancy graph?
    • kaashif 1 hour ago
      This is exactly the example I thought of and came here to post.

      The rule is obviously wrong.

      I think just having no cycles is good enough as a rule.

  • rco8786 2 hours ago
    Is there any way to actually enforce this in reality? Eventually some leaf service is going to need to hit an API on an upstream node or even just 2 leaf nodes that need to talk to each other.
    • jayd16 2 hours ago
      IAM roles.

      Said less snarky, it should be trivial to define and restrict the dependencies of services (Although there are many ways to do that). If its not trivial, that's a different problem.

      • rco8786 1 hour ago
        I don't mean that. I mean that eventually the business is going to need some feature that requires breaking the acyclic rule.
      • otterley 1 hour ago
        Since you called the problem “trivial,” we can now all depend on you to resolve these problems for us at little cost, correct?
        • nineteen999 58 minutes ago
          The solution requires AWS since the gp thinks that's the only access control mechanism that matters. So I doubt there is going to be little cost about it.
  • andix 1 hour ago
    In reality their structure is much more like the Box with Christmas lights I just got from the basement. It would take a knot theory expert half a day to analyze what’s happening inside the box.
  • nicodjimenez 1 hour ago
    This seems completely wrong. In an RPC call you have a trivial loop, for example.

    It would make more sense to say that the event tree should not have any cycles, but anyway this seems like a silly point to make.

    • nicodjimenez 1 hour ago
      My main take on microservices at this point is that you only want microservices to isolate failure modes and for independent scaling. Most IO bound logic can live in a single monolith.
  • anomaloustho 59 minutes ago
    Why do we use polytree in this context instead of DAG? Because nodes can’t ever come back together?
    • duped 25 minutes ago
      The author is not saying you should use a polytree but rather that the ideal graph of microservices should also be a polytree.

      A polytree has the property that there is exactly one path that each node can be reached. If you think of this as a dependency graph, for each node in the graph you know that none of its dependencies have shared transitive dependencies.

      I'll give it one though: if there are no shared transitive dependencies then there cannot be version conflicts between services, where two otherwise functioning services need disparate versions of the same transitive dependency.

  • cientifico 1 hour ago
    Services (or a set of Microservices) should mimic teams at the company. If we have polytree, that should represent departments.
    • mkarrmann 1 hour ago
      Microservices should have clear owners reflected in the org chart, but the topology of dependencies should definitely not be isomorphic to your org chart.
  • webstrand 1 hour ago
    Oh that's weird, in the hacker news search index, this link was posted 4 days ago.
  • jamesbelchamber 2 hours ago
    Good practical explanation of something I felt but couldn't put a name to.
  • buster 2 hours ago
    Isn't it the same wisdom as to avoid cyclic dependencies?
    • rhelz 1 hour ago
      It is not only that. An acyclic graph can be non-planar, which means that as you add more nodes, the number of edges can grow as O(n^2).

      A polytree is a planar graph, and the number of edges must grow linearly with the number of edges.

  • mapehe 4 days ago
    Hi, this is my company blog. Hope you like this week's post.
  • itsthecourier 1 hour ago
    just imagine how many clients services like auth, notifications and so on has.

    Polytrees look good, they don't work on orthogonal services