OpenFreeMap survived 100k requests per second

(blog.hyperknot.com)

140 points | by hyperknot 3 hours ago

11 comments

  • colinbartlett 2 hours ago
    Thank you for this breakdown and for this level of transparency. We have been thinking of moving from MapTiler to OpenFreeMap for StatusGator's outage maps.
    • hyperknot 1 hour ago
      Feel free to migrate. If you ever worry about High Availability, self-hosting is always an option. But I'm working hard on making the public instance as reliable as possible.
  • ch33zer 22 minutes ago
    Since the limit you ran into was number of open files could you just raise that limit? I get blocking the spammy traffic but theoretically could you have handled more if that limit was upped?
  • rtaylorgarlock 58 minutes ago
    Is it always/only 'laziness' (derogatory, i know) when caching isn't implemented by a site like wplace.live ? Why wouldn't they save openfreemap all the traffic when a caching server on their side presumably could serve tiles almost as fast or faster than openfreemap?
    • VladVladikoff 47 minutes ago
      I actually have a direct answer for this: priorities. I run a fairly popular auction website and we have map tiles via stadia maps. We spend about $80/month on this service for our volume. We definitely could get this cost down to a lower tier by caching the tiles and serving them from our proxy. However we simply haven’t yet had the time to work on this, as there is always some other task which is higher priority.
    • hyperknot 39 minutes ago
      We are talking about an insane amount of data here. It was 56 Gbit/s (or 56 x 1 Gbit servers 100% saturated!). This is not something a "caching server" could handle. We are talking on the order of CDN networks, like Cloudflare, to be able to handle this.
      • ndriscoll 9 minutes ago
        I'd be somewhat surprised if nginx couldn't saturate a 10Gbit link with an n150 serving static files, so I'd expect 6x $200 minipcs to handle it. I'd think the expensive part would be the hosting/connection.
      • wyager 25 minutes ago
        > or 56 x 1 Gbit servers 100% saturated

        Presumably a caching server would be 10GbE, 40GbE, or 100GbE

        56Gbit/sec of pre-generated data is definitely something that you can handle from 1 or 2 decent servers, assuming each request doesn't generate a huge number of random disk reads or something

    • markerz 46 minutes ago
      It looks like a fun website, not a for-profit website. The expectations and focus of fun websites is more to just get it working than to handle the scale. It sounds like their user base exploded overnight, doubling every 14 hours or so. It also sounds like it’s other a solo dev or a small group based on the maintainers wording.
  • jspiner 1 hour ago
    The cache hit rate is amazing. Is there something you implemented specifically for this?
  • eggbrain 1 hour ago
    Limiting by referrer seems strange — if you know a normal user makes 10-20 requests (let’s assume per minute), can’t you just rate limit requests to 100 requests per minute per IP (5x the average load) and still block the majority of these cases?

    Or, if it’s just a few bad actors, block based on JA4/JA3 fingerprint?

    • hyperknot 1 hour ago
      What if one user really wants to browse around the world and explore the map. I remember spending half an hour in Google Earth desktop, just exploring around interesting places.

      I think referer based limits are better, this way I can ask high users to please choose self-hosting instead of the public instance.

  • LoganDark 1 hour ago
    > I believe what is happening is that those images are being drawn by some script-kiddies.

    Oh absolutely not. I've seen so many autistic people literally just nolifing and also collaborating on huge arts on wplace. It is absolutely not just script kiddies.

    > 3 billion requests / 2 million users is an average of 1,500 req/user. A normal user might make 10-20 requests when loading a map, so these are extremely high, scripted use cases.

    I don't know about that either. Users don't just load a map, they look all around the place to search for and see a bunch of the art others have made. I don't know how many requests is typical for "exploring a map for hours on end" but I imagine a lot of people are doing just that.

    I wouldn't completely discount automation but these usage patterns seem by far not impossible. Especially since wplace didn't expect sudden popularity so they may not have optimized their traffic patterns as much as they could have.

    • Karliss 30 minutes ago
      Just scrolled around a little bit 2-3minutes with network monitor open. That already resulted in 500requests, 5MB transferred (after filtering by vector tile data). Not sure how many of those got cached by browser with no actual requests, cached by browser exchanging only headers or cached by cloudflare. I am guessing that the typical 10-20 requests/user case is for embedded map fragment like those commonly found in contact page where most users don't scroll at all or at most slightly zoom out to better see rest of city.
    • nemomarx 1 hour ago
      There are some user scripts to overlay templates on the map and coordinate working together, but I can't imagine that increases the load much. What might is that wplace has been struggling under the load and you have to refresh to see your pixels placed or any changes and that could be causing more calls an hour maybe?
  • willsmith72 26 minutes ago
    so 96% availability = "survived" now?

    but interesting write-up. If I were a consumer of OpenFreeMap, I would be concerned that such an availability drop was only detected by user reports

    • timmg 10 minutes ago
      96% during a unique event. I think you would typically consider long term in a stat like that.

      Assuming it was close to 100% the rest of the year, that works out to 99.97% over 12 months.

    • ndriscoll 20 minutes ago
      If I were a consumer of a free service from someone who will not take your money to offer support or an SLA (i.e. is not trying to run a business), I would assume there's little to no monitoring at all.
  • v5v3 1 hour ago
    The article mentions Cloudflare, so how much of this was cached by them?
    • alessandroberna 1 hour ago
      99.38%
    • do_anh_tu 1 hour ago
      Do you even read the article?
      • jwilk 52 minutes ago
        From the HN Guidelines <https://news.ycombinator.com/newsguidelines.html>:

        > Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".

      • keketi 1 hour ago
        Are you new? Nobody actually reads the articles.
        • LorenDB 1 hour ago
          False. I almost never upvote an article without reading it, and half of those upvotes are because I already read something similar recently that gave me the same information.
  • fnord77 1 hour ago
    sounds like they survived 1,000 reqs/sec and the cloudflare CDN survived 99,000 reqs/sec
  • feverzsj 1 hour ago
    So, OFM was hit by another Million Dollar Homepage for kids.
  • charcircuit 1 hour ago
    >Nice idea, interesting project, next time please contact me before.

    It's impossible to predict that one's project may go viral.

    >As a single user, you broke the service for everyone.

    Or you did by not having a high enough fd limit. Blaming sites when using it too much when you advertise there is no limit is not cool. It's not like wplace themselves were maliciously hammering the API.

    • 010101010101 1 hour ago
      Do you expect him just to let the service remain broken or to scale up to infinite cost to himself on this volunteer project? He worked with the project author to find a solution that works for both and does not degrade service for every other user, under literally no obligation to do anything at all. This isn’t Anthropic deciding to throttle users paying hundreds of dollars a month for a subscription. Constructive criticism is one thing, but entitlement to something run by an individual volunteer for free is absurd.
      • charcircuit 1 hour ago
        We are talking about hosting a fixed amount of static files. This should be a solved problem. This is nothing like running large AI models for people.
        • 010101010101 1 hour ago
          The nature of the service is completely irrelevant.
          • charcircuit 59 minutes ago
            Running a no limit service for free definitely depends on the marginal cost of serving a single request.
    • columb 1 hour ago
      You are so entitled... Because of you most nice things have "no limits but...". Not cool stress testing someone's infrastructure. Not cool. The author of this post is more than understanding, tried to fix it and offered a solution even after blocking them. On a free service.

      Show us what you have done.

      • charcircuit 1 hour ago
        >You are so entitled

        That's how agreements work. If someone says they will sell a hamburger for $5, and another person pays $5 for a hamburger, then they are entitled to a hamburger.

        >On a free service.

        It's up to the owner to price the service. Being overwhelmed by traffic when there are no limits is not a problem limited only to free services.

        • perching_aix 15 minutes ago
          > Do you offer support and SLA guarantees?

          >

          > At the moment, I don’t offer SLA guarantees or personalized support.

          From the website.

    • rikafurude21 1 hour ago
      the funny part is that his service didnt break- cloudflares cache caught 99% of the requests. just wanted to feel powerful and break the latest viral trend.