3 comments

  • zX41ZdbW 51 minutes ago
    The test does not look realistic: https://github.com/Cranot/grouped-simd-hashtable/blob/master...

    Better to use a few distributions of keys from production-like datasets, e.g., from ClickBench. Most of them will be Zipfian and also have different temporal locality.

  • jeffbee 1 hour ago
    Static size, no deleting. Everyone already knew that you can make faster hash tables when they never need to be resized, but nobody bothers doing that because it is pretty useless or at best niche.
    • dragontamer 1 hour ago
      Well, not to be completely dismissive here... It's clearly a prototype project to try and make quadratic probing a thing.

      I'm not convinced this methology is better than linear probing (which then can be optimized easily into RobinHood hashes).

      The only line I see about linear hashes is:

      > Linear jumps (h, h+16, h+32...) caused 42% insert failure rate due to probe sequence overlap. Quadratic jumps spread groups across the table, ensuring all slots are reachable.

      Which just seems entirely erroneous to me. How can linear probing fail? Just keep jumping until you find an open spot. As long as there is at least one open spot, you'll find it in O(n) time because you're just scanning the whole table.

      Linear probing has a clustering problem. But IIRC modern CPUs have these things called L1 Cache/locality, meaning scanning all those clusters is stupidly fast in practice.

      • hinkley 31 minutes ago
        Linear probing could get pretty nasty corner cases in a concurrent system. Particularly one where the table is “warmed up” at start so that 80% of the eventual size shows up in the first minute of use. If that table is big enough then pressure to increase the load factor will be high, leading to more probing.

        If you have ten threads all probing at the same time then you could get priority inversion and have the first writer take the longest to insert. If they hit more than a couple collisions then writers who would collide with them end up taking their slots before they can scan them.

      • jeffbee 1 hour ago
        The comments don't make sense to you because you know what you are talking about, claude does not, and this code was all written by claude.
        • dragontamer 1 hour ago
          Hmmm. That makes me sad but it does explain the uneasy feeling I got when reading the GitHub page
    • hinkley 36 minutes ago
      Cliff Click designed a hash table that does concurrent draining of the old table when resizing to a new one. I don’t think he did rate limiting on puts but there are other real time systems that amortize cleanup across all write allocations, which then spreads the cost in a way compatible with deadlines.
  • dana321 2 hours ago
    Should it be possible in rust?