Donkey Kong Country 2 and Open Bus

(jsgroth.dev)

231 points | by colejohnson66 18 hours ago

11 comments

  • deater 18 hours ago
    I have to say as a 6502 assembly programmer I have wasted many hours of my life tracking down the same issue in my code (forgetting to put an # in front of an immediate value and thus accidentally doing a memory access instead). Often it's like this case too where things might accidentally work some of the time.

    Worse than the floating-bus in this example is when it depends on uninitialized RAM which is often consistent based on DRAM so the code will always work on your machine/emulator but won't on someone else's machine with different DRAM chips (invariably you catch this at a demoparty when it won't run on the party machine and you only have 15 minutes to fix it before your demo is about to be presented)

    • anonymousiam 17 hours ago
      Was there ever an architecture that used dynamic memory with a 6502 CPU? In my (limited?) experience, that platform always had static RAM.
      • retrac 17 hours ago
        Most of them. Static RAM was (and still is) more expensive since it needs more transistors and chip area per bit stored. It it, however, also much easier to interface since it doesn't need refresh circuitry. This is why you see it in the earliest designs, and also why you see it in so many hobbyist designs. It's also why you tend to see it in the video systems even if the rest of the machine uses DRAM. Dealing with DRAM refresh while reading out the whole memory chip sequentially (while also having a second port to read/write from the CPU!) starts making things very complicated.

        But still DRAM is what you would use for a "real" system. Wozniak's design for the Apple II used a clever hack where the system actually runs at 2 MHz with an effective CPU rate of 1 MHz. Any read from a DRAM row will refresh the entire row. Approximately every other cycle the video system steps incrementally through memory, refreshing as it goes.

        • rzzzt 16 hours ago
          Same with the VIC-II and the 6510 in the Commodore 64. The video chip is given the main character role for the bus, stopping the CPU from moving forward if it needs cycles for video generation or DRAM refresh.
          • phire 4 hours ago
            The clever thing about the Apple II is that there are no refresh cycles. Woz laid out the screen buffer in memory in such a way that simply scanning out the screen will touch every single row of DRAM.

            This is more about saving chips than saving cycles, since the Apple II was implemented entirely with 74 series logic. A more traditional approach that used spare cycles during horizontal blanking would have required several more chips.

            It does mean that the layout of the Apple II's screen memory is somewhat insane. Those DRAM chips needed to be refreshed every 2ms, and it takes 16ms to scan out a whole screen. Every 8th of the screen needs to be spread out across all 128 rows.

      • tom_ 10 hours ago
        The mid-1980s Acorn 8-bit range all used dynamic RAM for the onboard memory.

        The BBC Micro range all had 250 ns DRAM, with the CPU getting 2e6 accesses and the video getting the other 2e6 (taking advantage of the 6502's predictable RAM access rate). The display memory fetches served to refresh the RAM.

        I don't know much about the Acorn Electron, which was very different internally, but it had dynamic RAM as well. I expect the video refresh was used to refresh the DRAM in this case too - as the display memory layout was the same, and so over every 640 microsec it would touch every possible address LSB.

        The 6502 second processor had DRAM as well, refreshed by a circuit that ran on a timer and stole the occasional cycle from the CPU at some rate.

        Though static RAM was quite common for RAM upgrade boards (of one kind or another), presumably cheaper for this case than the alternative.

      • wk_end 17 hours ago
        Well, the SNES - if that counts, it's a 65816 - uses DRAM. This is especially noteworthy because the DRAM refresh is actually visible on-screen on some units:

        https://www.retrorgb.com/snesverticalline.html

      • adrian_b 16 hours ago
        There must have been computers with 6502 and DRAM.

        For higher memory capacities, e.g. 32 kB, 48 kB or 64 kB, static RAM would have been too expensive and too big, even if 6502 did not have an integrated DRAM controller, like Zilog Z80.

        Using SRAM instead of DRAM meant using 4 times more IC packages, e.g, 32 packages instead of 8. The additional DRAM controller required by DRAM would have needed only 1 to 4 additional IC packages. Frequently the display controller could be used to also ensure DRAM refresh.

      • deater 17 hours ago
        I think you'll find more systems used DRAM than SRAM.

        The Apple II was one of the first 6502 systems to use DRAM (in 1977) and Woz was incredibly clever in getting the refresh for free as a side effect of the video generation

      • Braxton1980 14 hours ago
        Are you thinking of SDRAM (a type of DRAM)?
        • anonymousiam 14 hours ago
          I appreciate all of the responses. I did development on a KIM-1 and I owned a SYM-1. Both of these used static RAM. I expanded the RAM in my SYM-1 from 4K to 8K (with eight 2114 static RAM chips). I never owned any other 6502 based computers.
    • RiverCrochet 15 hours ago
      6502 was my first assembly language, and I always thought of instructions like "LDA #2" as "load A with the number 2" versus LDA 2 (load A with what's in memory location 2).
    • bartread 15 hours ago
      This is the kind of situation where feeding your code through an LLM can actually be helpful: they're really good at spotting the kind of errors/typos like this that have a profound impact but which our eyes tend to all to easily scan over/past.
      • bogantech 9 hours ago
        Yeah I've been using Claude to review a bunch of m68k asm that I've been working on and it's been helpful at catching silly mistakes like using a direct address instead of an immediate value, clobbering registers, incorrect branches etc.

        Of course if you just blindly ask it to write asm it will occasionally invent new instructions or address modes but it's very good at reviewing and making adjustments

      • nancyminusone 15 hours ago
        The last time I tried an LLM on assembly, it made up instructions that didn't exist.
        • cdelsolar 14 hours ago
          cool; nowadays LLMs are better
          • iforgotpassword 13 hours ago
            Today I used chatgpt for winapi stuff - it made up structs and enums regarding display config. So not too convinced it'll be any good with 6502 asm.
            • nextaccountic 1 hour ago
              It's funny because some time ago (months? years?) people would say that you just didn't prompt the LLM well enough. But now LLMs are better and prompting isn't as arcane as before, so the next frontier is giving them the proper context. See this HN thread currently in the front page

              https://news.ycombinator.com/item?id=44427757

              • voidUpdate 15 minutes ago
                You also have to be using the exact right model to get reasonable results, which is always the one you have to pay for, not the free one, and also not the one you were using
          • recursive 11 hours ago
            cool; but not better enough
  • anonymousiam 17 hours ago
    I started reading this to understand Open Bus, which was capitalized in the title, so I assumed it was a proper name for some old bus protocol/standard that I'd never heard of.

    After reading, I realized that he just meant that the bus was "open" as in not connected to anything, because the address line decoders had no memory devices enabled at the specified address ($2000).

    It's pretty funny that the omission of the immediate mode (#) went unnoticed until the obsolete emulator didn't behave in the same way as the real hardware when reading <nothing> from memory.

    His solution of changing the instruction to use immediate addressing mode (instead of absolute) would have the consequence of faster execution time, because the code is no longer executing a read from memory. It's probably now faster by about 2us through that blob of code, but maybe this only matters on bare metal and not the emulator, which is probably not time-perfect anyway.

    • wk_end 16 hours ago
      > It's probably now faster by about 2us through that blob of code, but maybe this only matters on bare metal and not the emulator, which is probably not time-perfect anyway.

      (Some) SNES emulators really are basically time-perfect, at this point [0]. But 2us isn't going to make an appreciable difference in anything but exceptional cases.

      [0] https://arstechnica.com/gaming/2021/06/how-snes-emulators-go...

      • BearOso 16 hours ago
        There's actually some issues with clock drift, and speculation whether or not original units had an accurate crystal or varied significantly in timing. The only way to figure that out is to go back and ask the designers what the original spec was, and who knows if they remember. So they're not really time-perfect, because the clock speeds can vary as much as a half-percent.
        • NobodyNada 15 hours ago
          It's mostly the audio clock that is suspectible to drift. Everything except the audio subsystem is derived from a single master clock, so even if the master clock varies in frequency slightly, all the non-audio components will remain in sync with each other.

          That means the 2 clock cycles could theoretically make an observable difference if they cause the CPU to miss a frame deadline and cause the game to take a lag frame. But this is rather unlikely.

          • BearOso 14 hours ago
            The CPU has shown some variation, but yes, it's the APU that has a ceramic clock source that isn't even close to the same among units. Apparently those ceramic resonators have a pretty high variation, even when new.

            When byuu/near tried to find a middle-ground for the APU clock, the average turned out to be about 1025296 (32040.5 * 32). Some people have tested units recently and gotten an even higher average. They speculate that aging is causing the frequency to increase, but I don't really know if this is the case or if there really was that much of a discrepancy originally.

            It does cause some significant compatibility issues, too, like with attraction mode desyncs and random freezes.

      • shadowgovt 16 hours ago
        In general, even SNES games are still doing frame-locking, right? i.e. if you save 2us you're just lengthening the amount of time the code is going to wait for a blanking signal by 2us.
        • wk_end 15 hours ago
          Yeah, exactly. It'd have to be really exceptional cases. For example, exactly one game (Air Strike Patrol) has timed writes to certain video registers to create a shadow effect, but 2us is so minor I don't think it'd appreciably effect even that. Or, like, the SNES has an asynchronous multiplier/divider that returns invalid results while the computation is on-going, so if you optimized some code you might end up reading back garbage.

          IIRC ZSNES actually had basically no timing; all instructions ran for effectively one cycle. ZSNES wasn't an accurate emulator, but it mostly worked for most games most of the time.

    • shadowgovt 16 hours ago
      Rare has a history of video games that work in testing and have bugs buried in them for years until some novel architecture surfaces them. Not to imply other companies don't; just that Rare is an easy-to-reference name on the topic.

      Donkey Kong 64 has a memory leak that will kill the game after a (for that era) unlikely amount of contiguous time playing it (8-9 hours, if I understand correctly). That was not caught in development but is a trivial amount of time to rack up if someone is playing the game and saving progress via emulator save-state instead of the in-game save feature.

      (Note: there is some ambiguous history here. Some sources claim the game shipping with the Memory Pak was a last-ditch effort to hide the bug by pushing the crash window out to 13-20 hours instead of 8-9. I think recent research on the issue suggests that was coincidence and the game didn't ship with either Rare or Nintendo being aware of the bug).

  • Dwedit 13 hours ago
    I once encountered SNES Puyo Puyo doing PPU open bus. This was when I was working on the RunAhead feature for RetroArch, and was checking when savestates failed to match. CPU execution trace logs didn't match because a value read out of PPU Open Bus didn't match after loading state.
  • nicetryguy 17 hours ago
    I don't always make 6502(ish) errors, but when i do, it's usually the memory address instead of the immediate! It's a very common and easy mistake to make, and i believe Chuck Peddle himself deeply regretted the (number symbol, pound sign, hashtag) #$1234 syntax for immediate values. I made # appear bright red in my IDE, it helps, a bit... Even the ASM gods at Rare fell victim to the same issue!
    • JoshTriplett 17 hours ago
      I ran into a similar issue a long time ago, with the GNU assembler in "intel_syntax noprefix" mode. It has an issue where there's syntactic ambiguity that makes it possible to interpret a forward-referenced named constant immediate as a reference to an unknown symbol, if in an instruction that could accept either an immediate or a memory address. The net result is assembling the instruction to have a placeholder memory address (expected to be filled in by the relocated address of the symbol when linked) rather than the expected immediate. Painful to debug.
      • userbinator 4 hours ago
        TASM IDEAL mode resolves that ambiguity and should've been the standard syntax for x86 Asm in contrast to MASM, and RosAsm syntax is pretty nice too, but GNU as (and its default syntax) is in a wholly different category of insanity that's nearly comparable to HLA.
    • Dwedit 6 hours ago
      Instructions sets like ARM basically made it impossible to make that mistake. You need to use a different instructions when you involve memory.
  • userbinator 7 hours ago
    AFAIK "open bus" is found only on these early systems with simple synchronous buses, as I believe most others will give a constant value of all zeros or all ones when attempting to access nonexistent addresses, since the bus protocol has handshaking that lets the master know if there's no response (a "master abort" in PCI terminology.)
  • NobodyNada 15 hours ago
    Open bus quite literally means that the data bus lines are an open circuit -- the CPU has placed an unmapped or write-only address on the address bus, and no hardware on the bus has responded, so the bus lines aren't being driven and are just floating. Thus, nominally, this is a case of undefined behavior at the hardware level.

    In order to understand what actually happens, we need to look a little closer at the physical structure of a data bus -- you have long conductors carrying the signals around the motherboard and to the cartridge, separated from the ground plane by a thin layer of insulating substrate. This looks a lot like a capacitor, and in fact this is described and modeled as "parasitic capacitance" by engineers who try to minimize it, since this effect limits the maximum speed of data transmission over the bus. But this effect means that, whenever the bus is not being driven, it tends to stay at whatever voltage it was last driving to -- just like a little DRAM cell, producing the "open-bus reads return the last value transferred across the bus" effect described in the article.

    It's not uncommon for games to accidentally rely on open-bus effects, like DKC2. On the NES, the serial port registers for connecting to a controller only drive the low-order bits and the high bits are open-bus; there are a few games that read the controller input with the instruction LDA $4016 and expect to see the value $40 or $41 (with the 4 sticking around because of open-bus).

    There's also speedrun strategies that rely on open-bus behavior as part of memory-corruption or arbitrary-code-execution exploits, such as the Super Mario World credits warp, which sends the program counter on a trip through unmapped memory before eventually landing in RAM and executing a payload crafted by carefully manipulating enemy positions [1].

    But there's some exceptions to the usual predictable open bus behavior. Nonstandard cartridges could return a default value for unmapped memory, or include pull-up or pull-down resistors that impact the behavior of open bus. There's also an interesting interactions with DMA; the SNES supports a feature called HDMA which allows applications to schedule DMA transfers to transfer data from the CPU to the graphics hardware with precise timing in order to upload data or change settings mid-frame [2]. This DMA transfer temporarily pauses the CPU in order to use the bus to perform the transfer, which can change the behavior of an open-bus read if a DMA transfer happens to occur in the middle of an instruction (between reading the target address & performing the actual open-bus read).

    This very niche edge case has a significant impact on a Super Metroid speedrun exploit [3] which causes an out-of-bounds memcpy, which attempts to transfer a large block of data from open-bus to RAM. The open-bus read almost always returns zero (because the last byte of the relevant load instruction is zero), but when performed in certain rooms with HDMA-heavy graphical effects, there's a good chance that a DMA transfer will affect one of the reads, causing a non-zero byte to sneaks in somewhere important and causing the exploit to crash instead of working normally. This has created a mild controversy in the community, where some routes and strategies are only reliable on emulators and nonstandard firmwares; a player using original hardware or a very accurate emulator has a high chance of experiencing a crash, whereas most emulators (including all of Nintendo's official re-releases of the game) do not emulate this niche edge case of a mid-instruction HDMA transfer changing the value of an open-bus read.

    Also, the current fastest TAS completion of Super Metroid [4] relies on this HDMA interaction. We found a crash that attempted to execute open bus, but wasn't normally controllable in a useful way; by manipulating enemies in the room to influence CPU timing, we were able to use HDMA to put useful instructions on the bus at the right timing, eventually getting the console to execute controller inputs as code and achieve full arbitrary code execution.

    [1]: https://youtu.be/vAHXK2wut_I

    [2]: https://youtu.be/K7gWmdgXPgk

    [3]: https://youtu.be/CnThmKhtfOs

    [4]: https://tasvideos.org/8214S

    • russellbeattie 12 hours ago
      > ... we need to look a little closer at the physical structure of a data bus

      Once again, I have to give a shout out to Ben Eater, whose video series on making a breadboard computer with the 6502 is why I actually understand what the article is about and what you're referring to when describing the hardware issues. (Obviously, extrapolating from his basic bus example to a commercial machine.) I'd be pretty clueless otherwise.

      https://eater.net

  • mock-possum 18 hours ago
    Love stuff like this, I feel like I’m only ever 60% following the assembly code, so the prose explanation alongside really helps - and it’s fun to hear these ‘bugs that nobody understood or possibly even noticed until now in a classic piece of software’ stories!
    • shadowgovt 16 hours ago
      One of the things I love about this era of systems is that there were none of the modern checks that we consider table-stakes in nearly everything, including most embedded systems (necessary in anything that can be hooked up to a network, and still so cheap that it's included as a nice-to-have in completely isolated embedded architectures).

      Lots of reads and writes in the original NES just toggled voltages on a line somewhere, and then what happened, happened. You got the effect you wanted by toggling those voltages in a very controlled manner lock-stepped with the signal indicating the behavior of the CRT blanking intervals. Some animations in Super Mario Bros 3 involved toggling a RAM mux to select from multiple banks of sprite data so that when the graphics hardware went to pull sprites, it'd pull them from an entirely different chip with slight variations in their look. And since the TV timing mattered, they had to release different software for regions with NTSC and PAL TVs since those TVs operate with different refresh rates and refresh rate was the clock that drove the render logic.

      It was a wild time.

  • pipes 17 hours ago
    Whenever I'm playing a game via emulation and I get stuck, I do end up wondering if it's a bug in the emulator. This particular issue, I would have assumed the game was designed this way it and is just difficult.

    Not quite related, but i get a similar feeling if the game seems really tough: "is this due to emulation latency". I went down a rabbit hole on this one and built myself a mister FPGA!

    • bnjms 17 hours ago
      Chronic Trigger had one like this. I recall there is a section where you catch a rat and have to input four simultaneous key inputs after catching the rat. But usb inputs only forwarded 3 at a time so to get passed this you’d mash all four and eventually you’d get them registered inside the very short timeframe. Took many tries and was very frustrating.
      • mewse-hn 7 hours ago
        Yeah I hit that bug a long time ago, my keyboard at the time only had "2 key rollover" - which I think was standard. Current gaming keyboards have NKRO - "n-key rollover" which would allow the four "buttons" to be pressed at the same time
      • Y_Y 17 hours ago
        > Chronic Trigger

        Sometimes it does feel that way...

    • bigstrat2003 14 hours ago
      I only ever played DKC on ZSNES, and I had no idea that this was an emulator bug until reading the article. Like you said, I just assumed that it was the intended game design to time your launch from the barrel so that it was the correct angle. It blew my mind to learn that it was a bug!
    • aidenn0 17 hours ago
      I played a lot of bionic commando as a kid. When I loaded it up in an emulator in the early 2000's it was way harder than I rememberd. Then I realized there was an emulation bug where the enemies didn't disappear when you blew up the base, but Ladd still froze; that meant I needed roughly 2 extra life points when clearing a level. Just to see if I could, I did beat it that way once, but never again.
    • pipes 15 hours ago
      Why did I get downvoted for this?
  • jgalt212 11 hours ago
    DKC 1 with the SGI prerendered 3d graphics was cutting edge stuff. Vector Man on the Genesis did something similar to less acclaim.
    • frou_dh 38 minutes ago
      As a kid, that look felt somehow off to me because I could sense that it was at some level "fake". The various magazines at the time could be kinda wooly on the subject and imply that the power of the SNES was rendering the characters etc in realtime (rather than it essentially being flipbook animation).
    • xp84 6 hours ago
      I was a kid right in the target demo for DKC circa 95 or so — 11. That game absolutely blew our socks off. We couldn’t believe what we were seeing. I remember I had a videotape that they sent out (probably a send-away promo from a box of cereal or something) that teased the game around the time of its release and showed some behind the scenes stuff from the game’s development. I definitely watched that tape many times. I didn’t even get to own DKC, but I did get to play it at a friend’s house.
  • jihadjihad 18 hours ago
    I know it's OT, but I have to say, for a 30-year-old video game, it's remarkable how well DK Country 2 holds up today. I've been playing it with an emulator and the graphics, sound, level design, and controls are all masterful. The kids can keep Fortnite, I'll take DKC and Chrono Trigger any day!
    • christophilus 17 hours ago
      Chrono Trigger holds up. That game is a masterpiece.
    • pezezin 8 hours ago
      I played the original DKC trilogy for the first time three years ago, and I have to say that I really disliked parts 1 and 2. The controls felt very "floaty" and the difficulty was not well adjusted. The bird stage in DKC 2 was specially rage-inducing, you know which one.

      I really enjoyed DKC 3 though, which apparently is not that popular among hardcore DKC fans, so there is that.

      • ninjin 5 hours ago
        Played a tiny bit of the first one in the 90s, but like you I only played through the trilogy just recently (in my case, last year) and on original hardware. I would not go so far as to say disliked, but you are absolutely correct that the controls are floaty (although not as bad as Sonic, but Sonic is truly awful) and it is often a challenge to tell where objects and platforms end. The third game has the weakest music, but I would consider a lot of the level and boss design substantially better. Overall, I think there is a heavy amount of nostalgia going on when it comes to the fans, but that is okay as long as we speak openly about it.

        As for difficulty, we will have to agree to disagree. Only point that had me somewhat frustrated was the waterfall boss in the last game and that one had me stumped for some time (should have read the manual I guess?). Overall, I would still recommend the games as good for their time and among the better action platformers for their console, but they really are nowhere close to a masterpiece like Super Mario World that has pretty much perfected controls and you can tell exactly where platforms and objects start and end.

    • aidenn0 17 hours ago
      Most Rare-developed games from that era were really well done.
  • helf 17 hours ago
    I love this sort of content! My favorite things I find on HN :D