80386 Memory Pipeline

(nand2mario.github.io)

110 points | by wicket 4 days ago

4 comments

rep_lodsb 19 hours ago
Again a very interesting look at how this chip works internally!
I've decoded the entry point PLA of the 80286 (not the actual microcode though). It also has separate entries for real and protected mode, but only for segment loads from a general purpose register, HLT, and for those opcodes that aren't allowed in real mode like ARPL.
Loading a segment register from memory on the 286 uses the same microcode in both modes, as does everything else that would certainly have to act differently, like jump/call far. That was a bit surprising, since it would have to decide at run time which mode it's in. Is this the same on the 386?
Tested on my 286 machine what happens when opcodes are decoded while in real mode but executed after PE is set: Segment load from memory works (using protected mode semantics), whereas the load from register only changes the visible selector and nothing else. The base in the descriptor cache keeps whatever was set there before -- I assume on the 386, SBRM would update the base the same way it does in real mode in that situation, because it's also used for V86 mode there. Illegal-in-real-mode instructions trap, but do so correctly using the protected mode IDT.
Also seems like executing three pre-decoded instructions without a jump after setting PE causes a triple fault for some reason.
[-]
- nand2mario 10 hours ago
  Nice findings. For segment loads from memory, the entry point is actually shared between real and protected mode on the 386. The microcode branches later based on PE and does the extra descriptor work only in protected mode. So maybe it's done similarly on the 286.
  The decode vs. execution behavior is more interesting. From both Intel docs and my own core, PE is effectively checked in both stages independently, but decode happens ahead of execution (prefetch queue). So if an instruction is decoded in real mode, it’ll still follow the real-mode path even if PE is set before it executes.
  That’s exactly why Intel requires a jump right after setting PE — it flushes the prefetch queue and forces re-decode in protected mode. As the 80386 System Software Writer’s Guide (Ch. 6.1) puts it: "Instructions in the queue were fetched and decoded while the processor was in real mode; executing them after switching to protected mode can be erroneous."
- bell-cot 15 hours ago
  > Also seems like executing three pre-decoded instructions without a jump after setting PE causes a triple fault for some reason.
  It's been a while, but I recall Intel documenting that a jump was required almost immediately after setting PE. Probably because documenting "you must soon jump" was easy. Vs. handling the complexities of decoded-real/executed-PE - and documenting how that worked - would have been a giant PITA.
  The two-instruction grace period was to let you load a couple segment or descriptor table registers or something, which were kinda needed for the jump. And that triple fault - if you failed to jump in time - sounds right in line with Intel's "when in doubt, fault or halt" philosophy for the 286.
  [-]
  - rep_lodsb 12 hours ago
    Well, Intel documented that the very first instruction after enabling protected mode had to be an "intra-segment" (not inter-segment) jump, to flush the prefetch queue. At least that was what it said in the 286 and 386 documents I read. You were supposed to set up everything else needed before that, do this near jump, and then jump to the new protected mode code segment.
    Some later documentation contradicted this, saying that instead this first jump had to be to the protected mode segment.
    From the patent (US4442484), it is apparent that the processor decodes opcodes into a microcode entry point before they are executed, and the PE bit is one of the inputs for the entry point PLA. So that would be the obvious reason for flushing the prefetch queue - but it turns out that at least on the 80286, most instructions go to the same entry point regardless of the mode they are decoded in. So they should work the same without flushing the queue.
    And yet for some reason, what I've seen in my experiments is that the system would reset if there were three instructions following the "LMSW" without a jump. Even something harmless like "NOP" or "MOV AX,AX", that couldn't be different between real and protected mode. Maybe there is some clock phase where the PE bit changing during the decoding of an instruction leads to an invalid entry point, that either causes a triple fault or resets the processor?
    [-]
    - rasz 6 hours ago
      I disassemble and read a lot of vintage bioses for fun. Recently I looked at something more ~recent, an Atom N270 945GSE Mini-ITX industrial board from 2010. Phoenix bios:
      seg000:FD56 Unreal_FFD56 proc near ; CODE XREF: CPU_MicrocodeUpdate+A↑j seg000:FD56 ; VGA_BIOS_Shadow+20↑p ... seg000:FD56 lgdt fword ptr cs:[bx] seg000:FD5A mov eax, cr0 seg000:FD5D or al, 1 seg000:FD5F mov cr0, eax seg000:FD62 jmp short $+2 seg000:FD64 ; --------------------------------------------------------------------------- seg000:FD64 seg000:FD64 loc_FFD64: ; CODE XREF: Unreal_FFD56+C↑j seg000:FD64 mov ax, 8 seg000:FD67 mov ds, ax seg000:FD69 assume ds:nothing seg000:FD69 mov es, ax seg000:FD6B assume es:nothing seg000:FD6B mov eax, cr0 seg000:FD6E and al, 0FEh seg000:FD70 mov cr0, eax seg000:FD73 jmp short $+2 seg000:FD75 ; --------------------------------------------------------------------------- seg000:FD75 seg000:FD75 loc_FFD75: ; CODE XREF: Unreal_FFD56+1D↑j seg000:FD75 xor ax, ax seg000:FD77 mov ds, ax seg000:FD79 assume ds:nothing seg000:FD79 mov es, ax seg000:FD7B assume es:nothing seg000:FD7B retn seg000:FD7B Unreal_FFD56 endp
      two short jumps, no far jumps in sight. Apparently works just fine on Pentium 4, Core 2s and Atoms.
      [-]
      - rep_lodsb 2 hours ago
        Yes, the far jump was never necessary on any processor, only a convention. You can stay in the same segment as in real mode and it will continue to work. But some kind of control transfer to flush the queue must be done shortly after the LMSW / MOV CR0, or things may break in ways that I'm not entirely clear on.
        My test code looked like this:
        mov ax,1 ;new MSW mov bx,TestSel ;pointer to selector value into BX mov dx,[bx] ;and load into DX mov cl,31 ;shift count for delay cli ;disable interrupts lgdt [Gdtr] lidt [Idtr] jmp enter_pm ;flush queue now align 2 enter_pm: ;go! rol cl,cl ;delay while following instructions decode lmsw ax ;set PE bit mov es,[bx] ;should load selector 0x0010 into ES mov ds,dx ;should set DS base to 0x00100 [NOPE] str ax ;should trap because not allowed in real mode ud2 ;trap anyway in case it didn't
        On the 286, this always caused the processor to reset. Replacing one of the two segment load instructions with a same-length "mov ax,ax" didn't change that, but removing one of them did.
        In that case the "str ax" acted as the control transfer that flushed the queue (it was still decoded in real mode, so it went to the "invalid opcode" entry point). No clue as to what exactly happens to cause the reset when three instructions are run from the queue, some timing issue related to when the PE bit actually changes vs. what the decoder is doing at this point?
mrlonglong 21 hours ago
Voodoo mode is the ultimate test. Imagine having access to 4GB of memory from real mode.
[-]
- jmmv 21 hours ago
  Shameless plug for my https://blogsystem5.substack.com/p/beyond-the-1-mb-barrier-i... article from a couple of years ago. You'll find a deep dive on unreal mode (I just learned it's also known as "vodoo mode") and some hands-on code to play with it ;-P
  [-]
  - mrlonglong 18 hours ago
    There are things like DOS4GW that you can use as loaders.
  - bombcar 19 hours ago
    I want vooDOS 5.0 which is 32 bit clean :)
- vardump 18 hours ago
  Most people called it unreal mode.
  https://en.wikipedia.org/wiki/Unreal_mode
- nand2mario 8 hours ago
  Yep. The microcode in real mode segment loading (as shown in the post) does not set the limit to 64KB. That is why returning to real mode with a large value like 4GB in limit gives you "unreal mode".
andyjohnson0 21 hours ago
Interesting article. I learned some things.
How hard would it be for Mr Github to add rss/atom feeds, I wonder?
blueybingo 20 hours ago
wasn't this basically the consensus among numerical analysts like 20 years ago? i remeber reading similar arguments in goldberg's paper and various game dev forums circa 2005, so genuinely curious what keeps making this idea feel "new" to each generation of programmers who rediscovers it
[-]
- curiousObject 19 hours ago
  I think you’re missing the point. This person is implementing various CPU cores on FPGA. The insights they can share from that complex process are sometimes interesting, because they are looking at the system from a new angle.
  https://nand2mario.github.io/projects/
  [-]
  - rep_lodsb 18 hours ago
    OP probably meant to post in another thread: https://news.ycombinator.com/item?id=47767398