Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tracing the roots of the 8086 instruction set to the Datapoint 2200 minicomputer (righto.com)
135 points by matt_d on Aug 12, 2023 | hide | past | favorite | 49 comments


Great article, especially the opcode charts! Almost all you find online are unfortunately in hex, which obscures the encoding.

The strangest feature inherited from the 8008 is probably the parity flag. There are some uses other than 7-bit ASCII comms for it, but this kind of code is probably heavily 'pessimized' on modern CPUs:

    TEST AL,0xc0
    JZ   quadrant0    ;both bits clear
    JPE  quadrant3    ;both bits set
    JS   quadrant2    ;only bit 7 set
    JMP  quadrant1    ;only bit 6 set
And to this day, even the reserved lower flag bits have the same values as they did on the 8080 ('S Z 0 A 0 P 1 C'). I wonder if there is actually code out there that depends on bit 1 being set.

One nitpick:

>The ModR/M byte has been changed in 64-bit mode so the BX (originally HL) register is no longer special when accessing memory

It was 32-bit mode that made EAX, ECX and EDX usable for addressing, though not the 16-bit lower half registers. And nothing with that changed in the transition to 64 bits.


you might be interested in my opcode map for the 8080 in Dercuano (temporary online copy at https://dercuano.github.io/notes/8080-opcode-map.html ); it not only uses octal but also uses the digit order 0, 2, 4, 6, 1, 3, 5, 7 to bring out the structure of the instruction set better

datapoint's manuals for the 2200 instruction set use octal, but intel's 8008 manual uses hex, maybe due to some kind of misguided admiration for ibm


That digit order is an interesting choice to bring out more structure. The downside is the registers and RST instructions get weirdly permuted. (By the way, your link is broken because the closing parenthesis gets appended; I recommend a space before it.)

My suspicion is that Intel's manuals used hex because octal was kind of obsolete by that point.


thanks for the note; i've made the suggested correction

i agree about the registers and rst instructions

i don't see how octal could be obsolete; computers still use binary, and octal is just as useful for doing binary arithmetic as it ever was. as you point out, it's considerably more useful for instruction sets with 3-bit fields. moreover, the great advantage of hexadecimal was that bcd was easy to read, but that stopped mattering around the time the 8008 was introduced

i'm interested to hear if your experience differs, but i think it's a lot easier to do mental arithmetic in octal than in hex; it's a lot easier to remember 5 + 6 = 13, or figure it out if you forget, than d + c = 19, and both the addition and multiplication tables are one fourth the size

as far as i can tell, octal just went out of style. if anything, hexadecimal became obsolete, because its killer advantage went away, while its drawbacks remained

arm leaned into the hexadecimal wave and designed their instruction set to be easily readable in hex. we probably don't have to worry that a possible wide adoption of risc-v will initiate a transition to base 32, though; its instruction formats, while much simpler than thumb, resists any such easy numerical readability

btw, i don't remember if i ever told you this, but thank you for xfractint


Octal is nice with a 12-bit machine but octal is absolutely awful when using bytes on a 16-bit machine. The problem is that the two bytes are encoded differently in octal. For example, the string "AB" is 4142 in hex while "BA" is 4241; the two bytes are easy to decode. In octal, "AB" is 40502 while "BA" is 41101; you can't see what's going on.


yes, that's true; multiplying a byte by 400 to form a word is kind of a pain

i had a heathkit h8 which had an unusual approach to this problem

it displayed memory addresses on its front panel in octal, from 000000 up to 377377


In the context of a CPU's memory model, I wouldn't call it style because a byte--as an abstract term for the smallest addressable unit--is subdivided into nibbles (digits), where the size of a nibble dictates what base makes sense.

A 9-bit byte is subdivided into three 3-bit nibbles/digits (2^3=base-8) [000, 777].

An 8-bit byte (octet) is evenly subdivided into two 4-bit nibbles/digits (2^4=base-16) [00, FF].

In documentation--given that the majority of computers use 8-bit bytes--the base we choose to represent 3-bit fields is a separate topic, and could be called style.


there's something to that

as a side note, i have a quibble with your definition of 'byte'. lots of computers have been word-oriented, with words being generally the smallest addressable unit

word sizes in word-addressed computers have included 12 bits (pdp-8), 16 bits (nova and tandem), 18 bits (pdp-7 and ga144), 20 bits (mup21), 32 bits (zuse z4, chifir, and maybe some ti dsps, not sure), 36 bits (pdp-10, ibm 704, ibm 709), 60 bits (cdc 6600), 64 bits (cray unless you count the parity bits), and so on

it was not normal practice to describe any of these words as a 'byte'; more commonly, you would divide the word into (possibly variable-size) 'bytes', each representing one character, even though you couldn't load those bytes individually from memory


I wonder if there is actually code out there that depends on bit 1 being set.

Almost certain there is. Demoscene and emulator detection code come to mind.


Thanks, I've fixed that.


I wouldn't consider little-endian "quirky" --- it's the most logical way to do it, since the significance increases with increasing address. AFAIK all arbitrary-precision arithmetic routines use little-endian ordering even on big-endian machines for this reason, as big-endian requires lots of length-dependent terms in calculation.

When the 8086 was released, these registers were renamed to AX, CX, DX, and BX respectively.

It's worth noting that NEC produced the V25/V35, which were basically binary compatible (with some extensions) to the 8086, and presumably to avoid legal issues (which happened anyway), they renamed the instructions and registers; instead of AX CX DX BX SP BP SI DI, they used AW CW DW BW SP BP IX IY --- the latter two reminiscent of the Z80.


You're not going to lure me into a big-endian vs little-endian argument :-)


Without starting the argument, I'd like to know your thoughts - for me as someone who operates several layers of abstraction away from the hardware, the choice seems mostly arbitrary (like Fahrenheit vs Celsius) with each option having both benefits and drawbacks.


Fahrenheit and Celsius are just two units of measure on the same temperature scale, running in the same direction.

Big and little endian are more fundamentally different.

Ken's article mentions that the Datapoint 2200 used shift-register memory that read out one bit at a time, along with a one-bit arithmetic unit:

> Another consequence of shift-register memory was that the Datapoint 2200 was a serial computer, operating on one bit at a time as the shift-register memory provided it, using a 1-bit ALU. To handle arithmetic operations, the ALU needed to start with the lowest bit so it could process carries. Likewise, a 16-bit value (such as a jump target) needed to start with the lowest bit. This resulted in a little-endian architecture, with the low byte first. The little-endian architecture has remained in Intel processors to the present.

Now consider how you add or subtract, or multiply or divide two numbers by hand, where you're working with one decimal place at a time.

Do you start from the big end of the numbers, or the little end?

And if the numbers aren't the same length, where do you align them? At the little end.


Little Endian is not arbitrary.

If you were going to write a routine to add to numbers that were written in ASCII, you’d necessarily need to start in the ones position moving to the tens and the hundreds. The computer in the article originally had serial memory. Those values would also need to cycle from the smallest value to the largest. The earliest computers were also serial computers because they were the cheapest to produce.


The choice for big-endian is driven by one thing only - humans can read memory as they are familiar with. Which is no kind of hardware decision at all.


Lessons learned:

1. Incremental, backward compatible progress wins in the market.

2. Incremental, backward compatible progress leads to nasty, ugly, hard to use hacks that create waste, inefficiency, and user pain.

Explains a lot, really.


As Intel engineer Tom Forsyth said, "The REAL cost of x86 legacy is not gates, it’s lots and lots of meetings."


You have to measure "waste, inefficiency, and user pain" not in a vacuum but by comparison to the alternative. Breaking backwards compatibility also causes waste, inefficiency, and user pain.


Welcome to biological evolution, by the way.


> In 64-bit mode, the 8086's general-purpose registers are extended to sixteen 64-bit registers (and [soon](https://www.intel.com/content/www/us/en/developer/articles/t...) to be 32 registers).

Wait, what?

I thought that Intel AVX10 was limited to vector instructions, but they are re-arranging the instruction set and doubling general-purpose CPU registers.

And adding lots of conditional instructions, so that common loop patterns can be mapped onto a conditional instruction, rather than written out as a branch to another code path.

This architecture extension is called Intel APX, referenced by the AVX10 extension, but I don't recall hearing about it before.

APX will not need any additional hardware space, as they are carving these 64-but registers out of the deprecated MPX registers.

Memory Protection Extension https://en.m.wikipedia.org/wiki/Intel_MPX


There was some discussion of APX on HN a couple of weeks ago: https://news.ycombinator.com/item?id=36853166#


Remarkably, Intel has managed to move from 8-bit computers to 16, 32, and 64 bits, while keeping systems mostly compatible.

Well, the move to 64 bit didn't quite turn out as Intel wanted it, and they had to copy AMD's version of it. This is acknowledged in footnote 20, but the article itself sounds like Intel was infallible and never mentions AMD at all...


I remember reading somewhere that Intel also had a backward-compatible 64-bit extension in development, but was beaten by AMD to it when the Itanic turned out to be a flop.


Author here if you have questions about the Datapoint 2200 :-)


Ken, you have this statement in a footnote:

> I haven't been able to find any instruction set before the Datapoint 2200 that required memory addresses to be loaded into a register.

For a direction to look, look at the CDC mainframes (the Cybers, the 6600 and 7600 models, and possibly earlier models). A too great many years ago now while in college for my EE degree the college had a pair of CDC's (I think a 6600 and 7600, I only ever had access to the older one for a mandatory "assembly programming" class) and one of the 'quirks' of the CDC CPU design was that the CPU had (if memory serves) eight data registers and eight address registers, and to perform a memory read, one loaded an address into one of the first six address registers, which would cause a memory read from that address, storing the result in the corresponding data register (so A0 caused a read into D0, etc.).

For a memory write, one loaded an address into either A6 or A7, which caused the CPU to perform a memory write using data from the corresponding data register (A7 caused D7 to be written to memory).

The "production dates" for these CDC systems is relatively contemporary (for a reasonable fudge factor around "contemporary") with the Datapoint terminal, so these might be an example of another architecture from a similar time range (or quite possibly before) that loaded an address into a register in order to access memory.


The CDC 6600 did that for a completely different reason. It's an early superscalar machine. It overlaps memory operations, and even some compute. This is visible to the programmer. The desired programming style is load, load, load, operate, operate, operate, store, store, store. Then the operations can overlap. There's something called the "scoreboard" to stall the pipeline if there's a conflict, but there's no automatic re-ordering.

The tiny machines at the Datapoint 2200 and 80xx level didn't do anything like that.

At the other extreme, there were low-end machines where the registers really were in main memory. The compute/memory speed ratio has changed over time. Today, arithmetic is much faster than memory, but in the late 1960s/early 1970s, arithmetic was often slower than memory on low-end machines.


The 6600 was not a superscalar machine but simply a pipelined processor. Superscalar machines first appeared in the floating point processor of the IBM 360/91 and may well be due to John Cocke (IBM) who generalized the notion. Yale Patt (UC Berkeley, U Michigan, U Texas at Austin) refined the ideas. Most processors designed today have superscalar features.


The 6600 had multiple functional units - 2 floating point multipliers, a divider, two adders, etc.,[1] and if the instruction stream allowed it, many of them could be running at the same time. So it was a superscalar machine.

[1] https://en.wikipedia.org/wiki/CDC_6600#Central_Processor_(CP...


To be superscalar, a processor must initiate multiple instructions per clock cycle. Having multiple functional units isn't sufficient to be superscalar if one instruction is dispatched at a time. You get higher performance from the multiple functional units since the next instruction isn't blocked while the previous one is executing.

According to "Modern Processor Design: Fundamentals of Superscalar Processors", the CDC 6600 was not superscalar because it had scalar instruction issue. This book says the IBM Advanced Computer system was the first superscalar design, but the project was canceled.


This is just a definitional issue. Older thinking was that having several instructions in progress at once was enough to be superscalar. Modern thinking seems to be that you have to initiate multiple instructions on the same clock cycle. Sources differ. Here's a good overview of the CDC 6600.[1]

Multiple execution units yes, multiple operations in progress yes, scoreboard yes, retirement unit no, branch prediction no, reordering no.

[1] https://people.eecs.berkeley.edu/~randy/Courses/CS252.S96/Le...


Well, I can't stop you from using a nonstandard definition :-) The original definition of superscalar from Agerwala and Cocke of IBM was dispatching multiple instructions to the execution units every cycle. This is the same definition used by the other sources I've checked.

There are processors such as the Motorola 88100 and the Intel 80960KA that had multiple functional units and scoreboards, but were not considered superscalar. The follow-on 88110 and 80960CA processors could issue multiple instructions per clock, and were called superscalar by their creators. https://techmonitor.ai/technology/motorola_lifts_the_veil_on... https://ieeexplore.ieee.org/document/63681


Thanks, that's very interesting and I've updated the footnote. I looked at the CDC 6600 manual and it works pretty much like you say. If you change an address register A1-A5, the system automatically reads from memory into the corresponding operand register X1-X5. Similarly, if you change address register A6 or A7, the word in X6 or X7 is stored to memory. It is interesting to look at old systems that do things wildly differently from modern computers.

http://www.bitsavers.org/pdf/cdc/cyber/cyber_70/60045000_660...


> I haven't come across an instruction set other than Datapoint that treated register and memory accesses identically. (If you know of exceptions, please let me know.)

The DEC PDP-11 was launched in 1970, slightly earlier than Datapoint 2200.

While DEC PDP-11 also had additional addressing modes, its two most simple addressing modes, register direct and register indirect correspond exactly with the addressing modes of Datapoint 2200.

The only difference is how they were encoded. Because PDP-11 had longer instructions, they could afford separate bits for encoding the addressing mode.

So 3 bits encoded the 8 registers and a separate bit encoded whether the register is used as data (register direct) or as the address of data from memory (register indirect).

Datapoint 2200 needed shorter instructions, so they have dispensed with the bit needed for encoding the addressing mode, by using only 7 registers, so that the unused register number could be used to encode the register indirect addressing mode, where the register holding the address had to be implicit, as no other bits were available to specify it.

So what is special about the addressing modes of Datapoint 2200 is this encoding trick, which saves 1 bit of the instruction encoding with the price of allowing only one of the general-purpose registers to be used for memory addressing, while the addressing modes themselves are not new.

This encoding trick, where it is avoided to have additional bits for encoding addressing modes, by reserving some register numbers to encode those addressing modes, has been reused repeatedly by Intel when defining the 8086 and 80386 ISAs and by AMD when defining the 64-bit extension.

For instance, there are no instruction bits to encode addressing modes where the address is computed by adding a base register and an index register. Instead of that, the SP register cannot be used in the basic encoding for memory operands, so using the SP register number denotes that the address is computed by adding a base register and an index register, where those registers are specified by an extra SIB byte. There are a few other cases where some registers cannot be used with certain addressing modes, but when they appear in the instruction those register numbers mean that the operand must be accessed with some different addressing mode, like memory relative to the instruction pointer or memory absolute.


In PDP-11 terms, the Datapoint 2200 only had these "address modes" ( https://en.wikipedia.org/wiki/PDP-11_architecture#General_re... ):

    - register
    - register deferred via HL
    - autoincrement via PC (immediate operands)
No registers other than HL and PC could be output to the address bus at all. The 8080 was much more flexible, but the instructions it added weren't orthogonal, so I would say it's not at all comparable to the PDP-11 addressing modes.


No questions, just thanks for another interesting article.

What I have found especially interesting is the story about the undocumented instructions of 8085, because I was not aware of them.

Those instructions would have been actually quite useful and if they had been documented they would have made the Intel 8085 significantly more competitive with Zilog Z80, taking into account also the fact that in the early years 8085 usually had a higher clock frequency (3 MHz or 5 MHz for 8085 versus 2.5 MHz or 4 MHz for Z80).

When I was young I have worked to make some improvements in speed to the functions that implemented the floating-point arithmetic operations in the run-time library used by the Microsoft CP/M Fortran compiler, because they were too slow for my needs (obviously after disassembling them, as Microsoft did not document them). On an Intel 8080 CPU, more than 100 FP64 multiply-add operations per second was considered as high speed, while now a CPU that does 100 billion FP64 multiply-add operations per second is considered a very slow CPU (the best desktop CPUs are more than 6 times faster).

I am sure that with the extra 16-bit operations provided by 8085, a decent speed-up of those FP arithmetic functions would have been possible and I would have found that useful at that time, because I was able to use IBM PC clones only some years later.


The undocumented instructions in the 8085 were described in detail in Dr Dobb's Journal shortly after announcement of the 8085 (with no mention of the "new" instructions).


Was the datapoint 2200 a brand new machine or was it influenced by others esp. wrt. the assembly language instruction set?

Looking up Datapoint on Wikipedia shows a predecessor box, the 3300. But did the new designers of the 2200 totally ignore the 3300 instruction set? The 3300 seems to have used shift-registers as well.

Other data terminals at that time or radio equipment these designers might have used before?

Thx

https://en.m.wikipedia.org/wiki/Datapoint https://en.m.wikipedia.org/wiki/Datapoint_3300


The Datapoint 3300 was strictly a terminal. It was not programmable and did not have an instruction set. It basically had counters for the cursor position and put characters into memory at that position, and had a ROM to generate characters. Manual: http://bitsavers.org/pdf/datapoint/3300/70116_3300termMaint_...

I don't think the Datapoint 2200 had any specific influences, at least none that I could find. Keep in mind that at the time, it was fairly common to make a processor out of TTL chips. People would make a custom processor for random things like a CNC controller or a video game. These processors were usually ad hoc, with an instruction set that had the features that were needed. If people were going to copy something, it was often a PDP-8 or 11 or a Data General NOVA.


hi Ken!

I wrote about DataPoint in The Big Bucks. It turns out I knew Gordon Peterson, architect of ARCNet, so he was a very good source for me.

Naturally, there's a book about them. I read these things so you don't have to.


why did they collapse

it seems like it was only two years from being the darling of wall street to bankruptcy

what happened


since I just claimed I read them so you don't have to:

they had a major scandal of salespeople misreporting revenue.

and then the usual head-in-the-sand attitude of companies who get imprisoned by their success.

Being in San Antonio probably didn't help, either.


The accounting scandal was the proximate cause, but I think the underlying issue was Moore's Law. I've examined the performance of Datapoint's TTL-based computers vs microprocessors. In 1972, the 2200 V2 was over 6 times as fast as the 8008. The Datapoint 5500 (1973) was significantly faster than the 8080 (1974). The Z80 (1976) was slightly faster than the Datapoint 6600 (1977). The 8086 (1978) was about twice as fast and Datapoint never managed to catch up. Datapoint tried to pivot to Intel processors, but then they were competing against commodity PCs. Datapoint suffered a hostile takeover in 1984 but didn't go bankrupt until years later in 2000.

The point is that Datapoint's TTL-based approach was great in the early 1970s but couldn't keep up with the exponential improvements in microprocessors.


thanks for the correction about the bankruptcy; i'd seen that in fact it was a corporate raider skinning them alive rather than an actual bankruptcy but i'd forgotten

but presumably to be an appealing target for such asset stripping you already need to have cratered in the stock market, which presumably is a result of failing to compete or other terrible news

why didn't they just switch to cmos? every computer company was using ttl at the beginning of the 70s and lots of them survived the switch to cmos in the late 80s or early 90s. some mainframe and super companies waited until much later because they were using ecl rather than ttl

my own uninformed guess previously had been that their sales force structure couldn't survive the necessary price reductions to compete against commodity pcs, because the equipment pricing had business-class plane fares and three-martini lunches for the sales force figured in, but i haven't been able to find any information about datapoint pricing in the early 80s to confirm this. apparently avoiding this problem was one of the main reasons the ibm pc was developed in such a non-ibm manner: to avoid being dependent on the ibm sales force


I will email Gordon. Maybe he can contribute.


i'd be delighted to hear his point of view


hmm, interesting, the first part sounds like garden-variety corruption

what do you mean about the attitude tho


Ken addressed that: a generational change underway, which is always difficult for a company that's been successful.


> The "stopgap" 8086 processor, however, started the x86 architecture that changed the history of Intel.

Nothing lasts longer than a temporary fix.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: