Hacker News

148 points by rbanffy 3 days ago | 31 comments

An excellent source for this architecture is Mitch Alsup and his Usenet posts going back to the late 1980s (he still posted regularly in the 2020s.)

readitalready 20 hours ago

I still have some of the 88000 reference manuals, and it was really my first introduction to RISC architecture, and I thought it was great. But I never figured out why companies like Apple never chose it for their CPU?

mrpippy 20 hours ago

I believe it was the first RISC that Apple prototyped building a Mac around, including a 68K emulator. IIRC from Gary Davidian’s CHM oral history, it was corporate dealmaking that led to AIM and PPC more than any technical negatives for the 88K.

https://computerhistory.org/blog/transplanting-the-macs-cent...

klodolph 16 hours ago

Yeah, this is probably closest to the right answer. Apple DID choose the 88K, and then changed. Reportedly they put some 88K systems in a Mac chassis.

I do wonder what the exact reasons were. Maybe the PPC (complete systems) could be made cheaper? Maybe Apple was worried about relying on a single vendor? I am kind of skeptical of the “corporate dealmaking” angle, because it seems like there are valid technical reasons to NOT choose the 88K. Namely, that it requires companion chips, and the whole system (board + chips) ends up being complicated and expensive.

kalleboo 8 hours ago

What I always read was that Apple did not want to be stuck relying only on Motorola again like they were with the 680x0. And it worked out, kinda, Apple had IBM to rely on to make the G5 (until IBM also lost interest)

I remember reading that the successor 88110 design with the support chips integrated was announced mainly to woo Apple but I don't know how true that is.

fredoralive 6 hours ago

Bitsavers have some documents about the Jaguar RISC project[1] that do indicate Apple's feedback went into the 88110, for example in the System ERS it states "The main processor for the Jaguar is a new version of the Motorola 88000 family which has been enhanced (with input from Jaguar's team) in several areas over the existing implementation. This processor (which will be the MC88110) will be referred to as XJS in the ERS.". There's also an architecture document describing changes Apple wants to make to the 88000 ISA, although I'm not sure how much of this actually got through into the final 88110 (Apple wanted to break binary compatibility, not sure if that happened).

[1] The high end RISC machine project that went nowhere, which AFAIK became known as Tesseract when switched to PPC before it fizzled out.

kev009 19 hours ago

Timing. The 68k still had legs, i.e. the 68040 provided great drop in performance and had an enormous ecosystem and economies of scale. By the time the RISC wars were starting to get fever pitched, the POWER architecture and AIM alliance seemed like a blessing to combine ecosystems and economies of scale for the A and M constituents. And it was.. successful product lines for 2-3 decades from all sorts of embedded systems to G5 workstations to spacecraft.

rbanffy 9 hours ago

The 88000 was implemented across three large ICs. This took an enormous amount of board space and would be unfeasible on the smaller Macs.

panick21_ 4 hours ago

They did basically. What happened is that Apple own CPU project crashed and burned. Then they had some meetings with people including DEC for Alpha and IBM. IBM offered POWER and IBM was also willing to go in on some other projects, like the next gen OS Teligent.

But Apple didn't want to drop Motorola fully. So Motorola, Apple and IBM figured out that with some tweaks to the 88000 the could turn it into something POWER like. And that thing was PowerPC that Motorola supplied to Apple. That's my understanding.

shawn_w 20 hours ago

Complicated, expensive CPU marketed to very high end workstation use? Nobody thought it was worth picking up even if it was faster than the alternatives.

hrmtst93837 12 hours ago

Nobody wanted to bet payroll on a weird new ISA with no volume story. The "faster" part only matters if your compiler and OS aren't tripping over oddball silicon limits every patch release and that was a huge if back then, because once the toolchain, ABI, and kernel are all fighting the chip the benchmark win dies fast.

panick21_ 4 hours ago

Exept of course that Apple internally spent outrages amount of resources on their own CPU project that also wouldnt have had a volume story. Its only because that procet failed that they started looking at alternatives.

cmrdporcupine 16 hours ago

Both Apple and NeXT had machines prototyped around it, but it was initially very expensive I believe, and I think Apple was easily convinced to go with PowerPC ... and rather than evolve it and push it further Motorola dropped it in favour of going in on PowerPC.

The sad thing is Intel showed there was still life left in CISC, and Motorola themselves ended up circling back on 68k in the form of ColdFire which proved you could do for 68k what Intel did w/ the Pentium. But by then all their 68k customers had moved on from the 68k ISA.

p_l 12 hours ago

68k was much harder to optimize than x86, being way more CISC-y

68k like VAX was seen as dead avenue not only compared to RISC

adrian_b 6 hours ago

Motorola had made a few design mistakes, like adding memory indirect addressing in MC68020, which were removed much later, in the ColdFire Motorola CPUs.

But Intel had made much more design mistakes in the x86 ISA.

The truth is that the success of Intel and the failure of Motorola had absolutely nothing to do with the technical advantages or disadvantages of their CPU architectures.

Intel won and Motorola failed simply because IBM had chosen the Intel 8088 for the IBM PC.

Being chosen by IBM was partly due to luck and partly due to a bad commercial strategy of Motorola, which had chosen to develop in parallel 2 incompatible CPU architectures, MC68000 intended for the high end of the market and MC6809 for the low end of the market.

Perhaps more due to luck than due to wise planning, Intel had chosen to not divert their efforts into developing 2 distinct architectures (because they were already working in parallel at the 432 architecture for their future CPUs, which was a flop), so after developing the 8086 for the high end of the market they have just crippled it a little into the 8088 for the low end of the market.

Both 8086 and MC68000 were considered too expensive by IBM, but 8088 seemed a better choice than Z80 or MC6809, mainly by allowing more memory than 64 kB, which was already rather little in 1980.

In the following years, until 80486 Motorola succeeded to have a consistent lead in performance over Intel and they always introduced various innovations a few years before Intel, but they never succeeded to match again Intel in price and manufacturing reliability, because Intel had the advantage of producing an order of magnitude more CPUs, which helped solving all problems.

Eventually Intel matched and then exceeded the performance of the Motorola CPUs, despite the disadvantages of their architecture, due to having access to superior manufacturing, so Motorola had to restrict the use of their proprietary ISAs to the embedded markets, switching to IBM POWER for general-purpose computers.

p_l 3 hours ago

Analysis of issues in making more performant 68k and VAX are major part of what led to RISC development, with complex addressing (even in earliest 68000) being part of the problem. People think of x86 as CISC when reading about CISC vs RISC, but x86 was not much of a consideration when industry was switching to RISC-style designs - it was hitting walls on complex ISAs, especially VAX (which was allowed to live for way too long), but also to an extent 68k.

N.b. 68000 was supposed to be a 16bit extension of 6800, which among others resulted in hilarious two layers of microcoding.

AS for IBM PC, 68000 had major flaw of being newer while 8086 had been available for longer and with second sources - 68000 was released at the same time as reduced capability 8088, while equivalent reduced capability model for 68k arrived in 1982.

adrian_b 48 minutes ago

68k did not resemble VAX at all, it was considerably simpler. 68k and the other Motorola CPUs resembled a lot the earlier PDP-11, not VAX.

Both MC68020 in 1984 and 80386 in 1985 have added to their base architectures various features taken from VAX, e.g. scaled indexed addressing. MC68020 has added slightly more features from VAX, e.g. bit-field operations, while 80386 has added only single bit operations. However none of the few features taken from VAX has made 68k more difficult to implement or less suitable for high speed implementations.

The wrong feature added in MC68020, which had to eventually be removed later, which consisted in the memory indirect addressing modes, was not taken from VAX. VAX did not have such addressing modes, only some much earlier computers had such addressing modes. Those addressing modes were added by someone from Motorola without being inspired by VAX in any way.

The VAX ISA was more difficult to decode at high speed, because it used byte encodings, like x86, but the VAX ISA was still much easier to decode at high speed than x86. The 68-k ISA, which used 16-bit encodings, was much easier to decode than x86, being intermediate in ease of decoding between a RISC ISA and VAX. The x86 ISA is probably the most difficult to decode ISA that has ever been used in a successful product, but at the huge amount of logical gates that can be used in a CPU nowadays that is no longer a problem.

The reduced capability variant of MC68000, i.e. MC68008, has been launched too late to be useful for IBM because Motorola had not realized that this is a good idea and they have done it only after the success of Intel 8088.

Simultaneously with MC68000, Motorola had launched MC6809, which Motorola believed to be sufficient for cheaper products. That was Motorola's mistake. MC6809 had a much more beautiful ISA than any other 8-bit CPU, but at the time when it was launched 8-bit CPUs were becoming obsolete for general-purpose computers, due to the launch of the 64 kilobit DRAM packages in 1980, which made economical the use of more than 64 kilobytes of memory in a PC, for which the 8-bit CPUs like Zilog Z80 and Motorola MC6809 were no longer suitable.

Someone 5 hours ago

> 68k was much harder to optimize than x86

Harder to optimize or because of its orthogonal instruction set easier to write code for?

p_l 3 hours ago

Harder to optimize at microarchitectural level because each individual instruction represents way more complex execution model, including to even decode what the CPU is supposed to do.

X86 is comparatively simple, with limited indirect addressing support to the point it can be inlined in execution pipeline, and many instructions either being actually "simple" to implement, or acceptable to do in slow path. M68k (and VAX even more) are comparatively harder to build modern superscalar chip for.

adrian_b 35 minutes ago

Not true.

The 68k family had only one bad feature, which was introduced in MC68020, a set of memory indirect addressing modes.

Except for this feature, all instructions were as simple or simpler to implement than the x86 instructions.

MC68020, like also 80386, was a microprogrammed CPU with multi-cycle instructions, so the memory indirect addressing modes did not matter yet.

Those addressing modes became a problem later, in the CPUs with pipelined execution and hardwired control, because a single instruction with such addressing modes could generate multiple exceptions in the paged MMU and because any such instruction had to be decoded into multiple micro-operations in all cases.

For embedded computers, backwards compatibility is not important, so Motorola could correct this mistake in the ColdFire CPUs, but for applications like the Apple PCs they could not remove the legacy addressing modes, because that would have broken the existing programs.

Besides the bad memory indirect addressing modes, 68k had the same addressing modes as 80386, except that they could be used in a much more orthogonal way, which made the implementation of a CPU simpler, not more complex.

For a corrected 68k ISA, e.g. ColdFire, it is far easier to make a superscalar implementation with out-of-order execution than for x86.

Like I have said, 68k does not resemble VAX at all. The base 68k architecture resembles a porting to 32-bit of the DEC PDP-11 architecture. Over the base architecture, MC68020 has added a few features taken from VAX, mainly scaled indexed addressing and bit-field operations, and a few features taken from IBM 370, e.g. compare-and-swap.

Intel 80385 has also taken scaled indexed addressing from VAX, but instead of implementing bit-field operations it has added only-single bit operations. That is a negligible simplification of the implementation, which has been chosen by Intel only because their instruction format did not have any bits left for specifying the length of a bit field.

None of these features taken from VAX has caused any problems in either the Intel or the Motorola CPUs in high-speed pipelined implementations.

p_l 22 minutes ago

> Those addressing modes became a problem later, in the CPUs with pipelined execution and hardwired control, because a single instruction with such addressing modes could generate multiple exceptions in the paged MMU and because any such instruction had to be decoded into multiple micro-operations in all cases.

This is literally my point - the people involved in shift to RISC had figured it was a problem, and one aspect that made x86 easier to optimize long term (outside of Intel's huge market share) was that x86 had at most one memory operand per instruction (with certain exceptions). m68k's orthogonality meant both decode and execution are long-term harder, especially since you're going to have to support software that already uses those features - x86 has less of a legacy baggage there by virtue of not being as nice early on.

Clean break towards simpler internal design backed by compiled code statistics led most vendors - including intel - towards RISC style. Intel just happened to have constantly growing market share of their legacy design and never committed fully to abandoning it while lucking out in their simplistic design making it easier to support it long term.

rjsw 4 hours ago

What matters is how easy it is to create an out-of-order implementation of an ISA, there isn't a 680x0 equivalent of the Pentium Pro.

wk_end 57 minutes ago

Well, there's the 68080 - the modern FPGA-based 68k chip designed as an upgrade for Amigas - but that did arrive quite a bit too late.

http://www.apollo-core.com/index.htm?page=features

inkyoto 3 hours ago

Respectfully, this is nonsense.

«More CISC-y» does not by itself mean «harder to optimise for». For compilers, what matters far more is how regular the ISA is: how uniform the register file is, how consistent the condition codes are, how predictable the addressing modes are, and how many nasty special cases the backend has to tiptoe around.

The m68k family was certainly CISC, but it was also notably regular and fairly orthogonal (the legacy of the PDP-11 ISA, which was a major influence on m68k). Motorola’s own programming model gives one 16 programmer-visible 32-bit registers, with data and address registers used systematically, and consistent condition-code behaviour across instructions.

Contrast that with old x86, which was full of irregularities and quirks that compilers hate: segmented addressing, fewer truly general registers (5 general purpose registers), multiple implicit operands, and addressing rules tied to specific registers and modes. Even modern GCC documentation still has to mention x86 cases where a specific register role reduces register-allocation freedom, which is exactly the sort of target quirk that makes optimisation more awkward.

So…

  68k: complex, but tidy

  x86: complex, and grubby

What worked for x86, though, was the sheer size of the x86 market, which resulted in better compiler support, more tuning effort, and vastly more commercial optimisation work than m68k. But that is not the same claim as «68k was harder to optimise because it was more CISC-y».

p_l 3 hours ago

Notice I didn't write harder to optimize for - I am not talking about optimizing code, but optimizing the actual internal microarchitecture.

Turns out m68k orthogonality results in explosion of complexity of the physical implementation and is way harder to optimize, especially since compilers did use that. Whereas way more limited x86 was harder to write code generation for, but it meant there was simpler execution in silicon and less need to pander to slow path only instructions. And then on top of that you got the part where Intel's scale meant they could have two-three teams working on separate x86 cpu at the same time.

inkyoto 28 minutes ago

Once again – respectfully – this remains largely twaddle as the facts themselves state otherwise.

Even at the microarchitecture level, the hard part is not raw CISC-ness but irregularity and compatibility baggage. In that respect x86 was usually the uglier customer.

High-end x86 implementations ultimately scaled further because Motorola had less market pressure and fewer resources than Intel to keep throwing silicon at the problem, not because m68k was somehow harder to optimise.

Later high-performance m68k cores did what later x86 cores also did: translate the architected variable-length instruction stream into a more regular internal form. Motorola’s own MC68060 manual says the variable-length M68000 instruction stream is internally decoded into a fixed-length representation and then dispatched to dual pipelined RISC execution engines. That is not evidence of an ISA that was uniquely resistant to microarchitectural optimisation. It is evidence that Motorola used the same broad trick that became standard elsewhere: hide ISA ugliness behind a cleaner internal machine.

There is also a deeper point. The m68k ISA was rich, but it was comparatively regular and systematic at the architectural level. The m68k manuals show a clean register model and – notably – consistent condition-code behaviour across instruction forms. That kind of regularity is exactly what tends to help both compiler backends and hardware decode/control design. By contrast, x86’s biggest hardware pain historically came not from being «less CISC» than m68k, but from being more irregular and more burdened by backward compatibility.

Lastly, but not least importantly, CPU's were not the core business of Motorola – it was a large communications-and-semiconductors company, with the CPU's being just one product family within a much larger semiconductor business.

There was no clear understanding within the company of the rising importance of CPU's (and computing in general), hence the chronic underinvestment in the CPU product line – m68k did not see the light of highly advanced, performant designs purely because of that.

rbanffy 9 hours ago

I confess I have a soft spot for these machines - the road not taken is always tempting to explore. Sadly, it didn't do well on the market, even less in Europe, so there are very few working machines around me and even fewer floating around on eBay. :-(

zdw 17 hours ago

The 88k multi-chip cache/MMU architecture is fascinating, especially how it could be designed with a single cache chip, or a split I/D cache across two or more different chips.

znpy 4 hours ago

Weird to see Omron mentioned. I have a digital weight scale from them in my bathroom :)

snvzz 12 hours ago

m88k is an ISA primarily designed by Mitch Alsup.

Mitch Alsup has extensive experience in ISA design, has participated (tangentially) in informing RISC-V design process.

Recently, he's designed my66000, an interesting, fresh take at a new ISA that I recommend exploring.

Findecanor 9 hours ago

Alsup liked to write a lot about his my66000 on Usenet, but does not share documents about it with everyone. (Yes, I've emailed him and been ignored. I have had to piece together what I know about it from multiple posts.) Apparently it runs in FPGA and there are assemblers and compiler back-ends for it.

Like the 88000, the register file is shared between integer and floating point units. One interesting detail is that it supports CRAY-style vector operations using the same architectural registers, and downgrades to scalar operation automatically on interrupts. This means that the register state to load/store on context switches is small.

mrumon 4 hours ago

[dead]

helf 17 hours ago

[dead]

mohite 13 hours ago

[flagged]