https://computerhistory.org/blog/transplanting-the-macs-cent...
I do wonder what the exact reasons were. Maybe the PPC (complete systems) could be made cheaper? Maybe Apple was worried about relying on a single vendor? I am kind of skeptical of the “corporate dealmaking” angle, because it seems like there are valid technical reasons to NOT choose the 88K. Namely, that it requires companion chips, and the whole system (board + chips) ends up being complicated and expensive.
I remember reading that the successor 88110 design with the support chips integrated was announced mainly to woo Apple but I don't know how true that is.
[1] The high end RISC machine project that went nowhere, which AFAIK became known as Tesseract when switched to PPC before it fizzled out.
But Apple didn't want to drop Motorola fully. So Motorola, Apple and IBM figured out that with some tweaks to the 88000 the could turn it into something POWER like. And that thing was PowerPC that Motorola supplied to Apple. That's my understanding.
The sad thing is Intel showed there was still life left in CISC, and Motorola themselves ended up circling back on 68k in the form of ColdFire which proved you could do for 68k what Intel did w/ the Pentium. But by then all their 68k customers had moved on from the 68k ISA.
68k like VAX was seen as dead avenue not only compared to RISC
But Intel had made much more design mistakes in the x86 ISA.
The truth is that the success of Intel and the failure of Motorola had absolutely nothing to do with the technical advantages or disadvantages of their CPU architectures.
Intel won and Motorola failed simply because IBM had chosen the Intel 8088 for the IBM PC.
Being chosen by IBM was partly due to luck and partly due to a bad commercial strategy of Motorola, which had chosen to develop in parallel 2 incompatible CPU architectures, MC68000 intended for the high end of the market and MC6809 for the low end of the market.
Perhaps more due to luck than due to wise planning, Intel had chosen to not divert their efforts into developing 2 distinct architectures (because they were already working in parallel at the 432 architecture for their future CPUs, which was a flop), so after developing the 8086 for the high end of the market they have just crippled it a little into the 8088 for the low end of the market.
Both 8086 and MC68000 were considered too expensive by IBM, but 8088 seemed a better choice than Z80 or MC6809, mainly by allowing more memory than 64 kB, which was already rather little in 1980.
In the following years, until 80486 Motorola succeeded to have a consistent lead in performance over Intel and they always introduced various innovations a few years before Intel, but they never succeeded to match again Intel in price and manufacturing reliability, because Intel had the advantage of producing an order of magnitude more CPUs, which helped solving all problems.
Eventually Intel matched and then exceeded the performance of the Motorola CPUs, despite the disadvantages of their architecture, due to having access to superior manufacturing, so Motorola had to restrict the use of their proprietary ISAs to the embedded markets, switching to IBM POWER for general-purpose computers.
N.b. 68000 was supposed to be a 16bit extension of 6800, which among others resulted in hilarious two layers of microcoding.
AS for IBM PC, 68000 had major flaw of being newer while 8086 had been available for longer and with second sources - 68000 was released at the same time as reduced capability 8088, while equivalent reduced capability model for 68k arrived in 1982.
Both MC68020 in 1984 and 80386 in 1985 have added to their base architectures various features taken from VAX, e.g. scaled indexed addressing. MC68020 has added slightly more features from VAX, e.g. bit-field operations, while 80386 has added only single bit operations. However none of the few features taken from VAX has made 68k more difficult to implement or less suitable for high speed implementations.
The wrong feature added in MC68020, which had to eventually be removed later, which consisted in the memory indirect addressing modes, was not taken from VAX. VAX did not have such addressing modes, only some much earlier computers had such addressing modes. Those addressing modes were added by someone from Motorola without being inspired by VAX in any way.
The VAX ISA was more difficult to decode at high speed, because it used byte encodings, like x86, but the VAX ISA was still much easier to decode at high speed than x86. The 68-k ISA, which used 16-bit encodings, was much easier to decode than x86, being intermediate in ease of decoding between a RISC ISA and VAX. The x86 ISA is probably the most difficult to decode ISA that has ever been used in a successful product, but at the huge amount of logical gates that can be used in a CPU nowadays that is no longer a problem.
The reduced capability variant of MC68000, i.e. MC68008, has been launched too late to be useful for IBM because Motorola had not realized that this is a good idea and they have done it only after the success of Intel 8088.
Simultaneously with MC68000, Motorola had launched MC6809, which Motorola believed to be sufficient for cheaper products. That was Motorola's mistake. MC6809 had a much more beautiful ISA than any other 8-bit CPU, but at the time when it was launched 8-bit CPUs were becoming obsolete for general-purpose computers, due to the launch of the 64 kilobit DRAM packages in 1980, which made economical the use of more than 64 kilobytes of memory in a PC, for which the 8-bit CPUs like Zilog Z80 and Motorola MC6809 were no longer suitable.
Harder to optimize or because of its orthogonal instruction set easier to write code for?
X86 is comparatively simple, with limited indirect addressing support to the point it can be inlined in execution pipeline, and many instructions either being actually "simple" to implement, or acceptable to do in slow path. M68k (and VAX even more) are comparatively harder to build modern superscalar chip for.
The 68k family had only one bad feature, which was introduced in MC68020, a set of memory indirect addressing modes.
Except for this feature, all instructions were as simple or simpler to implement than the x86 instructions.
MC68020, like also 80386, was a microprogrammed CPU with multi-cycle instructions, so the memory indirect addressing modes did not matter yet.
Those addressing modes became a problem later, in the CPUs with pipelined execution and hardwired control, because a single instruction with such addressing modes could generate multiple exceptions in the paged MMU and because any such instruction had to be decoded into multiple micro-operations in all cases.
For embedded computers, backwards compatibility is not important, so Motorola could correct this mistake in the ColdFire CPUs, but for applications like the Apple PCs they could not remove the legacy addressing modes, because that would have broken the existing programs.
Besides the bad memory indirect addressing modes, 68k had the same addressing modes as 80386, except that they could be used in a much more orthogonal way, which made the implementation of a CPU simpler, not more complex.
For a corrected 68k ISA, e.g. ColdFire, it is far easier to make a superscalar implementation with out-of-order execution than for x86.
Like I have said, 68k does not resemble VAX at all. The base 68k architecture resembles a porting to 32-bit of the DEC PDP-11 architecture. Over the base architecture, MC68020 has added a few features taken from VAX, mainly scaled indexed addressing and bit-field operations, and a few features taken from IBM 370, e.g. compare-and-swap.
Intel 80385 has also taken scaled indexed addressing from VAX, but instead of implementing bit-field operations it has added only-single bit operations. That is a negligible simplification of the implementation, which has been chosen by Intel only because their instruction format did not have any bits left for specifying the length of a bit field.
None of these features taken from VAX has caused any problems in either the Intel or the Motorola CPUs in high-speed pipelined implementations.
This is literally my point - the people involved in shift to RISC had figured it was a problem, and one aspect that made x86 easier to optimize long term (outside of Intel's huge market share) was that x86 had at most one memory operand per instruction (with certain exceptions). m68k's orthogonality meant both decode and execution are long-term harder, especially since you're going to have to support software that already uses those features - x86 has less of a legacy baggage there by virtue of not being as nice early on.
Clean break towards simpler internal design backed by compiled code statistics led most vendors - including intel - towards RISC style. Intel just happened to have constantly growing market share of their legacy design and never committed fully to abandoning it while lucking out in their simplistic design making it easier to support it long term.
«More CISC-y» does not by itself mean «harder to optimise for». For compilers, what matters far more is how regular the ISA is: how uniform the register file is, how consistent the condition codes are, how predictable the addressing modes are, and how many nasty special cases the backend has to tiptoe around.
The m68k family was certainly CISC, but it was also notably regular and fairly orthogonal (the legacy of the PDP-11 ISA, which was a major influence on m68k). Motorola’s own programming model gives one 16 programmer-visible 32-bit registers, with data and address registers used systematically, and consistent condition-code behaviour across instructions.
Contrast that with old x86, which was full of irregularities and quirks that compilers hate: segmented addressing, fewer truly general registers (5 general purpose registers), multiple implicit operands, and addressing rules tied to specific registers and modes. Even modern GCC documentation still has to mention x86 cases where a specific register role reduces register-allocation freedom, which is exactly the sort of target quirk that makes optimisation more awkward.
So…
68k: complex, but tidy
x86: complex, and grubby
What worked for x86, though, was the sheer size of the x86 market, which resulted in better compiler support, more tuning effort, and vastly more commercial optimisation work than m68k. But that is not the same claim as «68k was harder to optimise because it was more CISC-y».Turns out m68k orthogonality results in explosion of complexity of the physical implementation and is way harder to optimize, especially since compilers did use that. Whereas way more limited x86 was harder to write code generation for, but it meant there was simpler execution in silicon and less need to pander to slow path only instructions. And then on top of that you got the part where Intel's scale meant they could have two-three teams working on separate x86 cpu at the same time.
Even at the microarchitecture level, the hard part is not raw CISC-ness but irregularity and compatibility baggage. In that respect x86 was usually the uglier customer.
High-end x86 implementations ultimately scaled further because Motorola had less market pressure and fewer resources than Intel to keep throwing silicon at the problem, not because m68k was somehow harder to optimise.
Later high-performance m68k cores did what later x86 cores also did: translate the architected variable-length instruction stream into a more regular internal form. Motorola’s own MC68060 manual says the variable-length M68000 instruction stream is internally decoded into a fixed-length representation and then dispatched to dual pipelined RISC execution engines. That is not evidence of an ISA that was uniquely resistant to microarchitectural optimisation. It is evidence that Motorola used the same broad trick that became standard elsewhere: hide ISA ugliness behind a cleaner internal machine.
There is also a deeper point. The m68k ISA was rich, but it was comparatively regular and systematic at the architectural level. The m68k manuals show a clean register model and – notably – consistent condition-code behaviour across instruction forms. That kind of regularity is exactly what tends to help both compiler backends and hardware decode/control design. By contrast, x86’s biggest hardware pain historically came not from being «less CISC» than m68k, but from being more irregular and more burdened by backward compatibility.
Lastly, but not least importantly, CPU's were not the core business of Motorola – it was a large communications-and-semiconductors company, with the CPU's being just one product family within a much larger semiconductor business.
There was no clear understanding within the company of the rising importance of CPU's (and computing in general), hence the chronic underinvestment in the CPU product line – m68k did not see the light of highly advanced, performant designs purely because of that.
Mitch Alsup has extensive experience in ISA design, has participated (tangentially) in informing RISC-V design process.
Recently, he's designed my66000, an interesting, fresh take at a new ISA that I recommend exploring.
Like the 88000, the register file is shared between integer and floating point units. One interesting detail is that it supports CRAY-style vector operations using the same architectural registers, and downgrades to scalar operation automatically on interrupts. This means that the register state to load/store on context switches is small.