the past
The computing industry as we know it was born out of the government and academic projects that sought to improve on the ENIAC machine by enabling stored program computing. The EDSAC project showed that a stored program computer is possible, and its complex control logic motivated the microprogrammable control unit. Many subsequent developments in computer architecture can be traced back to the philosophy and goals of microprogramming:
- Reduce the complexity of control. Regular hardware structures, interconnects, and single-cycle microinstructions are paralleled in the development of RISC architectures and modern CMP (multicore) designs.
- Improve programmer productivity. The microprogrammed control store translates assembly (the dominant language of the 50s and 60s) into sequences of machine operations. This translation mechanism leads to increased use of complex instructions and resultant CISC architectures.
- Compatibility across machine generations. By implementing a language interpreter in microcode, an assembly language can stay fixed while the underlying machine changes. Binary compatibility and bitslicing result from interpreted assembly and lead to ISA families. This use of microprogramming can still be seen in the Intel x86 architecture.
- Extract hardware parallelism.
New hardware resources can be used in parallel by packing more operations in (wider) microinstructions, known as horizontal microprogramming. The technique of issuing multiple parallel instructions leads to VLIW architectures. VLIW research drove compiler technology, because the compiler is responsible for determining the parallel instructions to pack in a VLIW instruction word. Intel brought VLIW to market with the Itaniam, the IA-64 ISA, but that has not seen very much success.
the present
In 1995, the memory wall was identified as the performance bottleneck caused by processing speed improving faster than off-chip peripheral performance, especially that of memory. It seemed that the "free lunch" of faster software due to better hardware would go away. However, computer architects brought to market techniques that mitigate memory access latencies: out-of-order (OoO) execution, caching, prefetching, and simultaneous multithreading (SMT). These are reasonably well-explained on wikipedia, although no replacement for an authoritative source, such as Hennessy and Patterson.
In the 2000s, the power wall (see Hennessy and Patterson, 2009) results directly from packing more transistors into chips. Enter the multicore. By using simpler pipelines in parallel, computer architects can keep clock frequencies and feature density low while utilizing the available transistors. The main approach to exploit parallelism is the chip multiprocessor (CMP), also called the multicore microprocessor or simply multicore. Intel, AMD, Sun Microsystems, IBM, and others have brought multicore computing to the public, and the paradigm seems poised to continue for at least the next generation of processor families.
the future
My opinion is that, barring unforeseen technology, the multicore era may herald the end of the "general purpose processor." If (when) the cloud computing paradigm catches on, the need for powerful desktops will be reduced and, therefore, the market factors that drive powerful general purpose processors will decline. Special-purpose processors will appear that serve niche markets. Already this is seen in the form of GPGPU-CPU hybrids, which serve the substantial videogaming and streaming video markets. Because not all application spaces can benefit from SIMD parallelism, other special-purpose hardware will appear.
My belief is that special-purpose hardware will take some form of (re)configurable logic, such as FPGA, CGRA, or an as-yet-unknown technology. The performance gaps of current reconfigurable hardware to application specific integrated circuits (ASICs) will be less impactful on future applications in conjunction with the shift toward cloud computing. Furthermore, integrating reconfigurable logic with hard-wired functionality such as processor cores, e.g. http://www.eetimes.com/electronics-news/4210937/Intel-rolls-six-merged-Atom-FPGA-chips, further reduces the power and performance gaps. I'm not convinced FPGAs are the solution, but I think something similar will become the "general purpose processor" of the future, or at least will have the dominant market share for (non-embedded) end use.
I think the industry will continue to move forward in four broad directions: power-conscious servers, performance-hungry desktop users, embedded devices, and thin clients. Servers are likely to continue using the multicore model, since they benefit greatly from thread-level parallelism. Compute-heavy desktops are likely to adapt the multicore and GPGPU models, while also relying on the ability to compose cores for better single processor performance for applications that do not benefit from either thread-level or SIMD parallelism. Ever smaller, power-conscious platforms will enable processor-based computation in more everyday items, driving the development of concepts such as smart homes. Finally, thin clients will continue to evolve, from the mobile platforms such as smartphones and tablet computers to the next generation "cloud clients." Such clients will benefit from simple low-power processing cores and flexible hardware that can adapt to the specific workload of the client.
I believe the processor architectures of the future will be divergent, and there will be a variety of computer architecture families. Advances in multicore computing will improve servers and compute-heavy desktops. Single processor performance will continue to be important, especially in embedded and thin client markets, but also in compute-heavy desktops. Power will remain an important factor for all processors, but the programming tools to support the variety of processing models will be as important. Uniprocessor programming is pretty well understood, but programming parallel and reconfigurable computers remains hard.
The challenge for computer architects will continue to be to design platforms for servers, desktops, embedded systems, or thin clients that provide (1) appropriate performance, (2) power-consciousness, and (3) support for programmer abstractions.