Preface xv
CHAPTERS
1Computer Abstractions and Technology 2
1.1 Introduction 3
1.2 Eight Great Ideas in Computer Architecture 11
1.3 Below Your Program 13
1.4 Under the Covers 16
1.5 Technologies for Building Processors and Memory 24
1.6 Performance 28
1.7 The Power Wall 40
1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43
1.9 Real Stuff: Benchmarking the Intel Core i7 46
1.10 Fallacies and Pitfalls 49
1.11 Concluding Remarks 52
1.12 Historical Perspective and Further Reading 54
1.13 Exercises 54
2Instructions: Language of the Computer 60
2.1 Introduction 62
2.2 Operations of the Computer Hardware 63
2.3 Operands of the Computer Hardware 67
2.4 Signed and Unsigned Numbers 75
2.5 Representing Instructions in the Computer 82
2.6 Logical Operations 90
2.7 Instructions for Making Decisions 93
2.8 Supporting Procedures in Computer Hardware 100
2.9 Communicating with People 110
2.10 LEGv8 Addressing for Wide Immediates and Addresses 115
2.11 Parallelism and Instructions: Synchronization 125
2.12 Translating and Starting a Program 128
2.13 A C Sort Example to Put it All Together 137
2.14 Arrays versus Pointers 146
2.15 Advanced Material: Compiling C and Interpreting Java 150
2.16 Real Stuff: MIPS Instructions 150
2.17 Real Stuff: ARMv7 32-bit Instructions 152
2.18 Real Stuff: x86 Instructions 154
2.19 Real Stuff: The Rest of the ARMv8 Instruction Set 163
2.20 Fallacies and Pitfalls 169
2.21 Concluding Remarks 171
2.22 Historical Perspective and Further Reading 173
2.23 Exercises 174
3Arithmetic for Computers 186
3.1 Introduction 188
3.2 Addition and Subtraction 188
3.3 Multiplication 191
3.4 Division 197
3.5 Floating Point 205
3.6 Parallelism and Computer Arithmetic: Subword Parallelism 230
3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions in x86 232
3.8 Real Stuff: The Rest of the ARMv8 Arithmetic Instructions 234
3.9 Going Faster: Subword Parallelism and Matrix Multiply 238
3.10 Fallacies and Pitfalls 242
3.11 Concluding Remarks 245
3.12 Historical Perspective and Further Reading 248
3.13 Exercises 249
4The Processor 254
4.1 Introduction 256
4.2 Logic Design Conventions 260
4.3 Building a Datapath 263
4.4 A Simple Implementation Scheme 271
4.5 An Overview of Pipelining 283
4.6 Pipelined Datapath and Control 297
4.7 Data Hazards: Forwarding versus Stalling 316
4.8 Control Hazards 328
4.9 Exceptions 336
4.10 Parallelism via Instructions 342
4.11 Real Stuff: The ARM Cortex-A53 and Intel Core i7 Pipelines 355
4.12 Going Faster: Instruction-Level Parallelism and Matrix Multiply 363
4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 366
4.14 Fallacies and Pitfalls 366
4.15 Concluding Remarks 367
4.16 Historical Perspective and Further Reading 368
4.17 Exercises 368
5Large and Fast: Exploiting Memory Hierarchy 386
5.1 Introduction 388
5.2 Memory Technologies 392
5.3 The Basics of Caches 397
5.4 Measuring and Improving Cache Performance 412
5.5 Dependable Memory Hierarchy 432
5.6 Virtual Machines 438
5.7 Virtual Memory 441
5.8 A Common Framework for Memory Hierarchy 465
5.9 Using a Finite-State Machine to Control a Simple Cache 472
5.10 Parallelism and Memory Hierarchy: Cache Coherence 477
5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 481
5.12 Advanced Material: Implementing Cache Controllers 482
5.13 Real Stuff: The ARM Cortex-A53 and Intel Core i7 Memory Hierarchies 482
5.14 Real Stuff: The Rest of the ARMv8 System and Special Instructions 487
5.15 Going Faster: Cache Blocking and Matrix Multiply 488
5.16 Fallacies and Pitfalls 491
5.17 Concluding Remarks 496
5.18 Historical Perspective and Further Reading 497
5.19 Exercises 497
6Parallel Processors from Client to Cloud 514
6.1 Introduction 516
6.2 The Difficulty of Creating Parallel Processing Programs 51