并行程序设计导论(英文版)
CHAPTER 1 Why Parallel Computing?
1.1 Why We Need Ever-Increasing Performance
1.2 Why We're Building Parallel Systems
1.3 Why We Need to Write Parallel Programs
1.4 How Do We Write Parallel Programs?
1.5 What We'll Be Doing
1.6 Concurrent, Parallel, Distributed
1.7 The Rest of the Book
1.8 A Word of Warning
查看完整
1.1 Why We Need Ever-Increasing Performance
1.2 Why We're Building Parallel Systems
1.3 Why We Need to Write Parallel Programs
1.4 How Do We Write Parallel Programs?
1.5 What We'll Be Doing
1.6 Concurrent, Parallel, Distributed
1.7 The Rest of the Book
1.8 A Word of Warning
查看完整
帕切克(Petm S.Pacheco),拥有佛罗里达州立大学数学专业博士学位。曾担任旧金山大学计算机主任,目前是旧金山大学数学系主任。近20年来,一直为本科生和研究生讲授并行计算课程。
采用教程形式,从简短的编程实例起步,一步步编写更有挑战性的程序。重点介绍分布式内存和共享式内存的程序设计、调试和性能评估。使用MPI、PTrlread和OperIMP等编程模型,强调实际动手开发并行程序。并行编程已不仅仅是面向专业技术人员的一门学科。如果想要全面开发机群和多核处理器的计算能力,那么学习分布式内存和共享式内存的并行编程技术是不可或缺的。由Peter S.Pacheco编著的《并行程序设计导论(英文版)》循序渐进地展示了如何利用MPI、PThread和OperlMP开发高效的并行程序,教给读者如何开发、调试分布式内存和共享式内存的程序,以及对程序进行性能评估。
CHAPTER 1 Why Parallel Computing?
1.1 Why We Need Ever-Increasing Performance
1.2 Why We're Building Parallel Systems
1.3 Why We Need to Write Parallel Programs
1.4 How Do We Write Parallel Programs?
1.5 What We'll Be Doing
1.6 Concurrent, Parallel, Distributed
1.7 The Rest of the Book
1.8 A Word of Warning
1.9 Typographical Conventions
1.10 Summary
1.11 Exercises
CHAPTER 2 Parallel Hardware and Parallel Software
2.1 Some Background
2.1.1 The von Neumann architecture
2.1.2 Processes, multitasking, and threads
2.2 Modifications to the von Neumann Model
2.2.1 The basics of caching
2.2.2 Cache mappings
2.2.3 Caches and programs: an example
2.2.4 Virtual memory
2.2.5 Instruction-level parallelism
2.2.6 Hardware multithreading.
2.3 Parallel Hardware
2.3.1 SIMD systems
2.3.2 MIMD systems
2.3.3 Interconnection networks
2.3.4 Cache coherence
2.3.5 Shared-memory versus distributed-memory
2.4 Parallel Software
2.4.1 Caveats
2.4.2 Coordinating the processes/threads
2.4.3 Shared-memory
2.4.4 Distributed-memory
2.4.5 Programming hybrid systems
2.5 Input and Output
2.6 Performance
2.6.1 Speedup and efficiency
2.6.2 Amdahl's law
2.6.3 Scalability
2.6.4 Taking timings
2.7 Parallel Program Design
2.7.1 An example
2.8 Writing and Running Parallel Programs
2.9 Assumptions
2.10 Summary
2.10.1 Serial systems
2.10.2 Parallel hardware
2.10.3 Parallel software
2.10.4 Input and output
2.10.5 Performance.
2.10.6 Parallel program design
2.10.7 Assumptions
2.11 Exercises
CHAPTER 3 Distributed-Memory Programming with MPI
3.1 Getting Started
3.1.1 Compilation and execution
3.1.2 MPI programs
3.1.3 MPI Init and MPI Finalize
3.1.4 Communicators, MPI Comm size and MPI Comm rank
3.1.5 SPMD programs
3.1.6 Communication
3.1.7 MPI Send
3.1.8 MPI Recv
3.1.9 Message matching
3.1.10 The status p argument
3.1.11 Semantics of MPI Send and MPI Recv
3.1.12 Some potential pitfalls
3.2 The Trapezoidal Rule in MPI
3.2.1 The trapezoidal rule
3.2.2 Parallelizing the trapezoidal rule
Contents xiii
3.3 Dealing with I/O
3.3.1 Output
3.3.2 Input
3.4 Collective Communication
3.4.1 Tree-structured communication
3.4.2 MPI Reduce
3.4.3 Collective vspoint-to-point communications
3.4.4 MPI Allreduce
3.4.5 Broadcast
3.4.6 Data distributions
3.4.7 Scatter
3.4.8 Gather
3.4.9 Allgather
3.5 MPI Derived Datatypes
3.6 Performance Evaluation of MPI Programs
3.6.1 Taking timings
3.6.2 Results
3.6.3 Speedup and efficiency
3.6.4 Scalability
3.7 A Parallel Sorting Algorithm
3.7.1 Some simple serial sorting algorithms
3.7.2 Parallel odd-even transposition sort
3.7.3 Safety in MPI programs
3.7.4 Final details of parallel odd-even sort
3.8 Summary
3.9 Exercises
3.10 Programming Assignments .
CHAPTER 4 Shared-Memory Programming with Pthreads .
4.1 Processes, Threads, and Pthreads
4.2 Hello, World
4.2.1 Execution
4.2.2 Preliminaries
4.2.3 Starting the threads
4.2.4 Running the threads
4.2.5 Stopping the threads
4.2.6 Error checking
4.2.7 Other approaches to thread startup
4.3 Matrix-Vector Multiplication
4.4 Critical Sections
xiv Contents
4.5 Busy-Waiting
4.6 Mutexes .
4.7 Producer-Consumer Synchronization and Semaphores
4.8 Barriers and Condition Variables
4.8.1 Busy-waiting and a mutex
4.8.2 Semaphores
4.8.3 Condition variables
4.8.4 Pthreads barriers
4.9 Read-Write Locks
4.9.1 Linked list functions
4.9.2 A multi-threaded linked list
4.9.3 Pthreads read-write locks
4.9.4 Performance of the various implementations
4.9.5 Implementing read-write locks
4.10 Caches, Cache Coherence, and False Sharing
4.11 Thread-Safety
4.11.1 Incorrect programs can produce correct output
4.12 Summary
4.13 Exercises
4.14 Programming Assignments .
CHAPTER 5 Shared-Memory Programming with OpenMP .
5.1 Getting Started
5.1.1 Compiling and running OpenMP programs
5.1.2 The program
5.1.3 Error checking
5.2 The Trapezoidal Rule
5.2.1 A first OpenMP version
5.3 Scope of Variables
5.4 The Reduction Clause .
5.5 The parallel for Directive
5.5.1 Caveats
5.5.2 Data dependences
5.5.3 Finding loop-carried dependences
5.5.4 Estimating
5.5.5 More on scope
5.6 More About Loops in OpenMP: Sorting .
5.6.1 Bubble sort
5.6.2 Odd-even transposition sort
5.7 Scheduling Loops
5.7.1 The schedule clause
5.7.3 The dynamic and guided schedule types
5.7.4 The runtime schedule type
5.7.5 Which schedule?
5.8 Producers and Consumers
5.8.1 Queues
5.8.2 Message-passing
5.8.3 Sending messages
5.8.4 Receiving messages
5.8.5 Termination detection
5.8.6 Startup
5.8.7 The atomic directive
5.8.8 Critical sections and locks
5.8.9 Using locks in the message-passing program
5.8.10 critical directives, atomic directives, or locks?
5.8.11 Some caveats
5.9 Caches, Cache Coherence, and False Sharing
5.10 Thread-Safety
5.10.1 Incorrect programs can produce correct output
5.11 Summary
5.12 Exercises
5.13 Programming Assignments .
CHAPTER 6 Parallel Program Development
6.1 Two n-Body Solvers
6.1.1 The problem
6.1.2 Two serial programs
6.1.3 Parallelizing the n-body solvers
6.1.4 A word about I/O
6.1.5 Parallelizing the basic solver using OpenMP
6.1.6 Parallelizing the reduced solver using OpenMP
6.1.7 Evaluating the OpenMP codes
6.1.8 Parallelizing the solvers using pthreads
6.1.9 Parallelizing the basic solver using MPI
6.1.10 Parallelizing the reduced solver using MPI
6.1.11 Performance of the MPI solvers
6.2 Tree Search
6.2.1 Recursive depth-first search
6.2.2 Nonrecursive depth-first search
6.2.3 Data structures for the serial implementations
6.2.6 A static parallelization of tree search using pthreads
6.2.7 A dynamic parallelization of tree search using pthreads
6.2.8 Evaluating the pthreads tree-search programs
6.2.9 Parallelizing the tree-search programs using OpenMP
6.2.10 Performance of the OpenMP implementations
6.2.11 Implementation of tree search using MPI and static
partitioning
6.2.12 Implementation of tree search using MPI and dynamic
partitioning
6.3 A Word of Caution
6.4 Which API?
6.5 Summary
6.5.1 Pthreads and OpenMP
6.5.2 MPI
6.6 Exercises
6.7 Programming Assignments
CHAPTER 7 Where to Go from Here
References
Index
^ 收 起
1.1 Why We Need Ever-Increasing Performance
1.2 Why We're Building Parallel Systems
1.3 Why We Need to Write Parallel Programs
1.4 How Do We Write Parallel Programs?
1.5 What We'll Be Doing
1.6 Concurrent, Parallel, Distributed
1.7 The Rest of the Book
1.8 A Word of Warning
1.9 Typographical Conventions
1.10 Summary
1.11 Exercises
CHAPTER 2 Parallel Hardware and Parallel Software
2.1 Some Background
2.1.1 The von Neumann architecture
2.1.2 Processes, multitasking, and threads
2.2 Modifications to the von Neumann Model
2.2.1 The basics of caching
2.2.2 Cache mappings
2.2.3 Caches and programs: an example
2.2.4 Virtual memory
2.2.5 Instruction-level parallelism
2.2.6 Hardware multithreading.
2.3 Parallel Hardware
2.3.1 SIMD systems
2.3.2 MIMD systems
2.3.3 Interconnection networks
2.3.4 Cache coherence
2.3.5 Shared-memory versus distributed-memory
2.4 Parallel Software
2.4.1 Caveats
2.4.2 Coordinating the processes/threads
2.4.3 Shared-memory
2.4.4 Distributed-memory
2.4.5 Programming hybrid systems
2.5 Input and Output
2.6 Performance
2.6.1 Speedup and efficiency
2.6.2 Amdahl's law
2.6.3 Scalability
2.6.4 Taking timings
2.7 Parallel Program Design
2.7.1 An example
2.8 Writing and Running Parallel Programs
2.9 Assumptions
2.10 Summary
2.10.1 Serial systems
2.10.2 Parallel hardware
2.10.3 Parallel software
2.10.4 Input and output
2.10.5 Performance.
2.10.6 Parallel program design
2.10.7 Assumptions
2.11 Exercises
CHAPTER 3 Distributed-Memory Programming with MPI
3.1 Getting Started
3.1.1 Compilation and execution
3.1.2 MPI programs
3.1.3 MPI Init and MPI Finalize
3.1.4 Communicators, MPI Comm size and MPI Comm rank
3.1.5 SPMD programs
3.1.6 Communication
3.1.7 MPI Send
3.1.8 MPI Recv
3.1.9 Message matching
3.1.10 The status p argument
3.1.11 Semantics of MPI Send and MPI Recv
3.1.12 Some potential pitfalls
3.2 The Trapezoidal Rule in MPI
3.2.1 The trapezoidal rule
3.2.2 Parallelizing the trapezoidal rule
Contents xiii
3.3 Dealing with I/O
3.3.1 Output
3.3.2 Input
3.4 Collective Communication
3.4.1 Tree-structured communication
3.4.2 MPI Reduce
3.4.3 Collective vspoint-to-point communications
3.4.4 MPI Allreduce
3.4.5 Broadcast
3.4.6 Data distributions
3.4.7 Scatter
3.4.8 Gather
3.4.9 Allgather
3.5 MPI Derived Datatypes
3.6 Performance Evaluation of MPI Programs
3.6.1 Taking timings
3.6.2 Results
3.6.3 Speedup and efficiency
3.6.4 Scalability
3.7 A Parallel Sorting Algorithm
3.7.1 Some simple serial sorting algorithms
3.7.2 Parallel odd-even transposition sort
3.7.3 Safety in MPI programs
3.7.4 Final details of parallel odd-even sort
3.8 Summary
3.9 Exercises
3.10 Programming Assignments .
CHAPTER 4 Shared-Memory Programming with Pthreads .
4.1 Processes, Threads, and Pthreads
4.2 Hello, World
4.2.1 Execution
4.2.2 Preliminaries
4.2.3 Starting the threads
4.2.4 Running the threads
4.2.5 Stopping the threads
4.2.6 Error checking
4.2.7 Other approaches to thread startup
4.3 Matrix-Vector Multiplication
4.4 Critical Sections
xiv Contents
4.5 Busy-Waiting
4.6 Mutexes .
4.7 Producer-Consumer Synchronization and Semaphores
4.8 Barriers and Condition Variables
4.8.1 Busy-waiting and a mutex
4.8.2 Semaphores
4.8.3 Condition variables
4.8.4 Pthreads barriers
4.9 Read-Write Locks
4.9.1 Linked list functions
4.9.2 A multi-threaded linked list
4.9.3 Pthreads read-write locks
4.9.4 Performance of the various implementations
4.9.5 Implementing read-write locks
4.10 Caches, Cache Coherence, and False Sharing
4.11 Thread-Safety
4.11.1 Incorrect programs can produce correct output
4.12 Summary
4.13 Exercises
4.14 Programming Assignments .
CHAPTER 5 Shared-Memory Programming with OpenMP .
5.1 Getting Started
5.1.1 Compiling and running OpenMP programs
5.1.2 The program
5.1.3 Error checking
5.2 The Trapezoidal Rule
5.2.1 A first OpenMP version
5.3 Scope of Variables
5.4 The Reduction Clause .
5.5 The parallel for Directive
5.5.1 Caveats
5.5.2 Data dependences
5.5.3 Finding loop-carried dependences
5.5.4 Estimating
5.5.5 More on scope
5.6 More About Loops in OpenMP: Sorting .
5.6.1 Bubble sort
5.6.2 Odd-even transposition sort
5.7 Scheduling Loops
5.7.1 The schedule clause
5.7.3 The dynamic and guided schedule types
5.7.4 The runtime schedule type
5.7.5 Which schedule?
5.8 Producers and Consumers
5.8.1 Queues
5.8.2 Message-passing
5.8.3 Sending messages
5.8.4 Receiving messages
5.8.5 Termination detection
5.8.6 Startup
5.8.7 The atomic directive
5.8.8 Critical sections and locks
5.8.9 Using locks in the message-passing program
5.8.10 critical directives, atomic directives, or locks?
5.8.11 Some caveats
5.9 Caches, Cache Coherence, and False Sharing
5.10 Thread-Safety
5.10.1 Incorrect programs can produce correct output
5.11 Summary
5.12 Exercises
5.13 Programming Assignments .
CHAPTER 6 Parallel Program Development
6.1 Two n-Body Solvers
6.1.1 The problem
6.1.2 Two serial programs
6.1.3 Parallelizing the n-body solvers
6.1.4 A word about I/O
6.1.5 Parallelizing the basic solver using OpenMP
6.1.6 Parallelizing the reduced solver using OpenMP
6.1.7 Evaluating the OpenMP codes
6.1.8 Parallelizing the solvers using pthreads
6.1.9 Parallelizing the basic solver using MPI
6.1.10 Parallelizing the reduced solver using MPI
6.1.11 Performance of the MPI solvers
6.2 Tree Search
6.2.1 Recursive depth-first search
6.2.2 Nonrecursive depth-first search
6.2.3 Data structures for the serial implementations
6.2.6 A static parallelization of tree search using pthreads
6.2.7 A dynamic parallelization of tree search using pthreads
6.2.8 Evaluating the pthreads tree-search programs
6.2.9 Parallelizing the tree-search programs using OpenMP
6.2.10 Performance of the OpenMP implementations
6.2.11 Implementation of tree search using MPI and static
partitioning
6.2.12 Implementation of tree search using MPI and dynamic
partitioning
6.3 A Word of Caution
6.4 Which API?
6.5 Summary
6.5.1 Pthreads and OpenMP
6.5.2 MPI
6.6 Exercises
6.7 Programming Assignments
CHAPTER 7 Where to Go from Here
References
Index
^ 收 起
帕切克(Petm S.Pacheco),拥有佛罗里达州立大学数学专业博士学位。曾担任旧金山大学计算机主任,目前是旧金山大学数学系主任。近20年来,一直为本科生和研究生讲授并行计算课程。
采用教程形式,从简短的编程实例起步,一步步编写更有挑战性的程序。重点介绍分布式内存和共享式内存的程序设计、调试和性能评估。使用MPI、PTrlread和OperIMP等编程模型,强调实际动手开发并行程序。并行编程已不仅仅是面向专业技术人员的一门学科。如果想要全面开发机群和多核处理器的计算能力,那么学习分布式内存和共享式内存的并行编程技术是不可或缺的。由Peter S.Pacheco编著的《并行程序设计导论(英文版)》循序渐进地展示了如何利用MPI、PThread和OperlMP开发高效的并行程序,教给读者如何开发、调试分布式内存和共享式内存的程序,以及对程序进行性能评估。
比价列表