大学计算机教育国外著名教材系列：大规模并行处理器程序设计（影印版）

作者：（美）柯克　等著
出版：清华大学出版社 2010.7
页数：258
定价：36.00 元
ISBN-13：9787302229735
ISBN-10：7302229732 去豆瓣看看

0 0暂无人评价...

　　Preface
　　Acknowledgments
　　Dedication
　　CHAPTER 1 INTRODUCTION
　　1.1 GPUs as Parallel Computers
　　1.2 Architecture of a Modem GPU
　　1.3 Why More Speed or Parallelism?
　　1.4 Parallel Programming Languages and Models
　　1.5 0verarching Goals
　　1.6 Organization of the Book
　　CHAPTER 2 HISTORY OF GPU COMPUTING
　　2.1 Evolution of Graphics Pipelines
　　2.1.1 The Era of Fixed-Function Graphics Pipelines
　　2.1.2 Evolution of Programmable Real-Time Graphics
　　2.1.3 Unified Graphics and Computing Processors
　　2.1.4 GPGPU： An Intermediate Step
　　2.2 GPU Computing
　　2.2.1 Scalable GPUs
　　2.2.2 Recent Developments
　　2.3 Future Trends
　　CHAPTER 3 INTRODUCTION TO CUDA
　　3.1 Data Parallelism
　　3.2 CUDA Program Structure
　　3.3 A Matrix-Matrix Multiplication Example
　　3.4 Device Memories and Data Transfer
　　3.5 Kernel Functions and Threading
　　3.6 Summary
　　3.6.1 Function declarations
　　3.6.2 Kernel launch
　　3.6.3 Predefined variables
　　3.6.4 Runtime API
　　CHAPTER 4 CUDA THREADS
　　4.1 CUDA Thread Organization
　　4.2 blockIdx and threadIdx
　　4.3 Synchronization and Transparent Scalability
　　4.4 Thread Assignment
　　4.5 Thread Scheduling and Latency Tolerance
　　4.6 Summary
　　4.7 Exercises
　　CHAPTER 5 CUDATM MEMORIES
　　5.1 Importance of Memory Access Efficiency
　　5.2 CUDA Device Memory Types
　　5.3 A Strategy for Reducing Global Memory Traffic
　　5.4 Memory as a Limiting Factor to Parallelism
　　5.5 Summary
　　5.6 Exercises
　　CHAPTER 6 PERFORMANCE CONSIDERATIONS
　　6.1 More on Thread Execution
　　6.2 Global Memory Bandwidth
　　6.3 Dynamic Partitioning of SM Resources
　　6.4 Data Prefetching
　　6.5 Instruction Mix
　　6.6 Thread Granularity
　　6.7 Measured Performance and Summary
　　6.8 Exercises
　　CHAPTER 7 FLOATING POINT CONSIDERATIONS
　　7.1 Floating-Point Format
　　7.1.1 Normalized Representation of M
　　7.1.2 Excess Encoding of E
　　7.2 Representable Numbers
　　7.3 Special Bit Patterns and Precision
　　7.4 Arithmetic Accuracy and Rounding
　　7.5 Algorithm Considerations
　　7.6 Summary
　　7.7 Exercises
　　CHAPTER 8 APPLICATION CASE STUDY： ADVANCED MRI
　　RECONSTRUCTION
　　8.1 Application Background
　　8.2 Iterative Reconstruction
　　8.3 Computing FHd
　　Step 1. Determine the Kernel Parallelism Structure
　　Step 2. Getting Around the Memory Bandwidth Limitation.
　　Step 3. Using Hardware Trigonometry Functions
　　Step 4. Experimental Performance Tuning
　　8.4 Final Evaluation
　　8.5 Exercises
　　CHAPTER 9 APPLICATION CASE STUDY： MOLECULAR VISUALIZATION
　　AND ANALYSIS
　　9.1 Application Background
　　9.2 A Simple Kernel Implementation
　　9.3 Instruction Execution Efficiency
　　9.4 Memory Coalescing
　　9.5 Additional Performance Comparisons
　　9.6 Using Multiple GPUs
　　9.7 Exercises
　　CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL
　　THINKING
　　10.1 Goals of Parallel Programming
　　10.2 Problem Decomposition
　　10.3 Algorithm Selection
　　10.4 Computational Thinking
　　10.5 Exercises
　　CHAPTER 11 A BRIEF INTRODUCTION TO OPENCLTM
　　11.1 Background
　　11.2 Data Parallelism Model
　　11.3 Device Architecture
　　11.4 Kernel Functions
　　11.5 Device Management and Kernel Launch
　　11.6 Electrostatic Potential Map in OpenCL
　　11.7 Summary
　　11.8 Exercises
　　CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK
　　12.1 Goals Revisited
　　12.2 Memory Architecture Evolution
　　12.2.1 Large Virtual and Physical Address Spaces
　　12.2.2 Unified Device Memory Space
　　12.2.3 Configurable Caching and Scratch Pad
　　12.2.4 Enhanced Atomic Operations
　　12.2.5 Enhanced Global Memory Access
　　12.3 Kernel Execution Control Evolution
　　12.3.1 Function Calls within Kernel Functions
　　12.3.2 Exception Handling in Kernel Functions
　　12.3.3 Simultaneous Execution of Multiple Kernels
　　12.3.4 Interruptible Kernels
　　12，4 Core Performance
　　12.4.1 Double-Precision Speed
　　12.4.2 Better Control Flow Efficiency
　　12.5 Programming Environment
　　12.6 A Bright Outlook
　　APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION
　　SOURCE CODE
　　A.1 matrixmul.cu
　　A.2 matri mulgol d.cpp
　　A.3 matrixmul， h
　　A.4 assi st. h
　　A.5 Expected Output
　　APPENDIX B GPU COMPUTE CAPABILITIES
　　B.1 GPU Compute Capability Tables
　　B.2 Memory Coalescing Variations
　　Index

目　录内容简介

　　本书介绍了并行程序设计与GPU体系结构的基本概念，并详细探讨了用于构建并行程序的各种技术，用案例演示了并行程序设计的整个开发过程，即从并行计算的思想开始，直到最终实现实际且高效的并行程序。
　　本书特点
　　介绍了并行计算的思想，使得读者可以把这种问题的思考方式渗透到高性能并行计算中去。
　　介绍了CUDA的使用，CUDA是NVIDIA公司专门为大规模并行环境创建的一种软件开发工具。
　　介绍如何使用CUDA编程模式和OpenCL来获得高性能和高可靠性。

比价列表

商家

评价 (67)

折扣

价格