
OpenCL Programming Guide
by Munshi, Aaftab; Gaster, Benedict; Mattson, Timothy G.; Fung, James; Ginsburg, Dan-
Receive Free Shipping To The More Store!*
*Marketplace items do not qualify for the free shipping promotion.
-
eCampus.com Device Compatibility Matrix
Click the device icon to install or view instructions
Rent Book
Rent Digital
New Book
We're Sorry
Sold Out
Used Book
We're Sorry
Sold Out
How Marketplace Works:
- This item is offered by an independent seller and not shipped from our warehouse
- Item details like edition and cover design may differ from our description; see seller's comments before ordering.
- Sellers much confirm and ship within two business days; otherwise, the order will be cancelled and refunded.
- Marketplace purchases cannot be returned to eCampus.com. Contact the seller directly for inquiries; if no response within two days, contact customer service.
- Additional shipping costs apply to Marketplace purchases. Review shipping costs at checkout.
Summary
Author Biography
Benedict R. Gaster is a software architect working on programming models for next-generation heterogeneous processors, in particular looking at high-level abstractions for parallel programming on the emerging class of processors that contain both CPUs and accelerators such as GPUs. Benedict has contributed extensively to the OpenCL’s design and has represented AMD at the Khronos Group open standard consortium. Benedict has a Ph.D. in computer science for his work on type systems for extensible records and variants. He has been working at AMD since 2008.
Timothy G. Mattson is an old-fashioned parallel programmer, having started in the mid-eighties with the Caltech Cosmic Cube and continuing to the present. Along the way, he has worked with most classes of parallel computers (vector supercomputers, SMP, VLIW, NUMA, MPP, clusters, and many-core processors). Tim has published extensively, including the books Patterns for Parallel Programming (with Beverly Sanders and Berna Massingill, published by Addison-Wesley, 2004) and An Introduction to Concurrency in Programming Languages (with Matthew J. Sottile and Craig E. Rasmussen, published by CRC Press, 2009). Tim has a Ph.D. in chemistry for his work on molecular scattering theory. He has been working at Intel since 1993.
James Fung has been developing computer vision on the GPU as it progressed from graphics to general-purpose computation. James has a Ph.D. in electrical and computer engineering from the University of Toronto and numerous IEEE and ACM publications in the areas of parallel GPU Computer Vision and Mediated Reality. He is currently a Developer Technology Engineer at NVIDIA, where he examines computer vision and image processing on graphics hardware.
Dan Ginsburg currently works at Children’s Hospital Boston as a Principal Software Architect in the Fetal-Neonatal Neuroimaging and Development Science Center, where he uses OpenCL for accelerating neuroimaging algorithms. Previously, he worked for Still River Systems developing GPU-accelerated image registration software for the Monarch 250 proton beam radiotherapy system. Dan was also Senior Member of Technical Staff at AMD, where he worked for over eight years in a variety of roles, including developing OpenGL drivers, creating desktop and hand-held 3D demos, and leading the development of handheld GPU developer tools. Dan holds a B.S. in computer science from Worcester Polytechnic Institute and an M.B.A. from Bentley University.
Table of Contents
Tables xxi
Listings xxv
Foreword xxix
Preface xxxiii
Acknowledgments xli
About the Authors xliii
Part I: The OpenCL 1.1 Language and API 1
Chapter 1: An Introduction to OpenCL 3
What Is OpenCL, or . . . Why You Need This Book 3
Our Many-Core Future: Heterogeneous Platforms 4
Software in a Many-Core World 7
Conceptual Foundations of OpenCL 11
OpenCL and Graphics 29
The Contents of OpenCL 30
The Embedded Profile 35
Learning OpenCL 36
Chapter 2: HelloWorld: An OpenCL Example 39
Building the Examples 40
HelloWorld Example 45
Checking for Errors in OpenCL 57
Chapter 3: Platforms, Contexts, and Devices 63
OpenCL Platforms 63
OpenCL Devices 68
OpenCL Contexts 83
Chapter 4: Programming with OpenCL C 97
Writing a Data-Parallel Kernel Using OpenCL C 97
Scalar Data Types 99
Vector Data Types 102
Other Data Types 108
Derived Types 109
Implicit Type Conversions 110
Explicit Casts 116
Explicit Conversions 117
Reinterpreting Data as Another Type 121
Vector Operators 123
Qualifiers 133
Keywords 141
Preprocessor Directives and Macros 141
Restrictions 146
Chapter 5: OpenCL C Built-In Functions 149
Work-Item Functions 150
Math Functions 153
Integer Functions 168
Common Functions 172
Geometric Functions 175
Relational Functions 175
Vector Data Load and Store Functions 181
Synchronization Functions 190
Async Copy and Prefetch Functions 191
Atomic Functions 195
Miscellaneous Vector Functions 199
Image Read and Write Functions 201
Chapter 6: Programs and Kernels 217
Program and Kernel Object Overview 217
Program Objects 218
Kernel Objects 237
Chapter 7: Buffers and Sub-Buffers 247
Memory Objects, Buffers, and Sub-Buffers Overview 247
Creating Buffers and Sub-Buffers 249
Querying Buffers and Sub-Buffers 257
Reading, Writing, and Copying Buffers and Sub-Buffers 259
Mapping Buffers and Sub-Buffers 276
Chapter 8: Images and Samplers 281
Image and Sampler Object Overview 281
Creating Image Objects 283
Creating Sampler Objects 292
OpenCL C Functions for Working with Images 295
Transferring Image Objects 299
Chapter 9: Events 309
Commands, Queues, and Events Overview 309
Events and Command-Queues 311
Event Objects 317
Generating Events on the Host 321
Events Impacting Execution on the Host 322
Using Events for Profiling 327
Events Inside Kernels 332
Events from Outside OpenCL 333
Chapter 10: Interoperability with OpenGL 335
OpenCL/OpenGL Sharing Overview 335
Querying for the OpenGL Sharing Extension 336
Initializing an OpenCL Context for OpenGL Interoperability 338
Creating OpenCL Buffers from OpenGL Buffers 339
Creating OpenCL Image Objects from OpenGL Textures 344
Querying Information about OpenGL Objects 347
Synchronization between OpenGL and OpenCL 348
Chapter 11: Interoperability with Direct3D 353
Direct3D/OpenCL Sharing Overview 353
Initializing an OpenCL Context for Direct3D Interoperability 354
Creating OpenCL Memory Objects from Direct3D Buffers and Textures 357
Acquiring and Releasing Direct3D Objects in OpenCL 361
Processing a Direct3D Texture in OpenCL 363
Processing D3D Vertex Data in OpenCL 366
Chapter 12: C++ Wrapper API 369
C++ Wrapper API Overview 369
C++ Wrapper API Exceptions 371
Vector Add Example Using the C++ Wrapper API 374
Chapter 13: OpenCL Embedded Profile 383
OpenCL Profile Overview 383
64-Bit Integers 385
Images 386
Built-In Atomic Functions 387
Mandated Minimum Single-Precision Floating-Point Capabilities 387
Determining the Profile Supported by a Device in an OpenCL C Program 390
Part II: OpenCL 1.1 Case Studies 391
Chapter 14: Image Histogram 393
Computing an Image Histogram 393
Parallelizing the Image Histogram 395
Additional Optimizations to the Parallel Image Histogram 400
Computing Histograms with Half-Float or Float Values for Each Channel 403
Chapter 15: Sobel Edge Detection Filter 407
What Is a Sobel Edge Detection Filter? 407
Implementing the Sobel Filter as an OpenCL Kernel 407
Chapter 16: Parallelizing Dijkstra’s Single-Source Shortest-Path Graph Algorithm 411
Graph Data Structures 412
Kernels 414
Leveraging Multiple Compute Devices 417
Chapter 17: Cloth Simulation in the Bullet Physics SDK 425
An Introduction to Cloth Simulation 425
Simulating the Soft Body 429
Executing the Simulation on the CPU 431
Changes Necessary for Basic GPU Execution 432
Two-Layered Batching 438
Optimizing for SIMD Computation and Local Memory 441
Adding OpenGL Interoperation 446
Chapter 18: Simulating the Ocean with Fast Fourier Transform 449
An Overview of the Ocean Application 450
Phillips Spectrum Generation 453
An OpenCL Discrete Fourier Transform 457
A Closer Look at the FFT Kernel 463
A Closer Look at the Transpose Kernel 467
Chapter 19: Optical Flow 469
Optical Flow Problem Overview 469
Sub-Pixel Accuracy with Hardware Linear Interpolation 480
Application of the Texture Cache 480
Using Local Memory 481
Early Exit and Hardware Scheduling 483
Efficient Visualization with OpenGL Interop 483
Performance 484
Chapter 20: Using OpenCL with PyOpenCL 487
Introducing PyOpenCL 487
Running the PyImageFilter2D Example 488
PyImageFilter2D Code 488
Context and Command-Queue Creation 492
Loading to an Image Object 493
Creating and Building a Program 494
Setting Kernel Arguments and Executing a Kernel 495
Reading the Results 496
Chapter 21: Matrix Multiplication with OpenCL 499
The Basic Matrix Multiplication Algorithm 499
A Direct Translation into OpenCL 501
Increasing the Amount of Work per Kernel 506
Optimizing Memory Movement: Local Memory 509
Performance Results and Optimizing the Original CPU Code 511
Chapter 22: Sparse Matrix-Vector Multiplication 515
Sparse Matrix-Vector Multiplication (SpMV) Algorithm 515
Description of This Implementation 518
Tiled and Packetized Sparse Matrix Representation 519
Header Structure 522
Tiled and Packetized Sparse Matrix Design Considerations 523
Optional Team Information 524
Tested Hardware Devices and Results 524
Additional Areas of Optimization 538
Appendix: Summary of OpenCL 1.1 541
The OpenCL Platform Layer 541
The OpenCL Runtime 543
Buffer Objects 544
Program Objects 546
Kernel and Event Objects 547
Supported Data Types 550
Vector Component Addressing 552
Preprocessor Directives and Macros 555
Specify Type Attributes 555
Math Constants 556
Work-Item Built-In Functions 557
Integer Built-In Functions 557
Common Built-In Functions 559
Math Built-In Functions 560
Geometric Built-In Functions 563
Relational Built-In Functions 564
Vector Data Load/Store Functions 567
Atomic Functions 568
Async Copies and Prefetch Functions 570
Synchronization, Explicit Memory Fence 570
Miscellaneous Vector Built-In Functions 571
Image Read and Write Built-In Functions 572
Image Objects 573
Image Formats 576
Access Qualifiers 576
Sampler Objects 576
Sampler Declaration Fields 577
OpenCL Device Architecture Diagram 577
OpenCL/OpenGL Sharing APIs 577
OpenCL/Direct3D 10 Sharing APIs 579
Index 581
An electronic version of this book is available through VitalSource.
This book is viewable on PC, Mac, iPhone, iPad, iPod Touch, and most smartphones.
By purchasing, you will be able to view this book online, as well as download it, for the chosen number of days.
Digital License
You are licensing a digital product for a set duration. Durations are set forth in the product description, with "Lifetime" typically meaning five (5) years of online access and permanent download to a supported device. All licenses are non-transferable.
More details can be found here.
A downloadable version of this book is available through the eCampus Reader or compatible Adobe readers.
Applications are available on iOS, Android, PC, Mac, and Windows Mobile platforms.
Please view the compatibility matrix prior to purchase.