Heterogeneous Computing with OpenCL 2.0 by David R. Kaeli

By David R. Kaeli

Heterogeneous Computing with OpenCL 2.0teaches OpenCL and parallel programming for advanced structures that can contain various machine architectures: multi-core CPUs, GPUs, and fully-integrated sped up Processing devices (APUs). This fully-revised variation comprises the newest improvements in OpenCL 2.0 together with:

• Shared digital reminiscence to extend programming flexibility and decrease information transfers that eat assets • Dynamic parallelism which reduces processor load and avoids bottlenecks • more suitable imaging help and integration with OpenGL

Designed to paintings on a number of structures, OpenCL may help you extra successfully application for a heterogeneous destiny. Written via leaders within the parallel computing and OpenCL groups, this ebook explores reminiscence areas, optimization recommendations, extensions, debugging and profiling. a number of case reviews and examples illustrate high-performance algorithms, allotting paintings throughout heterogeneous structures, embedded domain-specific languages, and may offer you hands-on OpenCL event to deal with quite a number basic parallel algorithms.

  • Updated content material to hide the most recent advancements in OpenCL 2.0, together with advancements in reminiscence dealing with, parallelism, and imaging aid
  • Explanations of rules and techniques to benefit parallel programming with OpenCL, from knowing the abstraction versions to entirely trying out and debugging entire purposes
  • Example code protecting photograph analytics, net plugins, particle simulations, video modifying, functionality optimization, and extra

Show description

Read or Download Heterogeneous Computing with OpenCL 2.0 PDF

Best design & architecture books

Operational Amplifiers: Theory and Design

Operational Amplifiers – conception and layout, moment variation provides a scientific circuit layout of operational amplifiers. Containing cutting-edge fabric in addition to the necessities, the e-book is written to entice either the circuit dressmaker and the approach fashion designer. it truly is proven that the topology of all operational amplifiers may be divided into 9 major total configurations.

Computer and Information Security Handbook

The second edition of this accomplished guide of laptop and knowledge security provides the main whole view of desktop safeguard and privateness on hand. It bargains in-depth insurance of protection concept, expertise, and perform as they relate to verified applied sciences in addition to fresh advances.

Languages, Design Methods, and Tools for Electronic System Design: Selected Contributions from FDL 2015

This ebook brings jointly a variety of the simplest papers from the eighteenth variation of the discussion board on specification and layout Languages convention (FDL), which happened on September 14-16, 2015, in Barcelona, Spain. FDL is a well-established overseas discussion board dedicated to dissemination of analysis effects, sensible reviews and new principles within the program of specification, layout and verification languages to the layout, modeling and verification of built-in circuits, complicated hardware/software embedded platforms, and mixed-technology platforms.

Extra resources for Heterogeneous Computing with OpenCL 2.0

Example text

Each CPU core supports out-of-order execution and can switch to a single-thread mode where a single thread can use all of the resources that previously had to be dedicated to multiple threads. In this sense, these SPARC architectures are becoming closer to other modern SMT designs such as those from Intel. Server chips, in general, try to maximize parallelism at the cost of some singlethreaded performance. As opposed to desktop chips, more area is devoted to supporting quick transitions between thread contexts.

6 MULTICORE ARCHITECTURES Conceptually at least, the obvious approach to increasing the amount of work performed per clock cycle is simply to clone a single CPU core multiple times on the chip. In the simplest case, each of these cores executes largely independently, sharing data through the memory system, usually through a cache coherency protocol. This design is a scaled-down version of traditional multisocket server symmetric multiprocessing systems that have been used to increase performance for decades, in some cases to extreme degrees.

The Cortex-A8, Cortex-A9, and Cortex-A15 cores, based on the ARMv7 ISA, are superscalar and multicore with up to four symmetric cores. The ARMv7-based cores optionally support the NEON SIMD instructions, giving 64and 128-bit SIMD operations in each core. ARMv8-A cores add a 64-bit instruction set, and updated NEON extensions with more 128-bit registers, double-precision support, and cryptography instructions. The high-end Cortex-A57, based on the ARMv8-A architecture, targets mid-range performance, has eight-wide instruction issue, and trading performance for power, an out-of-order pipeline.

Download PDF sample

Rated 4.19 of 5 – based on 23 votes