GPU Talk

Home » Posts tagged 'HMI'

Tag Archives: HMI

GC Nano User Interface (UI) Acceleration

M3_Banner_GCNano_Wearables_FINAL

Background and Overview

Crisp, clear, and responsive user interface HMI (human machine interface) has become equally important to the user experience as the content or the device form factor. A beautifully crafted smartphone that uses a combination of brushed titanium and smudge-proof glass may look great in the hand, but the user will quickly opt for another product if the user interface stutters or the screen is hard to read because of aliased and inconsistent fonts. The same scenario also applies to HMI in wearables and IoT devices, which is the focus of this white paper.

The goal of a well-designed wearable/IoT HMI is to make reading or glancing at the screen intuitive and natural, yet engaging. In other words, it is about a consistent, seamless interaction between user and device. Since device screens are smaller, information needs to be displayed in a simplified, uncluttered way with only relevant data (text, images, icons, video, etc.) rendered and composed onscreen. Smaller device screens do not directly translate into a device with less processing capabilities. The opposite can be true since upcoming devices need to perform real time processing (UI display composition, communications, sensor processing, analytics, etc.) as part of a single or network of IoT nodes. Some wearables/IoTs are taking technologies found in low/mid-range smartphone application processors and customizing parts of the IP specifically for wearables. One important IP that device OEMs need is the graphics processing unit (GPU) to accelerate HMI screen composition at ultra-low power.

In addition, a couple hot new trends in these emerging markets is personalized screen UI or a unified UI that spans all devices from cars and 4K TVs, to smartphones, wearables and embedded IoT screens to give users a consistent, ubiquitous screen experience across a given operating system (OS) platform, regardless of the underlying hardware (i.e. SoC/MCU). This will enable a cross vendor solution where vendor A’s smartwatch will work correctly with vendor B’s TV and vendor C’s smartphone. Google and Microsoft have recently announced support for these features in their Android Material Design and Windows 9 releases, respectively. Support for this requires an OpenGL ES 2.0 capable GPU at the minimum, with optional/advanced features using OpenGL ES 3.x. Google also has their light weight wearables OS called Android Wear that requires a GPU to give the UI a similar look-and-feel as their standard smartphone/tablet/TV Android OS.

 

Evolution of Wearables/IoT

Figure 1: Evolving Wearable and IoT Devices Requiring GPUs

 

Vivante GPU Product Overview

The underlying technology that accelerates HMI user experience is the graphics processing unit (GPU). GPUs natively do screen/UI composition including multi-layer blending from multiple sources (ISP/Camera, Video, etc.), image filtering, font rendering/acceleration, 3D effects (transition, perspective view, etc.) and lots more. Vivante has a complete top-to-bottom product line of GPU technologies that include the GC Vega and GC Nano Series:

  • GC Vega Series targets SoCs that need the latest and greatest GPU hardware and features like OpenGL ES 3.1, Full Android Extension Pack (AEP) Support including hardware tessellation / geometry shaders (TS/GS), DirectX 12, close to the metal GPU programming, hybrid ray tracing, zero driver overhead, sensor fusion, and GPU-Compute for vision processing using OpenVX, OpenCV or OpenCL, bundled in the most aggressive PPA and feature-complete design. Target markets range from high end wearables and low/mid-range mobile devices up to 4K TVs and GPUs for server virtualization.
  • GC Nano Series falls on the other side of the spectrum and targets devices that are making a revolutionary push into consumer products like wearables and IoT (smart homes / appliances, information gadgets, etc.) with GPU rendered HMI / UI. This core is specifically designed to work in resource constrained environments where CPU, memory (both on-chip and DDR), battery, and bandwidth are very limited. GC Nano is also optimized to work with MCU platforms for smaller form factors that require UI composition acceleration at 30/60+ FPS.

 

Vivante GPU Product Line and Markets

Figure 2: Vivante GPU Product Line and Target Markets

 

GC Nano Overview

GC Nano Series consists of the following products starting with the GC Nano Lite (entry), GC Nano (mainstream) and GC Nano Ultra (mid/high).

 

Fig3

Figure 3: GC Nano Product Line

 

GC Nano Series benefits include:

  • Silicon Area and Power Optimized: Tiny silicon footprint that maximizes performance-per-area for silicon constrained SoCs means vendors can add enhanced graphics functionality to their designs without exceeding silicon/power budgets and still maintain responsive and smooth UI performance. GC Nano maximizes battery life with ultra-low power consumption and thermals with minimal dynamic power and near zero leakage power.
  • Smart Composition: Vivante’s Immediate Mode Rendering (IMR) architecture reduces composition bandwidth, latency, overhead and power by intelligently composing and updating only screen regions that change. Composition works either with GC Nano composing all screen layers (graphics, background, images, videos, text, etc.) or through a tightly coupled design where the GC Nano and display controller/processor (3rd party or Vivante DC core) work in tandem for UI composition. Data can also be compressed / decompressed through Vivante’s DEC compression IP core to further reduce bandwidth.
  • Wearables and IoT Ready: Ultra-lightweight vector graphics (GC Nano Lite) and OpenGL ES 2.0 (GC Nano, GC Nano Ultra) drivers, SDK and tools to easily transition wearables and IoT screens to consumer level graphical interfaces. The GCcNano package also includes tutorials, sample code, and documentation to help developers optimize or port their code.
  • Designed for MCU/MPU Platforms: Efficient design to offload and significantly reduce system resources including complete UI / composition and display controller integration, minimal CPU overhead, DDR-less and flash memory only configurations, bandwidth modulation, close-to-the-metal GPU drivers, and wearables / IoT-specific GPU features to shrink silicon size. The tiny software code size puts less constraints on memory size, speeds up GPU initialization/boot-up times and allows instant-on UI composition for screens that need to display information at the push of a button.
  • Ecosystem and Software Support: Developers can take advantage of the lightweight NanoUI or OpenGL ES API to further enhance or customize their solutions. Large industry support on existing Vivante products include the GC Nano / GC Nano Ultra product line on Android, Android Wear and embedded UI solutions from key partners covering tools for font, artwork and Qt development environments.
  • Compute Ready: As the number of wearable / IoT (processing) nodes grows by several tens of billions of units in the next few years, bandwidth on data networks could be an issue with an always-on, always-connected, always-processing node. GC Nano helps with this by performing ultra-low power processing (GFLOP / GINT ops) at the node and only transmits useful compressed data as needed. Examples include sensor fusion calculations and image/video bandwidth reduction.

Vivante’s software driver stack, SDK and toolkit will support its NanoUI API that brings close-to-the-metal GPU acceleration for no-OS / no-DDR options on GC Nano Lite and the OpenGL ES 2.0 API (optional 3.x) for more advanced solutions that include proprietary or high-level operating systems like embedded Linux, Tizen™, Android™, Android™ Wear and other RTOS that require OpenGL ES 2.0+ in the smallest memory footprint. These various OS / non-OS platforms will form the base of next generation wearables and IoT that bring personalized, unique and optimized experiences to each person. The GC Nano drivers include aggressive power savings, intelligent composition and rendering, and bandwidth modulation that allow OEMs and developers to build rich visual experiences on wearables and IoT using an ultralight UI / composition or 3D graphics driver.

Many of the GC Nano innovations create a complete “visual” wearables MCU/SoC platform that optimizes PPA and software efficiency to improve overall device performance and BOM cost, with the most compact UI graphics hardware and software footprint that does not diminish or restrict the onscreen user experience. These new GPUs are making their way into some exciting products that will appear all around you as wearables and IoT get integrated into our lives.

 

 Fig4

Figure 4: GC Nano Series Features and Specifications

  Fig5

Figure 5: Example GC Nano Series SoC/MCU Implementation

 

Trends and Importance of 3D User Interface Rendering

In the UI sample in Figure 6 of a smart home device, next generation products will take some of the well thought out UI design elements from smartphones, tablets and TVs and incorporate them into IoT devices (and wearables) to keep a consistent interface between products. The similar UI look-and-feel will reduce usage learning curve and accelerate device adoption. As a side note, since different devices have different levels of processing/performance capabilities, a minimum level will be used for smaller screens (baseline performance) with additional features/higher performance added as device capabilities move up into a higher tier segmented by the OS vendor.

 

Fig6

Figure 6: Sample HMI user interface on a smart home device

 

A few examples of updated UIs include the following:

  • Animated icons – easily shows the user which menu item is selected or where the input cursor is pointed to so the user does not need to spend time searching for cursor position onscreen. Icons can rotate, wiggle, pop out, flash, etc. before being selected.
  • Live animations – dynamic content can turn a simple background (wall paper) into a dynamic moving scene that can add a personal touch to your device. Background images and designs can also be personalized to match décor, lighting, theme and mood. Some white good appliance makers are testing these concept designs, hoping to put one (or two) inside your kitchen in the near future.
  • 3D effects – text, icons and images that go beyond simple shadows where feature of the GPU can render using powerful shader instructions to give 3-dimensional character to parts of the UI (ex. carousel, parallax, depth blur, widget/icon rendering to 3D/2D shapes, procedural/template animations for icon movements, physical simulations for particle systems, perspective view, etc.). These effects can be implemented using the GC Nano’s ultra-low power OpenGL ES 2.0/3.x pipeline.

GC Nano’s architecture excels at HMI UI composition by bringing out 3D UI effects, bandwidth reduction and reduced latency, which will be discussed below.

 

GC Nano Bandwidth Calculation

In this section we will step through examples of various user interface scenarios and calculate system bandwidth for both 30 and 60 FPS UI HMI rendering through the GC Nano GPU. All calculation assumptions are stated in section 5.2.

Methods of Composition

There are also two options for screen display composition that we will evaluate – first, where the GPU does the entire screen composition of all layers (or surfaces) including video and the display controller simply outputs the already composited HMI UI onscreen, and second, where the display controller takes composited layers from both GPU and video decoder (VPU) and does the final UI composition blend and merge before displaying. The top level diagrams below do not show DDR memory transactions, but they will be shown in section 5.2 when describing the UI steps.

 

 Fig7

Figure 7: GC Nano Full Composition: All UI layers are processed by GC Nano before sending the final output frame to the display controller

 Fig8

Figure 8: Display Controller Composition: Final output frame is composited by the display controller using input layers from GC Nano and the video processor

 

UI Bandwidth Calculations

Calculation assumptions:

  • GC Nano UI processing is in ARGB8 (32-bits per pixel) format. When GC Nano performs full composition, the GPU automatically converts 16-bit YUV video format into 32-bit ARGB.
  • Video frame is in YUV422 (16-bits per pixel) and has the same resolution as the screen size (GC Nano treats incoming video as video textures)
  • Final composited frame is in ARGB8 format (32-bits per pixel)
  • Reading video has a request burst size of 32-bytes
  • GC Nano UI request burst size is 64-bytes
  • Write sizes for writing out the UI rendering and final frame is 64-bytes
  • For these cases we assume 32-bit UI rendering. If the display format is 16-bits (applicable to smaller screens) then the bandwidth calculations listed below will be much lower.
  • Bandwidth calculation examples will be given for WVGA (800×480) and 720p (1280×720)
  • The amount of UI pixels per frame that need to be refreshed/updated (in our example) will include the following percentages:
    • 15% (standard UI)
    • 25%
    • 50% (worst case UI)

 

GC Nano Full UI Composition

The following images describe the flow of data to/from DDR memory using GC Nano to perform the entire UI composition. Some major benefits of using this method include using the GPU to perform some pre-post processing on images or videos, filtering, adding standard 3D effects to images/videos (video carousel, warping/dewarping, etc.) and augmented reality where GC Nano overlays rendered 3D content on top of a video stream. This method is the most flexible since the GC Nano can be programmed to perform image/UI related tasks.

Fig9a

Fig9b

Fig9c

Figure 9: GC Nano Full Composition memory access and UI rendering steps (steps 1 – 4)

 

Bandwidth calculation is as follows:

Table1

Notes:

  • Total screen pixels = resolution WxH
  • UI pixels updated per frame = [Total screen pixels] * [UI% updated per frame]
  • Total UI pixels updated per frame in bytes = [UI pixels updated per frame] * [4 Bytes]; 4 Bytes since the UI format is 32bpp ARGB8888
  • Assumes video is in the background (worst case). Total composition Bandwidth (Bytes) = Video part [(a – c) * (2 Bytes for 16-bit YUV)] + UI part [c * 4 Bytes for ARGB8] + [a * 4 Bytes]
  • Total bandwidth per frame (MB) = [(c+d)/106]
  • Total bandwidth = [e*30] for 30 FPS and [e*60] for 60 FPS

 

Display Controller UI Composition

This section describes the flow of data to/from DDR memory using the display controller to do the final merging/composition of layers from the GC Nano and video processor. This method partially reduces bandwidth consumption since the GPU does not need to read in the video surface since it does not perform final frame composition. The GPU only works on composing the UI part of the frame minus any additional layers from other IP blocks inside the SoC/MCU. A benefit from this method is lower overall system bandwidth, but at the cost of less flexibility in the UI. If the video (or image) stream only needs be merged with the rest of the UI then this is a good solution. If the incoming video (or image) stream needs to be processed in any way – adding 3D effects, filtering, augmented reality, etc. – then this method has limitations and it is better to use the GPU for full frame UI composition.

 

 Fig10

Figure 10: Display controller performing final frame composition from two incoming layers from GC Nano and the video processor (VPU)

 

The display controller has a DMA engine that can read data from system memory directly. Data formats supported are flexible and include various ARGB, RGB, YUV 444/422/420, and their swizzle formats.

Bandwidth calculation for UI composition only is straightforward and is only based on the screen resolution size, as follows:

 

 Table2

Notes:

  • Total screen pixels = resolution WxH
  • Total UI pixels per frame = [Total screen pixels] * 4 Bytes; 32-bit ARGB8 format
  • Total bandwidth per frame (MB) = [b/106]; since GC Nano needs to perform full screen UI minus additional layers from other sources
  • Total bandwidth = [c*30] for 30 FPS and [c*60] for 60 FPS

 

Summary of Bandwidth Calculations

The table below summarizes the calculations above:

 

Table3

Adding Vivante’s DEC compression technology will also reduce bandwidth by about 2x – 3x from the numbers above.

 

 

GC Nano Architecture Advantage for UIs

There are two main architectures for GPU rendering, tile based rendering (TBR) and immediate mode rendering (IMR). TBR breaks a screen image into tiles and renders once all the relevant information is available for a full frame. In IMR graphics commands are issued directly to the GPU and executed immediately. Techniques inside Vivante’s architecture allows culling of hidden or unseen parts of the frame so execution, bandwidth, power, etc. is not wasted on rendering parts of the scene that will eventually be removed. Vivante’s IMR also has significant advantages when rendering photorealistic 3D images for the latest AAA rated games that take advantage of full hardware acceleration for fine geometries and PC level graphics quality, including support of advanced geometry/tessellation (GS/TS) shaders in its high end GC Vega cores (DirectX 11.x, OpenGL ES 3.1 and Android Extension Pack – AEP). Note: some of the more advanced features like GS/TS are not applicable to the GC Nano Series.

Tile Based Rendering (TBR) Architecture for UIs

The following images explain the process TBR architectures use for rendering UIs.

Breaking a scene into tiles…

Fig11a

But…before rendering a frame, all UI surfaces need to go through a pre-tile pass before proceeding…

Fig11b

Combining the pre-processing step and tiling step give us the following…

Fig11c

If the UI is dynamic then parts of the frame need to be re-processed…

Fig11d

Here are the “dirty” blocks inside the UI

Fig11e

TBR UI Rendering Summary

From the steps shown above, TBR based GPUs have additional overhead that increases UI rendering latency since the pre-processed UI triangles need to be stored in memory first and then read back when used. This affects overall frame rate. TBR GPUs also require large amounts of on-chip L2$ memory to store the entire frame (tile) database, but as UI complexity grows, either the on-chip L2$ cache size (die area) has to grow in conjunction or the TBR core has to continuously overflow to DDR memory which increases their latency, bandwidth and power.

TBRs have mechanisms to identify and track which parts of the UI (tiles) and which surfaces have changed to minimize pre-processing, but for newer UIs that have many moving parts; this continues to be a limitation. In addition, as screen sizes/resolutions and content complexity increases, this latency becomes even more apparent especially on Google, Microsoft, and other operating system platforms that will use unified UIs across all screens.

Immediate Mode Rendering (IMR) Architecture for UIs

The most advanced GPUs use IMR technology, which is object-based rendering found in PC (desktop/notebook) graphics cards all the way to Vivante’s GC Series product lines. IMR allows the GPU to render photorealistic images and draw the latest complex, dynamic and interactive content onscreen. In this architecture, graphics API calls are sent directly to the GPU and object rendering happens as soon as commands and data are received. This significantly speeds up 3D rendering performance.

In the case of UIs, the pre-pass processing is not required and this eliminates the TBR-related latency seen in section 6.1. In addition, there are many intelligent mechanisms that perform transaction elimination so hidden (unseen) parts of the frame are not even sent through the GPU pipeline, or if the hidden portions are already in-flight (ex. change in UI surface), those can be discarded immediately so the pipeline can continue executing useful work.

Composition processing is performed in the shaders for flexibility and the Vivante GPU can automatically add a rectangle primitive that takes the whole screen into account to achieve 100% efficiency (versus 50% efficiency using two triangles). Memory bandwidth is equivalent to TBR architectures for simple UIs and 3D frames, but for more advanced UIs and 3D scenes, TBR designs need to access external memory much more than IMR since TBRs cannot hold large amounts of complex scene data in their on-chip caches.

The following images explain the process Vivante’s IMR architecture uses for rendering dynamic UIs. The process is significantly simpler compared to TBRs, and dynamic changes in UI or graphics are straightforward.

IMR Object Based UI Rendering

Fig12a

Additional UI Content are Considered New Objects

Fig12b

IMR GPUs are Ideal for Dynamic and Next Gen UIs

Fig12c

IMR UI Rendering Summary

For dynamic 3D UIs, complex 3D graphics, mapping applications, etc., IMRs are more efficient in terms of latency, bandwidth and power. Memory consumption and memory I/O is another area where IMR has its advantages – for upcoming dynamic real time 3D UIs, IMR is the best choice and for standard UIs, IMRs and TBRs are equivalent but IMRs give the SoC/MCU flexibility and future-proofing. Note: historically, TBRs were better for simpler UIs and simple 3D games (low triangle/polygon count, low complexity) since TBRs could keep the full frame tile database on chip (L2$ cache), but advances in UI technologies brought about by leading smartphones, tablets and TVs have tipped things in favor of IMR technology.

Summary

GC Nano provides flexibility and advanced graphics and UI composition capabilities to SoCs/MCUs targeting IoT and wearables. With demand for high quality UIs that mirror other consumer devices from mobile to home entertainment and cars, a consistent, configurable interface is possible across all screens as the trend towards a unified platform is mandated by Google, Microsoft and others. GC Nano is also architected for OEMs and developers to take advantage of IMR technologies to create clean, amazing UIs that help product differentiation. The tiny core packs enough horsepower to take on the most demanding UIs at 60+ FPS in the smallest die area and power/thermal consumption. The GC Nano also reduces system bandwidth, latency, memory footprint and system/CPU overhead so resource constrained wearables and IoT SoCs/MCUs can use GPUs for next generation designs.

GCNano GPUs – Supercharging your Wearables and IoT

Vivante recently announced the GCNano GPU Series, the latest product line that complements its shipping Vega GC7000 Series to complete the world’s first full line-up of top to bottom GPU offerings, from the smallest wearables and IoT devices, to ultra HD 4K / 8K TVs, smartphones and tablets. GC7000 targets SoCs that need the latest and greatest GPU hardware and features like OpenGL ES 3.1, Full Android Extension Pack Support, DirectX 12, tessellation / geometry shaders, ray tracing, zero driver overhead and GPGPU for vision / image / physics processing, with the most aggressive PPA and feature-complete design. GCNano falls on the other side of the spectrum without sacrificing features or performance, and targets devices that are making a revolutionary (visual user interfaces / UI, network connected, intelligence) push into consumer products like wearables and IoT (smart homes / appliances, information gadgets, etc.). Many of these new products will be powered by microcontroller (MCU) and microprocessor (MPU) systems that will complement the general purpose applications processors (AP) found in mobile and home entertainment products. (I’ll use the term MCU to collectively reference MCUs and MPUs).

About MCUs

MCUs are task specific “embedded” processing devices that are found in the billions inside everything from the washing machine control / interface and thermostats with displays, to remote controls, network printers with displays, smart meters, and other devices in the home, car, infrastructure or on the body. Most of them are invisible to us, but behind the scene they keep our world moving. Traditional MCUs only supported basic visual interfaces since their focus was to display relevant information and keep things simple. Over the last several years the industry took a sharp turn and evolved to where MCUs where not just data processors but the HMI (human-machine interface) of some consumer devices. Nest thermostats, washing machines with color displays and the next generation health wearables are examples of this shift. As screens become pervasive, demand for fancier UIs that have an intuitive, consumer friendly look-and-feel will be required. GCNano specifically targets these types of systems since an innovative and special type of GPU needs to be used to overcome system and resource constraints, without negatively impacting user experience.

MCUs have limited CPU resources, limited memory resources, limited bandwidth, limited I/O, and require ultra low power (ex. long battery life). Previous products could get by using a simple display controller and CPU (software) or simple 2D engine to create a basic GUI. But as graphics UI complexity increases (layers, content, effects) and resolutions/PPI go up, this method will not suffice since it will overwhelm system resources. To overcome these limitations you cannot take a standard off the shelf GPU IP block and plug it in. You need to examine the constraints and optimize the design for this type of configuration through a holistic approach that includes hardware (GPU, UI-display controller integration), software (drivers, tools, compilers, etc.) and ecosystem enablement.

On the hardware side you are looking at the feature list and elminating unused ones (ex. 3D game features), optimizing PPA, fine-tuning datapaths and memory, enabling compression, reducing bandwidth and creating a tightly coupled interface between the UI / composition GPU and display controller. In addition, removing or significantly reducing external DDR memory cuts system cost dramatically since DDR is a major portion of BOM cost. On the software side you need to look at drastically cutting down driver size, driver overhead, batching calls, compilers (for example, pre-compiled shaders), and creating a standard wearables / IoT GPU SW package that developers can tap into. Having a tiny driver size is critical since you need instantaneous screen response at the push of a button (wearable) or when you start your car and the dashboard information needs to appear in less than a second (IoT). GPUs are complex, powerful and programmable, yet the GCNano takes a simpler approach and takes the guesswork out to keep things relevant and functional.

GCNano Product Overview

The GCNano can be split into two types of products. On one side you have the GCNano Lite which is a vector graphics engine that can render with no-OS and no-DDR memory and is shipping now (production proven). The other category is products that require 3D rendering using OpenGL ES 2.0 (at a minimum) but still need a tiny memory footprint (minimal DDR) and customized / limited / high-level operating systems (GCNano and GCNano Ultra). The table below shows the various products.

GCNano_slide_for_PR

GCNano Series benefits include:

  • Wearables and IoT Ready: Ultra-lightweight vector graphics (GCNano Lite) and OpenGL ES 2.0 (GCNano, GCNano Ultra) drivers, SDK and tools to easily transition wearables and IoT screens to consumer level graphical interfaces. The GCNano package also includes tutorials, sample code, and documentation to help developers optimize or port their code.
  • Designed for MCU/MPU Platforms: Efficient design to offload and significantly reduce system resources including complete UI / composition and display controller integration, minimal CPU overhead, DDR-less and flash memory only configurations, bandwidth modulation, close-to-the-metal GPU drivers, and wearables / IoT-specific GPU features to shrink silicon size. The tiny software code size puts less constraints on memory size, speeds up GPU initialization and boot-up times.
  • Ecosystem and Software Support: Developers can take advantage of the lightweight NanoUI or NanoGL API to further enhance or customize their solutions. Large industry support on existing Vivante products include the GCNano / GCNano Ultra product line on Android, Android Wear and embedded UI solutions from key partners covering tools for font, artwork and Qt development environments.
  • Compute Ready: As the number of wearable / IoT (processing) nodes grows by several tens of billions of units in the next few years, bandwidth on data networks could be an issue with an always-on, always-connected, always-processing node. GCNano products help with this by performing ultra low-power processing (GFLOP / GINT ops) at the node and only transmits useful compressed data as needed. Examples include sensor fusion calculations and image/video bandwidth reduction.

Vivante’s software driver stack, SDK and toolkit will support its NanoUI API that brings close-to-the-metal GPU acceleration for no-OS / no-DDR options on GCNano Lite and the NanoGL API for more advanced solutions that include proprietary or high-level operating systems like embedded Linux, Tizen™, Android™, Android™ Wear and other RTOS that require OpenGL ES 2.0+ in the smallest memory footprint. These various OS / non-OS platforms will form the base of next generation wearables and IoT that bring personalized, unique and optimized experiences to each person. The GCNano drivers include aggressive power savings, intelligent composition and rendering, and bandwidth modulation that allow OEMs and developers to build rich visual experiences on wearables and IoT using an ultralight UI / composition or 3D graphics driver.

Many of the GCNano innovations create a complete “visual” MCU platform that optimizes PPA and software efficiency to improve overall device performance and BOM cost, with the most compact UI graphics hardware and software footprint that does not diminish or restrict the onscreen user exprience. These new GPUs are making their way into some exciting products that will appear all around you as wearables and IoT get integrated and eventually “dissappear” into our lives.

The Importance of Graphics and GPGPU in ADAS and Other Automotive Applications

By Benson Tao (Vivante Corporation)

People spend a significant amount of time in their cars, whether commuting to work, going to the mall with the kids, or taking a road trip with loved ones. The car has evolved into an extension of our lives outside the home that blend driving fun with full featured electronics that give people a consumer device interface. In-car electronics have also moved beyond entertainment and fancy HMI (human-machine interface) displays to include intelligent safety monitoring and occupant protection systems. The automotive OEMs that build compelling consumer centric HMI / entertainment IVI (in-vehicle infotainment) and advanced safety features will be the ones driving higher automobile sales, and the best sellers will be the ones that create the most immersive in-car living room experience in a safe environment where the vehicle is “aware” of its surroundings. New GPU technologies enable automotive OEMs to realize both types of technologies with 3D graphics for visual eye candy and GPGPU (General Purpose Graphics Processing Unit) using OpenCL for safety applications. Upcoming vehicles shipped will have a predefined set of functions available on the HMI or IVI, but the car owner will be able to install different apps to customize the car interface. GENIVI (www.genivi.org) is an example of an open platform, industry consortium taking IVI to the next level.

Automotive Example

3D Graphics

3D graphics have been heavily used in the mobile market with the rapid expansion of Android and iOS. Each successive release of a mobile operating system or hardware technology pushes the visual envelope in terms of UI, 3D game play and captivating visual content. Since consumers are familiar with the look and feel of their existing mobile devices, the automotive market has taken note of this and started looking at in-vehicle platforms that display information in a similar manner. The first generation automobiles with embedded GPUs had basic graphics functionality which was limited in performance and capabilities since graphics was not an important requirement during the time that pre-dated the first iPhone shipments. Once the iPhone took off, adoption of GPU IP into system-on-chips (SOC) really took off and brought graphics into the spotlight where it could make-or-break a product. With this new paradigm shift, graphics proliferated into many important markets including the automotive industry where some SOC vendor designs are awarded based on the GPU inside.

Leveraging the mass market use of 3D graphics in mobile devices and building on the existing ecosystem around 3D graphics, dynamic and fancy UIs, and apps, automotive OEMs are using these building blocks to transform themselves from car makers to a new breed of consumer-focused automotive manufacturers that have the HW (car) and SW (apps, app store) to turn the driving experience upside down. One step to create this transition to their new business model is to bring the familiar graphical interfaces and user experiences found on tablets and TVs and transform the car into an entertainment hub powered through the IVI system and driver HMIs to add eye candy to console data displays. These displays need to scale to higher resolutions with higher DPI on HD screens that are crisp, clear, vivid, and responsive. The migration towards a visual-centric automobile console shows the importance of the GPU and how it has changed from a nice-to-have feature to a must-have requirement that sways technology decisions in the automotive ecosystem. The technology is available to put the pieces together in terms of hardware, software, middleware, and operating systems – it just comes down to putting the pieces together to make the final product and bringing the next generation graphics-centric solutions to a dealer near you, that goes beyond what is available today.

Infotainment 1

Safety Features

Safety is another major feature that influences purchase decisions. The term ADAS or Advanced Driver Assistance Systems describes the latest electronic technologies found in vehicles that focus on increasing safety for occupants, pedestrians, and surrounding vehicles. Features included in ADAS that monitor, predict, and try to prevent accidents include active safety monitoring, collision avoidance systems (CAS), object/pedestrian recognition, land departure warning, adaptive cruise control, and more. Current solutions use a combination of DSPs, CPUs, and in some cases FPGA with built-in computational units to perform safety monitoring. These solutions use hand written code for specific products, making them harder to port to new platforms or when changing components like DSPs or CPUs. With the GPU you can overcome these limitations by writing algorithms in OpenCL (described in more detail later) with some GPU based OpenCV libraries, and the code can be re-used across various platforms since it is cross platform compatible. In the near future, the compiler will be able to partition code to be executed on the most efficient compute element (GPU, CPU, DSP) in a platform to give the best overall performance. Parallel data will go to the GPU and serial data can go to the DSP or CPU.

Some automakers are looking at harnessing the massively parallel processing power of GPUs to reduce parallel algorithm execution times and speed-up real-time response in ADAS. Since the GPU is inherently fast at image and pixel processing, incoming pixel data from camera sensors and other sensors that are parallel in nature can be sent through the GPU to be processed. In addition GPGPU APIs like OpenCL can help process parallel data streams (sensor fusion) from cameras, GPS/WiFi positioning data, accelerometers, radar, and LIDAR to guide vehicles safely. Current solutions focus on computer vision (CV) as a first step, but moving forward data from other sensors can be sent to the GPU to offload other computational resources in a system. Autonomous cars like Google’s driverless car and those in DARPA competitions have already demonstrated what the future of ADAS will evolve into.

Entertainment and safety can be met with the latest semiconductor technologies like those found in the Freescale i.MX 6 automotive grade applications processors to enhance 3D/2D/VG graphics (HMI rendering, games, and user interface composition) and OpenCL (ADAS, computer vision, and gesture). So far the i.MX 6 is the only product that targets automotive with advanced graphics like OpenGL ES 3.0 and GPU compute with OpenCL 1.1.

MBZ ADAS_From Web

Source: Mercedes Benz

The Evolution of GPUs from Graphics to General Purpose Computation Cores (GPGPU)

The GPU was originally designed for 3D graphics applications and image rendering during the rasterization process. Over time the computational resources of modern graphics processing units became suitable for certain general parallel computations due to the massively parallel processing capabilities native to GPU architecture. Graphics is one of the best cases of parallel processing where the GPU needs to execute on billions of pixels or hundreds of millions of triangle vertices in parallel.

GPU architectures process independent vertices, primitives and fragments in great numbers using a large number of graphics shaders, which are also known as arithmetic logic units (ALUs) in the CPU world. Each primitive is processed the same way, using the same program or kernel. Many computational problems like image processing, analytics, mathematical calculations and others map well to this single-instruction-multiple-data (SIMD) architecture. The calculation speed-up shown and proven on SIMD processors was quickly recognized by researchers and developers and another area of high performance computing built on the vast processing power of GPUs was born. Today and in the near future, the fastest supercomputers and processing units use or will use GPU technology for the highest compute performance, calculation density, time savings, and overall system speed-up. The GPU has morphed from a graphics processor into a general purpose co-processor that sits alongside the CPU in today’s platforms.

The Penalties That Come With Less Than Optimal Graphical Processing

When selecting a GPU, there are certain requirements that need to be met when it comes to performance, power, and capabilities. Performance not only includes graphics benchmark results and 3D games, but also testing different applications that mirror real world use cases so the applications processor and GPU give the best overall user experience. As screen resolutions increase in both mobile devices and in-vehicle screens, the pixel count and triangle count (3D complexity) go up, leading to higher demand on the GPU as more objects need to be rendered onscreen. An underpowered GPU will lead to low performance (dropped frames, low FPS, image artifacts, incorrect rendering) and pretty much a non-usable device as evidenced by some of the first generation tablets that shipped but never used extensively. To get to the latest consumer electronics product levels, the GPUs in cars need to be upgraded from OpenGL ES 1.1 graphics to ES 2.0 and ES 3.0 capable cores with added shader performance to create eye catching visuals. i.MX 6 was one of the first SoCs where graphics was specifically defined at the product planning stages as Freescale had a vision of the car as a node in the internet of things (IOT) and graphics as the interface that couples man and machine. Content for cars (streamed media, social, games, apps) is also increasing as they become digitally connected with the rest of the consumer ecosystem. i.MX 6 is currently the only automotive SoC to support the latest APIs including OpenGL ES 3.0 and OpenCL 1.1. Other SoCs from Texas Instruments, Renesas, and FPGA based solutions have graphics capabilities, but rely on other solutions for OpenCL

The evolution of in-vehicle graphics went from an afterthought to a must have feature, migrating from simple onscreen text (that either used the CPU or simple 2D engine), to 2D graphics and then basic 3D. Today, there is another transition to advanced GPU rendering as seen on consumer devices that show detailed 3D models of your car in the console to highlight parts of the car in an easier to see format, 3D maps with street and building details, customizable/configurable HMI consoles similar to personalizing our Android smartphone, and much more. The initial solutions were underpowered but over time consumer expectations have grown to match their mobile devices going from a UI that was either scaled down or limited (ex. less icons, less menu layers, and basic 3D graphics) by specific hardware, to products that blur the line between consumer and auto.

According to Richard Robinson, principal analyst for automotive infotainment at iSuppli, “Infotainment hardware has undergone a rapid evolution during the last 13 years, moving from the traditional approach of dedicated hardware blocks, to the advent of bus-connected distributed architecture systems in the 2000 time frame, to the highly-integrated navigation-centric systems of 2006, to the new user-defined systems of today.”1

“The traditional boundaries between home, mobile and automotive infotainment systems are quickly going away. Consumers are now expecting the same features and equal access to their data across all these platforms,” said Jim Trent, VP and GM at NEC Electronics America2.

An Overview of OpenCL and Its Benefits.

OpenCL (Open Computing Language) is an open industry standard application programming interface (API) used to program multiple devices including GPUs, CPUs, as well as other devices organized as part of a single computational platform. The standard targets a wide range of devices from consumer electronics (smartphone, tablets, TVs) to embedded applications like automotive ADAS and computer vision (CV). Applications that already take advantage of the OpenCL performance speedup include medical imaging, video/image processing, high performance computing (HPC), robotics, surveillance, “Big Data” analytics, augmented reality, and gesture (motion, NUI). We will focus on the GPU aspect of OpenCL below.

The evolution of GPU computing has gone through a few major milestones. Pre-OpenCL, a program would be specifically written for and executed on a target device. This limited the features, performance, and calculation throughput to the device characteristics and there was not much flexibility beyond the hardware’s capabilities. The next step forward was the introduction of OpenCL where a hardware abstraction layer is created that separates the application from what is “under-the-hood” for ease of use and cross-platform portability. The abstraction layer queries all computational resources in a platform and uses them in the best way as a single cohesive unit to leverage as much computing horsepower as possible. Moving forward as we progress from OpenCL 1.1/1.2 to 2.0, advanced API features will be added along with making the solution even easier for general purpose programming.

At a high level, OpenCL provides both a programming language and a framework to enable parallel programming. The programming language is based on ISO C99 with math accuracy based on the IEEE 754 standard. OpenCL also includes libraries and a runtime system to assist and support software development. A developer can write general purpose OpenCL programs that executes directly on a GPU without needing to know 3D graphics or 3D APIs like OpenGL or DirectX. OpenCL also provides a low-level hardware abstraction layer as well as a framework that exposes many details of the underlying hardware layer allowing the programmer to take full advantage of it.

OpenCL uses the parallel execution SIMD (single instruction, multiple data) engines to enhance data computation density by performing massively parallel data processing on multiple data items, across multiple compute engines. Each compute unit has its own ALUs, including pipelined floating point (FP), integer (INT) units, and a special function unit (SFU) that can perform computations as well as transcendental operations. The parallel computations and associated series of operations is called a kernel, and the Vivante cores can execute millions of parallel kernels at any given time.

A Deeper Discussion of Graphics and OpenCL Benefits using Freescale’s i.MX 6 As An Example

Freescale uses GPU technology from a leading GPU IP provider based in Sunnyvale, California called Vivante to provide the 3D graphics, OpenVG, and OpenCL compute capability in their automotive grade i.MX 6 product line3. The i.MX6 applications processor is the industry’s first scalable, multicore ARM Cortex-A9 product line that spans single, dual, and quad core CPU architectures that are pin and software compatible. Integrated into the i.MX 6 is the GC2000 3D and OpenCL GPU, GC355 for fast hardware OpenVG acceleration, and the GC320 composition processing core (CPC) to compose screen content which the user sees. The applications processor also integrates the image processing unit (IPU) that accepts multiple camera input streams into the i.MX 6 for processing (ex. 360 degree view, rear view camera, and blind-spot detection).

The 3D graphics core provides 200 million triangles per second rendering horsepower which rivals performance of some of the latest tablets and smartphones, enabling the i.MX 6 to render ultra-realistic graphics and connect to app stores to play the latest games and display 3D UIs4. With this built-in capability and performance, ecosystem partners like QNX, Green Hills, Adeneo, Mentor Graphics, Rightware, Electrobit, and others are optimizing their operating systems, middleware, and applications to efficiently run the full feature set of the i.MX 6 GPU. The Freescale development platforms also have BSPs (board support packages) for Android and Linux to aid in the development of platforms in similar markets.

The OpenCL support currently focuses on accelerating Embedded (Computer) Vision applications that rely on camera inputs for ADAS. Some example applications where OEMs are analyzing GPU OpenCL performance are:

  • Feature Extraction – this is vital to many vision algorithms since image “interest points” and descriptors need to be created so the GPU knows what to process. SURF (Speeded Up Robust Features) and SIFT are examples of algorithms that can be parallelized effectively on the GPU. Object recognition and sign recognition are forms of this application.
  • Image filtering with different kernel sizes to enhance images.
  • Integral image for image acquisition can be spread across multiple i.MX 6 GPU shaders to cut down calculation time and parallelize execution.
  • Resampling – the GPU can use texture sampling to perform bilinear or bicubic filtering.
  • Point Cloud Processing – includes feature extraction to create 3D images to detect shapes and segment objects in a cluttered image. Uses could include adding augmented reality to street view maps.
  • Line detection – uses Hough Transform to detect lines in the input image (creates edge maps) followed by Sobel or Canny algorithms to further enhance edge detection. This can be used for lane detection
  • Pedestrian Detection – uses Histogram of Oriented Gradients (HOGS) to detect a person and automatically brake the car if the driver does not react in time.
  • Face recognition – goes through face landmark localization (exl Haar feature classifiers), face feature extraction, and face feature classification. Another use could be eye recognition to detect drowsiness and keep the vehicle within its lane.
  • Hand gesture recognition – separates hand from background (ex. color space conversion to the HSV color space) and then performs structural analysis on the hand to process motion.
  • Camera image de-warping – GPU performs matrix multiplications to map wide-angle camera inputs onto a flat screen so images are corrected. OEMs can use different camera vendors and to de-warp images they would only need to use different camera coefficients making the GPU easy to program.
  • Blind-spot detection – cameras can be used for blind spot detection using OpenCL to process stereo images. In this case, two cameras are needed per blind spot to detect depth so the GPU knows how far/close the other car is.

There applications listed above are examples of where OEMs are looking at using OpenCL on the GPU to speed up ADAS. There are many exciting

Background information: GPU vs. CPU for processing OpenCL

The best approach is to use a hybrid/heterogeneous platform (ex. HSA) to accelerate applications. CPU for task parallelism and serial computations & GPU for data parallel processing.

CPU vs GPU

Notes:

  1. http://www.isuppli.com/Automotive-Infotainment-and-Telematics/News/Pages/Automotive-Infotainment-Hardware-Enters-Fourth-Generation-Stage.aspx
  2. http://www.genivi.org/sites/default/files/press-releases/english/2009_11_10_GENIVI_Expands.pdf
  3. http://www.eetimes.com/electronics-news/4375956/Freescale-Vivante-Rightware-automotive-display
  4. http://www.cnx-software.com/2012/01/19/freescale-i-mx6-automotive-aerospace-infotainment-applications/