GPU Talk

Home » Posts tagged 'OpenCL'

Tag Archives: OpenCL

Introducing Vega…the latest, most advanced GPUs from Vivante

By Benson Tao

Breaking News…

One of the latest headlines coming out of IDF 2013 in San Francisco today is the unveiling of a next generation GPU product line from Vivante. This technology continues to break through the the limits of size, performance, and power to help customers deliver unique products quickly and cost-effectively. The first generation solutions were introduced in 2007 (Generation 1) and upgraded again in 2010 (Generation 2) with new enhancements that were shipped in tens of millions of products. Gen 2 solutions already exceeded PC and console quality graphics rendering, which is the standard other GPU IP vendors strive to reach today. The next version (Gen 3) successfully hit key industry milestones by becoming the first GPU IP product line to pass OpenCL™ 1.1 conformance (CTS) and the first IP to be successfully designed into real time mission critical Compute applications for automotive (ADAS), computer vision, and security/surveillance. The early Gen 3 cores, designed and completed before the OpenGL ES 3.0 standard was fully ratified, were forward looking designs that have already passed OpenGL ES 3.0 conformance (CTS) and application testing. Many of the latest visually stunning games can be unleashed on the latest Gen 3 hardware found in leading devices like the Samsung Galaxy Tab 3 (7″), Huawei Ascend P6, Google Chromecast, GoogleTV 2.0/3/0, and other 4K TVs.

With the unveiling of Vivante’s fourth generation (codenamed “Vega”) ScalarMorphic architecture, the latest designs provide a foundation for Vivante’s newest series of low-power, high-performance, silicon-efficient GPU cores. Vivante engineering continues to respond quickly to industry developments and needs, and continuously refines and enhances its hardware specifications in order to remain at the top of the industry through partnerships with ecosystem vendors.

ProductsSample of Vivante Powered Products

What is Vega?

Vega is the latest, most advanced mobile GPU architecture from Vivante. Leveraging over seven years of architectural refinements and more than 100 successful mass market SOC designs, Vega is the cumulation of knowledge that blends high performance, full featured API support, ultra low power and programmability into a single, well defined product that changes the industry dynamics. SOC vendors can now double graphics performance and support the latest API standards like OpenGL ES 3.0 in the same silicon footprint as the previous generation OpenGL ES 2.0 products. Silicon vendors can also leverage the Vega design to achieve equivalent leading edge silicon process performance in a cost effective mainstream process. This effectively means that given the same SOC characteristics, a TSMC 40nm LP device can compete with a TSMC 28nm HPM version, at a more affordable cost that opens up the market to mainstream silicon vendors that were initially shut out of leading edge process fabrication due to their high initial costs.

Vega is also optimized for Google™ Android and Chrome products (but also supports Windows, BB OS and others), and fast forwards innovation by bringing tomorrow’s 3D and GPU Compute standards into today’s mass market products. Silicon proven to have the smallest die area footprint, graphics performance boost, and scalability across the entire product line, Vega cores extend Vivante’s current leadership in bringing all the latest standards to consumer electronics in the smallest silicon area. Vega 3D cores are adaptable to a wide variety of platforms from IoT (Internet-of-Things) and wearables, to smartphones, tablets, TV dongles, and 4K/8K TVs.

Whether you are looking for a tiny single shader stand-along 3D core or a powerhouse multi-core multi-shader GPU that can deliver high performance 3D and GPGPU functionality, Vivante has a market-proven solution ready to use. There are several options available when it comes to 3D GPU selection: 3D only cores, 3D cores designed with an integrated Composition Processing engine, and 3D cores with full GPGPU functionality that blend real-life graphics with GPU Compute. Vivante already is noted in the industry as the IP provider with the smallest, full-featured licensable cores in every GPU class.

Now let’s dive into some of the Vega listed features to see what they mean…

Hardware Features

  • ScalarMorphic™ architecture
    • Optimized for multi-GPU scalability and multi-threaded, multi-core heterogeneous platforms. This makes the GPU and GPU Compute cores as independent or cohesive as needed, flexible and developer friendly as new applications built on graphics + compute come online.
    • The same premium core architecture as previous generations is still intact, but it has been improved over time to remove inefficiencies. This also allows the same unified driver architecture to work with Vega cores and previous GC cores, so there is no waste of previous developer resources to re-code or overhaul apps for each successive Vivante GPU core.
    • Advanced scheduler and command dispatch unit for optimized shader load balancing and resource allocation.
    • Dynamic branching and non-constant varying indexing.
  • Ultra-threaded, unified shaders
    • Maximize graphics throughput, process millions of threads in parallel, and minimize latency.
    • The GPU scheduler and cores can process other threads while waiting for data to return from system memory, hiding latency and ensuring the cores are being used efficiently with minimal downtime. Context switching between threads is done automatically in hardware which costs zero cycles.
    • These shaders are more than just single way pipelines with added features that make the GPU more general purpose with multi-way pipelines to benefit various processing required for graphics and compute.
  • Patented math units that work in the Logarithmic space
    • In graphics there are different methods to calculate math and get the correct results.  With this method Vega cores can reduce area, power, and bandwidth that speeds up the overall system performance.
  • Fast, immediate hidden surface removal (HSR)
    • Eliminates render processing time by an average of 30% since a more advanced method to remove back-facing or obscure surfaces is implemented on the fly so minimal or no pre-processing time is wasted. This also goes beyond past versions where the GPU was automatically removing individual pixels (ex. early Z, HZ, etc.).
  • Power savings
    • Saves power up to 65% over previous GC Cores using intelligent DVFS and incremental low power architectural enhancements.
  • Proprietary Vega lossless compression
    • Reduces on-chip bandwidth by an average of 3.2:1 and streamlines the graphics subsystem including the GPU, composition co-processsor (CPC), interconnect, and memory and display subsystems. This is important to make sure the entire visual pipeline from when an app makes an API call to the output on the screen is smooth and crisp at optimal frame rates, with no artifacts or tearing regardless of the GPU loading.
  • Built-In Visual Intelligence
    • ClearView image quality – Life-like rendering with high definition detail, MSAA, and high dynamic range (HDR) color processing. This improves image quality, clarity, and matches real life colors that are not oversaturated.
    • Large display rendering – Up to 4K/8K screen resolution including multi-screen support that makes sure the GPU pipelines are balanced.
    • New additions using color correction can be implemented to correct color, increase color space using shaders (or OpenCL/RS-FS) or FRC.
    • NUIs can also take advantage of visual processing for motion and gesture.
  • Industry’s smallest graphics driver memory footprint
    • For the first time, smaller embedded or low end consumer devices and DDR-cost constrained systems can now support the latest graphics and various compute applications that fit those segments. With a smaller footprint you don’t need to increase system BOM cost by adding another memory chip, which is crucial in the cost sensitive markets.
    • There are also Vivante options that support DDR-less MCU/MPUs in the Vega series where no external DDR system memory exists.

More About the Shaders

  • Dynamic, reconfigurable shaders
    • Pipelined FP/INT double (64-bit), single/high (32-bit) and half precision/medium (16-bit) precision IEEE formats for GPU Compute and HDR graphics.
    • Multi-format support for flexibility when running compute in a heterogeneous architecture where coherency exists between CPU-GPU, high precision graphics, medium precision graphics, computational photography, and fast approximate calculations needed for fast, approximate calculations (for example, some image processing algorithms only need to approximate calculations for speed instead of accuracy). With these options, the GPU has full flexibility to target multiple applications.
    • High precision pipeline with support for long instructions.
  • Gigahertz Shaders
    • Updated pipeline enables shaders to run over 1 GHz, while lowering overall power consumption.
    • The high speed along with intelligent power management allows tasks to finish sooner and keep the GPU in a power savings state longer, so average power is reduced.
    • Cores scalable from tens of GFLOPS to over 1 TFLOP in various multi-core GPU versions.
  • Stream-Out Geometry Shaders
    • Increases on-chip GPU processing for realistic, HDR rendering with stream-out and multi-way pipelines.
    • The GPU is more independent when using GS since it can process, create and destroy vertices (and perform state changes) without taking CPU cycles. Previous versions required the CPU to pre-process and load states when creating vertices.

Application Programming Interface (API) Overview

Some of the APIs supported by Vega are listed below. This is not an exhaustive list but includes the key APIs in the industry and show the flexibility of the product line.

  • Full featured, native graphics API support includes:
    • Khronos OpenGL ES 3.0/2.0, OpenGL 3.x2.x, OpenVG 1.1, WebGL
    • Microsoft DirectX 11 (SM 3.0, Profile 9_3)
  • Full Featured, native Compute APIs and support:
    • Khronos OpenCL 1.2/1.1 Full Profile
    • Google Renderscript/Filterscript
    • Heterogeneous System Architecture (HSA)

Product Line Overview

Please visit the Vivante homepage to find more information on the Vega product line.

  • GC400L – Smallest OpenGL ES 2.0 Core – 0.8 mm2 in 28nm
  • GC880 – Smallest OpenGL ES 3.0 Core – 2.0 mm2 in 28nm
GC400 Series GC800 Series GC1000 Series GC2000 Series GC3000 Series GC4000 Series GC5000 Series GC6000 Series GC7000 Series
Vega-Lite Vega 1X Vega 2X Vega 4X Vega 8X
Core Clock in 28HPM (WC-125) MHz 400 400 800 800 800 800 800 800 800
Shader Clock in 28HPM (WC-125) MHz 400 800 1000 1000 1000 1000 1000 1000 1000
Pixel Rate
(GPixel/sec, no overdraw)
200 400 800 1600 1600 1600 1600 3200 6400
Triangle Rate
(M tri/sec)
40 80 123 267 267 267 267 533 1067
Vertex Rate
(M vtx/sec)
100 200 500 1000 1000 2000 2000 4000 8000
Shader Cores (Vec 4)
High/Medium Precision
1 1 2 4 4/8 8 8/16 16/32 32/64
High/Medium Precision
3.2 6.4 16 32 32/64 64 64/128 128/256 256/512
API Support
OpenGL ES 1.1/2.0
OpenGL ES 3.0 Optional Optional
OpenGL 2.x Desktop
OpenVG 1.1
OpenCL 1.2 Optional Optional
DirectX11 (9_3) SM3.0 Optional Optional
Key: ✓  (Supported)   – (Not supported)

Join us @ Khronos BoF Day (SIGGRAPH 2013)

Exciting new things being presented at the Khronos BoF today. You will learn about the new camera WG and Camera API initiatives happening that take advantage of OpenCL, OpenGL ES, OpenVX, and StreamImput. The trend went from traditional photography to computational photography, and now, to perceptual photography (this is a new term that came up today). Improving HDR images (quality, range, bandwidth, memory footprint, computational loading, etc.) is an exciting application where the Khronos group is heading. Multi-sensor applications (ex. Pelican Imaging), stereo, dynamic effects, etc., even though in its infancy as a mainstream technology, is now being pushed hard by the industry. With the inclusion of GPUs in the computational (imaging) pipeline new image dynamics are constantly being enabled to enhance the user experience.

New updates in OpenGL/GL ES will be given throughout the day as we take a look at the GLES 3 and the next generation graphics API. This is also an exciting time as it marks the 10th anniversary of OpenGL ES.

Please join us at the Khronos BoF event so we can update you on the latest information.


Event: Khronos BoF
Date: July 24th
Time: All Day
Location: Anaheim Hiliton (California Ballroom, 2F)

Exciting Updates @ SIGGRAPH 2013 HSA BoF

Please join us at the Heterogeneous System Architecture (HSA) Foundation’s BoF (Bird of Feather) talk at SIGGRAPH 2013. Phil Rogers, President of HSA and AMD Fellow will give the keynote speech and update us on the exciting progress they  have made to push the standard and technology forward. The BoF session will also have a Q&A section where you can get answers to some of your toughest questions.

Please look for us when you are there to ask us how we are innovating in this area, or you can just say “Hello” to us.

Event: HSA Foundation BoF
Date: July 24th
Time: 1 pm
Location: Anaheim Convention Center ( Room 202 B)

Images (and Pixels)…The Next Generation Hyperlink

By Benson Tao (Vivante Corporation)


The world is very visual and thanks to the popularity of media processors (GPUs and VPUs), multimedia has become a ubiquitous part of life. Most of the things we interact with have some sort of screen, ranging from the usual consumer products to washing machines, thermostats, and any other product or gadget that needs to be upsold. Stick a screen on something, maybe add a camera, install Android, and you have a device that could be part of your Internet of Things (IoT) which you can probably sell for a premium…before the next product comes out the next day.

With the rise of multimedia that is standard across all screened devices, so has the explosive growth of media exchange that includes pictures, images, and videos. YouTube has 100 hours of video content uploaded every minute! Facebook has 250 million photos uploaded daily and over 100 billion (as of 2011) photos on its servers! These huge numbers have so many zeroes and something never seen before, and each day those numbers get larger and larger. The rise of visual media content is great, you can find clips of cats playing pianos, live, courtside highlights from the NBA playoffs, and pretty much anything that catches your attention. The flipside of the story is – How do you search for content without getting lost in the noise? Just going through links to find what you want can take a few seconds for simple, straightforward queries, to several minutes or hours for more involved searches. But this number is rising due to the massive amounts of data, and getting relevant search results fast is what counts. As more searches involve multimedia, text hyperlinks will not work so a new generation of visual search (as the new hyperlink) will become standard. QR codes tried to be different, but those have not been successful.

From a high level, VS/MVS is a way of interacting with the world around you. Those interactions can take the form of video, graphics or audio. Bits of these technologies are available, it’s just a matter of packaging them together in the most efficient way. As an example, there are apps that can “listen” to a song and tell you what it is (Shazam is one example). Google Googles is also capable of MVS. In general, a device takes a picture of an image and the image (ex. JPEG) is sent over the network to a server that compares the image and tries to return an answer. Different methods to reduce the transmission and response time have been used where pre-processing is done on the smartphone which reduces the search window to a smaller subset and possibly eliminates server interaction for some use cases.


This is where visual search or mobile visual search comes into play and where the GPU inside a smartphone/tablet or TV applications processor plays in the next generation of search. We have seen PC GPUs play an important role in identifying faces in photos (visual search) using OpenCL with software from Cyberlink or Corel that can identify and sort photos based on different queries. You can search for all photos with John Doe and the GPU searches through the image database and returns all photos with John Doe (assuming he doesn’t have a had and sunglasses on). The results are pretty good, not perfect, but nothing that has this much complexity is perfect in the real world.

Fast forward to today…mobile GPUs have lots of horsepower packed into such die area constrained designs that if not used, it just sits there idle and you remove the efficiency (and potential) of what the technology can do. Many groups are researching how to add mobile visual search optimizations to GPUs like those at Vivante, and the push forward will improve implementations in augmented reality (AR), GPU accelerated MVS, vision processing (can include automotive, consumer electronics, etc.) and many other applications.

The idea of using the GPU has a few areas of consideration:

  • Bandwidth efficiency – Compression or pre-computations on the device to minimize server-client transmissions. Part of this also involves network latency and possibly lower quality image if there are transmission limitations.
  • Lower Power / Thermals on Servers Side – Servers need to compare an image vs. a database, which takes a lot of computational workload.
  • Lower Power / Thermals on Client Side – GPUs are fast and efficient when dealing with parallel workloads. Many search algorithms use parallel computations which are best suited for the GPU.
  • Higher Quality VS/MVS Results – Search results can scale with the GPU with increased accuracy (since GPUs are good at image/pixel processing).

MPEG Group Visual Search Development

I recently saw a new specification developed by the folks at MPEG or Motion Picture Experts Group related to Visual Search (Compact Descriptors for Visual Search) and this is an exciting topic where GPU technology can make it shine. If you have not heard of MPEG, they are responsible for creating the video/audio transmission standards used pretty much by whatever video content you watch. Things like MP3 (audio), MPEG-2, MPEG-4, H.264, and HEVC have all come from that consortium. Since they are the 800 lbs gorilla when it comes to video, they want to create a standard VS/MVS solution that will be interoperable across any device and platform and efficient (bandwidth, algorithm computational overhead, etc.). This is great news since a fragmented solution can break a superb idea, and Khronos is looking at how they can cooperate with MPEG to see how it can play a role in future APIs. This should open up the doors for exciting new applications on any device.

HSA Releases Ver. 0.95 of PRM Specification

By Benson Tao (Vivante Corporation)



The Heterogeneous System Architecture (HSA) Foundation is a not-for-profit consortium that brings together some of the best minds (and companies) across the mobile, PC, consumer, HPC, Compute/Vision industries, along with leading academic institutions and anyone that wants to join in on the fun. The goal of HSA is to create a single architecture specification and standard programming interface (API) that developers can easily adopt to optimize distributed workloads across the GPU, CPU, DSP, and any other compute fabric element on the platform. From a high level view, the platform or system (with all the different components) can be viewed as one large, unified processor that executes a given workload. The main goal is to get the biggest bang for the buck or operational efficiency that includes the highest computational throughput (performance) at the lowest power and thermal envelope. Industry participants in HSA include SoC vendors, IP providers, OEMs, OSVs, and a full range of ISVs and application developers that want to make the best use of platform capabilities.

Vivante Contributes to Platform Innovation

Vivante joined HSA Foundation with the intention of pushing forward a defined specification that advances GPU Compute technologies in mobile, embedded, and consumer platforms. Many of our new and existing customers look to us for guidance on ways to improve their existing platforms and problems they are “stuck” on. Improvements can be as minor as performance gains, reduced BOM (or silicon) costs, and power savings, to re-architecting their designs (through GPU programmability) to fit new use cases and applications so they can extend product lifecycles without incurring major financial costs to replace/upgrade the existing infrastructure. These are some of the ways Vivante looks at defining solutions and future-proofing GPU/GPGPU IP cores to help its customers.

Vivante has multiple products targeting hybrid platforms from mass market cores that have the smallest silicon footprint with OpenGL ES 3.0 and OpenCL 1.1/RS-FS, to mid range and high performance multi-cluster configurable cores. The GPUs work directly with the CPU through a unified memory system, ACE-Lite™ cache coherency, or a native stream interface that connects directly to various compute fabrics. The Vivante HSA design, like the OpenGL ES graphics stack, supports a unified software and hardware package that provides a single architecture spanning multiple operating systems, platforms, and GPU cores. Vivante HSA software will also be backwards compatibility with all existing compute-enabled products and built around HSA APIs and tools that complement our current OpenCL™ and Google Renderscript™/Filterscript support. By simplifying the lives of application developers targeting heterogeneous architectures, programmers can create breakthrough use cases that take advantage of the new paradigm shift to hybrid computing. Real world applications that are already being accelerated by Vivante cores include computer vision, image processing, augmented reality, sensor fusion, and motion processing, with some examples being in the automotive ADAS sector (Advanced Driver Assistance Systems).

HSA Releases Ver. O.95 of the Highly Anticipated Programmers Reference Manual (PRM)

The fruits of hard labor of many technical discussions and architecture meetings over the last year since the consortium’s founding in June 2012 has finally come full circle with the release of version 0.95 of the PRM. This manual is a major milestone and lays the foundation for HSA to successfully move forward as it continues defining the platform of the future. The PRM also gives developers an early start as ecosystem partners create amazing applications, tools, libraries, and middleware programs that work best on HSA certified products.

Some features highlighted in the specification include:

1) Shared Coherent (Virtual) Memory Models

2) Atomics

3) User Mode and GPU Queuing

4) Zero Copy

5) Low Latency Dispatch

The specification also includes HSAIL (HSA Intermediate Language), which abstracts away from the native instruction set of the hardware and can be compiled automatically, in real-time, to the native ISA of the underlying hardware without any developer involvement. The same OpenCL and Renderscript/Filterscript programs can be abstracted and run on HSA platforms also.

Link to HSA Foundation website:

Link to HSA Foundation press release: