<?xml version='1.0' encoding='UTF-8'?>

<?xml-stylesheet href="./_c74_tut.xsl" type="text/xsl"?>
<chapter name="Tutorial 42: Slab: Data Processing on the GPU">
<setdocpatch name="42jSlabDataProcessing" patch="42jSlabDataProcessing.maxpat"/>

<previous name="jitterchapter41">Shaders</previous>
<next name="jitterchapter43">A Slab of Your Very Own</next>
<parent name="jitindex">Jitter Tutorials</parent>



<h1>Tutorial 42: Slab: Data Processing on the GPU</h1>

<techdetail>
	<i>Note:</i> Some techniques described in this tutorial are outdated. Users are recommended to use <o>jit.movie</o> with <b>output_texture</b> enabled instead of uyvy colormode for efficiency of uploading movie frames to the GPU. See the <link type="vignette" module="core" name="jitter_gl_texture_output">GL Texture Output</link> article for more information.
</techdetail>

<p>We saw in the previous tutorial how custom shaders can be executed on the Graphics Processing Unit (GPU) to apply additional shading models to 3D objects. While the vertex processor and fragment processor are inherently designed to render 3D geometry, with a little creativity we can get these powerful execution units to operate on arbitrary matrix data sets to do things like image processing and other tasks. You might ask, "If we can already process arbitrary matrix data sets on the CPU with Jitter Matrix Operators (MOPs), why would we do something silly like use the graphics card to do this job?" The answer is speed. </p>

<div>
<techdetail><b>Hardware Requirement:</b> To fully experience this tutorial, you will need a graphics card which supports programmable shaders&#x2014;e.g. ATI Radeon 9200, NVIDIA GeForce 5000 series or later graphics cards. It is also recommended that you update your OpenGL driver with the latest available for your graphics card. On Macintosh, this is provided with the latest OS update. On PC, this can be acquired from either your graphics card manufacturer, or computer manufacturer.</techdetail>
</div>

<h2>Recent Trends in Performance</h2>
<p>The performance in CPUs over the past half century has more or less followed <i>Moore&#x2019;s Law</i>, which predicts the doubling of CPU performance every 18 months. However, because the GPU&#x2019;s architecture does not need to be as flexible as the CPU and is inherently <i>parallelizable</i> (i.e. multiple pixels can be calculated independently of one another), GPUs have been streamlined to the point where performance is advancing at a much faster rate, doubling as often as every 6 months. This trend has been referred to as <i>Moore&#x2019;s Law Cubed</i>. At the time of writing, high-end consumer graphics cards have up to 128 vertex pipelines and 176 fragment pipelines which can each operate in parallel, enabling dozens of image processing effects at HD resolution with full frame rate. Given recent history, it would seem that GPUs will continue to increase in performance at faster rates than the CPU. </p>

<h2>Getting Started</h2>
<bullet>Open the Tutorial patch and double-click on the <m>p slab-comparison-CPU</m> subpatch to open it. Click on the <o>toggle</o> box connected to the <o>qmetro</o> object. Note the performance of the patch as displayed by the <o>jit.fpsgui</o> object at the bottom.</bullet>
<bullet>Turn off the toggle, close the subpatch and double-click on the <m>p slab-comparison-GPU</m> subpatch to open it. Click the toggle connected to the <m>sync</m> message twice to turn sync off. Click on the <o>toggle</o> box connected to the <o>qmetro</o> object. Note the performance in the <o>jit.fpsgui</o> and compare it to what you got with the CPU version.</bullet><br/>
<p>The patches don&#x2019;t do anything particularly exciting; they are simply a cascaded set of
additions and multiplies operating on a <m>640x480</m> matrix of noise (random values of type
<m>char</m>). One patch performs these calculations on the CPU, the other on the GPU. In both examples the noise is being generated on the CPU (this doesn&#x2019;t come without cost). The visible results of these two patches should be similar; however, as you probably will notice if you have a recent graphics card, the performance is much faster when running on the graphics card (GPU). Note that we are just performing some simple math operations on a dataset, and this same technique could be used to process arbitrary matrix datasets on the graphics card.</p>
<div>
<techdetail>
	What is the <m>sync</m> message? There is usually no point in rendering images faster than the computer can display them. In fact, if the  software gets ahead of the hardware, it could attempt to display two frames at the same time&#x2014; you'd get part of one frame at the top of the window and part of another at the bottom, an effect called "tearing". We also don't need to waste cycles on images that will never be seen&#x2014; the system has other things to do. The <m>sync</m> attribute of the <o>jit.window</o> object synchronizes Jitter  calculations with the window display rate (typically 60 fps). This doesn't mean the GPU is no longer blazing fast, it just gets to take a breather between frames.
</techdetail>
</div>
<br/>
<br/>
<illustration><img src="images/jitterchapter42a.png"/>CPU (left) and GPU (right) processed noise.</illustration>

<h2>What about the shading models?</h2>
<p>Unlike the last tutorial, we are not rendering anything that appears to be 3D geometry based on lighting or material properties. As a result, this doesn&#x2019;t really seem to be the same thing as the shaders we&#x2019;ve already covered, does it? Actually, we are still using the same vertex processor and fragment processor, but with extremely simple geometry where the pixels of the texture coordinates applied to our geometry maps to the pixel coordinates of our output buffer. Instead of lighting and material calculations, we can perform arbitrary calculations per pixel in the fragment processor. This way we can use shader programs in a similar fashion to Jitter objects which process matrices on the CPU (Jitter MOPs).</p>

<bullet>Open the Tutorial patch and double-click on the <m>p slab-composite-DV</m> subpatch to open it. Click on the <o>toggle</o> connected to the leftmost <o>qmetro</o> object.</bullet>
<bullet>Click the <o>message</o> boxes containing <m>dvducks.mov</m> and <m>dvkite.mov</m> to load two DV movies, and turn on the corresponding <o>metro</o> objects to enable playback.</bullet>
<bullet>Load a desired compositing operator from the <o>umenu</o> object connected to the topmost instance of <o>jit.gl.slab</o>.</bullet>
<illustration><img src="images/jitterchapter42b.png"/> UYVY DV footage composited on GPU using "difference" op.</illustration>
<p>Provided that our hardware can keep up, we are now mixing two DV sources in real time on the GPU. You will notice that the <o>jit.movie</o> objects and the topmost <o>jit.gl.slab</o> object each have their <m>colormode</m> attribute set to <m>uyvy</m>. As covered in the <i>Tutorial 49: Colorspaces</i>, this instructs the <o>jit.movie</o> objects to render the DV footage to chroma-reduced YUV 4:2:2 data, and the <o>jit.gl.slab</o> object to interpret incoming matrices as such. We are able to achieve more efficient decompression of the DV footage using <m>uyvy</m> data because DV is natively a chroma-reduced YUV format. Since <m>uyvy</m> data takes up one half the memory of ARGB data, we can achieve more efficient memory transfer to the graphics card. </p>

<p>Let&#x2019;s add some further processing to this chain. </p>

<bullet>Click the <o>message</o> boxes containing <m>read cf.emboss.jxs</m> and <m>read</m> <m>cc.scalebias.jxs</m> connected to the lower two instances of <o>jit.gl.slab</o>.</bullet>
<bullet>Adjust these two effects by playing with the <o>number</o> boxes to the right to change the parameters of the two effects.</bullet>
<illustration><img src="images/jitterchapter42c.png"/> Additional processing on GPU.</illustration>

<h2>How Does It Work?</h2>
<p>The <o>jit.gl.slab</o> object manages this magic, but how does it work? The <o>jit.gl.slab</o> object receives either <m>jit_matrix</m> or <m>jit_gl_texture</m> messages as input, uses them as input textures to render this simple geometry with a shader applied, capturing the results in another texture which it sends down stream via the <m>jit_gl_texture &lt;texturename&gt;</m> message. The <m>jit_gl_texture</m> message works similarly to the <m>jit_matrix</m> message, but rather than representing a matrix residing in main system memory, it represents a texture image residing in memory on the graphics hardware. </p>

<p>The final display of our composited video is accomplished using a <o>jit.gl.videoplane</o> object that can accept either a <m>jit_matrix</m> or <m>jit_gl_texture</m> message, using the received input as a texture for planar geometry. This could optionally be connected to some other object like <o>jit.gl.gridshape</o> for texturing onto a sphere, for example.</p>

<h2>Moving from VRAM to RAM</h2>
<p>The instances of <o>jit.gl.texture</o> that are being passed between <o>jit.gl.slab</o> objects by name refer to resources that exist on the graphics card. This is fine for when the final application of the texture is onto to 3D geometry such as <o>jit.gl.videoplane</o> or <o>jit.gl.gridshape</o>, but what if we want to make use of this image in some CPU based processing chain, or save it to disk as an image or movie file? We need some way to transfer this back to system memory. The <o>jit.matrix</o> object accepts the <m>jit_gl_texture</m> message and can perform what is called <i>texture readback</i>, which transfers texture data from the graphics card (VRAM) to main system memory (RAM). </p>

<bullet>Open the Tutorial patch and double-click on the <m>p slab-readback</m> subpatch to open it. Click on the <o>toggle</o> boxes connected to the leftmost <o>qmetro</o> object. As in the last patch we looked at, read in the movies by clicking on the <o>message</o> boxes and start the <o>metro</o> object on the right side of the patch.</bullet>
<illustration><img src="images/jitterchapter42d.png"/>Matrix readback from GPU.</illustration>
<p>Here we see that the image is being processed on the GPU with <o>jit.gl.slab</o> object and then copied back to RAM by sending the <m>jit_gl_texture &lt;texturename&gt;</m> message to the <o>jit.matrix</o> object. This process is typically not as fast as sending data to the graphics card, and does not support reading back in a chroma-reduced UYVY format. However, if the GPU is performing a fair amount of processing, even with the transfer from the CPU to the GPU and back, this technique can be faster than performing the equivalent processing operation on the CPU. It is worth noting that readback performance is being improved in recent generation GPUs.</p>

<h2>Summary</h2>
<p>In this tutorial we discussed how to make use of <o>jit.gl.slab</o> object to use the GPU for general-purpose data processing. While the focus was on processing images, the same techniques could be applied to arbitrary matrix datasets. Performance tips by using chroma reduced <m>uyvy</m> data were also covered, as was how to read back an image from the GPU to the CPU.</p>

	<seealsolist>
		<seealso display="Video and Graphics Tutorial 9: Building live video effects" module="Video and Graphics" name="jitterchapter00k_Building live video effects" type="tutorial" />		
		<seealso display="GL Texture Output" module="core" name="jitter_gl_texture_output" type="vignette" />
		<seealso display="GL Contexts" module="core" name="jitter_gl_contexts" type="vignette" />		
		<seealso name="jit.fpsgui">Display fps, ms, and matrix attributes</seealso>
		<seealso name="jit.gl.slab">Performs a GL accelerated grid-based evaluation</seealso>
		<seealso name="jit.gl.texture">Manages a GL texture</seealso>
		<seealso name="jit.gl.videoplane">GL accelerated video plane</seealso>
		<seealso name="jit.matrix">The Jitter Matrix!</seealso>
		<seealso name="jit.movie">Play or edit a movie</seealso>
		<seealso name="qmetro">Queue-based metronome</seealso>
	</seealsolist>
	</chapter>
