[investigation] buffer sharing between GPU and ML accelerator #33

huningxin · 2019-11-06T08:42:53Z

For WebNN interoperability for custom op support, so far, we have done the investigation and report out for WebNN-WASM interop and WebNN-WebGPU interop.

According to the WebNN interop investigation next steps discussion in WebML CG call on 3 Oct, the participants were interested in the buffer sharing between GPU and ML accelerator. Opening this issue to capture the requirement as well as share the status and data.

The idea is that WebNN allows to run expensive ops (e.g. conv2d) on ML accelerator and share buffer to WebGPU compute shader to run custom ops (e.g. add/relu). It can be illustrated by following code sample.

// Create a WebNN model contains conv2d
const model = await createWebNNConv(filterValue, noBias, noRelu);
const compilation = await model.createCompilation();
// Let WebNN compilation for the ML accelerator
compilation.setPreference(nn.LOW_POWER);
await compilation.finish();
const execution = await compilation.createExecution();
// input, output, bias are tf.tensor
// Get underlying WebGPUBuffer
const inputBuffer = tf.backend().getBuffer(input.dataId);
const outputBuffer = tf.backend().getBuffer(output.dataId);
// Set WebGPUBuffer as input and output to WebNN execution
execution.setInputGPUBuffer(0, inputBuffer);
execution.setOutputGPUBuffer(0, outputBuffer);
// Execute the WebNN ops on ML accelerator
execution.startCompute();
// Execute the WebGPU ops on GPU
let addOutput = tf.add(output, bias);
let reluOutput = tf.relu(addOutput);
// Read back result from GPU
let result = await reluOutput.data();

Per recommendation from @walrusmcd (thanks!), the investigation will initially target the AI on the PC Devkit. This device has both GPU and VPU (as an example of ML accelerator) that are supported D3D12 and DirectML API. The Chromium WebNN POC will be enhanced to support above scenario.

There are some dependencies need to be work on:

Rebase WebNN POC to the version that WebGPU compute shader works on D3D12
Get WebNN/DirectML backend work on VPU
Get WebGPU-WebNN interop work on D3D12/DML for GPU
Get WebGPU/D3D12/GPU and WebNN/DML/VPU interop work

Currently, we have done the rebase and get basic VPU work in WebNN/DML backend. We'll update here once we make progress on the WebGPU-WebNN interop on D3D12/DML.

All, please kindly let me know whether I miss anything.

huningxin · 2019-12-05T14:54:27Z

Some updates:

Rebase WebNN POC to the version that WebGPU compute shader works on D3D12

Rebased WebNN POC to 80.0.3960.0 for WebGPU D3D12 support. There is an issue that TF.js WebGPU crashes due to lack of read-only storage buffer support. Workaround it by removing the read-only declaration of TF.js shader preprocessor.

Get WebGPU-WebNN interop work on D3D12/DML for GPU

Implemented WebGPU-WebNN interop on Windows with same API as macOS prototype. The WebGPU backend of D3D12 and WebNN backend on DirectML share buffers via D3D12Resource. The test results (with above workaround of TF.js WebGPU backend) are:

WebNN-WebGPU Interop Test
Start
TF.js sets backend as WebGPU
conv2d input dims: [1,100,100,100] and filter dims: [3,3,100,100]

Test1 - conv2d/add/relu (WebGPU): 37.93 ms
Test2 - conv2d (WebNN) -> ArrayBufferView -> add/relu (WebGPU): 27.04 ms
Test3 - conv2d (WebNN) -> WebGPUBuffer -> add/relu (WebGPU): 9.18 ms
Test4 - conv2d/add/relu (WebNN): 7.58 ms

The test platform configuration is

CPU: Intel(R) Core(TM) i7-8559U CPU @2.7 GHz
GPU: Intel(R) Iris(R) Plus Graphics 655, driver version 26.20.100.7463
OS: Windows 10 Pro 1903

Get WebNN/DirectML backend work on VPU

Leveraged DXCore API to enumerate adapters that support compute-only devices, e.g. ML accelerator. When web app compiles WebNN graph with low-power preference, WebNN POC DML backend selects the low-power ML accelerator then creates D3D12/DML device and command queue for it. In particular, for our experiment on AI on PC devkit, the following sample code compiles and executes WebNN graph on VPU.

// Create a WebNN graph contains conv2d
const graph = await createWebNNConv(filterValue, noBias, noRelu);
const compilation = await graph.createCompilation();
// Compiles WebNN graph for VPU
compilation.setPreference(nn.LOW_POWER);
await compilation.finish();
const execution = await compilation.createExecution();
// input and output are TypedArray
execution.setInput(0, input);
execution.setOutput(0, output);
// Executes WebNN graph on VPU
await execution.startCompute();

If the compilation preference is sustained-speed, WebNN DML backend still uses the GPU.

huningxin · 2019-12-10T07:59:53Z

Per the discussion of Dec 5 CG call, the next step of the investigation is to run Test3 on programmable ML accelerator, e.g. VPU. It means running custom ops in WebGPU computer shader and sharing buffer with WebNN built-in ops on ML accelerator. This depends on answer of following open questions:

Can VPU run HLSL?
Can WebGPU support compute-only device (e.g. VPU) and run computer shader on it?
Can WebGPU/D3D12 share buffer with WebNN/DML on VPU? (run Test3 on VPU)

For buffer sharing cross GPU and VPU:

Get WebGPU/D3D12/GPU and WebNN/DML/VPU interop work

As mentioned by @RafaelCintron in meeting, this usage is not recommended as it could be very slow. If the ML accelerator cannot do custom ops, web apps could still use ArrayBuffer.

a-sully · 2024-01-25T22:20:33Z

A solution to the problem of buffer-sharing is proposed in #482. Can we close this issue?

josephrocca mentioned this issue Nov 8, 2022

WebGL - WebGPU interoperability #18 "copy interaction" - what about an easy post process option gpuweb/gpuweb#3574

Open

inexorabletash added the webgpu interop label Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[investigation] buffer sharing between GPU and ML accelerator #33

[investigation] buffer sharing between GPU and ML accelerator #33

huningxin commented Nov 6, 2019 •

edited

huningxin commented Dec 5, 2019

huningxin commented Dec 10, 2019

a-sully commented Jan 25, 2024

[investigation] buffer sharing between GPU and ML accelerator #33

[investigation] buffer sharing between GPU and ML accelerator #33

Comments

huningxin commented Nov 6, 2019 • edited

huningxin commented Dec 5, 2019

huningxin commented Dec 10, 2019

a-sully commented Jan 25, 2024

huningxin commented Nov 6, 2019 •

edited