Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[op compatibility] conv2d #28

Closed
nsthorat opened this issue Sep 16, 2019 · 17 comments
Closed

[op compatibility] conv2d #28

nsthorat opened this issue Sep 16, 2019 · 17 comments

Comments

@nsthorat
Copy link

nsthorat commented Sep 16, 2019

This issue will track op compatibility resolution for conv2d.

Signature:
conv2d(x, filter, padding, strides, dilations)

Arguments:
x: 4-dim tensor with logical shape [N, H, W, in_channels]
filter: 4-dim tensor with logical shape [H, W, in_channels, out_channels]
padding: array of ints. Padding for the beginning and ending along each spatial axis.
strides: array of ints that have length 1, 2 or 4. The stride of the sliding window for each dimension of input.
dilations: array of ints of length 2: [dil_height, dil_width] The dilation factor for each spatial dimension of the input.

Docstring:
Computes a 2-D convolution given 4-D input and filter tensors.

Notes:

  • Tensor layout should be a single logical type to leave transpose graph optimizations up to the browser (this is much more portable because the model author does not have to think about the hardware of the device).
  • Tensor layout should be NHWC because HTMLMediaElements are already of this format and we want to remove the burden on the user to remove transposing.

To be discussed:

  • Compatibility with underlying APIs / hardware
@huningxin
Copy link
Contributor

conv2d(x, filter, padding, strides)

Is the dilations parameter missed in the signature?

@nsthorat
Copy link
Author

Fixed, thanks :)

@huningxin
Copy link
Contributor

huningxin commented Oct 2, 2019

  • Compatibility with underlying APIs / hardware

The following table is based on WebNN POC native mapping. It captures the mapping to NNAPI, MPS and DirectML that can be used as the base for native APIs compatibility study.

WebNN NNAPI MPS DML BNNS DNNL (MKL-DNN)
conv2d ANEURALNETWORKS_CONV_2D MPSCNNConvolution DML_OPERATOR_CONVOLUTION BNNSFilterCreateConvolutionLayer convolution_forward
x: 4-dim tensor with logical shape [N, H, W, in_channels] input[0]: 4-dim tensor of shape [N, H, W, in_channels] sourceImage of MPSCNNKernel.encodeToCommandBuffer: 4-dim tensor in MPSImage
described by MPSImageDescriptor {numberOfImages, featureChannels, width, height}
InputTensor of DML_CONVOLUTION_OPERATOR_DESC: 4-dim tensor in buffer resource of shape [N, in_channels, H, W] in_desc: 4-dim tensor of shape [N, in_channels, H, W] src_desc of convolution_forward::desc: 4-dim tensor of shape [N, in_channels, H, W]
filter: 4-dim tensor with logical shape [H, W, in_channels, out_channels] inputs[1]: 4-dim tensor of shape [out_channels, H, W, in_channels] weights of MPSCNNConvolutionDataSource:
4-dim tensor of shape [out_channels, H, W, in_channels]
FilterTensor of DML_CONVOLUTION_OPERATOR_DESC: 4-dim tensor of shape [out_channels, in_channels, H, W] weigths of BNNSConvolutionLayerParameters: 4-dim tensor of shape [out_channels, in_channels, H, W] weights_desc of convolution_forward::desc: 4-dim tensor of shape [out_channels, in_channels, H, W]
padding: array of ints. Padding for the beginning and ending along each spatial axis. (explicit padding) input[3:6]: int, padding to beginning_W, ending_W, beginning_H, ending_H offset of MPSCNNKernel: in MPSOffset {x, y, z}, int, offset (-padding) for beginning_W and beginning_H.
Open: padding for ending?
StartPadding of DML_CONVOLUTION_OPERATOR_DESC: unsigned int, padding to the start of the corresponding axis [beginning_H, beginning_W]
EndPadding of DML_CONVOLUTION_OPERATOR_DESC: unsigned int, padding to the end of corresponding axis [ending_H, ending_W]
x_padding of BNNSConvolutionLayerParameters: unsigned int, beginning and ending padding of width axis
y_padding of BNNSConvolutionLayerParameters: unsigned int, beginning and ending padding of height axis
Open: only support same beginning and ending paddings along each spatial axis.
padding_l of of convolution_forward::desc: int64, [beginning_H, beginning_W]
padding_r of convolution_forward::desc: int64, [ending_H, ending_W]
strides: array of ints that have length 1, 2 or 4. The stride of the sliding window for each dimension of input. input[7:8]: int, stride_W and stride_H
Open: support strides of N and C?
strideInPixelsX and strideInPixelsY of MPSCNNConvolutionDescriptor: unsigned int, stride_W and stride_H
Open: support strides of N and C?
Strides of DML_CONVOLUTION_OPERATOR_DESC: unsigned int, [stride_H, stride_W]
Open: support strides of N and C?
x_stride and y_stride of BNNSConvolutionLayerParameters: unsigned int, stride_W and stride_H
Open: support strides of N and C?
strides of convolution_forward::desc: int64, stride_W, stride_H
Open: support strides of N and C?
dilations: array of ints of length 2: [dil_height, dil_width] The dilation factor for each spatial dimension of the input. input[11:12]: int, dilation_width and dilation_height dilationRateX and dilationRateY of MPSCNNConvolutionDescriptor: unsigned int, dilation_width and dilation_height Dilations of DML_CONVOLUTION_OPERATOR_DESC: unsigned int, [dilation_height, dilation_width] Open: not supported dilates of convolution_forward::desc: int64, [dilation_width, dilation_height]

According to the table, most features of proposed conv2d op can be supported by NNAPI, MPS and DirectML, BNNS and DNNL with some opens:

  1. dilation is not supported by BNNS.
  2. MPS and BNNS only support same beginning and ending paddings along each spatial axis.
  3. strides for N and C dimension are not supported by all APIs.
  4. int to unsigned int conversion for strides and dilations are required by MPS, BNNS and DML.

Another open question is: NNAPI, MPS and DML all support fused Bias and Activation for conv2d. It looks like there are two options to support them:

  1. add 'bias' and 'activation' into conv2d op
  2. add separated add and activation ops and leverage the underlying graph optimizer to generate the fused native op

Any thoughts?

@BenjaminPoulain
Copy link

Thanks Ningxin.

Would it be useful to add BNNS and MKL to survey all the possible shapes?

I believe fusing bias and activation should be done by the optimizer. The fused ops are arbitrary. It will make the API cleaner.

@huningxin
Copy link
Contributor

Would it be useful to add BNNS and MKL to survey all the possible shapes?

Good point, thanks Benjamin. BNNS and MKL-DNN (renamed to DNNL) can support CPU acceleration. It would be useful to survey them. I updated the table with BNNS and DNNL columns. The opens have been updated accordingly. Please take a look.

I believe fusing bias and activation should be done by the optimizer. The fused ops are arbitrary. It will make the API cleaner.

+1

@BenjaminPoulain
Copy link

dilation is not supported by BNNS.

This should not be a problem.

@anssiko
Copy link
Member

anssiko commented Oct 15, 2019

I think we're ready to formalize this op and add it into the spec.

Thanks @BenjaminPoulain for Apple's perspective. @nsthorat and @RafaelCintron please chime in if you have further feedback or design considerations from Google and Microsoft.

Unless we hear concerns, I'd ask @huningxin to craft a PR to add conv2d to the spec.

@anssiko
Copy link
Member

anssiko commented Oct 31, 2019

I'll revert my previous comment on spec readiness. There are still opens that need to be addressed before this op can be spec-formalized, quoting @huningxin for opens:

  1. dilation is not supported by BNNS.
  2. MPS and BNNS only support same beginning and ending paddings along each spatial axis.
  3. strides for N and C dimension are not supported by all APIs.
  4. int to unsigned int conversion for strides and dilations are required by MPS, BNNS and DML.

Suggestions welcome on how to overcome these four issues!

@BenjaminPoulain
Copy link

I'd like to add a few open questions:

  1. Precision: Convolution has the same concerns as [op compatibility] matMul #27 (comment) - this could matter for compatibility.
  2. Input shape: The operation does not need to define the batch dimension to be valid ([N, H, W, in_channels] -> [H, W, in_channels]). I suspect most usage do not need N. When there are higher dimensions, I don't think it needs to be limited to one (e.g. [A, B, ..., H, W, in_channels]). The definition of mathMul in [op compatibility] matMul #27 has this kind of flexibility.
  3. Should the dimensions be int or a Number -> ToLength()? (OperandDescriptor's dimension should match ECMAScript's TypedArray dimension #29)

huningxin added a commit to huningxin/webnn that referenced this issue Dec 9, 2019
Per 5 Dec CG meeting [1], move the conv2d compability table [2] to markdown.

[1] https://www.w3.org/2019/12/05-webmachinelearning-minutes.html#x02
[2] webmachinelearning#28 (comment)
@huningxin
Copy link
Contributor

huningxin commented Dec 10, 2019

Per 5 Dec CG meeting, moved the conv2d native API compatibility table to op_compatibility/conv2d.md. Feel free to review and contribute. @RafaelCintron @walrusmcd

@wchao1115
Copy link
Collaborator

@nsthorat Is supporting grouped convolution out of scope?

Grouped convolution is used in AlexNet, the filter tensor shape for DirectML will be [out_channels / group_count, group_count, in_channels / group_count, H,W] when group_count > 1.

And since the API is explicitly called 'conv2d', I'm assuming supporting 3-D volumetric convolution (with 'depth' input dimension) is out of scope.

@anssiko
Copy link
Member

anssiko commented Jan 24, 2020

@wchao1115 asked:

@nsthorat Is supporting grouped convolution out of scope?

On the 23 Jan 2020 call we agreed to keep grouped convolution a separate op, compatibility to be assessed in its own issue.

@smilejames
Copy link

Hi ningxin @huningxin ,I really appreciate the work WebNN has done.
Can you add Conv2d of Paddle-Lite to the form?
Conv2d : https://www.paddlepaddle.org.cn/documentation/docs/en/api/layers/conv2d.html

@huningxin
Copy link
Contributor

Thanks for the info @smilejames

Can you add Conv2d of Paddle-Lite to the form?

I'd like to help. Let me take a look at paddle doc and make a PR. I'll ping you for review.

This was referenced Feb 6, 2020
@anssiko
Copy link
Member

anssiko commented Mar 26, 2020

@huningxin @wchao1115 I believe we can close this issue now. Please check if there are any opens remaining.

@wchao1115
Copy link
Collaborator

@anssiko Ideally I would prefer that all the ? in the table are definitively answered e.g. striding/padding support in some API etc. but I would be ok closing this issue as-is. I assume people do look at the current spec and relate that to their respective API advocacy. Perfection is the enemy of progress!

@dontcallmedom
Copy link
Contributor

closing issue as per discussion above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants