Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX] Fix numerical errors in softmax when dim is not last dimension #37326

Closed

Conversation

p12tic
Copy link
Contributor

@p12tic p12tic commented Apr 26, 2020

Fixes #34585.

This PR improves the workaround for the problem of different semantics between ONNX softmax and Pytorch softmax.

In Pytorch the dim parameter specifies over which dimension normalize the values. ONNX on the other hand always coerces the input into a 2D tensor and the axis parameter specifies which dimensions represent rows and columns of the resulting tensor. As a result, only when we are normalizing the last dimension (dim == ndim - 1) semantics are the same.

Previously this was handled by recognizing the dim == ndim - 1 case and using softmax for that. All other cases used a fallback path of explicit invocations of exp, reducesum and div operators to compute the result. Unfortunately, this results in numeric errors when input values are large: the result of exp will produce infinity on both numerator and denumerator and the division of that will result in NaN.

This can be improved by transposing the input tensor so that we can reuse ONNX softmax.

Similar approach has been applied to logsoftmax function in #30433.

@dr-ci
Copy link

dr-ci bot commented Apr 27, 2020

💊 Build failures summary and remediations

As of commit 2a9c810 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 5 times.

@mrshenli mrshenli added the module: onnx Related to torch.onnx label Apr 27, 2020
@mrshenli mrshenli requested a review from houseroad April 27, 2020 20:25
@p12tic p12tic force-pushed the onnx-softmax-fix-numerical-errors branch from 63c4108 to ea8f934 Compare April 28, 2020 14:51
Copy link
Collaborator

@BowenBao BowenBao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @p12tic for the fix! Overall looks good, please see inline comments for some additional updates.

test/onnx/test_pytorch_onnx_onnxruntime.py Show resolved Hide resolved
torch/onnx/symbolic_opset9.py Show resolved Hide resolved
@BowenBao
Copy link
Collaborator

BowenBao commented May 1, 2020

@p12tic could you take a look at the comments and update your PR? Thanks!

@p12tic
Copy link
Contributor Author

p12tic commented May 2, 2020

@BowenBao Yes, I will do so. This week my priorities were elsewhere, thus the delay.

@p12tic p12tic force-pushed the onnx-softmax-fix-numerical-errors branch from ea8f934 to 2a9c810 Compare May 3, 2020 18:58
@p12tic
Copy link
Contributor Author

p12tic commented May 3, 2020

@BowenBao I've applied your suggestions to the PR. Thanks for suggesting them to the PR instead of doing it yourself, I've got to learn what max normalization in softmax is :-)

Copy link
Collaborator

@BowenBao BowenBao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@p12tic Thanks for the fix! LGTM

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@houseroad merged this pull request in d16c823.

ShawnZhong pushed a commit to ShawnZhong/pytorch that referenced this pull request May 5, 2020
…pytorch#37326)

Summary:
Fixes pytorch#34585.

This PR improves the workaround for the problem of different semantics between ONNX softmax and Pytorch softmax.

In Pytorch the `dim` parameter specifies over which dimension normalize the values.  ONNX on the other hand always coerces the input into a 2D tensor and the `axis` parameter specifies which dimensions represent rows and columns of the resulting tensor. As a result, only when we are normalizing the last dimension (`dim == ndim - 1`) semantics are the same.

Previously this was handled by recognizing the `dim == ndim - 1` case and using `softmax` for that. All other cases used a fallback path of explicit invocations of exp, reducesum and div operators to compute the result. Unfortunately, this results in numeric errors when input values are large: the result of exp will produce infinity on both numerator and denumerator and the division of that will result in NaN.

This can be improved by transposing the input tensor so that we can reuse ONNX softmax.

Similar approach has been applied to `logsoftmax` function in pytorch#30433.
Pull Request resolved: pytorch#37326

Reviewed By: hl475

Differential Revision: D21389712

Pulled By: houseroad

fbshipit-source-id: 554fd1b98231a28984c30c7e7abd3c0643386ff7
bharatr21 pushed a commit to bharatr21/pytorch that referenced this pull request May 5, 2020
…pytorch#37326)

Summary:
Fixes pytorch#34585.

This PR improves the workaround for the problem of different semantics between ONNX softmax and Pytorch softmax.

In Pytorch the `dim` parameter specifies over which dimension normalize the values.  ONNX on the other hand always coerces the input into a 2D tensor and the `axis` parameter specifies which dimensions represent rows and columns of the resulting tensor. As a result, only when we are normalizing the last dimension (`dim == ndim - 1`) semantics are the same.

Previously this was handled by recognizing the `dim == ndim - 1` case and using `softmax` for that. All other cases used a fallback path of explicit invocations of exp, reducesum and div operators to compute the result. Unfortunately, this results in numeric errors when input values are large: the result of exp will produce infinity on both numerator and denumerator and the division of that will result in NaN.

This can be improved by transposing the input tensor so that we can reuse ONNX softmax.

Similar approach has been applied to `logsoftmax` function in pytorch#30433.
Pull Request resolved: pytorch#37326

Reviewed By: hl475

Differential Revision: D21389712

Pulled By: houseroad

fbshipit-source-id: 554fd1b98231a28984c30c7e7abd3c0643386ff7
@p12tic p12tic deleted the onnx-softmax-fix-numerical-errors branch October 11, 2022 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unstable softmax trace when exporting to onnx
6 participants