Bug 102232 - Missed arithmetic fold
Summary: Missed arithmetic fold
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 12.0
: P3 enhancement
Target Milestone: 12.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, TREE
Depends on:
Blocks:
 
Reported: 2021-09-07 15:22 UTC by Jeremy R.
Modified: 2022-02-23 21:26 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-09-07 00:00:00


Attachments
[PATCH] PR tree-optimization/102232 (1.06 KB, text/x-diff)
2021-11-09 04:30 UTC, Navid Rahimi
Details
[PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd (1.10 KB, patch)
2021-11-10 16:19 UTC, Navid Rahimi
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy R. 2021-09-07 15:22:54 UTC
LLVM optimizes bar into tgt here but not foo.

https://godbolt.org/z/nhEjaoanx

int foo(int a, int b) {
    return b * (1 + a / b) - a;
}
int bar(int a, int b) {
    return b * (a / b) + b - a;
}
int tgt(int a, int b) {
    return b - a % b;
}

LLVM appears to miss this too.
Comment 1 Jeremy R. 2021-09-07 15:24:41 UTC
Correction on first line: *GCC optimizes bar into tgt here but not foo.
Pardon my sloppy copy-paste from my bug report over on LLVM's bugzilla!
Comment 2 Andrew Pinski 2021-09-07 18:43:48 UTC
So on aarch64, the code for all three functions are almost the same:

foo(int, int):
        sdiv    w2, w0, w1
        madd    w1, w2, w1, w1
        sub     w0, w1, w0
        ret

bar(int, int):
        sdiv    w2, w0, w1
        madd    w1, w2, w1, w1
        sub     w0, w1, w0
        ret

tgt(int, int):
        sdiv    w2, w0, w1
        msub    w0, w2, w1, w0
        sub     w0, w1, w0
        ret

MSVC can do the transformation.
Comment 3 Navid Rahimi 2021-11-09 04:30:27 UTC
Created attachment 51752 [details]
[PATCH] PR tree-optimization/102232
Comment 4 Navid Rahimi 2021-11-09 04:34:03 UTC
This patch I attached will fix this problem and does include the test [1]. You can follow the discussion in GCC-Patches here [1]. Although it seems I still have problem to fix with MIME type of the patch in mailing list. 

1) https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583737.html
Comment 5 Navid Rahimi 2021-11-10 16:17:37 UTC
Comment on attachment 51752 [details]
[PATCH] PR tree-optimization/102232

>From 7c2abb0eab05766ab879066b000c13de827e3b3d Mon Sep 17 00:00:00 2001
>From: Navid Rahimi <navidrahimi@microsoft.com>
>Date: Mon, 8 Nov 2021 13:57:19 -0800
>Subject: [PATCH] PR tree-optimization/102232
>
>	* match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization.
>	* gcc.dg/tree-ssa/pr102232.c: testcase for this optimization.
>---
> gcc/match.pd                             |  7 ++++
> gcc/testsuite/gcc.dg/tree-ssa/pr102232.c | 52 ++++++++++++++++++++++++
> 2 files changed, 59 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr102232.c
>
>diff --git a/gcc/match.pd b/gcc/match.pd
>index 71cf6f9df0a..37c01e79d97 100644
>--- a/gcc/match.pd
>+++ b/gcc/match.pd
>@@ -1295,6 +1295,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (bit_xor (bit_ior:c (bit_not @0) @1) (bit_ior:c @0 (bit_not @1)))
>  (bit_xor @0 @1))
> 
>+/* x * (1 + y / x) - y -> x - y % x */
>+(simplify
>+ (minus (mult:cs @0 (plus:cs integer_onep (trunc_div:s @1 @0))) @1)
>+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>+      && types_match (@0, @1))
>+  (minus @0 (trunc_mod @1 @0))))
>+
> /* ((x & y) - (x | y)) - 1 -> ~(x ^ y) */
> (simplify
>  (plus (nop_convert1? (minus@2 (nop_convert2? (bit_and:c @0 @1))
>diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102232.c b/gcc/testsuite/gcc.dg/tree-ssa/pr102232.c
>new file mode 100644
>index 00000000000..e7485cf24e9
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102232.c
>@@ -0,0 +1,52 @@
>+/* PR tree-optimization/102232 */
>+/* { dg-do run } */
>+/* { dg-options "-O3 -fdump-tree-optimized" } */
>+
>+int __attribute__ ((noipa)) foo (int a, int b)
>+{
>+  return b * (1 + a / b) - a;
>+}
>+
>+int
>+main (void)
>+{
>+  // few randomly generated test cases
>+  if (foo (71856034, 238) != 212)
>+    {
>+      return 1;
>+    }
>+  if (foo (71856034, 10909) != 1549)
>+    {
>+      return 1;
>+    }
>+  if (foo (20350, 1744) != 578)
>+    {
>+      return 1;
>+    }
>+  if (foo (444813, 88563) != 86565)
>+    {
>+      return 1;
>+    }
>+  if (foo (112237, 63004) != 13771)
>+    {
>+      return 1;
>+    }
>+  if (foo (68268386, 787116) != 210706)
>+    {
>+      return 1;
>+    }
>+  if (foo (-444813, 88563) != 90561)
>+    {
>+      return 1;
>+    }
>+  if (foo (-68268386, 787116) != 1363526)
>+    {
>+      return 1;
>+    }
>+
>+  return 0;
>+}
>+
>+/* Verify that multiplication and division has been removed.  */
>+/* { dg-final { scan-tree-dump-not " \\* " "optimized" } } */
>+/* { dg-final { scan-tree-dump-not " / " "optimized" } } */
>\ No newline at end of file
>-- 
>2.25.1
>
Comment 6 Navid Rahimi 2021-11-10 16:19:05 UTC
Created attachment 51760 [details]
[PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd
Comment 7 Navid Rahimi 2021-11-10 16:22:27 UTC
The new version of the patch I attached to this bug has been approved by Richard Biener in this thread [1]. 



1) https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583935.html
Comment 8 GCC Commits 2021-11-23 03:08:44 UTC
The master branch has been updated by Jeff Law <law@gcc.gnu.org>:

https://gcc.gnu.org/g:df1a0d526e2e4c75311345c0b73ce8483e243899

commit r12-5460-gdf1a0d526e2e4c75311345c0b73ce8483e243899
Author: Navid Rahimi <navidrahimi@microsoft.com>
Date:   Mon Nov 22 22:07:35 2021 -0500

    Re: [PATCH] PR tree-optimization/102232 Adding a missing pattern to match.pd
    
            PR tree-optimization/102232
    
    gcc/
            * match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization.
    
    gcc/testsuite/
    
            * gcc.dg/tree-ssa/pr102232.c: Testcase for this optimization.
Comment 9 Roger Sayle 2022-02-23 21:26:35 UTC
This is fixed on mainline; the godbolt link in comment #1 shows that GCC now generates the same code for all three functions.