LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 50485 - Exponential code explosion during inlining
Summary: Exponential code explosion during inlining
Status: NEW
Alias: None
Product: new-bugs
Classification: Unclassified
Component: new bugs (show other bugs)
Version: trunk
Hardware: All All
: P normal
Assignee: Hongtao Yu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-26 08:06 PDT by Joseph Tremoulet
Modified: 2021-06-11 08:48 PDT (History)
5 users (show)

See Also:
Fixed By Commit(s):


Attachments
Repro showing pattern for exponential explosion, grows 1000x at this size. (12.64 KB, text/x-matlab)
2021-05-26 08:06 PDT, Joseph Tremoulet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joseph Tremoulet 2021-05-26 08:06:50 PDT
Created attachment 24893 [details]
Repro showing pattern for exponential explosion, grows 1000x at this size.

Some of our code started failing to compile when change ed9df5bd2f50b changed the optimization pipeline.  The attached repro case demonstrates the issue (seen via `opt -passes 'default<O3>'` or at https://godbolt.org/z/3vccKbxb5).

The optimizations done to clean up inlined code can result in code being smaller when considered as a transitive inline from a subsequent SCC than it was when considered from its own SCC.  This makes the inlines look more favorable after their SCC has already converged, defeating the intent that the bottom-up ordering will ensure that prior SCCs are already flattened out to the point that inlining deeply through them will be undesirable.

Why commit ed9df5bd2f50b triggered this in our code is that the function sizes before and after inlining cleanup now happen to yield inline costs that straddle the threshold.

The attached repro case is structured as follows:
 - It contains an SCC comprising a series of tiers.  Each tier is densely connected (one edge short of a clique), and has one edge to the next tier, with the last having an edge back to the first, completing the cycle.
 - Each individual function in each tier has just the right size and redundant code (a handful of noop stores) so that calls to it are not inlined when the large SCC is visited, but are inlined when it is revisited by transitive inlining of the next SCC.
 - The only other code is a singleton SCC (function @outside_scc) that calls into the large SCC.

Inlining in @outside_scc proceeds until the InlineHistory kicks in and cuts it off, which permits one inline of each acyclic path through the SCC, and the number of such paths is exponential in the number of tiers, hence the code explosion.

The original repro cases we've seen this in are actually two of the specint2k benchmarks, though of course this is using our custom toolchain, so I'd say it occurs in code that's real to us (we'll need a fix downstream regardless of what happens upstream).
Comment 1 Joseph Tremoulet 2021-05-26 08:18:26 PDT
Some ideas for potential approaches to fixing the issue:

1 - During inlining, rather than adding every inlinee's callsites to the worklist of inline candidates, only do so when the inlinee is in the current SCC.  This would seem to be in line (no pun intended) with the intent of the inliner, that calls left in already-converged SCCs are unprofitable inlines.  But I'm not sure I'm interpreting that intent correctly.

2 - Arrange the optimization passes so that cleanup can't happen between one SCC and the next, so that the inline costs won't change between the time when a call is processed in its own SCC vs the next.  I'm not sure how feasible that is.

3 - Adjust the InlineHistory logic to also have a cap on the length of CG paths it inlines.  That would make the upper bound on code growth sensitive to the fan-out of the call graph, rather than its depth, which would help because fan-out, unlike depth, can't be arbitrarily increased without making the individual functions too big to inline.
Comment 2 Joseph Tremoulet 2021-05-26 08:26:46 PDT
cc @chandlerc, not sure how closely this may be related to the issue in PR33072.

cc @fhahn, since ed9df5bd2f50b ostensibly triggered this and I don't know if maybe there's a phase-ordering fix that would make sense.
Comment 3 Joseph Tremoulet 2021-06-11 08:48:07 PDT
Hongtao, I'm assigning this to you per your comment "I will work on a separate per-function capping mechanism" at https://lists.llvm.org/pipermail/llvm-dev/2021-June/151006.html, since we expect such a cap to resolve this issue.

Please reassign if I've misunderstood.  Thanks.