New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delimited continuation primops #313
Delimited continuation primops #313
Conversation
65c8a72
to
a0429ae
Compare
This simplification was suggested by Tom Ellis.
Could you add a link to the source code for the benchmarks? |
I would also like to add that having the patch set and the code that uses the proposal concretely available at the same time is fantastic |
This looks really nice! As Carter mentions, it is very nice to see the implementation at the same time as the proposal, especially along with some decent evidence that the thing does what it intends to do. :) I'd also like to see the link to the benchmarks, but yeah, this looks like it has practically no downsides and gets a fair amount of bang for its buck. It might be nice to hear from some other people involved in the development of the RTS, but the changes there don't look onerous in any way. |
control0 :: ((IO b -> IO a) -> IO a) -> IO b
control0 :: (forall a. (IO b -> IO a) -> IO a) -> IO b
Oh I think I see. It has to match the |
If I |
Some of this is a bit over my head, but I'm wondering if it could be used to improve the performance of the default thread scheduler of |
The benchmark code is available here: https://github.com/ocharles/effect-zoo/tree/eff-and-polysemy
Yes, that’s right.
If you do that, the behavior is undefined. This is the relevant sentence from the proposal:
Here, “the same state thread” means “a state thread originating from the same call to
What exactly does that mean? Well, consider that prompt $ do
let x = unsafePerformIO $ control0 f
g x When will the Another piece of complexity to this scenario is that let y = 1 + error "bang"
We can’t do that with So this proposal doesn’t attempt to make any guarantees about this situation. In fact, in the current implementation, doing this will leave a dangling blackhole, which means if another thread is blocking on the blackhole and the continuation is never invoked, the thread will be blocked forever. What’s more, if you invoke the continuation multiple times, you’ll end up trying to update the same thunk more than once, and I don’t even know what would happen in that case. Conclusion: don’t do this. |
This is an interesting question. It’s challenging to say for certain. Something to consider about this proposal is that it’s really designed to optimize for situations where continuation capture is infrequent, which is usually true for effect systems. As the Motivation section stresses, the big win gained by using native delimited continuations is that programs that don’t need to use continuations don’t have to pay for them. But An advantage of the CPS approach is sharing. The CPS approach is basically building a heap-allocated tree to represent the continuation. If a continuation is restored, then quickly captured again, the tree doesn’t have to be rebuilt on the second capture, since the closures have already been allocated. In comparison, restoring a There are several ways to ameliorate this cost:
This is unfortunately one of the tricky things about higher-order control. Different usage patterns benefit from radically different implementations, even if the user-facing interface is largely the same. This proposal tries to strike a compromise between expressiveness, simplicity, and performance: performance ought to be “good enough” for most use cases without overfitting to a particular problem while still providing a small set of operators that can express everything you might reasonably want to. |
-- EDIT: This is not unsafeCoerce
unsafeCoerce :: IO a -> IO b
unsafeCoerce a = prompt (undefined (control0 (\k -> k a))) EDIT: Actually my example isn't |
I'm very much in favour of this proposal. It took me a while to understand what If during our evaluation we are about to evaluate an expression of type Suppose then we are evaluating a
We evaluate
Instead,
During evaluation of
When
The evaluation that was originally intended to happen part way through the evaluation of |
(note that this doesn't include all the benchmarks and timings, but i think lexi's proposal gets algebraic effects ) hrmm, @lexi-lambda whats the perf figures for your approach at n==1000...0 ? I'd love to have some clarity on the asymptotics which we cant really infer |
For these sorts of benchmark figures, log scale can make a huge difference in clarity. It’s been a long standing misfeature of criterion thst it doesn’t do log scale by default. That should be fixed at some point. |
And if I’m reading those figure correctly, eff with delimited continuations is at least ~10x slower as n gets larger vs mtl with specialize or reference program. At least in this micro benchmark with two points of measurements. To be clear, it does seem to be orders of magnitude less slow than other extant algebraic effect stack Libs in this example per your figure |
It would also be helpful to include the figures for the other comparisons. I can help you tweak the R scripts in ollies repo if you need it |
I’ll see about doing the full suite with lexis patches on my larger computer later today and generate the full measurement set |
And let me emphasize : I genuinely value and appreciate that a proposal patch set and example programs are part of the proposal. It makes it much easier to understand and evaluate for anyone who wants to help the ghc proposal process. |
@cartazio The lack of Almost all existing benchmarks for effect systems are highly misleading. They are written in such a way that GHC either inlines or specializes everything, but that is extremely unlikely in real programs. When people use In those situations, GHC will not specialize the code. The only practical way to get GHC to specialize it at that point is to write The benchmarks I’ve provided are intentionally structured to provide numbers that are more representative of effect system performance when used in real programs. In those situations, some dispatch is inevitable, and whole-program specialization is unrealistic. If you accept those constraints, the important thing to optimize for is dispatch cost, which |
Absolutely true points. Just trying to understand how they fare vs traditional techniques. I do personally think using mtl as an algebraic effects style library is problematic because of the general noncommutativity of monad transformers. Comparing libraries should ideally be in terms of respective norms and practices etc etc. (and there’s definitely room for more humane tool support for tactical/strategic specialization based optimization in ghc) Your figures definitely communicate the constant factors hands down favor your work over the alternative algebraic effect Libs. Which benchmarks would help us understand the relative asymptotics? I realize, of course, this is mostly me trying to understand the applications of your ghc proposal rather than its core. One question I have, and maybe you can help me understand this best: what invariants or assumptions does this changeset out on the ghc rts that other operations or machinery don’t already impose? |
Yes, understood and generally agreed. I think, unfortunately, the ship has sailed on noncommutativity of effect handlers within Haskell effect systems, as both
It depends on what you mean by “asymptotics.” One benchmark that may answer your question is the “BigStack” benchmark included in the same benchmark suite, which tests the performance of programs with a large stack of effect handlers. Here are the results: As you can see, approaches based on free monad-like constructions (such as Here you can see that the constant factor of pushing a handler is a little higher for More generally, I think we lack good, real-world benchmarks for effect systems. All of the benchmarks we currently have are synthetic microbenchmarks. The microbenchmarks are not useless (though as I stated above, you have to be very careful to make sure you’re measuring what you think you are), but they don’t tell the full story. It would be wonderful to have some larger, less synthetic benchmark programs, but building those is a lot of work, and I don’t think it’s something I can realistically do as part of this proposal.
Good question. As mentioned in the Effects and Interactions section of this proposal, the added assumptions are actually very small because GHC already implements something similar for asynchronous interrupt handling. For more information about how that works, take a look at An Implementation of Resumable Black-Holes by Alastair Reid (which SPJ pointed me to when I first started looking into this). |
FWIW I and the company I work for, as well as many other companies whose code I've had the opportunity to look at make use of concrete MTL stacks, or with SPECIALIZE pragmas for strategic parts of the program. In fact, this was blogged about recently as part of a "Optimizing Haskell in Production" series of posts by Joachim Breitner here to that effect. I've seen some people make use of MTL as an effect system, but not often in my area of the hask-o-sphere precisely because it doesn't win you much for the performance drawback. Classy Optics have been the preferred method of extending capabilities-like behavior. That said, I think this proposal is pretty killer. It's not every day you see a really well-thought out GHC proposal come with an implementation! And regardless of the effects debate (which I assume will rage on long after), this proposal should go in because of the clear benefits to the way GHC handles continuations in general, not just because of the potential benefits to the folks building new effects systems.
This is an unfortunate side effect of the non-commutativity of Monads in general. Which, by all means, please solve this for us 😄 |
Yes, I should definitely make this clear: concrete transformer stacks (whether you’re using I think that is not a fundamental problem, and it may very well be possible to implement a GHC plugin to add an effect specialization pass to the simplifier. However, I’m not worried about doing that in the short term for the following reasons:
|
To add to @tomjaguarpaw’s comment about understanding delimited continuations, I happened to have a conversation with someone on the FP slack about a week ago, which some people requested I turn into a gist. It discusses continuations from a Haskell perspective, and it also includes a little information on reduction semantics, which I think is a useful framework for understanding delimited continuations. The conversation is available here, and people who want to understand this proposal better might find it helpful. Also, I want to address this comment directly:
You can absolutely write unsafeCoerce :: a -> b
unsafeCoerce a = case runRW# (prompt# (control0# $ \_ s -> (# s, a #))) of (# _, b #) -> b To understand this using reduction semantics, recall the reduction rule given in the proposal:
Here we take
We can reduce further using the beta rule to get
which in the original program appears in a larger evaluation context: case runRW# (\s1 -> ●) of (# _, b #) -> b The definition of |
@emilypi took the worlds out of my mouth :). I do not care at all about effect systems and always use I do have some thoughts on the general issues this proposal touches. It's a bit tangential, but I suppose I might write it anyways lest it influence's others interest in the proposal. I'll try to someday expand on them more fully in a better, separate venue.
I think this is because this much of this research is "stabbing with axioms" rather than fully characterizing semantics of what they care about. In scheme, or untyped languages in which it's hard to make some sort of safe and ergonomic overloading system, I see why they might be easier to use than the "virtualized" lambda-based control. But in Haskell those virtualized methods work great: safe and expressive. Cost is the sole problem. If cost is the sole problem, then we we can turn the problem around, looking not at the cost of control operators, but the safe composition of cheap stack operations. In other words, we desire a calculus which exposes that data structure, and has typing rules sufficient to encoding the invariants. For example, for tagged prompts the tag can also be the capability, and now we don't need the real world or any strictness. Rentrancy issues likewise beg for linearity. What might this calculus look like? It turns out we have been dancing around similar things for a while. Witness https://www.microsoft.com/en-us/research/wp-content/uploads/2016/04/sequent-calculus-icfp16.pdf which became join points, or the discussion of polarity in https://ix.cs.uoregon.edu/~pdownen/publications/eta.pdf . The earlier still http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.631.1583&rep=rep1&type=pdf also relates---it is presented as a variation of A-normal form, but A-normal form in fact is very close (exactly?) equivalent to the image of the call-by-value to call-by-push-value translation, and call-by-push value is also intimately related to this larger line of polarity research as explained in the ICFP '19 paper. In each case above the change has yet to happen or the change to core was too great. In each case in isolation, I understand, but collectively I am a bit apprehensive. What can we do to make exploring these radical alternatives more feasible? The benefits I dream of down this road are huge. Beyond explicit stack manipulations there's replacing STG and/or C-- with languages that preserve Haskell types (today you might say we preserve kinds), and "typed inline assembly" mixing code between adjacent intermediate representations while upholding the invariants the code each requires. Ultimately, I'd like to see RTS rewritten in the one of the "safe" and "low-level languages" this research yields, so we can have full verified compilation and linking. So back to this proposal, what I really hope for is new RTS primitives like this one will "tip the scales" so GHC researchers go back and reexamine those paths yet taken. |
I always find delimited continuations hard to grok. My role in the "Monadic framework for delimited continuations" paper, which you kindly cite, was very much the guinea pig saying "why does it have to be like that". So as yet I only grasp a fragment of what this proposal is about. The Motivation section of the proposal is great -- but it appears to be 100% disconnected from the Proposed Changes. One thing that might help woudl be a side by side example: say here it is in MTL, and here is the same thing using the new primops. Minimum of abstractions, focus on how fewer closures are built. Showing how the new stuff addresses the difficulties shown in Motivation. My main anxiety is about typing. Usually well typed programs can't go wrong. But now they can. Yikes. Is that really really really necessary? The above paper has prompts as first class values, and (SubCont a b) values to represent stack chunks. Can't you do something like that? I feel we'd need a very convincing story to explain why it's utterly impossible to make a statically typed version. |
What is the status of this proposal? It looks like (some) agreements were reached and then the activity died down ;) |
If that is the case, then @lexi-lambda could officially submit it to the committee |
Yes, I apologize for dropping this—alas, I’ve been mostly just working on this in my spare time, and my schedule and energy are rather wildly variable. Nevertheless, I agree that it seems like it’s in a good spot. I did want to respond to some of @aspiwack’s comments, but I never got to it; let me do that now.
Yes, I agree with this. Unfortunately, I do not see any obvious wins to be gained from one-shot continuations here, especially given that the proposal now includes prompts. One-shot undelimited continuations most definitely allow significant shortcuts, but I don’t know of anything similar for delimited ones, though I am not an expert on continuation implementation strategies and could very well be unaware of some known trick. I think that, if we were to introduce one-shot continuations, an API like you describe would absolutely make sense, and indeed, it would be a significant benefit! But as it stands, I don’t think they quite fit into this proposal, so for now I think the current API is for the best. With that out of the way, I think I will take @nomeata up on his suggestion and request that this proposal be reviewed by the committee. 🙂 |
/remind @simonmar to check progress in 2 weeks |
@nomeata set a reminder for Sep 26th 2020 |
Alexis, I couldn't see a spot in the proposal where you discuss the resource re-entrancy problem. (I searched for "resource" and "re-entrancy".) Do you in fact discuss what safety means? Could you. To be clear, provided the implementation is not ghastly (I actually expect it shares quite a bit with exception handling) I'm in warm support of this proposal. |
To be clear: I didn't mean any of this a suggestion. |
@simonpj, the re-entrancy problem is discussed in the Effects and Interactions section, though it uses phrase “re-entrant” rather than “re-entrancy,” which is why your search failed! The discussion of that problem makes up the bulk of the section, but I’ll also reproduce it here for clarity:
|
Ah yes. Suggestion:
You could say that IO and asynch exceptions properly belong with "interactions" but to me both are really part of understanding the specification. Is it possible to say what make the use of these primitives safe wrt the above things? Eg if you use the continuation exactly once things are ok? Or what? |
👋 @simonmar, check progress |
Is this awaiting @simonmar's action, or is this pending @lexi-lambda's response to @simonpj (or something else)? I fear unintentionally spamming this list, but I find that the next step isn't obvious to me, so I figured there might be some value for all involved parties in raising the question. While I'm here: I'm very excited about this work! 🙂 |
Not spam at all -- a useful nudge |
Question about this from the proposal:
How would that be enforced? There's no way that I'm aware of to detect an |
Currently the implementation just checks for an update frame, right? My understanding is that this catches some uses of Alternatively, if any of these conditions are too difficult to detect, what about just leaving them declared undefined? The implementation could make its best effort to detect errors for debugging purposes, but these are low-level primitives after all and I don’t think it’s unreasonable not to mandate detecting every violation. |
Back in July I asked Daan Leijen, who knows about this stuff, about this proposal. He had been in direct contact with Alexis, which is great. He said: I believe the current proposal is pretty good, just somewhat worried about
I think that the current version of the proposal fixes both (1) and (2) by having explicitly prompts and statically typed primitives. Alexis, is that right? For (3) I guess the answer is "no -- maybe future work". I'm not sure about (4) @lexi-lambda any comments. I apologise for not transcribing this earlier. Not sure how that happened. Anyway, it's another well-informed thumbs-up |
@lexi-lambda Hey Alexis. I watched your relatively recent stream on Twich regarding problems with scoped effects. I believe this should be written somewhere. Right now the proposal is accepted and is seemingly in the "waiting to be implemented" state. However, the problems you raised on the stream means that it isn't quite so. Maybe if you document these problems in this proposal it'd spark some new discussion or at least help other people working in the same field in future? |
Yes, I should probably clarify my perspective on the current state of things, given it’s now been almost a year since this proposal was accepted, but I haven’t really publicly said much about it since then. (Oops.) Here are the salient points:
I’ve been wanting to find the energy to properly articulate my thoughts on that last point in writing for some time now, but it’s an extremely thorny and subtle set of issues, so I have unfortunately struggled to do so. But the relevant summary is that I think the bar for what constitutes an acceptable effect system is unusually high, since it’s such a foundational technology that requires pretty significant buy-in to use in a project, and I am completely unsatisfied with the way all existing approaches I am aware of handle scoping operations, such as But all of that is entirely about So, with that in mind, I don’t think I feel any need to retract this proposal, nor do I feel like implementing it would necessarily be a bad idea. But my personal motivation for working on it was, obviously, |
@lexi-lambda As far as I understand this proposal has now been implemented and merged? |
The proposal has been accepted; the following discussion is mostly of historic interest.
This is a proposal for adding primops for capturing slices of the RTS stack to improve the performance of algebraic effect systems, which require delimited continuations.
Rendered