Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delimited continuation primops #313

Merged

Conversation

lexi-lambda
Copy link
Contributor

@lexi-lambda lexi-lambda commented Feb 22, 2020

The proposal has been accepted; the following discussion is mostly of historic interest.


This is a proposal for adding primops for capturing slices of the RTS stack to improve the performance of algebraic effect systems, which require delimited continuations.

Rendered

@cartazio
Copy link

Could you add a link to the source code for the benchmarks?

@cartazio
Copy link

I would also like to add that having the patch set and the code that uses the proposal concretely available at the same time is fantastic

@cgibbard
Copy link
Contributor

This looks really nice! As Carter mentions, it is very nice to see the implementation at the same time as the proposal, especially along with some decent evidence that the thing does what it intends to do. :)

I'd also like to see the link to the benchmarks, but yeah, this looks like it has practically no downsides and gets a fair amount of bang for its buck. It might be nice to hear from some other people involved in the development of the RTS, but the changes there don't look onerous in any way.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Feb 22, 2020

Sorry if this is in one of the papers, but why are we (doing the explicit unboxing and real word equivalent of)

control0 :: ((IO b -> IO a) -> IO a) -> IO b

rather than

control0 :: (forall a. (IO b -> IO a) -> IO a) -> IO b

?

Oh I think I see. It has to match the prompt elsewhere / calling the continuation is optional

@Ericson2314
Copy link
Contributor

so the behavior of prompt# and control0# is only well-defined in strict State# threads

If I unsafePerformMyPrimMonad my prompt# and control0# calls separately, and there is a prompt# stack frame on the stack, does it work?

@treeowl
Copy link
Contributor

treeowl commented Feb 22, 2020

Some of this is a bit over my head, but I'm wondering if it could be used to improve the performance of the default thread scheduler of monad-par.

@lexi-lambda
Copy link
Contributor Author

Could you add a link to the source code for the benchmarks?

The benchmark code is available here: https://github.com/ocharles/effect-zoo/tree/eff-and-polysemy

Oh I think I see. It has to match the prompt elsewhere / calling the continuation is optional

Yes, that’s right.


If I unsafePerformMyPrimMonad my prompt# and control0# calls separately, and there is a prompt# stack frame on the stack, does it work?

If you do that, the behavior is undefined. This is the relevant sentence from the proposal:

the behavior of prompt# and control0# is only well-defined in strict State# threads (where each call to control0# has a corresponding call to prompt# in the same state thread)

Here, “the same state thread” means “a state thread originating from the same call to runRW#.” Another sentence a little later clarifies the reason for this restriction:

The presence of a thunk update frame in the captured continuation implies the nearest prompt# frame was pushed by a different state thread than the one that called control0#, so the captured continuation would be unpredictable.

What exactly does that mean? Well, consider that control0# captures the current evaluation context, which in the implementation is represented by a series of stack frames. In general, Haskell’s evaluation order is not well-defined, so it is difficult to predict what the evaluation context will be when an arbitrary expression is evaluated. Suppose, for example, a program were to do this:

prompt $ do
  let x = unsafePerformIO $ control0 f
  g x

When will the control0 f expression be evaluated? It’s hard to know from this program alone. If g is strict, x will likely be evaluated eagerly, and the captured continuation will be something like \x -> g x. But if g is lazy, the evaluation will be deferred, and it may be difficult to predict when x will be evaluated. It’s possible the prompt# frame won’t even be on the stack anymore once that happens!

Another piece of complexity to this scenario is that x is not a function, but a thunk. Normally, when a thunk is evaluated, it is overwritten by its result. But what is the result of evaluating something like control0 f? The situation is sort of similar to this one:

let y = 1 + error "bang"

y never reduces to a value, it raises an exception, which unwinds the stack! But the exception is synchronous, so we can be confident it will always raise the exception, and the thunk can be overwritten with a closure that always raises ErrorCall "bang".

We can’t do that with control0, because the “result” of x depends on what value is passed to the continuation. But the continuation can be called multiple times with different values, and evaluating x in different contexts would yield different continuations! There is no way to assign any semantics here that makes even a little sense; the problems just compound further the closer you look.

So this proposal doesn’t attempt to make any guarantees about this situation. In fact, in the current implementation, doing this will leave a dangling blackhole, which means if another thread is blocking on the blackhole and the continuation is never invoked, the thread will be blocked forever. What’s more, if you invoke the continuation multiple times, you’ll end up trying to update the same thunk more than once, and I don’t even know what would happen in that case. Conclusion: don’t do this.

@lexi-lambda
Copy link
Contributor Author

lexi-lambda commented Feb 23, 2020

Some of this is a bit over my head, but I'm wondering if it could be used to improve the performance of the default thread scheduler of monad-par.

This is an interesting question. It’s challenging to say for certain. Something to consider about this proposal is that it’s really designed to optimize for situations where continuation capture is infrequent, which is usually true for effect systems. As the Motivation section stresses, the big win gained by using native delimited continuations is that programs that don’t need to use continuations don’t have to pay for them.

But monad-par does need to use continuations, and in that context, it isn’t necessarily clear that native continuations will be faster than CPS. CPS effectively amortizes the cost of continuation capture over the entire computation through a series of small closure allocations. In contrast, control0# pays for the continuation capture “just in time,” all in one go, with a single big heap allocation that copies the entire continuation into the heap.

An advantage of the CPS approach is sharing. The CPS approach is basically building a heap-allocated tree to represent the continuation. If a continuation is restored, then quickly captured again, the tree doesn’t have to be rebuilt on the second capture, since the closures have already been allocated. In comparison, restoring a control0# continuation copies the entire stack chunk back onto the current stack, so a subsequent continuation capture will have to re-copy the stack frames into the heap. CPS continuations are a persistent (Church-encoded) data structure, but control0# continuations are not.

There are several ways to ameliorate this cost:

  1. It would be possible to make control0# copy the continuation in chunks and lazily restore the stack on continuation application, so sharing could be recovered.

    I was originally going to do this in my implementation, but it turns out to be a little tricky due to the way return values are returned on the stack. If you want to split a stack into two chunks and lazily restore the bottom chunk, you need to ensure all the return values currently on the stack get copied into the right place when they’re returned to the bottom chunk. This is possible, but it seemed like more trouble than it was worth unless I had real-world benchmarks to prove its value.

  2. You can enable certain optimizations by providing one-shot continuations that cannot be restored multiple times, which monad-par would be able to use.

    The tricky thing about providing one-shot delimited continuations is that you don’t necessarily get much out of the restriction unless you frontload more of the cost during calls to prompt#. Doing that would completely negate the advantages of prompt#/control0# for effect systems, since the whole point of this proposal is that prompt# is essentially free. So something like monad-par probably wants a related but distinct set of operators with different performance tradeoffs.

This is unfortunately one of the tricky things about higher-order control. Different usage patterns benefit from radically different implementations, even if the user-facing interface is largely the same. This proposal tries to strike a compromise between expressiveness, simplicity, and performance: performance ought to be “good enough” for most use cases without overfitting to a particular problem while still providing a small set of operators that can express everything you might reasonably want to.

@tomjaguarpaw
Copy link
Contributor

tomjaguarpaw commented Feb 23, 2020

@Ericson2314: I wondered that too. I guess control0# is more general without a higher-rank argument. @lexi-lambda note that it is not type safe:

control0# is not type safe; it is the responsibility of the caller to ensure that the a type argument to control0# is representationally equivalent to the a type argument of the nearest enclosing prompt# call

In particular:

-- EDIT: This is not unsafeCoerce
unsafeCoerce :: IO a -> IO b
unsafeCoerce a = prompt (undefined (control0 (\k -> k a)))

EDIT: Actually my example isn't unsafeCoerce at all, because it relies on running undefined. In retrospect I think that @lexi-lambda meant that the types do not enforce prompt/control0 to be used in a matching way, rather than that they can be used to implement unsafeCoerce.

@tomjaguarpaw
Copy link
Contributor

I'm very much in favour of this proposal. It took me a while to understand what prompt (put a tag on the stack) and control0 (slice off the part of the stack from the prompt to the top) were for. Now that I understand, I can see that these primops are useful to have. I'm adding the graphical representation of the operational semantics that I worked through in case that helps anyone else.

If during our evaluation we are about to evaluate an expression of type a then on the top of the stack there must be a pointer to a function that expects to be called with an argument of type a. For want of better notation, I represent this type of thing by a->.

Suppose then we are evaluating a prompt expression with type a whose body contains a control0. The stack and evaluation context must look as follows.

...|a->|

prompt E[control0 e] :: a

prompt puts a marker onto the stack and proceeds to evaluate its body. Recall that the type of the body of the prompt is the same as the type of the prompt itself, in this case a.

...|a->|prompt|

E[control0 e] :: a

We evaluate E as normal until we hit control0. Call the type of control0 e b. That implies that during the evaluation of E we came across control0 in a place where we expected to start
evaluating an expression of type b. Therefore there must be a b-> on top of the stack. Our evaluation expected to evaluate the expression and then jump to the next part of the evaluation of E, passing it a b.

...|a->|prompt|...produced by E...|b->|

control0 e :: b

Instead, control 0 slices off the top of the stack all the way down to the prompt marker and puts it on the heap (in a value we'll refer to as k here). e can refer to k by means of a free variable of type b -> a. By the typing rules for prompt and control0, e[k] has type a. We proceed to evaluate e[k].

k = |...produced by E...|b->|

...|a->|

e[k]

During evaluation of e[k], k may be called with an argument of type b. When this happens, because k has type b -> a, there must be a a-> on top of our stack, that is, a pointer to the part of e that will be jumped to once k has produced its a.

k = |...produced by E...|b->|

...|a->|...current e stack...|a->|

k b

When k is called, the saved portion of stack is grafted on top of the current stack. The saved portion of stack was expecting to continue with a b so there is a b-> on top, exactly matching the type of the argument to k.

...|a->|...current e stack...|a->|...produced by E...|b->|

b

The evaluation that was originally intended to happen part way through the evaluation of E happens now. What it finishes it returns a value of type a to the evaluation of e, which can do whatever it wants with it. The evaluation of e proceeds and may call k again
as many times as it likes (or none). Eventually the evaluation of e should produce an a which will be used as if it had been the result of the original prompt (E (control0 e)) call.

@cartazio
Copy link

cartazio commented Feb 23, 2020

Hey, so i added a specialize annotation to the MTL examples, and then they now compete with or beat the reference/no stack code. which means MTL with specialize pragmas also beats this example, at least in the presence of code specialization, which is one of the two main ways ghc winds up optimizing?

heres the log scale timings figure
Screen Shot 2020-02-23 at 11 48 18 AM

Edit: log scale is centered at 1, but as per usual smaller numbers are faster / better

@cartazio
Copy link

(note that this doesn't include all the benchmarks and timings, but i think lexi's proposal gets algebraic effects )

hrmm, @lexi-lambda whats the perf figures for your approach at n==1000...0 ? I'd love to have some clarity on the asymptotics which we cant really infer

her figures

@cartazio
Copy link

For these sorts of benchmark figures, log scale can make a huge difference in clarity. It’s been a long standing misfeature of criterion thst it doesn’t do log scale by default. That should be fixed at some point.

@cartazio
Copy link

And if I’m reading those figure correctly, eff with delimited continuations is at least ~10x slower as n gets larger vs mtl with specialize or reference program. At least in this micro benchmark with two points of measurements.

To be clear, it does seem to be orders of magnitude less slow than other extant algebraic effect stack Libs in this example per your figure

@cartazio
Copy link

It would also be helpful to include the figures for the other comparisons. I can help you tweak the R scripts in ollies repo if you need it

@cartazio
Copy link

I’ll see about doing the full suite with lexis patches on my larger computer later today and generate the full measurement set

@cartazio
Copy link

And let me emphasize : I genuinely value and appreciate that a proposal patch set and example programs are part of the proposal. It makes it much easier to understand and evaluate for anyone who wants to help the ghc proposal process.

@lexi-lambda
Copy link
Contributor Author

@cartazio The lack of INLINABLE or SPECIALIZE pragmas on the mtl example is 100% intentional. In fact, that’s one of the reasons all the examples include the polymorphic code defined in a separate module from where it is used.

Almost all existing benchmarks for effect systems are highly misleading. They are written in such a way that GHC either inlines or specializes everything, but that is extremely unlikely in real programs. When people use mtl as an effect system, they write entire programs against a polymorphic monad and only instantiate to a concrete monad transformer stack at the very top of their program.

In those situations, GHC will not specialize the code. The only practical way to get GHC to specialize it at that point is to write INLINABLE pragmas on everything (or use -fexpose-all-unfoldings and -fspecialise-aggressively), and in the tests I did on GitHub’s semantic library, forcing whole-program specialization used so much memory that I could not get compilation to terminate.

The benchmarks I’ve provided are intentionally structured to provide numbers that are more representative of effect system performance when used in real programs. In those situations, some dispatch is inevitable, and whole-program specialization is unrealistic. If you accept those constraints, the important thing to optimize for is dispatch cost, which eff does.

@cartazio
Copy link

Absolutely true points. Just trying to understand how they fare vs traditional techniques. I do personally think using mtl as an algebraic effects style library is problematic because of the general noncommutativity of monad transformers. Comparing libraries should ideally be in terms of respective norms and practices etc etc. (and there’s definitely room for more humane tool support for tactical/strategic specialization based optimization in ghc)

Your figures definitely communicate the constant factors hands down favor your work over the alternative algebraic effect Libs. Which benchmarks would help us understand the relative asymptotics?

I realize, of course, this is mostly me trying to understand the applications of your ghc proposal rather than its core.

One question I have, and maybe you can help me understand this best: what invariants or assumptions does this changeset out on the ghc rts that other operations or machinery don’t already impose?

@lexi-lambda
Copy link
Contributor Author

lexi-lambda commented Feb 24, 2020

Absolutely true points. Just trying to understand how they fare vs traditional techniques. I do personally think using mtl as an algebraic effects style library is problematic because of the general noncommutativity of monad transformers. Comparing libraries should ideally be in terms of respective norms and practices etc etc.

Yes, understood and generally agreed. I think, unfortunately, the ship has sailed on noncommutativity of effect handlers within Haskell effect systems, as both fused-effects and polysemy implement noncommutative handler behavior with the same semantics as mtl. eff is unique in this regard, as it implements commutative effect handlers, but the details of that are orthogonal to this proposal.

Your figures definitely communicate the constant factors hands down favor your work over the alternative algebraic effect Libs. Which benchmarks would help us understand the relative asymptotics?

It depends on what you mean by “asymptotics.” One benchmark that may answer your question is the “BigStack” benchmark included in the same benchmark suite, which tests the performance of programs with a large stack of effect handlers. Here are the results:

Screen Shot 2020-02-24 at 07 34 05

As you can see, approaches based on free monad-like constructions (such as freer-simple and polysemy) tend to get asymptotically worse the deeper the stack grows, while mtl, fused-effects, and eff have a constant dispatch cost. Zooming in a bit to focus on those libraries yields the following picture:

Screen Shot 2020-02-24 at 07 35 54

Here you can see that the constant factor of pushing a handler is a little higher for eff than mtl, but not by an enormous amount. (I’m not worried about that small increase here, since this is also a very synthetic microbenchmark, and really what matters is the asymptotics.)

More generally, I think we lack good, real-world benchmarks for effect systems. All of the benchmarks we currently have are synthetic microbenchmarks. The microbenchmarks are not useless (though as I stated above, you have to be very careful to make sure you’re measuring what you think you are), but they don’t tell the full story. It would be wonderful to have some larger, less synthetic benchmark programs, but building those is a lot of work, and I don’t think it’s something I can realistically do as part of this proposal.

One question I have, and maybe you can help me understand this best: what invariants or assumptions does this changeset out on the ghc rts that other operations or machinery don’t already impose?

Good question. As mentioned in the Effects and Interactions section of this proposal, the added assumptions are actually very small because GHC already implements something similar for asynchronous interrupt handling. For more information about how that works, take a look at An Implementation of Resumable Black-Holes by Alastair Reid (which SPJ pointed me to when I first started looking into this).

@emilypi
Copy link

emilypi commented Feb 24, 2020

Almost all existing benchmarks for effect systems are highly misleading. They are written in such a way that GHC either inlines or specializes everything, but that is extremely unlikely in real programs. When people use mtl as an effect system, they write entire programs against a polymorphic monad and only instantiate to a concrete monad transformer stack at the very top of their program.

FWIW I and the company I work for, as well as many other companies whose code I've had the opportunity to look at make use of concrete MTL stacks, or with SPECIALIZE pragmas for strategic parts of the program. In fact, this was blogged about recently as part of a "Optimizing Haskell in Production" series of posts by Joachim Breitner here to that effect. I've seen some people make use of MTL as an effect system, but not often in my area of the hask-o-sphere precisely because it doesn't win you much for the performance drawback. Classy Optics have been the preferred method of extending capabilities-like behavior.

That said, I think this proposal is pretty killer. It's not every day you see a really well-thought out GHC proposal come with an implementation! And regardless of the effects debate (which I assume will rage on long after), this proposal should go in because of the clear benefits to the way GHC handles continuations in general, not just because of the potential benefits to the folks building new effects systems.

I think, unfortunately, the ship has sailed on noncommutativity of effect handlers within Haskell effect systems, as both fused-effects and polysemy implement noncommutative handler behavior with the same semantics as mtl

This is an unfortunate side effect of the non-commutativity of Monads in general. Which, by all means, please solve this for us 😄

@lexi-lambda
Copy link
Contributor Author

lexi-lambda commented Feb 24, 2020

FWIW I and the company I work for, as well as many other companies whose code I've had the opportunity to look at make use of concrete MTL stacks, or with SPECIALIZE pragmas for strategic parts of the program.

Yes, I should definitely make this clear: concrete transformer stacks (whether you’re using mtl to work with them or not) generally do better than eff, especially if continuations are not used at all. One downside to eff is that (like the effect libraries based on free-like constructions) it doesn’t really cooperate with the GHC specializer, so programming against concrete handlers doesn’t get you any performance benefit.

I think that is not a fundamental problem, and it may very well be possible to implement a GHC plugin to add an effect specialization pass to the simplifier. However, I’m not worried about doing that in the short term for the following reasons:

  • As I mentioned in the proposal, the specialized Reference implementation does unrealistically better than the effect system implementations because this microbenchmark is doing arithmetic on integers in a tight loop. Specialization allows both unboxing and the complete elimination of function call overhead.

    Real programs do get a performance advantage by eliminating the dispatch overhead through specialization, but it’s usually much less significant than this microbenchmark would imply. For many programs, the bottlenecks are elsewhere.

  • Although the constant factors are higher for eff, the asymptotics should generally not get any worse than for the equivalent specialized mtl programs. Since the constant factors of your effect system are usually not the bottleneck for your performance, most programs (but not necessarily all) will be okay with this.

  • I think the eff interface is dramatically better than the interface of all other effect systems available to Haskell programmers today, including mtl. For that reason, I think the “good enough” performance of eff is especially compelling for the majority of programs where the performance bottlenecks are not in effect dispatch.

@lexi-lambda
Copy link
Contributor Author

To add to @tomjaguarpaw’s comment about understanding delimited continuations, I happened to have a conversation with someone on the FP slack about a week ago, which some people requested I turn into a gist. It discusses continuations from a Haskell perspective, and it also includes a little information on reduction semantics, which I think is a useful framework for understanding delimited continuations. The conversation is available here, and people who want to understand this proposal better might find it helpful.

Also, I want to address this comment directly:

Actually my example isn't unsafeCoerce at all, because it relies on running undefined.

You can absolutely write unsafeCoerce with prompt# and control0#, so yes, it is that unsafe. Here’s the simplest way to do it:

unsafeCoerce :: a -> b
unsafeCoerce a = case runRW# (prompt# (control0# $ \_ s -> (# s, a #))) of (# _, b #) -> b

To understand this using reduction semantics, recall the reduction rule given in the proposal:

prompt# E[control0# e] s1e (\m s2 -> E[m s2]) s1
where E contains no intervening prompt#

Here we take E = , the empty evaluation context, and e = \_ s -> (# s, a #). Substituting these into the reduction rule, we have:

prompt# (control0# (\_ s -> (# s, a #))) s1(\_ s -> (# s, a #)) (\m s2 -> m s2) s1

We can reduce further using the beta rule to get

(\_ s -> (# s, a #)) (\m s2 -> m s2) s1(# s1, a #)

which in the original program appears in a larger evaluation context:

case runRW# (\s1 -> ) of (# _, b #) -> b

The definition of runRW# is just runRW# f = f realWorld#, so runRW# (\s1 -> (# s1, a #)) reduces to (# realWorld#, a #), and we know that case (# realWorld#, a #) of (# _, b #) -> b trivially reduces to a. ∎

@Ericson2314
Copy link
Contributor

Ericson2314 commented Feb 24, 2020

@emilypi took the worlds out of my mouth :). I do not care at all about effect systems and always use -fexpose-all-unfoldings so a lack of inlining is not realistic for me. But that said, this is a really well-written proposal with clearly a lot of work behind it, and I think I value control operators for other reasons than effect systems, so this has my +1. Even if I didn't value control operators, I suppose this is fairly low impact and so I have no reason to mind letting this happen solely for the sake of those that do care about effect systems.


I do have some thoughts on the general issues this proposal touches. It's a bit tangential, but I suppose I might write it anyways lest it influence's others interest in the proposal. I'll try to someday expand on them more fully in a better, separate venue.

The design space for higher-order control operators is enormous. While many approaches are equivalently expressive, different choices of primitives can influence implementation decisions and performance characteristics.

This is unfortunately one of the tricky things about higher-order control. Different usage patterns benefit from radically different implementations, even if the user-facing interface is largely the same.

I think this is because this much of this research is "stabbing with axioms" rather than fully characterizing semantics of what they care about. In scheme, or untyped languages in which it's hard to make some sort of safe and ergonomic overloading system, I see why they might be easier to use than the "virtualized" lambda-based control. But in Haskell those virtualized methods work great: safe and expressive. Cost is the sole problem.

If cost is the sole problem, then we we can turn the problem around, looking not at the cost of control operators, but the safe composition of cheap stack operations. In other words, we desire a calculus which exposes that data structure, and has typing rules sufficient to encoding the invariants. For example, for tagged prompts the tag can also be the capability, and now we don't need the real world or any strictness. Rentrancy issues likewise beg for linearity.

What might this calculus look like? It turns out we have been dancing around similar things for a while. Witness https://www.microsoft.com/en-us/research/wp-content/uploads/2016/04/sequent-calculus-icfp16.pdf which became join points, or the discussion of polarity in https://ix.cs.uoregon.edu/~pdownen/publications/eta.pdf . The earlier still http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.631.1583&rep=rep1&type=pdf also relates---it is presented as a variation of A-normal form, but A-normal form in fact is very close (exactly?) equivalent to the image of the call-by-value to call-by-push-value translation, and call-by-push value is also intimately related to this larger line of polarity research as explained in the ICFP '19 paper.

In each case above the change has yet to happen or the change to core was too great. In each case in isolation, I understand, but collectively I am a bit apprehensive. What can we do to make exploring these radical alternatives more feasible? The benefits I dream of down this road are huge. Beyond explicit stack manipulations there's replacing STG and/or C-- with languages that preserve Haskell types (today you might say we preserve kinds), and "typed inline assembly" mixing code between adjacent intermediate representations while upholding the invariants the code each requires. Ultimately, I'd like to see RTS rewritten in the one of the "safe" and "low-level languages" this research yields, so we can have full verified compilation and linking.

So back to this proposal, what I really hope for is new RTS primitives like this one will "tip the scales" so GHC researchers go back and reexamine those paths yet taken.

@simonpj
Copy link
Contributor

simonpj commented Mar 4, 2020

I always find delimited continuations hard to grok. My role in the "Monadic framework for delimited continuations" paper, which you kindly cite, was very much the guinea pig saying "why does it have to be like that". So as yet I only grasp a fragment of what this proposal is about.

The Motivation section of the proposal is great -- but it appears to be 100% disconnected from the Proposed Changes. One thing that might help woudl be a side by side example: say here it is in MTL, and here is the same thing using the new primops. Minimum of abstractions, focus on how fewer closures are built. Showing how the new stuff addresses the difficulties shown in Motivation.

My main anxiety is about typing. Usually well typed programs can't go wrong. But now they can. Yikes. Is that really really really necessary? The above paper has prompts as first class values, and (SubCont a b) values to represent stack chunks. Can't you do something like that? I feel we'd need a very convincing story to explain why it's utterly impossible to make a statically typed version.

@arybczak
Copy link

arybczak commented Sep 3, 2020

What is the status of this proposal? It looks like (some) agreements were reached and then the activity died down ;)

@nomeata
Copy link
Contributor

nomeata commented Sep 3, 2020

If that is the case, then @lexi-lambda could officially submit it to the committee

@lexi-lambda
Copy link
Contributor Author

lexi-lambda commented Sep 12, 2020

Yes, I apologize for dropping this—alas, I’ve been mostly just working on this in my spare time, and my schedule and energy are rather wildly variable. Nevertheless, I agree that it seems like it’s in a good spot. I did want to respond to some of @aspiwack’s comments, but I never got to it; let me do that now.

So we could imagine that if we had some sort of very efficient one-shot continuation, it could be given linearly to the user. But a moveContinuation# :: Cont a b #-> Unrestricted (Cont a b) function could create a longer-lasting, reusable continuation.

Yes, I agree with this. Unfortunately, I do not see any obvious wins to be gained from one-shot continuations here, especially given that the proposal now includes prompts. One-shot undelimited continuations most definitely allow significant shortcuts, but I don’t know of anything similar for delimited ones, though I am not an expert on continuation implementation strategies and could very well be unaware of some known trick.

I think that, if we were to introduce one-shot continuations, an API like you describe would absolutely make sense, and indeed, it would be a significant benefit! But as it stands, I don’t think they quite fit into this proposal, so for now I think the current API is for the best.


With that out of the way, I think I will take @nomeata up on his suggestion and request that this proposal be reviewed by the committee. 🙂

@nomeata nomeata added the Pending shepherd recommendation The shepherd needs to evaluate the proposal and make a recommendataion label Sep 12, 2020
@nomeata
Copy link
Contributor

nomeata commented Sep 12, 2020

/remind @simonmar to check progress in 2 weeks

@reminders-prs
Copy link

reminders-prs bot commented Sep 12, 2020

@nomeata set a reminder for Sep 26th 2020

@simonpj
Copy link
Contributor

simonpj commented Sep 15, 2020

And in spite of the above changes, the primops are still unsafe to use in IO due to the resource re-entrancy problem.

Alexis, I couldn't see a spot in the proposal where you discuss the resource re-entrancy problem. (I searched for "resource" and "re-entrancy".) Do you in fact discuss what safety means? Could you.

To be clear, provided the implementation is not ghastly (I actually expect it shares quite a bit with exception handling) I'm in warm support of this proposal.

@aspiwack
Copy link
Contributor

I think that, if we were to introduce one-shot continuations, an API like you describe would absolutely make sense, and indeed, it would be a significant benefit! But as it stands, I don’t think they quite fit into this proposal, so for now I think the current API is for the best.

To be clear: I didn't mean any of this a suggestion.

@lexi-lambda
Copy link
Contributor Author

Alexis, I couldn't see a spot in the proposal where you discuss the resource re-entrancy problem. (I searched for "resource" and "re-entrancy".) Do you in fact discuss what safety means? Could you.

@simonpj, the re-entrancy problem is discussed in the Effects and Interactions section, though it uses phrase “re-entrant” rather than “re-entrancy,” which is why your search failed! The discussion of that problem makes up the bulk of the section, but I’ll also reproduce it here for clarity:

A more important consideration is how control0# might interact with user programs. As alluded to above, control0# is fundamentally dangerous when used with arbitrary IO code, as existing programs do not account for the possibility of re-entrant IO computations. For example, consider code such as the following:

m = withFile path ReadMode $ \h -> hGetLine h >> control0 tag f >> hGetContents h

Internally, withFile uses bracket to manage the lifetime of a Handle, but it isn’t clear what the behavior ought to be if a captured continuation includes a withFile frame. Several possibilities come to mind:

  • The most lax approach is to do nothing when the continuation is captured and restored, but this may cause resource leaks or nonsensical behavior if a continuation is captured but never restored or restored multiple times.

  • Continuation capture unwinds the stack, so arguably the Handle should be closed upon the call to control0. This is reasonable, but it precludes legitimate use cases where a continuation may be captured and re-applied shortly thereafter.

  • A more satisfying approach would be to make bracket behave like Scheme’s dynamic-wind, where the “allocate” and “destroy” actions are executed upon every jump in or out of the delimited context. This might work better for some resources, like a mutex, but it doesn’t work for withFile, since any state of the Handle will be destroyed. (And even if it could somehow be recreated, the captured continuation already closes over the old Handle.)

Existing code just isn’t equipped to deal with continuations. Furthermore, adequately handling these situations in general is not a solved problem. Therefore, this proposal punts the problem to library authors, who may experiment with different solutions outside of GHC proper.

@simonpj
Copy link
Contributor

simonpj commented Sep 16, 2020

Ah yes. Suggestion:

  • Make this text into a sub-section (with a sub-section heading) of Proposed Change Spec

  • Ditto the asynch exceptions subsection of Interactions

  • In this text "There is no safe way to use these operations in arbitrary IO code, for reasons given below. " don't say "below"; instead say "in Section xxx".

You could say that IO and asynch exceptions properly belong with "interactions" but to me both are really part of understanding the specification.


Is it possible to say what make the use of these primitives safe wrt the above things? Eg if you use the continuation exactly once things are ok? Or what?

@reminders-prs reminders-prs bot removed the reminder label Sep 26, 2020
@reminders-prs
Copy link

reminders-prs bot commented Sep 26, 2020

👋 @simonmar, check progress

@cstrahan
Copy link

Is this awaiting @simonmar's action, or is this pending @lexi-lambda's response to @simonpj (or something else)?


I fear unintentionally spamming this list, but I find that the next step isn't obvious to me, so I figured there might be some value for all involved parties in raising the question.

While I'm here: I'm very excited about this work! 🙂

@simonpj
Copy link
Contributor

simonpj commented Nov 13, 2020

Not spam at all -- a useful nudge

@simonmar
Copy link

Question about this from the proposal:

it is illegal to capture a continuation that includes unsafePerformIO or unsafeInterleaveIO as a continuation frame.

How would that be enforced? There's no way that I'm aware of to detect an unsafePerformIO on the stack. In some cases there would be an update frame on the stack, but not necessarily.

@evincarofautumn
Copy link

it is illegal to capture a continuation that includes unsafePerformIO or unsafeInterleaveIO as a continuation frame.

How would that be enforced? There's no way that I'm aware of to detect an unsafePerformIO on the stack. In some cases there would be an update frame on the stack, but not necessarily.

Currently the implementation just checks for an update frame, right? My understanding is that this catches some uses of unsafePerformIO, but mainly attempts to use control0# in pure code, which isn’t well defined. Is there another way to verify that the matching prompt# actually refers to the same thread? Could prompt# save something so that control0# / captureContinuationAndAbort could assert this?

Alternatively, if any of these conditions are too difficult to detect, what about just leaving them declared undefined? The implementation could make its best effort to detect errors for debugging purposes, but these are low-level primitives after all and I don’t think it’s unreasonable not to mandate detecting every violation.

@simonpj
Copy link
Contributor

simonpj commented Dec 9, 2020

Back in July I asked Daan Leijen, who knows about this stuff, about this proposal. He had been in direct contact with Alexis, which is great. He said:

I believe the current proposal is pretty good, just somewhat worried about

  1. Unsafe primitives
  2. No direct support for multi-prompt
  3. Can we have mutable state associated with a Marker (= PromptTag)? That turns out to be important for many uses but it is not entirely clear how to do that nicely.
  4. A final concern is the use of IO in the proposed primitives, can we use some other Ctl monad, together with a runCtl :: Ctl a -> a ? Why is IO needed as such?

I think that the current version of the proposal fixes both (1) and (2) by having explicitly prompts and statically typed primitives. Alexis, is that right?

For (3) I guess the answer is "no -- maybe future work".

I'm not sure about (4)

@lexi-lambda any comments.

I apologise for not transcribing this earlier. Not sure how that happened.

Anyway, it's another well-informed thumbs-up

@nomeata nomeata added Pending committee review The committee needs to evaluate the proposal and make a decision and removed Pending shepherd recommendation The shepherd needs to evaluate the proposal and make a recommendataion labels Jan 13, 2021
@nomeata nomeata merged commit 33a534a into ghc-proposals:master Jan 18, 2021
@nomeata nomeata added Accepted The committee has decided to accept the proposal and removed Pending committee review The committee needs to evaluate the proposal and make a decision labels Jan 18, 2021
@hasufell hasufell mentioned this pull request Jul 16, 2021
@arrowd
Copy link

arrowd commented Nov 16, 2021

@lexi-lambda Hey Alexis. I watched your relatively recent stream on Twich regarding problems with scoped effects. I believe this should be written somewhere.

Right now the proposal is accepted and is seemingly in the "waiting to be implemented" state. However, the problems you raised on the stream means that it isn't quite so. Maybe if you document these problems in this proposal it'd spark some new discussion or at least help other people working in the same field in future?

@lexi-lambda
Copy link
Contributor Author

Yes, I should probably clarify my perspective on the current state of things, given it’s now been almost a year since this proposal was accepted, but I haven’t really publicly said much about it since then. (Oops.) Here are the salient points:

  • I believe everything in the proposal is still entirely reasonable, and I think there is no obstacle to finishing the implementation besides doing the programming.

  • I still strongly believe in all of the advantages of implementing an effect system using the primops I proposed, and it’s quite possible they have other useful applications as well.

  • In spite of the above two points, my feelings have grown more mixed about effect systems generally over the past year, and that has left me feeling uncertain about the right way to proceed. As @arrowd mentioned above, I discussed my thoughts at some length recently during a live stream, and for those interested, you can watch the recording of that conversation here.

I’ve been wanting to find the energy to properly articulate my thoughts on that last point in writing for some time now, but it’s an extremely thorny and subtle set of issues, so I have unfortunately struggled to do so. But the relevant summary is that I think the bar for what constitutes an acceptable effect system is unusually high, since it’s such a foundational technology that requires pretty significant buy-in to use in a project, and I am completely unsatisfied with the way all existing approaches I am aware of handle scoping operations, such as catch and listen. I used to believe that the approach I was taking in eff would make it possible to eventually develop a more palatable way of handling those types of operations, but I realized about a year ago that there is a somewhat fundamental contention that makes my idea unworkable, which leaves eff in a spot that I personally find unsatisfactory.

But all of that is entirely about eff, and this proposal is neither about nor directly coupled to eff. Nothing about eff as it currently exists is actually broken by any of what I just described, so all of the performance advantages afforded by this proposal are as legitimate as they’ve always been. Any effect system implementation—eff or otherwise—that provides access to operations that require manipulating the continuation could be accelerated using the primops from this proposal. The problems I have with eff are expressivity concerns that are completely orthogonal to whether or not it uses delimited continuations.

So, with that in mind, I don’t think I feel any need to retract this proposal, nor do I feel like implementing it would necessarily be a bad idea. But my personal motivation for working on it was, obviously, eff and effect systems, and my disillusionment and uncertainty about how to proceed in that direction has, perhaps understandably, left me significantly less motivated to personally push this the rest of the way to the finish line. However, if anyone else feels like they still value existing effect systems in spite of their flaws and simply wish they were faster (or has some other use case for delimited continuations altogether), then I’d be happy to help explain what is left to be done to finish the implementation so that someone else can pick things up where I left off.

@miclill
Copy link

miclill commented Oct 12, 2022

@lexi-lambda As far as I understand this proposal has now been implemented and merged?
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/7942

@nomeata nomeata added the Implemented The proposal has been implemented and has hit GHC master label Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted The committee has decided to accept the proposal Implemented The proposal has been implemented and has hit GHC master
Development

Successfully merging this pull request may close these issues.

None yet