Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meaning-preserving parsing rules for SCC annotations #176

Merged
merged 5 commits into from Apr 17, 2019

Conversation

int-index
Copy link
Contributor

@int-index int-index commented Oct 10, 2018

The proposal has been accepted; the following discussion is mostly of historic interest.


In today's Haskell adding an SCC annotation can change the semantics of an expression:

ghci> 1 / 2 / 2
0.25
ghci> 1 / {-# SCC ann #-} 2 / 2
1.0

This is a side-effect of the parsing rules for SCC annotations. We propose to change the parsing rules in a way that guarantees that adding an annotation does not affect the meaning of a program.

Rendered

@int-index int-index force-pushed the scc-parsing branch 3 times, most recently from 413d6e1 to af442f6 Compare October 10, 2018 17:45
@nomeata
Copy link
Contributor

nomeata commented Oct 10, 2018

Well argued!

@kirelagin
Copy link

kirelagin commented Oct 11, 2018

I am all for this change. The primary reason is that the syntax of annotations, I assume, intentionally mimics the syntax of comments by reusing {- and -} so it is very natural to expect that inserting an annotation will not affect the meaning of the expression. I actually learned that it was not the case from this proposal, and I have to say it was truly 💔.

Regarding assigning the meaning to an annotation, i.e. what its “scope” should be, I think the simplest to explain to someone behaviour would be to parse an annotation as an atom that is being “applied” to an expression by the means of function application, following all of its rules. I feel that this treatment would be very straightforward and uniform. So:

1 / {-# ANN #-} f x y ^ g z / 2  ≡  1 / ({-# ANN #-} f) x y ^ g z / 2

Nevertheless, if you imagine a parse tree with function applications represented as binary operators and annotations as “unary” nodes, it is safe to “pull” an annotation node up as long as it is in the left subtree (the important part here is that it is not allowed to change the structure of the tree in any other way). If one pulls it all the way up, the resulting behaviour will be similar to the current one (extending to the right as much as possible) with the difference that “as much as possible” here respects the restriction that the meaning of the expression should not change. This way:

1 / {-# ANN #-} f x y ^ g z / 2  ≡  1 / ({-# ANN #-} (f x y ^ g z)) / 2

One other option is to allow “pulling up” only through function application nodes, although, to be honest, I think this would be somewhat arbitrary. So that:

1 / {-# ANN #-} f x y ^ g z / 2  ≡  1 / ({-# ANN #-} (f x y)) ^ g z) / 2

My personal favourite is the first option, because, even though it will require sprinkling parens here and there, it is the most straightforward one and it allows me to quickly see what subexpression an annotation applies to without thinking too much about operator fixities and precedences.

@int-index
Copy link
Contributor Author

int-index commented Oct 11, 2018

The primary reason is that the syntax of annotations, I assume, intentionally mimics the syntax of comments by reusing {- and -}

@kirelagin That's a good observation, I added it to the proposal.

an annotation as an atom that is being “applied” to an expression by the means of function application, following all of its rules

If we follow all of its rules, it means that

f x               a b  ==  (f x) a b
f {-# SCC ann #-} a b  ==  (f {-# SCC ann #-}) a b

and that clearly won't fly.

Another complication is that if we treat the annotation as an atomic expression, unary (-) would turn into binary (-) when annotated:

x               -a  ==  (-) x a
{-# SCC ann #-} -a  ==  (-) {-# SCC ann #-} a

And since function application is left-associative, nested annotations would be applied to each other, but we'd rather have them right-associative and both apply to the subexpression:

{-# SCC ann #-} {-# GENERATED ... #-} a
  ==  ({-# SCC ann #-} {-# GENERATED ... #-}) a   -- left-associative
  ==  {-# SCC ann #-} ({-# GENERATED ... #-} a)   -- right-associative

it is safe to “pull” an annotation node up as long as it is in the left subtree

There's a technical complication here if we want to pull through operators, as we know what tree is the "left" subtree only after renaming, as it depends on operator fixity. Consider the following example:

x1 +++ {-# SCC ann } x2 +++ x3
  == x1 +++ ({-# SCC ann } x2) +++ x3   -- infixl +++
  == x1 +++ ({-# SCC ann } x2 +++ x3)   -- infixr +++

From the implementation standpoint, it would be preferable to resolve what annotations apply to in the parser, before fixity information is available.

@AntC2
Copy link
Contributor

AntC2 commented Oct 12, 2018

For those of us who didn't know what SCC stood for before coming across this proposal ...

Yes @kirelagin makes a good point: pragmas look like comments, and we expect they shouldn't have an effect on syntax. But ...

GHC is moving away from global/per-module LANGUAGE settings and more to fine-tuned/surgical pragmas whose scope is both smaller than a module and wider than a module.

For example, per-instance {-# OVERLAPPING, OVERLAPPABLE, OVERLAPS, INCOHERENT #-} apply to how this instance interacts with other instances; the setting gets exported with the instance, and applies even in a module without any such pragmas.

There's proposals in the pipeline for per-class pragmas.

So my intuition with other pragmas is you put them just inside the start of a syntactic construct, and they scope over the whole body (up to the closing ; actual or implied by layout):

class {-# DECIDABLE #-} Num a => C a b c  | b -> c  where ... ;
instance {-# OVERLAPPABLE #-} D b c => C Int b c where ... ;

If you put a pragma anywhere else in those decls, it's rejected/ignored. And the pragma is not binding tight as-if applying a function. It skips over the context and means the whole C Int b c is OVERLAPPABLE.

Following that intuition, it seems to me the first here should be rejected, in favour of one of the others

x = 1 / {-# SCC ann #-} 2 / 2         -- scope ambiguous
x = {-# SCC ann #-} 1 / 2 / 2         -- scope everything up to the invisible ';'
x = 1 / ( {-# SCC ann #-} 2 / 2)      -- scope everything inside ( ... )
x = 1 / ( {-# SCC ann #-} 2) / 2      -- scope everything inside ( ... )

I think that trying to give pragmas associativity or fixity is just too hard (for human readers, even if @int-index can persuade yacc to turn somersaults).

So please let's not restrict this proposal narrowly to SCC pragmas. Let's state a general rule that applies for all pragmas, then see how much SCCs already comply.

@AntC2
Copy link
Contributor

AntC2 commented Oct 12, 2018

heh heh. GHC seems to delight in being inconsistent with pragmas.

  • LANGUAGE comes just inside the start of the module, and scopes over the whole module [Good].

  • UNPACK comes just inside/after a data constructor, and scopes over the whole constructor [Good].

  • INLINE, SPECIALISE, COMPLETE, MINIMAL come after the decl. OTOH they explicitly state inside the {-# ... #-} what they apply to. So in general they don't scope over the whole of a decl, just (possibly disjoint) parts of it.

  • RULES can appear anywhere. But that makes sense: they apply to arbitrary combinations of functions/methods (again which they state inside the {-# ... #-}); there's no one in particular to attach them to or scope them over.

@int-index
Copy link
Contributor Author

Following that intuition, it seems to me the first here should be rejected, in favour of one of the others

x = 1 / {-# SCC ann #-} 2 / 2         -- scope ambiguous
x = {-# SCC ann #-} 1 / 2 / 2         -- scope everything up to the invisible ';'
x = 1 / ( {-# SCC ann #-} 2 / 2)      -- scope everything inside ( ... )
x = 1 / ( {-# SCC ann #-} 2) / 2      -- scope everything inside ( ... )

@AntC2 I agree. To be clear, that is what is being proposed currently. My plan is to move the productions for parsing these annotations into the exp non-terminal, and it will have this exact effect.

@AntC2
Copy link
Contributor

AntC2 commented Oct 13, 2018

@int-index: that is what is being proposed currently.

Good. I'm not sure that's clear from the page. "Disallow ... annotations in positions where they may affect the structure of an expression." seems to raise questions like: what structure? "affect" how?

I don't think there's much need to talk about "structure", "precedence", "placement". And

My plan is to move the productions ... non-terminal ...

Seems to be talking to yacc rather than yakking to humans. Can we say:

  1. The SCC annotation must appear immediately at the start of an expression; its scope is the whole expression.
  2. Specifically if the expression or sub-expression is in parens, the SCC must appear immediately after the (; its scope is up to the closing ).
  3. An SCC in any other position would be ambiguous in scope; and is to be rejected.

@simonmar
Copy link

I like this proposal, provided we can explain clearly, succinctly, and unambiguously to users where they can put an SCC pragma. @int-index can you propose some text for the user guide?

It would also be helpful to see exactly what changes to the grammar would be required.

@osa1
Copy link
Contributor

osa1 commented Oct 15, 2018

This is a breaking change which should be mentioned at least in the implementation plan. I don't know what the policy for introducing breaking changes in annotations is, but I guess we'll have to first generate a warning and only after a few releases disallow annotations that change parsing of expressions.

One other thing that would also be helpful to define "structure preservation" more formally. I think it means that if I have a parse tree (e1 (e2) (e3)) these are structure-preserving annotations:

  • (SCC (e1 (e2) (e3)))
  • (e1 (SCC e2) (e3))
  • (e1 (e2) (SCC e3))

this is not a structure-preserving annotation:

  • (e1 (SCC (e2 e3)))

but I don't know how to define this formally so that for any addition of SCC we can check whether the structure is preserved. Maybe simply saying "replace SCC x with x and the parse trees should be the same" is enough.

Finally as @simonmar also asked, I'm wondering what changes in the grammar/parser will be required. Do we only make some changes in the application grammar/parser to disallow annotations before args or do we need more changes in other contexts?

@int-index
Copy link
Contributor Author

int-index commented Oct 15, 2018

I guess we'll have to first generate a warning and only after a few releases disallow annotations that change parsing of expressions.

@osa1 We don't have to do it – all programs that compile with the proposed change also compile without it, so the users can update their code and keep support for older compiler versions without -XCPP. This means we are in compliance with the 3 Release Policy that is generally used for breaking changes in GHC and core libraries.

One other thing that would also be helpful to define "structure preservation" more formally. <...> Maybe simply saying "replace SCC x with x and the parse trees should be the same" is enough.

The issue with this definition is that the parse trees are not the same. They differ in a specific way – x is wrapped in HsSCC (or HsCoreAnn for {-# CORE ... #-}, HsTickPragma for {-# GENERATED #-}).

When I tried to put it into words, I ended up with the phrasing currently used in the proposal:

  • Adding an annotation does not affect the structure or meaning of an expression in ways other than adding an annotation to a subexpression

where "structure" refers to the syntactic structure (i.e. the parse tree), and "meaning" is a redundant clarification. I think it's formal enough.

It would also be helpful to see exactly what changes to the grammar would be required.

@simonmar We take these productions:

-        | hpc_annot exp        {% ams (sLL $1 $> $ HsTickPragma noExt (snd $ fst $ fst $ unLoc $1)                                                                                 
-                                                                (snd $ fst $ unLoc $1) (snd $ unLoc $1) $2)                                                                        
-                                      (fst $ fst $ fst $ unLoc $1) }
-
-        | '{-# CORE' STRING '#-}' exp  {% ams (sLL $1 $> $ HsCoreAnn noExt (getCORE_PRAGs $1) (getStringLiteral $2) $4)                                                            
-                                              [mo $1,mj AnnVal $2
-                                              ,mc $3] }
-                                          -- hdaume: core annotation
-        | scc_annot exp        {% ams (sLL $1 $> $ HsSCC noExt (snd $ fst $ unLoc $1) (snd $ unLoc $1) $2)                                                                         
-                                      (fst $ fst $ unLoc $1) }

and move them to the exp non-terminal.

To achieve good error messages, we also add hpc_annot, scc_annot, and core_annot to other non-terminals (infixexp comes to mind), but they will halt parsing and report an error.

provided we can explain clearly, succinctly, and unambiguously to users where they can put an SCC pragma

It's tempting to write the rules similar to @AntC2's interpretation in #176 (comment):

  • An SCC annotation may appear at the start of an expression and span to its end, or after an opening parenthesis and span to the matching closing parenthesis.

However, it does not cover

  1. operator sections, where the SCC pragma is disallowed because it's ambiguous whether it should be inside or outside SectionL
  2. top-level TemplateHaskell splices, because SCC works differently on top-level
  3. tuples, where the SCC pragma goes after an opening parenthesis but spans to the comma, not the matching closing parenthesis
  4. lists (situation similar to tuples, but with brackets instead of parentheses)
  5. arrow notation -< >- -<< >>- where no ambiguity arises: xxx -< {-# SCC ann #-} zzz

and that's probably incomplete.

I think the only true way to say what we want to say is to name two principles:

  1. An SCC annotation has the lowest precedence of all syntactic constructs.
  2. An SCC annotation may not appear in positions where its low precedence
    a) would change the parse tree in ways other than annotating a subexpression,
    b) could not be resolved during parsing without consulting the fixities of user-defined operators.

@simonmar
Copy link

It's tempting to write the rules similar to @AntC2's interpretation in #176 (comment):

* An `SCC` annotation may appear at the start of an expression and span to its end, or after an opening parenthesis and span to the matching closing parenthesis.

I think you meant "spans to its end", but that's not enough: e.g. this: f {-# SCC "x" #-} x y doesn't annotate x, it is disallowed.

Perhaps you want something like "An SCC annotation can occur anywhere that you could put id $ without changing the meaning of the program". Except that we want SCC to have lower precedence than $.

@int-index
Copy link
Contributor Author

int-index commented Oct 19, 2018

What about the following specification?

  • An SCC annotation has lower precedence than function application or any infix operator, and may not appear in positions where it would change the grouping of subexpressions.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Oct 19, 2018

All that reminds me of BlockArguments slightly. The "meta-rule" from the Haskell report on let lambda and if really is just another precedence rule.

@int-index your proposal in the previous comment should note that those three in fact do bind exactly as loosely as those annotations.

It should be noted somewhere that all of them are mutually right associative, even if that is implied by the meta rule (I'm not sure!).

@int-index
Copy link
Contributor Author

int-index commented Oct 19, 2018

@Ericson2314 We can probably allow SCC before a lambda or let, but I don't think we can allow it before do because the user may write x + {-# SCC ann #-} do { a; b } + z.

The rule I wrote takes this into account. Do you have an alternative rule that takes care of all corner cases?

@Ericson2314
Copy link
Contributor

So for reference the Haskell report says:

The grammar is ambiguous regarding the extent of lambda abstractions, let expressions, and conditionals. The ambiguity is resolved by the meta-rule that each of these constructs extends as far to the right as possible.

We can probably allow SCC before a lambda or let, but I don't think we can allow it before do...

Agreed, this relates to that meta rule not mentioning do, case, or others introducing layout syntax.

The rule I wrote takes this into account. Do you have an alternative rule that takes care of all corner cases?

I want to make clear that the following are improper parses:

\ x -> f {-# SCC ann #-} y ==> (\ x -> f) ({-# SCC ann #-} y)
{-# SCC ann #-} y \ x -> f ==> ({-# SCC ann #-} y) (\ x -> f)

I also view the "extends as far to the right as possible." language of the Haskell report as insufficiently formal. I like your phrasing better because I agree that "precedence" applies to production rules in general, not just infix operators. (I view preference as competition to be the root/parent parse tree node, the "looser" rule "wins".)

Given the precedence way of formalizing things, I think:

  1. "extends as far to the right as possible." means:
    • has right associativity
    • precedence as loose or looser than any other construction
  2. That interpretation and your rule together leave open the possibility that let, lambda, and if bind looser than {-# SCC ann #-} ....

I'm not quite, sure but it could be that that possibility would prefer the parse:

{-# SCC ann #-} y \ x -> f ==> ({-# SCC ann #-} y) (\ x -> f)

over

{-# SCC ann #-} y \ x -> f ==> {-# SCC ann #-} (y (\ x -> f))

because in the former the lambda node is closer to the root than the latter compared to the SCC node.


So given all that what I propose is

  1. We interpret the Haskell report rule as

    lambda abstractions, let expressions, and conditionals have as lower precedence than function application or any infix operator, and are right associative.

    (Maybe someday the report language would be changed, but I'm not proposing that escalation now :).)

  2. Your rule becomes

    SCC annotations have the same precedence as lambda abstractions, let expressions, and conditionals, and also are right associative. Additionally, none may not appear in a position where it would change the grouping of subexpressions.

@int-index
Copy link
Contributor Author

@Ericson2314 This sounds reasonable to me. In terms of grammar, it means that in addition to parsing SCC in exp we can also accept it before any production that already ends with exp, e.g.

  | '\\' apat apats '->' exp -- lambdas
  | 'let' binds 'in' exp     -- let-expressions

I will add your wording to the proposal.

@Ericson2314
Copy link
Contributor

Great, thanks!

@bravit bravit removed the Proposal label Dec 3, 2018
@int-index
Copy link
Contributor Author

@nomeata I'd like to submit this proposal to the committee.

@nomeata nomeata added the Pending committee review The committee needs to evaluate the proposal and make a decision label Feb 3, 2019
@nomeata
Copy link
Contributor

nomeata commented Apr 17, 2019

Sorry for the delay; this got accepted.

@nomeata nomeata added Accepted The committee has decided to accept the proposal and removed Pending committee review The committee needs to evaluate the proposal and make a decision labels Apr 17, 2019
@nomeata nomeata merged commit f298c9e into ghc-proposals:master Apr 17, 2019
nomeata added a commit that referenced this pull request Apr 17, 2019
@int-index int-index deleted the scc-parsing branch April 17, 2019 09:59
@int-index
Copy link
Contributor Author

There's a prototype implementation at https://gitlab.haskell.org/ghc/ghc/merge_requests/860 now.

@nomeata nomeata added the Implemented The proposal has been implemented and has hit GHC master label Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted The committee has decided to accept the proposal Implemented The proposal has been implemented and has hit GHC master
Development

Successfully merging this pull request may close these issues.

None yet

8 participants