Meaning-preserving parsing rules for SCC annotations #176

int-index · 2018-10-10T17:11:22Z

The proposal has been accepted; the following discussion is mostly of historic interest.

In today's Haskell adding an SCC annotation can change the semantics of an expression:

ghci> 1 / 2 / 2
0.25
ghci> 1 / {-# SCC ann #-} 2 / 2
1.0

This is a side-effect of the parsing rules for SCC annotations. We propose to change the parsing rules in a way that guarantees that adding an annotation does not affect the meaning of a program.

Rendered

nomeata · 2018-10-10T20:47:30Z

Well argued!

kirelagin · 2018-10-11T13:36:36Z

I am all for this change. The primary reason is that the syntax of annotations, I assume, intentionally mimics the syntax of comments by reusing {- and -} so it is very natural to expect that inserting an annotation will not affect the meaning of the expression. I actually learned that it was not the case from this proposal, and I have to say it was truly 💔.

Regarding assigning the meaning to an annotation, i.e. what its “scope” should be, I think the simplest to explain to someone behaviour would be to parse an annotation as an atom that is being “applied” to an expression by the means of function application, following all of its rules. I feel that this treatment would be very straightforward and uniform. So:

1 / {-# ANN #-} f x y ^ g z / 2  ≡  1 / ({-# ANN #-} f) x y ^ g z / 2

Nevertheless, if you imagine a parse tree with function applications represented as binary operators and annotations as “unary” nodes, it is safe to “pull” an annotation node up as long as it is in the left subtree (the important part here is that it is not allowed to change the structure of the tree in any other way). If one pulls it all the way up, the resulting behaviour will be similar to the current one (extending to the right as much as possible) with the difference that “as much as possible” here respects the restriction that the meaning of the expression should not change. This way:

1 / {-# ANN #-} f x y ^ g z / 2  ≡  1 / ({-# ANN #-} (f x y ^ g z)) / 2

One other option is to allow “pulling up” only through function application nodes, although, to be honest, I think this would be somewhat arbitrary. So that:

1 / {-# ANN #-} f x y ^ g z / 2  ≡  1 / ({-# ANN #-} (f x y)) ^ g z) / 2

My personal favourite is the first option, because, even though it will require sprinkling parens here and there, it is the most straightforward one and it allows me to quickly see what subexpression an annotation applies to without thinking too much about operator fixities and precedences.

int-index · 2018-10-11T17:18:15Z

The primary reason is that the syntax of annotations, I assume, intentionally mimics the syntax of comments by reusing {- and -}

@kirelagin That's a good observation, I added it to the proposal.

an annotation as an atom that is being “applied” to an expression by the means of function application, following all of its rules

If we follow all of its rules, it means that

f x               a b  ==  (f x) a b
f {-# SCC ann #-} a b  ==  (f {-# SCC ann #-}) a b

and that clearly won't fly.

Another complication is that if we treat the annotation as an atomic expression, unary (-) would turn into binary (-) when annotated:

x               -a  ==  (-) x a
{-# SCC ann #-} -a  ==  (-) {-# SCC ann #-} a

And since function application is left-associative, nested annotations would be applied to each other, but we'd rather have them right-associative and both apply to the subexpression:

{-# SCC ann #-} {-# GENERATED ... #-} a
  ==  ({-# SCC ann #-} {-# GENERATED ... #-}) a   -- left-associative
  ==  {-# SCC ann #-} ({-# GENERATED ... #-} a)   -- right-associative

it is safe to “pull” an annotation node up as long as it is in the left subtree

There's a technical complication here if we want to pull through operators, as we know what tree is the "left" subtree only after renaming, as it depends on operator fixity. Consider the following example:

x1 +++ {-# SCC ann } x2 +++ x3
  == x1 +++ ({-# SCC ann } x2) +++ x3   -- infixl +++
  == x1 +++ ({-# SCC ann } x2 +++ x3)   -- infixr +++

From the implementation standpoint, it would be preferable to resolve what annotations apply to in the parser, before fixity information is available.

AntC2 · 2018-10-12T01:29:15Z

For those of us who didn't know what SCC stood for before coming across this proposal ...

Yes @kirelagin makes a good point: pragmas look like comments, and we expect they shouldn't have an effect on syntax. But ...

GHC is moving away from global/per-module LANGUAGE settings and more to fine-tuned/surgical pragmas whose scope is both smaller than a module and wider than a module.

For example, per-instance {-# OVERLAPPING, OVERLAPPABLE, OVERLAPS, INCOHERENT #-} apply to how this instance interacts with other instances; the setting gets exported with the instance, and applies even in a module without any such pragmas.

There's proposals in the pipeline for per-class pragmas.

So my intuition with other pragmas is you put them just inside the start of a syntactic construct, and they scope over the whole body (up to the closing ; actual or implied by layout):

class {-# DECIDABLE #-} Num a => C a b c  | b -> c  where ... ;
instance {-# OVERLAPPABLE #-} D b c => C Int b c where ... ;

If you put a pragma anywhere else in those decls, it's rejected/ignored. And the pragma is not binding tight as-if applying a function. It skips over the context and means the whole C Int b c is OVERLAPPABLE.

Following that intuition, it seems to me the first here should be rejected, in favour of one of the others

x = 1 / {-# SCC ann #-} 2 / 2         -- scope ambiguous
x = {-# SCC ann #-} 1 / 2 / 2         -- scope everything up to the invisible ';'
x = 1 / ( {-# SCC ann #-} 2 / 2)      -- scope everything inside ( ... )
x = 1 / ( {-# SCC ann #-} 2) / 2      -- scope everything inside ( ... )

I think that trying to give pragmas associativity or fixity is just too hard (for human readers, even if @int-index can persuade yacc to turn somersaults).

So please let's not restrict this proposal narrowly to SCC pragmas. Let's state a general rule that applies for all pragmas, then see how much SCCs already comply.

AntC2 · 2018-10-12T02:09:24Z

heh heh. GHC seems to delight in being inconsistent with pragmas.

LANGUAGE comes just inside the start of the module, and scopes over the whole module [Good].
UNPACK comes just inside/after a data constructor, and scopes over the whole constructor [Good].
INLINE, SPECIALISE, COMPLETE, MINIMAL come after the decl. OTOH they explicitly state inside the {-# ... #-} what they apply to. So in general they don't scope over the whole of a decl, just (possibly disjoint) parts of it.
RULES can appear anywhere. But that makes sense: they apply to arbitrary combinations of functions/methods (again which they state inside the {-# ... #-}); there's no one in particular to attach them to or scope them over.

int-index · 2018-10-12T06:27:50Z

Following that intuition, it seems to me the first here should be rejected, in favour of one of the others

x = 1 / {-# SCC ann #-} 2 / 2         -- scope ambiguous
x = {-# SCC ann #-} 1 / 2 / 2         -- scope everything up to the invisible ';'
x = 1 / ( {-# SCC ann #-} 2 / 2)      -- scope everything inside ( ... )
x = 1 / ( {-# SCC ann #-} 2) / 2      -- scope everything inside ( ... )

@AntC2 I agree. To be clear, that is what is being proposed currently. My plan is to move the productions for parsing these annotations into the exp non-terminal, and it will have this exact effect.

AntC2 · 2018-10-13T12:41:36Z

@int-index: that is what is being proposed currently.

Good. I'm not sure that's clear from the page. "Disallow ... annotations in positions where they may affect the structure of an expression." seems to raise questions like: what structure? "affect" how?

I don't think there's much need to talk about "structure", "precedence", "placement". And

My plan is to move the productions ... non-terminal ...

Seems to be talking to yacc rather than yakking to humans. Can we say:

The SCC annotation must appear immediately at the start of an expression; its scope is the whole expression.
Specifically if the expression or sub-expression is in parens, the SCC must appear immediately after the (; its scope is up to the closing ).
An SCC in any other position would be ambiguous in scope; and is to be rejected.

simonmar · 2018-10-15T07:11:38Z

I like this proposal, provided we can explain clearly, succinctly, and unambiguously to users where they can put an SCC pragma. @int-index can you propose some text for the user guide?

It would also be helpful to see exactly what changes to the grammar would be required.

osa1 · 2018-10-15T07:36:22Z

This is a breaking change which should be mentioned at least in the implementation plan. I don't know what the policy for introducing breaking changes in annotations is, but I guess we'll have to first generate a warning and only after a few releases disallow annotations that change parsing of expressions.

One other thing that would also be helpful to define "structure preservation" more formally. I think it means that if I have a parse tree (e1 (e2) (e3)) these are structure-preserving annotations:

(SCC (e1 (e2) (e3)))
(e1 (SCC e2) (e3))
(e1 (e2) (SCC e3))

this is not a structure-preserving annotation:

(e1 (SCC (e2 e3)))

but I don't know how to define this formally so that for any addition of SCC we can check whether the structure is preserved. Maybe simply saying "replace SCC x with x and the parse trees should be the same" is enough.

Finally as @simonmar also asked, I'm wondering what changes in the grammar/parser will be required. Do we only make some changes in the application grammar/parser to disallow annotations before args or do we need more changes in other contexts?

int-index · 2018-10-15T11:35:21Z

I guess we'll have to first generate a warning and only after a few releases disallow annotations that change parsing of expressions.

@osa1 We don't have to do it – all programs that compile with the proposed change also compile without it, so the users can update their code and keep support for older compiler versions without -XCPP. This means we are in compliance with the 3 Release Policy that is generally used for breaking changes in GHC and core libraries.

One other thing that would also be helpful to define "structure preservation" more formally. <...> Maybe simply saying "replace SCC x with x and the parse trees should be the same" is enough.

The issue with this definition is that the parse trees are not the same. They differ in a specific way – x is wrapped in HsSCC (or HsCoreAnn for {-# CORE ... #-}, HsTickPragma for {-# GENERATED #-}).

When I tried to put it into words, I ended up with the phrasing currently used in the proposal:

Adding an annotation does not affect the structure or meaning of an expression in ways other than adding an annotation to a subexpression

where "structure" refers to the syntactic structure (i.e. the parse tree), and "meaning" is a redundant clarification. I think it's formal enough.

It would also be helpful to see exactly what changes to the grammar would be required.

@simonmar We take these productions:

-        | hpc_annot exp        {% ams (sLL $1 $> $ HsTickPragma noExt (snd $ fst $ fst $ unLoc $1)                                                                                 
-                                                                (snd $ fst $ unLoc $1) (snd $ unLoc $1) $2)                                                                        
-                                      (fst $ fst $ fst $ unLoc $1) }
-
-        | '{-# CORE' STRING '#-}' exp  {% ams (sLL $1 $> $ HsCoreAnn noExt (getCORE_PRAGs $1) (getStringLiteral $2) $4)                                                            
-                                              [mo $1,mj AnnVal $2
-                                              ,mc $3] }
-                                          -- hdaume: core annotation
-        | scc_annot exp        {% ams (sLL $1 $> $ HsSCC noExt (snd $ fst $ unLoc $1) (snd $ unLoc $1) $2)                                                                         
-                                      (fst $ fst $ unLoc $1) }

and move them to the exp non-terminal.

To achieve good error messages, we also add hpc_annot, scc_annot, and core_annot to other non-terminals (infixexp comes to mind), but they will halt parsing and report an error.

provided we can explain clearly, succinctly, and unambiguously to users where they can put an SCC pragma

It's tempting to write the rules similar to @AntC2's interpretation in #176 (comment):

An SCC annotation may appear at the start of an expression and span to its end, or after an opening parenthesis and span to the matching closing parenthesis.

However, it does not cover

operator sections, where the SCC pragma is disallowed because it's ambiguous whether it should be inside or outside SectionL
top-level TemplateHaskell splices, because SCC works differently on top-level
tuples, where the SCC pragma goes after an opening parenthesis but spans to the comma, not the matching closing parenthesis
lists (situation similar to tuples, but with brackets instead of parentheses)
arrow notation -< >- -<< >>- where no ambiguity arises: xxx -< {-# SCC ann #-} zzz

and that's probably incomplete.

I think the only true way to say what we want to say is to name two principles:

An SCC annotation has the lowest precedence of all syntactic constructs.
An SCC annotation may not appear in positions where its low precedence
a) would change the parse tree in ways other than annotating a subexpression,
b) could not be resolved during parsing without consulting the fixities of user-defined operators.

simonmar · 2018-10-18T19:21:10Z

It's tempting to write the rules similar to @AntC2's interpretation in #176 (comment):
* An `SCC` annotation may appear at the start of an expression and span to its end, or after an opening parenthesis and span to the matching closing parenthesis.

I think you meant "spans to its end", but that's not enough: e.g. this: f {-# SCC "x" #-} x y doesn't annotate x, it is disallowed.

Perhaps you want something like "An SCC annotation can occur anywhere that you could put id $ without changing the meaning of the program". Except that we want SCC to have lower precedence than $.

int-index · 2018-10-19T11:39:40Z

What about the following specification?

An SCC annotation has lower precedence than function application or any infix operator, and may not appear in positions where it would change the grouping of subexpressions.

Ericson2314 · 2018-10-19T14:07:05Z

All that reminds me of BlockArguments slightly. The "meta-rule" from the Haskell report on let lambda and if really is just another precedence rule.

@int-index your proposal in the previous comment should note that those three in fact do bind exactly as loosely as those annotations.

It should be noted somewhere that all of them are mutually right associative, even if that is implied by the meta rule (I'm not sure!).

int-index · 2018-10-19T14:19:34Z

@Ericson2314 We can probably allow SCC before a lambda or let, but I don't think we can allow it before do because the user may write x + {-# SCC ann #-} do { a; b } + z.

The rule I wrote takes this into account. Do you have an alternative rule that takes care of all corner cases?

Ericson2314 · 2018-10-20T18:11:34Z

So for reference the Haskell report says:

The grammar is ambiguous regarding the extent of lambda abstractions, let expressions, and conditionals. The ambiguity is resolved by the meta-rule that each of these constructs extends as far to the right as possible.

We can probably allow SCC before a lambda or let, but I don't think we can allow it before do...

Agreed, this relates to that meta rule not mentioning do, case, or others introducing layout syntax.

The rule I wrote takes this into account. Do you have an alternative rule that takes care of all corner cases?

I want to make clear that the following are improper parses:

\ x -> f {-# SCC ann #-} y ==> (\ x -> f) ({-# SCC ann #-} y)
{-# SCC ann #-} y \ x -> f ==> ({-# SCC ann #-} y) (\ x -> f)

I also view the "extends as far to the right as possible." language of the Haskell report as insufficiently formal. I like your phrasing better because I agree that "precedence" applies to production rules in general, not just infix operators. (I view preference as competition to be the root/parent parse tree node, the "looser" rule "wins".)

Given the precedence way of formalizing things, I think:

"extends as far to the right as possible." means:
- has right associativity
- precedence as loose or looser than any other construction
That interpretation and your rule together leave open the possibility that let, lambda, and if bind looser than {-# SCC ann #-} ....

I'm not quite, sure but it could be that that possibility would prefer the parse:

{-# SCC ann #-} y \ x -> f ==> ({-# SCC ann #-} y) (\ x -> f)

over

{-# SCC ann #-} y \ x -> f ==> {-# SCC ann #-} (y (\ x -> f))

because in the former the lambda node is closer to the root than the latter compared to the SCC node.

So given all that what I propose is

We interpret the Haskell report rule as

lambda abstractions, let expressions, and conditionals have as lower precedence than function application or any infix operator, and are right associative.

(Maybe someday the report language would be changed, but I'm not proposing that escalation now :).)
Your rule becomes

SCC annotations have the same precedence as lambda abstractions, let expressions, and conditionals, and also are right associative. Additionally, none may not appear in a position where it would change the grouping of subexpressions.

int-index · 2018-10-20T21:20:59Z

@Ericson2314 This sounds reasonable to me. In terms of grammar, it means that in addition to parsing SCC in exp we can also accept it before any production that already ends with exp, e.g.

  | '\\' apat apats '->' exp -- lambdas
  | 'let' binds 'in' exp     -- let-expressions

I will add your wording to the proposal.

Ericson2314 · 2018-10-21T17:58:36Z

Great, thanks!

int-index · 2019-02-03T14:33:39Z

@nomeata I'd like to submit this proposal to the committee.

nomeata · 2019-04-17T07:06:07Z

Sorry for the delay; this got accepted.

int-index · 2019-05-06T19:33:09Z

There's a prototype implementation at https://gitlab.haskell.org/ghc/ghc/merge_requests/860 now.

Initial proposal

fdb463b

int-index force-pushed the scc-parsing branch 3 times, most recently from 413d6e1 to af442f6 Compare October 10, 2018 17:45

PR number, formatting

d297d42

int-index force-pushed the scc-parsing branch from af442f6 to d297d42 Compare October 10, 2018 21:14

Compare to {- comments -}

d63dcec

More precise specification

ed05127

Remove extra 'not'

f298c9e

bravit added the Proposal label Nov 11, 2018

bravit removed the Proposal label Dec 3, 2018

nomeata assigned simonmar Feb 3, 2019

nomeata added the Pending committee review The committee needs to evaluate the proposal and make a decision label Feb 3, 2019

nomeata added Accepted The committee has decided to accept the proposal and removed Pending committee review The committee needs to evaluate the proposal and make a decision labels Apr 17, 2019

nomeata merged commit f298c9e into ghc-proposals:master Apr 17, 2019

nomeata added a commit that referenced this pull request Apr 17, 2019

46 for #176

f7e96a5

int-index deleted the scc-parsing branch April 17, 2019 09:59

monoidal mentioned this pull request Jul 27, 2020

Negative literals and meaning-preserving SCC implemented #350

Merged

nomeata added the Implemented The proposal has been implemented and has hit GHC master label Jul 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meaning-preserving parsing rules for SCC annotations #176

Meaning-preserving parsing rules for SCC annotations #176

int-index commented Oct 10, 2018 •

edited by nomeata

nomeata commented Oct 10, 2018

kirelagin commented Oct 11, 2018 •

edited

int-index commented Oct 11, 2018 •

edited

AntC2 commented Oct 12, 2018 •

edited

AntC2 commented Oct 12, 2018

int-index commented Oct 12, 2018

AntC2 commented Oct 13, 2018 •

edited

simonmar commented Oct 15, 2018

osa1 commented Oct 15, 2018

int-index commented Oct 15, 2018 •

edited

simonmar commented Oct 18, 2018

int-index commented Oct 19, 2018 •

edited

Ericson2314 commented Oct 19, 2018 •

edited

int-index commented Oct 19, 2018 •

edited

Ericson2314 commented Oct 20, 2018

int-index commented Oct 20, 2018

Ericson2314 commented Oct 21, 2018

int-index commented Feb 3, 2019

nomeata commented Apr 17, 2019

int-index commented May 6, 2019

Meaning-preserving parsing rules for SCC annotations #176

Meaning-preserving parsing rules for SCC annotations #176

Conversation

int-index commented Oct 10, 2018 • edited by nomeata

nomeata commented Oct 10, 2018

kirelagin commented Oct 11, 2018 • edited

int-index commented Oct 11, 2018 • edited

AntC2 commented Oct 12, 2018 • edited

AntC2 commented Oct 12, 2018

int-index commented Oct 12, 2018

AntC2 commented Oct 13, 2018 • edited

simonmar commented Oct 15, 2018

osa1 commented Oct 15, 2018

int-index commented Oct 15, 2018 • edited

simonmar commented Oct 18, 2018

int-index commented Oct 19, 2018 • edited

Ericson2314 commented Oct 19, 2018 • edited

int-index commented Oct 19, 2018 • edited

Ericson2314 commented Oct 20, 2018

int-index commented Oct 20, 2018

Ericson2314 commented Oct 21, 2018

int-index commented Feb 3, 2019

nomeata commented Apr 17, 2019

int-index commented May 6, 2019

int-index commented Oct 10, 2018 •

edited by nomeata

kirelagin commented Oct 11, 2018 •

edited

int-index commented Oct 11, 2018 •

edited

AntC2 commented Oct 12, 2018 •

edited

AntC2 commented Oct 13, 2018 •

edited

int-index commented Oct 15, 2018 •

edited

int-index commented Oct 19, 2018 •

edited

Ericson2314 commented Oct 19, 2018 •

edited

int-index commented Oct 19, 2018 •

edited