Whitespace-sensitive operator parsing (~, !, @, $, $$, -) #229

int-index · 2019-05-13T23:01:23Z

We propose to simplify GHC internals and the lexical syntax of Haskell by replacing ad-hoc parsing rules with whitespace-based disambiguation:

a ! b = <rhs>  -- Infix (!)
f !a = <rhs>   -- Bang pattern

Rendered

AntC2 · 2019-05-14T00:16:01Z

It's always fascinating to read these proposals, to discover there are extensions I've never heard of.

~ as a strictness annotation in type declarations

Is it? What extension is that? Oh, it's a laziness annotation when you've switched on Strict-by-default Datatypes.

I note that syntactically it behaves as a unary prefix, as it is for irrefutable patterns.

I don't really think of ~ in constraints as being an operator: it's special syntax; just as -> is special syntax. Does it make any sense to treat it as an operator: can it be partially-applied? can it be passed as an argument to a higher-order type function?

~ in the Report is a <reservedop>. Do we really want to make it available as an infix operator in terms? The only sensible use would be for something parallel to its role in Equality constraints. What would that be?

If ~ is to appear in terms, we might consider making it available as a unary prefix -- same syntax rules as prefix -/negate.

int-index · 2019-05-14T08:13:11Z

Does it make any sense to treat it as an operator: can it be partially-applied? can it be passed as an argument to a higher-order type function?

Yes, both of this is possible for (~). There's no reason to have it as special syntax, for example its cousin ~~ is a regular class: https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-Type-Equality.html#t:-126--126-

The only sensible use would be for something parallel to its role in Equality constraints. What would that be?

With Dependent Haskell, or even VDQ alone, any type could appear in terms, including equality constraints. Imagine I defined

data Dict c where
  MkDict :: forall c -> c => Dict c

Note the forall c -> syntax, this is VDQ which is only available in types at the moment but will come to terms.

Then one could write something like this:

string_is_list_of_char_proof = MkDict (String ~ [Char])

Now, (String ~ [Char]) is still a type, but we can't tell before typechecking, so the parser and the renamer will have to assume it's a term. That's why we will need to bring all type-level syntax to terms, eventually.

same syntax rules as prefix -/negate.

Frankly, I'm not a fan of these rules. I would even consider making them whitespace-sensitive in the same way this proposal treats ! and ~. Many a beginner have been bitten by this:

sqrt -x    -- oops, it's parsed as (sqrt - x)

goldfirere · 2019-05-14T20:42:27Z

GHC currently supports at least two whitespace-sensitive operators: -XTypeApplications' @ and -XTemplateHaskell's $. But, perversely, the rules for each are different.

$ decides what it means based on whether or not it is followed by either an identifier or an open-paren, with no whitespace. Examples:

x$y   -- $y is a splice
x$ y  -- $ is an infix operator
x $ y -- $ is an infix operator
x $y  -- $y is a splice
x $(y)  -- $(y) is a splice
x $[y]  -- $ is an infix operator
x ${- hi -}y  -- $ is an infix operator

@ decides what it means based on whether or not the preceding character can plausibly end an identifier. Examples:

x@y    -- as-pattern
(x)@y   -- type application
x@ y   -- as-pattern
x @y   -- type application
x{- hi -}@y  -- type application
x@{- hi -}y  -- as-pattern

This proposal suggests to add ! and ~ into the mix. This kind of disambiguation is really useful, and I generally support it. But let's bring more consistency to the whole affair. Can we unify the treatment of all of these operators?

Also, the proposal discusses whitespace. Should a comment be understood to be whitespace? Or is it ignored? (I prefer the former, but the proposal should specify.)

simonpj · 2019-05-14T21:17:16Z

This kind of disambiguation is really useful, and I generally support it.

I agree. White space is very significant to humans, and it wastes information bandwidth not to take advantage of it.

Richard has good points about $ and @.

AntC2 · 2019-05-15T02:03:17Z

White space is very significant to humans,

Heh, heh. So do we grasp the livewire of white space around .? I remember the uproar around Simon's 'The Power of the dot'. And Haskell's inconsistencies have now gotten more ossified with all the code using tight-binding . for nested lenses.

Prelude.length         -- module qualifier
double.length          -- function composition
Just .length           -- function composition
Just. length           -- function composition
Just . length          -- function composition
Just.length            -- module qualifier, but no such module `Just'

shape.centre.xcoord    -- function composition (a nested lens)
screenobj.shape.centre.xcoord   -- function composition, but probably ill-typed because screenobj is not a lens
screenobj.^shape.centre.xcoord  -- probably what was meant,
                                --  all the dots are function composition except .^

I always put space both sides of . as function composition, and I hate seeing it inline when it's not module qual. I plain don't get the Lensists claim that they're trying to look like OOP: the precise place where you'd go object.method or struct.component is what Lenses don't do.

goldfirere · 2019-05-15T03:21:02Z

Ah yes, I had forgotten poor old .. It, too, is whitespace-sensitive, and with yet different rules.

int-index · 2019-05-15T07:23:32Z

Should a comment be understood to be whitespace?

Yes, I think so, – added it to the proposal.

Richard has good points about $ and @.

Indeed. But there's a lot of existing whitespace-sensitive syntax to consider:

$ and $$: TH splices
@: as-patterns or type applications
.: module qualification
#: overloaded labels and magic hash
?: implicit parameters

Unified rules are possible, but they might cause too much breakage. I don't want to get too ambitious with the current proposal. Let's do one thing at a time.

int-index · 2019-05-15T07:35:43Z

For the sake of a future proposal, I think we should define the unified rules in the following terms:

a . b -- a loose infix occurrence
a.b   -- a tight infix occurrence
a .b  -- a prefix occurrence
a. b  -- a suffix occurrence

A loose infix occurrence should always be considered an operator. (This frees up @ as an operator). Other types of occurrences may have special per-operator meaning, or produce a warning when there isn't any special meaning assigned at the moment. This gives us a wealth of new syntax for future language extensions.

AntC2 · 2019-05-15T08:31:22Z

Thanks that's useful terminology. It doesn't fully explain what happens with .:

A.b      -- module qualifier, because upper-case A
a.b      -- function composition
A.B.c.d  -- both together
A.B.c.D  -- also both together

That's why I always write composition as "loose infix".

Bj0rnen · 2019-05-15T13:00:36Z

@AntC2 So I think the idea is that the last three would raise a warning. I could see two approaches for GHC to reach that conclusion:

. only has a special meaning in a tight infix occurrence when the preceding lexical term is a capitalized identifier. So the special "rule" doesn't trigger and there is a warning because it wasn't a loose infix occurrence.
The special rule does trigger in any case of a tight infix occurrence of .. But the rule itself says to do different things depending on the capitalization of the preceding identifier. So if that isn't capitalized, it's treated as an infix term-level operator, plus a warning is emitted.

It depends on the responsibility/power of these "rules" and what can be used as a criteria for whether a "rule" triggers. I don't know if this is a rabbit hole worth pursuing in this ticket. But I think that I generally like the idea of warning on/eventually freeing up those ambiguous-looking infix operator placements.

Although, it would mean that expressions like this would be discouraged/eventually even stop compiling:

i :: Integer
i = 1 + 2^3

So, you can't use whitespace to indicate tighter binding for readability. But you can still use parentheses for that.

goldfirere · 2019-05-15T16:28:22Z

Though I see its appeal, I'm against warnings for omitting whitespace for operators, in general. This will break gobs and gobs of code (knowing that many Haskellers consider a warning to be breakage), and it doesn't seem quite necessary.

One new realization here: this proposal means that GHC would reject (more) Haskell98 programs, given that x !y = x + y is valid Haskell98, but would be rejected under this proposal (presumably with advice to enable -XBangPatterns). Similarly, data T = MkT ! Int would be rejected, even though that, too, is Haskell98. This should definitely be listed as a drawback, along with the fact that existing programs may break.

int-index · 2019-05-15T17:19:06Z

This should definitely be listed as a drawback, along with the fact that existing programs may break.

Done, @goldfirere. In theory, we could implement -Wcompat warnings for this and wait for a couple of releases, but this sounds rather pointless given that the migration strategy is so simple and requires no conditional compilation.

AntC2 · 2019-05-16T06:04:55Z

BTW, operator (!) for factorial is the standard example with -XPostfixOperators.

For any syntax change you make, be careful it doesn't invalidate the example in the Users Guide. That shows (e !), which is a "loose infix" usage, I guess(?). Currently "tight infix/suffix" also works. The section parens are required, so I guess it's moot.

(!) :: (Ord a, Num a) => a -> a
(!) 0 = 1
(!) n | n > 0 = n * ((n - 1)!)

int-index · 2019-05-16T08:02:04Z

Note that in the current proposal, we treat ! as a bang pattern only if it's both preceded by whitespace and not followed by whitespace, in other words, a prefix occurrence. A suffix or a tight/loose infix occurrence is considered an operator.

-XPostfixOperators continue to work.

int-index · 2019-05-16T08:06:30Z

Ah, I see now, (e !) is indeed a case when this rule fires, and we would have to consider it an invalid bang pattern. I will try to think of a better rule.

int-index · 2019-05-16T09:21:48Z

I glanced over Lexer.x and Parser.y and formulated more precise rules. Now (e !) is an infix ! as intended.

jvanbruegge · 2019-05-17T08:50:07Z

I support this

Many a beginner have been bitten by this sqrt -x -- oops, it's parsed as (sqrt - x)

I wouldn't describe myself as beginner, but I've been bitten by minus parsing too

For the sake of a future proposal, I think we should define the unified rules in the following terms: loose/tight infix, prefix/suffix occurence

THIS. So much THIS. I always put spaces before and after infix operators and rules/warnings would surely improve the situation here. It would also make a future record access syntax foo.label possible.

Although, it would mean that expressions like this would be discouraged/eventually even stop compiling: i = 1 + 2^3

One possibility would be to add a pragma, that basically hints that this operator is intended to also be used as a tight infix

goldfirere · 2019-05-17T17:49:53Z

One possibility would be to add a pragma, that basically hints that this operator is intended to also be used as a tight infix

But what operators should get that pragma? Just about every operator binds tighter than something.

Separately, I personally would have a hard time supporting this proposal unless it addresses at least $ and @ as well, with unified rules for all of them. (Or reasons that unified rules cannot work.) I would be thrilled if we could include . in the mix, but that may be harder.

Ericson2314 · 2019-05-18T22:36:41Z

@goldfirere I feel like this proposal at least moves things in the direction of unified rules for all of them—we can just exclude programs like p@ (..) until the rules are all the same. I don't mind taking baby steps as long as the trajectory is good. Are you worried about the good sapping the momentum of the better?

goldfirere · 2019-05-18T23:49:56Z

I feel like this proposal at least moves things in the direction of unified rules for all of them

How so? There is a framework, but I see no suggestions for how this framework should apply to $ and @. Maybe it's obvious -- to be honest, I haven't given the details much thought. But even if this proposal does not actually implement the changes, seeing what they would be would be helpful.

Ericson2314 · 2019-05-19T00:30:40Z

That's fair. It's not obvious, and perhaps the proposal should spell it out the a chain of restrictions even if only one step is being taken to make it clear.

To quickly throw together a straw-man convergence point:

When an token parses is both user infix operator and something else (including multiple tokens), the user operator form must be loose infix. So that means only foo $ bar, foo . bar, foo ~ bar, and foo - bar. foo @ bar too if @ gets allowed as an infix operator.
When an token parses is both meta infix operator and something else that's not an infix operator (including multiple tokens), This rule is for .., since [Module..bar] parses as [(Module.(.) bar]. For sake of simple lexing and the parser not having to "re-lex", .. (the range meta operator) also must be loose infix. It also
When a token parses as a prefix operator, or prefix "meta operator" (special syntactic form not user code) it must be prefix (preceded by a space, and followed by the character class from this proposal ($idchar (a digit, a letter, or an underscore _), an opening bracket (, [, {, or a quotation mark ", '). That would mean foo !bar, foo ~bar, foo $templateBar, foo $(templateBar templateBaz), @typeBar or -numberBar.
When a token parses as a meta infix operator, and the same lexeme as a single token (not token string) parses as somthing else, it must tight infix, so only ModuleFoo.bar or bindFoo@patternBar. The "same lexeme, single token" business is so this doesn't conflict with .. per rule 2.

I would like to ban non-loose-infix user-level infix operator application across the board, but that is the breaking change people were expressing worry about, and more than necessary to resolve these ambiguities.

AntC2 · 2019-05-19T01:19:38Z

@Ericson2314 the operator form must be loose infix. So that means only ... foo . bar, ...

This I love. This I have always wanted for operator .. This will cause a riot just as soon as the Lensists find out about it.

A couple of your other examples aren't quite right:

moduleFoo.bar or patternFoo@bar

Module names and pattern names must begin upper case. Specifically for modules, that's how to tell a module prefix vs "tight infix" function composition.

Accepted by Richard.

Classify the brackets `(|`, `|)`, `⟦`, `⟧`, `⦇`, and `⦈` as opening/closing tokens under the extensions that enable them. See also: GHC Issue #18225 (https://gitlab.haskell.org/ghc/ghc/issues/18225) and Merge Request !3339 (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3339).

Add banana and Unicode brackets to propsal #229

Generalize #229 to include consym

See ghc-proposals/ghc-proposals#229 for the cause of breakage

Simplify parsing of (~) and (!)

7c19ef2

int-index force-pushed the whitespace-bang-patterns branch from 99e3d9d to 7c19ef2 Compare May 13, 2019 23:01

Add an example

09f1923

int-index added 2 commits May 15, 2019 09:57

Comments are whitespace

92b5177

s/occurence/occurrence/

c249b9d

int-index added 2 commits May 15, 2019 19:57

Add examples of rejected programs

68d5a64

Clarify costs and drawbacks

aca8bf4

More precise rules

6533901

nomeata added the Accepted The committee has decided to accept the proposal label Sep 5, 2019

int-index mentioned this pull request Oct 12, 2019

Visible forall in types of terms, and types in terms #281

Merged

int-index added a commit to int-index/ghc-proposals that referenced this pull request Nov 14, 2019

Tweaks to ghc-proposals#229

c69a821

int-index mentioned this pull request Nov 14, 2019

Tweaks to #229 #293

Merged

goldfirere added a commit that referenced this pull request Nov 28, 2019

Tweaks to #229

506f61f

Accepted by Richard.

int-index mentioned this pull request Dec 11, 2019

RecordDotSyntax language extension proposal #282

Merged

nomeata mentioned this pull request May 26, 2020

Add banana and Unicode brackets to propsal #229 #336

Merged

goldfirere added a commit that referenced this pull request May 27, 2020

Merge pull request #336 from ElderEphemera/prop-229-patch

511b519

Add banana and Unicode brackets to propsal #229

dougalm mentioned this pull request Jun 22, 2020

Various small syntax tweaks google-research/dex-lang#103

Merged

goldfirere changed the title ~~Simplify parsing of (~) and (!)~~ Whitespace-sensitive operator parsing (~, !, @, $, $$, -) Jun 24, 2020

int-index mentioned this pull request Jun 29, 2020

Improve NegativeLiterals #344

Merged

monoidal mentioned this pull request Jul 27, 2020

Enable BangPatterns by default #343

Closed

int-index mentioned this pull request Aug 25, 2020

Change syntax for linear arrows #356

Merged

phadej mentioned this pull request Oct 17, 2020

Export ~ from Data.Type.Equality #371

Merged

phadej mentioned this pull request Jan 14, 2021

WIP: Revive compact representation for NP/NS well-typed/generics-sop#129

Draft

int-index added a commit to serokell/ghc-proposals that referenced this pull request Feb 14, 2021

Generalize ghc-proposals#229 to include consym

e1b9ed9

nomeata mentioned this pull request Feb 14, 2021

Generalize #229 to include consym #404

Merged

goldfirere added a commit that referenced this pull request Feb 14, 2021

Merge pull request #404 from serokell/whitespace-consym

bf3cb18

Generalize #229 to include consym

phadej added a commit to phadej/HTTP that referenced this pull request Feb 21, 2021

Support GHC-9.0

b2c00c4

See ghc-proposals/ghc-proposals#229 for the cause of breakage

phadej mentioned this pull request Feb 21, 2021

Support GHC-9.0 haskell/HTTP#135

Closed

phadej added a commit to phadej/HTTP that referenced this pull request Feb 21, 2021

Support GHC-9.0

9b002a4

See ghc-proposals/ghc-proposals#229 for the cause of breakage

phadej added a commit to phadej/HTTP that referenced this pull request Feb 21, 2021

Support GHC-9.0

cf307c0

See ghc-proposals/ghc-proposals#229 for the cause of breakage

phadej added a commit to phadej/HTTP that referenced this pull request Mar 15, 2021

Support GHC-9.0

3db9a80

See ghc-proposals/ghc-proposals#229 for the cause of breakage

phadej mentioned this pull request Oct 28, 2021

Sized literals #451

Merged

phadej mentioned this pull request Jul 5, 2022

Meta-proposal: Require implementors before proposal submission (under review) #517

Closed

nomeata added the Implemented The proposal has been implemented and has hit GHC master label Jul 5, 2022

phadej mentioned this pull request Jan 25, 2023

Warn when (or forbid using) # is a suffix of an identifier and MagicHash is not enabled #574

Open

phadej mentioned this pull request Oct 10, 2023

-Wsevere – erroring warnings #571

Open

kindaro mentioned this pull request May 11, 2024

This package does not build with GHC 9.4.8. pxqr/base32-bytestring#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whitespace-sensitive operator parsing (~, !, @, $, $$, -) #229

Whitespace-sensitive operator parsing (~, !, @, $, $$, -) #229

int-index commented May 13, 2019 •

edited

AntC2 commented May 14, 2019

int-index commented May 14, 2019

goldfirere commented May 14, 2019

simonpj commented May 14, 2019

AntC2 commented May 15, 2019

goldfirere commented May 15, 2019

int-index commented May 15, 2019 •

edited

int-index commented May 15, 2019

AntC2 commented May 15, 2019

Bj0rnen commented May 15, 2019

goldfirere commented May 15, 2019

int-index commented May 15, 2019 •

edited

AntC2 commented May 16, 2019

int-index commented May 16, 2019

int-index commented May 16, 2019

int-index commented May 16, 2019

jvanbruegge commented May 17, 2019

goldfirere commented May 17, 2019

Ericson2314 commented May 18, 2019 •

edited

goldfirere commented May 18, 2019

Ericson2314 commented May 19, 2019 •

edited

AntC2 commented May 19, 2019

Whitespace-sensitive operator parsing (~, !, @, $, $$, -) #229

Whitespace-sensitive operator parsing (~, !, @, $, $$, -) #229

Conversation

int-index commented May 13, 2019 • edited

AntC2 commented May 14, 2019

int-index commented May 14, 2019

goldfirere commented May 14, 2019

simonpj commented May 14, 2019

AntC2 commented May 15, 2019

goldfirere commented May 15, 2019

int-index commented May 15, 2019 • edited

int-index commented May 15, 2019

AntC2 commented May 15, 2019

Bj0rnen commented May 15, 2019

goldfirere commented May 15, 2019

int-index commented May 15, 2019 • edited

AntC2 commented May 16, 2019

int-index commented May 16, 2019

int-index commented May 16, 2019

int-index commented May 16, 2019

jvanbruegge commented May 17, 2019

goldfirere commented May 17, 2019

Ericson2314 commented May 18, 2019 • edited

goldfirere commented May 18, 2019

Ericson2314 commented May 19, 2019 • edited

AntC2 commented May 19, 2019

int-index commented May 13, 2019 •

edited

int-index commented May 15, 2019 •

edited

int-index commented May 15, 2019 •

edited

Ericson2314 commented May 18, 2019 •

edited

Ericson2314 commented May 19, 2019 •

edited