Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace-sensitive operator parsing (~, !, @, $, $$, -) #229

Merged
merged 13 commits into from Aug 23, 2019

Conversation

int-index
Copy link
Contributor

@int-index int-index commented May 13, 2019

We propose to simplify GHC internals and the lexical syntax of Haskell by replacing ad-hoc parsing rules with whitespace-based disambiguation:

a ! b = <rhs>  -- Infix (!)
f !a = <rhs>   -- Bang pattern

Rendered

@AntC2
Copy link
Contributor

AntC2 commented May 14, 2019

It's always fascinating to read these proposals, to discover there are extensions I've never heard of.

  • ~ as a strictness annotation in type declarations

Is it? What extension is that? Oh, it's a laziness annotation when you've switched on Strict-by-default Datatypes.

I note that syntactically it behaves as a unary prefix, as it is for irrefutable patterns.

I don't really think of ~ in constraints as being an operator: it's special syntax; just as -> is special syntax. Does it make any sense to treat it as an operator: can it be partially-applied? can it be passed as an argument to a higher-order type function?

~ in the Report is a <reservedop>. Do we really want to make it available as an infix operator in terms? The only sensible use would be for something parallel to its role in Equality constraints. What would that be?

If ~ is to appear in terms, we might consider making it available as a unary prefix -- same syntax rules as prefix -/negate.

@int-index
Copy link
Contributor Author

Does it make any sense to treat it as an operator: can it be partially-applied? can it be passed as an argument to a higher-order type function?

Yes, both of this is possible for (~). There's no reason to have it as special syntax, for example its cousin ~~ is a regular class: https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-Type-Equality.html#t:-126--126-

The only sensible use would be for something parallel to its role in Equality constraints. What would that be?

With Dependent Haskell, or even VDQ alone, any type could appear in terms, including equality constraints. Imagine I defined

data Dict c where
  MkDict :: forall c -> c => Dict c

Note the forall c -> syntax, this is VDQ which is only available in types at the moment but will come to terms.

Then one could write something like this:

string_is_list_of_char_proof = MkDict (String ~ [Char])

Now, (String ~ [Char]) is still a type, but we can't tell before typechecking, so the parser and the renamer will have to assume it's a term. That's why we will need to bring all type-level syntax to terms, eventually.

same syntax rules as prefix -/negate.

Frankly, I'm not a fan of these rules. I would even consider making them whitespace-sensitive in the same way this proposal treats ! and ~. Many a beginner have been bitten by this:

sqrt -x    -- oops, it's parsed as (sqrt - x)

@goldfirere
Copy link
Contributor

GHC currently supports at least two whitespace-sensitive operators: -XTypeApplications' @ and -XTemplateHaskell's $. But, perversely, the rules for each are different.

$ decides what it means based on whether or not it is followed by either an identifier or an open-paren, with no whitespace. Examples:

x$y   -- $y is a splice
x$ y  -- $ is an infix operator
x $ y -- $ is an infix operator
x $y  -- $y is a splice
x $(y)  -- $(y) is a splice
x $[y]  -- $ is an infix operator
x ${- hi -}y  -- $ is an infix operator

@ decides what it means based on whether or not the preceding character can plausibly end an identifier. Examples:

x@y    -- as-pattern
(x)@y   -- type application
x@ y   -- as-pattern
x @y   -- type application
x{- hi -}@y  -- type application
x@{- hi -}y  -- as-pattern

This proposal suggests to add ! and ~ into the mix. This kind of disambiguation is really useful, and I generally support it. But let's bring more consistency to the whole affair. Can we unify the treatment of all of these operators?

Also, the proposal discusses whitespace. Should a comment be understood to be whitespace? Or is it ignored? (I prefer the former, but the proposal should specify.)

@simonpj
Copy link
Contributor

simonpj commented May 14, 2019

This kind of disambiguation is really useful, and I generally support it.

I agree. White space is very significant to humans, and it wastes information bandwidth not to take advantage of it.

Richard has good points about $ and @.

@AntC2
Copy link
Contributor

AntC2 commented May 15, 2019

White space is very significant to humans,

Heh, heh. So do we grasp the livewire of white space around .? I remember the uproar around Simon's 'The Power of the dot'. And Haskell's inconsistencies have now gotten more ossified with all the code using tight-binding . for nested lenses.

Prelude.length         -- module qualifier
double.length          -- function composition
Just .length           -- function composition
Just. length           -- function composition
Just . length          -- function composition
Just.length            -- module qualifier, but no such module `Just'

shape.centre.xcoord    -- function composition (a nested lens)
screenobj.shape.centre.xcoord   -- function composition, but probably ill-typed because screenobj is not a lens
screenobj.^shape.centre.xcoord  -- probably what was meant,
                                --  all the dots are function composition except .^ 

I always put space both sides of . as function composition, and I hate seeing it inline when it's not module qual. I plain don't get the Lensists claim that they're trying to look like OOP: the precise place where you'd go object.method or struct.component is what Lenses don't do.

@goldfirere
Copy link
Contributor

Ah yes, I had forgotten poor old .. It, too, is whitespace-sensitive, and with yet different rules.

@int-index
Copy link
Contributor Author

int-index commented May 15, 2019

Should a comment be understood to be whitespace?

Yes, I think so, – added it to the proposal.

Richard has good points about $ and @.

Indeed. But there's a lot of existing whitespace-sensitive syntax to consider:

  • $ and $$: TH splices
  • @: as-patterns or type applications
  • .: module qualification
  • #: overloaded labels and magic hash
  • ?: implicit parameters

Unified rules are possible, but they might cause too much breakage. I don't want to get too ambitious with the current proposal. Let's do one thing at a time.

@int-index
Copy link
Contributor Author

For the sake of a future proposal, I think we should define the unified rules in the following terms:

a . b -- a loose infix occurrence
a.b   -- a tight infix occurrence
a .b  -- a prefix occurrence
a. b  -- a suffix occurrence

A loose infix occurrence should always be considered an operator. (This frees up @ as an operator). Other types of occurrences may have special per-operator meaning, or produce a warning when there isn't any special meaning assigned at the moment. This gives us a wealth of new syntax for future language extensions.

@AntC2
Copy link
Contributor

AntC2 commented May 15, 2019

Thanks that's useful terminology. It doesn't fully explain what happens with .:

A.b      -- module qualifier, because upper-case A
a.b      -- function composition
A.B.c.d  -- both together
A.B.c.D  -- also both together

That's why I always write composition as "loose infix".

@Bj0rnen
Copy link

Bj0rnen commented May 15, 2019

@AntC2 So I think the idea is that the last three would raise a warning. I could see two approaches for GHC to reach that conclusion:

  1. . only has a special meaning in a tight infix occurrence when the preceding lexical term is a capitalized identifier. So the special "rule" doesn't trigger and there is a warning because it wasn't a loose infix occurrence.
  2. The special rule does trigger in any case of a tight infix occurrence of .. But the rule itself says to do different things depending on the capitalization of the preceding identifier. So if that isn't capitalized, it's treated as an infix term-level operator, plus a warning is emitted.

It depends on the responsibility/power of these "rules" and what can be used as a criteria for whether a "rule" triggers. I don't know if this is a rabbit hole worth pursuing in this ticket. But I think that I generally like the idea of warning on/eventually freeing up those ambiguous-looking infix operator placements.

Although, it would mean that expressions like this would be discouraged/eventually even stop compiling:

i :: Integer
i = 1 + 2^3

So, you can't use whitespace to indicate tighter binding for readability. But you can still use parentheses for that.

@goldfirere
Copy link
Contributor

Though I see its appeal, I'm against warnings for omitting whitespace for operators, in general. This will break gobs and gobs of code (knowing that many Haskellers consider a warning to be breakage), and it doesn't seem quite necessary.

One new realization here: this proposal means that GHC would reject (more) Haskell98 programs, given that x !y = x + y is valid Haskell98, but would be rejected under this proposal (presumably with advice to enable -XBangPatterns). Similarly, data T = MkT ! Int would be rejected, even though that, too, is Haskell98. This should definitely be listed as a drawback, along with the fact that existing programs may break.

@int-index
Copy link
Contributor Author

int-index commented May 15, 2019

This should definitely be listed as a drawback, along with the fact that existing programs may break.

Done, @goldfirere. In theory, we could implement -Wcompat warnings for this and wait for a couple of releases, but this sounds rather pointless given that the migration strategy is so simple and requires no conditional compilation.

@AntC2
Copy link
Contributor

AntC2 commented May 16, 2019

BTW, operator (!) for factorial is the standard example with -XPostfixOperators.

For any syntax change you make, be careful it doesn't invalidate the example in the Users Guide. That shows (e !), which is a "loose infix" usage, I guess(?). Currently "tight infix/suffix" also works. The section parens are required, so I guess it's moot.

(!) :: (Ord a, Num a) => a -> a
(!) 0 = 1
(!) n | n > 0 = n * ((n - 1)!)

@int-index
Copy link
Contributor Author

Note that in the current proposal, we treat ! as a bang pattern only if it's both preceded by whitespace and not followed by whitespace, in other words, a prefix occurrence. A suffix or a tight/loose infix occurrence is considered an operator.

-XPostfixOperators continue to work.

@int-index
Copy link
Contributor Author

Ah, I see now, (e !) is indeed a case when this rule fires, and we would have to consider it an invalid bang pattern. I will try to think of a better rule.

@int-index
Copy link
Contributor Author

I glanced over Lexer.x and Parser.y and formulated more precise rules. Now (e !) is an infix ! as intended.

@jvanbruegge
Copy link

I support this

Many a beginner have been bitten by this sqrt -x -- oops, it's parsed as (sqrt - x)

I wouldn't describe myself as beginner, but I've been bitten by minus parsing too

For the sake of a future proposal, I think we should define the unified rules in the following terms: loose/tight infix, prefix/suffix occurence

THIS. So much THIS. I always put spaces before and after infix operators and rules/warnings would surely improve the situation here. It would also make a future record access syntax foo.label possible.

Although, it would mean that expressions like this would be discouraged/eventually even stop compiling: i = 1 + 2^3

One possibility would be to add a pragma, that basically hints that this operator is intended to also be used as a tight infix

@goldfirere
Copy link
Contributor

One possibility would be to add a pragma, that basically hints that this operator is intended to also be used as a tight infix

But what operators should get that pragma? Just about every operator binds tighter than something.

Separately, I personally would have a hard time supporting this proposal unless it addresses at least $ and @ as well, with unified rules for all of them. (Or reasons that unified rules cannot work.) I would be thrilled if we could include . in the mix, but that may be harder.

@Ericson2314
Copy link
Contributor

Ericson2314 commented May 18, 2019

@goldfirere I feel like this proposal at least moves things in the direction of unified rules for all of them—we can just exclude programs like p@ (..) until the rules are all the same. I don't mind taking baby steps as long as the trajectory is good. Are you worried about the good sapping the momentum of the better?

@goldfirere
Copy link
Contributor

I feel like this proposal at least moves things in the direction of unified rules for all of them

How so? There is a framework, but I see no suggestions for how this framework should apply to $ and @. Maybe it's obvious -- to be honest, I haven't given the details much thought. But even if this proposal does not actually implement the changes, seeing what they would be would be helpful.

@Ericson2314
Copy link
Contributor

Ericson2314 commented May 19, 2019

That's fair. It's not obvious, and perhaps the proposal should spell it out the a chain of restrictions even if only one step is being taken to make it clear.

To quickly throw together a straw-man convergence point:

  1. When an token parses is both user infix operator and something else (including multiple tokens), the user operator form must be loose infix. So that means only foo $ bar, foo . bar, foo ~ bar, and foo - bar. foo @ bar too if @ gets allowed as an infix operator.

  2. When an token parses is both meta infix operator and something else that's not an infix operator (including multiple tokens), This rule is for .., since [Module..bar] parses as [(Module.(.) bar]. For sake of simple lexing and the parser not having to "re-lex", .. (the range meta operator) also must be loose infix. It also

  3. When a token parses as a prefix operator, or prefix "meta operator" (special syntactic form not user code) it must be prefix (preceded by a space, and followed by the character class from this proposal ($idchar (a digit, a letter, or an underscore _), an opening bracket (, [, {, or a quotation mark ", '). That would mean foo !bar, foo ~bar, foo $templateBar, foo $(templateBar templateBaz), @typeBar or -numberBar.

  4. When a token parses as a meta infix operator, and the same lexeme as a single token (not token string) parses as somthing else, it must tight infix, so only ModuleFoo.bar or bindFoo@patternBar. The "same lexeme, single token" business is so this doesn't conflict with .. per rule 2.

I would like to ban non-loose-infix user-level infix operator application across the board, but that is the breaking change people were expressing worry about, and more than necessary to resolve these ambiguities.

@AntC2
Copy link
Contributor

AntC2 commented May 19, 2019

@Ericson2314 the operator form must be loose infix. So that means only ... foo . bar, ...

This I love. This I have always wanted for operator .. This will cause a riot just as soon as the Lensists find out about it.

A couple of your other examples aren't quite right:

moduleFoo.bar or patternFoo@bar

Module names and pattern names must begin upper case. Specifically for modules, that's how to tell a module prefix vs "tight infix" function composition.

@nomeata nomeata added the Accepted The committee has decided to accept the proposal label Sep 5, 2019
int-index added a commit to int-index/ghc-proposals that referenced this pull request Nov 14, 2019
@int-index int-index mentioned this pull request Nov 14, 2019
goldfirere added a commit that referenced this pull request Nov 28, 2019
Accepted by Richard.
ElderEphemera added a commit to ElderEphemera/ghc-proposals that referenced this pull request May 26, 2020
Classify the brackets `(|`, `|)`, `⟦`, `⟧`, `⦇`, and `⦈` as opening/closing
tokens under the extensions that enable them.

See also: GHC Issue #18225 (https://gitlab.haskell.org/ghc/ghc/issues/18225) and
Merge Request !3339 (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3339).
goldfirere added a commit that referenced this pull request May 27, 2020
Add banana and Unicode brackets to propsal #229
@goldfirere goldfirere changed the title Simplify parsing of (~) and (!) Whitespace-sensitive operator parsing (~, !, @, $, $$, -) Jun 24, 2020
int-index added a commit to serokell/ghc-proposals that referenced this pull request Feb 14, 2021
goldfirere added a commit that referenced this pull request Feb 14, 2021
phadej added a commit to phadej/HTTP that referenced this pull request Feb 21, 2021
See ghc-proposals/ghc-proposals#229
for the cause of breakage
phadej added a commit to phadej/HTTP that referenced this pull request Feb 21, 2021
See ghc-proposals/ghc-proposals#229
for the cause of breakage
phadej added a commit to phadej/HTTP that referenced this pull request Feb 21, 2021
See ghc-proposals/ghc-proposals#229
for the cause of breakage
phadej added a commit to phadej/HTTP that referenced this pull request Mar 15, 2021
See ghc-proposals/ghc-proposals#229
for the cause of breakage
@phadej phadej mentioned this pull request Oct 28, 2021
@nomeata nomeata added the Implemented The proposal has been implemented and has hit GHC master label Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted The committee has decided to accept the proposal Implemented The proposal has been implemented and has hit GHC master
Development

Successfully merging this pull request may close these issues.

None yet