Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devanagari Candrabindu as a Consonant Modifier #2392

Open
Richard57 opened this issue May 8, 2020 · 4 comments
Open

Devanagari Candrabindu as a Consonant Modifier #2392

Richard57 opened this issue May 8, 2020 · 4 comments
Labels

Comments

@Richard57
Copy link

In Devanagari (and other Indian scripts), candrabindu can nasalise the first consonant of a cluster or the vowel. In the first role, it is written at the left of the cluster, while in the second role it tends to be written centrally or at the right. On Windows 10, the font Nirmala UI distinguishes the two when rendered with Uniscribe, displaying the sequence as a single askhara (i.e. no internal halant) in both cases, but not when rendered with HarfBuzz, which displays the sequence as two aksharas. When the arrangement of the consonants in the cluster is to be left to the renderer, nasalisation of the consonant is encoded on Windows by the sequence <consonant, virama, candrabindu, consonant>.

Contrasting displays on Windows 10 (generated on the same machine and cut down from the same screen dump) are:

Internet Explorer 11 Version 11.1158.17763.0 Update Versions 11.0.185, which uses Uniscribe:

ie11

Chrome: Version 81.0.4044.129 (Official Build) (64-bit), which uses HarfBuzz:

chrome

The encoding of the various sequences row by row, left to right is

  0932 094d 0901 0932
  0932 094d 0932 0901

  0932 094d 0901 0932 093e
  0932 094d 0932 093e 0901

  0932 094d 0901 0932 0947
  0932 094d 0932 0947 0901

  0932 094d 0901 0932 093f
  0932 094d 0932 093f 0901

The rendering issue was confirmed to be present in the latest development from HarfBuzz Version 2.6.5. The Uniscribe rendering is linguistically correct and matches readers' expectations.

There was discussion on the Unicode list of how to encode the difference in the thread containing post
https://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0144.html . There Is some more recent discussion in the threads entitled "Devanagari ल्लाँ ambiguous?" and starting 5 May 2020. (No URL, as the Unicode Consortium web's server/site is still being repaired.)

The Uniscribe behaviour conflicts with the published specification, which agrees with TUS 12.1 R10 that U+0901 DEVANAGARI SIGN CANDRABINDU must follow the consonants and vowels of an akshara; this rule leads to the 2-akshara rendering produced by HarfBuzz.

This issue also impacts the porting of Devanagari to the USE.

There remains a case for treating U+0901 immediately before a vowel as a string encoding error.

@behdad
Copy link
Member

behdad commented May 19, 2020

cc @jfkthame

@Richard57
Copy link
Author

Andrew Glass has undertaken to update the Microsoft Devanagari specification (MicrosoftDocs/typography-issues#416) to add the current Uniscribe/DirectWrite behaviour as correct.

@behdad
Copy link
Member

behdad commented Jul 15, 2022

@dscorbett can you triage this please?

@dscorbett
Copy link
Collaborator

We should make this change.

I checked all the Indic-shaper scripts in Notepad with strings analogous to ⟨ल्ँल⟩. Only Devanagari supports it, and only for certain marks:

  • U+0900 DEVANAGARI SIGN INVERTED CANDRABINDU
  • U+0901 DEVANAGARI SIGN CANDRABINDU
  • U+0902 DEVANAGARI SIGN ANUSVARA
  • U+0953 DEVANAGARI GRAVE ACCENT
  • U+0954 DEVANAGARI ACUTE ACCENT

These belong to a new class which I’ll call CB. CB behaves like SM but can also appear between H and C. A dotted circle is inserted between two adjacent CBs. CB blocks 'rphf' in a context like <Ra, H, CB, C>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants