Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False positive for stray end tags #841

Closed
LiLoDavis opened this issue Jul 18, 2019 · 10 comments
Closed

False positive for stray end tags #841

LiLoDavis opened this issue Jul 18, 2019 · 10 comments

Comments

@LiLoDavis
Copy link

LiLoDavis commented Jul 18, 2019

Page: https://www.w3.org/WAI/demos/bad/before/home.html

The Nu HTML Checker reports two stray end tags on this page, but both have corresponding start tags.

Error: Stray end tag noscript.
From line 142, column 70; to line 142, column 80
/FONT></B></noscript>↩ <
LL comment: The start tag is on the same line as the end tag (142).

Error: Stray end tag head.
From line 144, column 3; to line 144, column 9
script>↩ </head>↩ <bo
LL comment: The start tag is on line 1.

@LiLoDavis
Copy link
Author

Perhaps related, I'm also getting false positives for unclosed elements on this page: https://www.uwb.edu/brand/website/accessibility/examples/inaccessible-page

@cvrebert
Copy link
Contributor

For https://www.w3.org/WAI/demos/bad/before/home.html , I suspect it's related to the <noscript> containing illegal children due to it being within the <head>. Probably that causes the parser to implicitly close the <noscript> tag early, thus making the explicit close </noscript> tag extraneous.

https://html.spec.whatwg.org/multipage/scripting.html#the-noscript-element

  • In a head element, if scripting is disabled for the noscript element
    • The noscript element must contain only link, style, and meta elements.
  • In a head element, if scripting is enabled for the noscript element
    • The noscript element must contain only text, except that invoking the HTML fragment parsing algorithm with the noscript element as the context element and the text contents as the input must result in a list of nodes that consists only of link, style, and meta elements that would be conforming if they were children of the noscript element, and no parse errors.

<b> and <font> aren't among the permitted children.

@LiLoDavis
Copy link
Author

Ok, so not exactly a false positive, but rather a false attribution.

@LiLoDavis
Copy link
Author

LiLoDavis commented Jul 19, 2019

Hm. If the problem is that a <noscript> in a <head> can't contain <b> or <font> elements, shouldn't removing the <b> and <font> elements from the <noscript> prevent the "stray end tag" error? It doesn't.

@sideshowbarker
Copy link
Contributor

Hm. If the problem is that a <noscript> in a <head> can't contain <b> or <font> elements, shouldn't removing the <b> and <font> elements from the <noscript> prevent the "stray end tag" error? It doesn't.

Can you doublecheck that?

Here is minimal test case:

<!doctype html>
<HTML lang="">
<title>test</title>
<noscript><b></b></noscript>

That one has <b> in <noscript>, which makes the checker report Stray end tag noscript:

https://validator.w3.org/nu/?showsource=yes&doc=data%3Atext%2Fhtml%3Bcharset%3Dutf-8%2C%253C%2521doctype%2520html%253E%250D%250A%253CHTML%2520lang%253D%2522%2522%253E%250D%250A%253Ctitle%253Etest%253C%252Ftitle%253E%250D%250A%253Cnoscript%253E%253Cb%253E%253C%252Fb%253E%253C%252Fnoscript%253E#textarea

Here is another minimal test case:

<!doctype html>
<HTML lang="">
<title>test</title>
<noscript></noscript>

That one has no <b> in <noscript>, and the checker reports no errors:

https://validator.w3.org/nu/?showsource=yes&doc=data%3Atext%2Fhtml%3Bcharset%3Dutf-8%2C%253C%2521doctype%2520html%253E%250D%250A%253CHTML%2520lang%253D%2522%2522%253E%250D%250A%253Ctitle%253Etest%253C%252Ftitle%253E%250D%250A%253Cnoscript%253E%253C%252Fnoscript%253E#textarea

@sideshowbarker
Copy link
Contributor

For https://www.w3.org/WAI/demos/bad/before/home.html , I suspect it's related to the <noscript> containing illegal children due to it being within the <head>. Probably that causes the parser to implicitly close the <noscript> tag early, thus making the explicit close </noscript> tag extraneous.

That is exactly the case. But it’s important to note that you’ll only see that behavior when scripting is disabled (as mentioned in the spec section you cited).

When scripting is not disabled — which of course in the normal case in web browsers — then the parser actually won’t implicitly close that noscript element. More specifically, when scripting is not disabled, the <B><FONT COLOR=RED>This page uses scripts!!!</FONT></B> inside the NOSCRIPT element in the source of https://www.w3.org/WAI/demos/bad/before/home.html just goes into the DOM as a text node.

But the checker is not capable of checking documents with scripting enabled. The checker doesn’t have a JavaScript engine to execute script with. So the checker uses the HTML parser in “scripting disabled” mode. And in “scripting disabled” mode, the parser doesn’t evaluate the NOSCRIPT content as text — instead the parser tries to parse any markup it finds inside the NOSCRIPT.

So exactly what happens here is that when the parser hits that <b> start tag, the parser inserts an implicit </noscript> end tag before the <b> start tag. But the parser doesn’t stop there; because the b element cannot appear in head, the parser also inserts both an implicit </head> end tag before the <b> start tag, and also inserts an implicit <body> start tag.

So with scripting disabled, this is what the parser ends up putting into the DOM:

Screen Shot 2019-07-19 at 11 16 10

…and this is what ends up getting rendered:

Screen Shot 2019-07-19 at 11 27 22

@sideshowbarker
Copy link
Contributor

Ok, so not exactly a false positive, but rather a false attribution.

Yeah, basically. The is just one of the many parts of the HTML parsing algorithm that is almost absurdly arcane and non-intuitive. So for this case, it’s very difficult to have the checker emit a user-friendly error message that clearly and succinctly explains what the root problem actually is.

@sideshowbarker
Copy link
Contributor

Perhaps related, I'm also getting false positives for unclosed elements on this page: https://www.uwb.edu/brand/website/accessibility/examples/inaccessible-page

I don’t get any messages about unclosed elements on that page —

https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.uwb.edu%2Fbrand%2Fwebsite%2Faccessibility%2Fexamples%2Finaccessible-page

Maybe something changed? It got updated in the meantime?

@LiLoDavis
Copy link
Author

Perhaps related, I'm also getting false positives for unclosed elements on this page: https://www.uwb.edu/brand/website/accessibility/examples/inaccessible-page

I don’t get any messages about unclosed elements on that page —

https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.uwb.edu%2Fbrand%2Fwebsite%2Faccessibility%2Fexamples%2Finaccessible-page

Maybe something changed? It got updated in the meantime?

I've been using the Check serialized DOM of current page bookmarklet rather than typing the URL into the Nu HTML Checker page. I did it both ways just now and got different results. Wasn't expecting that. :-(
For that page:

  • Using the Nu HTML Checker directly, I get 12 warnings and 2 errors.
  • Using the bookmarklet, I get 15 warnings, 14 errors, and 1 fatal error.

@sideshowbarker
Copy link
Contributor

I think this is resolved per the comments above. If not, let me know and we can reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants