Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc: Bad value for 'hidepid' #16896

Closed
mbiebl opened this issue Aug 28, 2020 · 33 comments
Closed

proc: Bad value for 'hidepid' #16896

mbiebl opened this issue Aug 28, 2020 · 33 comments

Comments

@mbiebl
Copy link
Contributor

mbiebl commented Aug 28, 2020

With latest git master, I see

root@debian:~# journalctl -b | grep hidepid
Aug 28 20:47:16 debian kernel: proc: Bad value for 'hidepid'
Aug 28 20:47:16 debian kernel: proc: Bad value for 'hidepid'
Aug 28 20:47:17 debian kernel: proc: Bad value for 'hidepid'
Aug 28 20:47:16 debian kernel: proc: Bad value for 'hidepid'
root@debian:~# uname -a
Linux debian 5.7.0-2-amd64 #1 SMP Debian 5.7.10-1 (2020-07-26) x86_64 GNU/Linux

I think those error messages showed up once ProtectProc was merged.

@poettering
Copy link
Member

You can ignore those warnings. Story goes like this: since 4e39995 we'll make use of the hidepid option, but this is only safe on kernel >= 5.8 where hidepid is a true per-mount option (previously it was per-userns, but weirdly exposed as mount option). The kernel developer behind this work recommended way to detect whether hidpid is per mount option or per userns is to use the new textual argument to hidepid, i.e. hidepid=invisible instead of hidepid=2. The same kernels that understand the former are also the ones that support the new-style per-mount hidepid. And that's what we do here. If the mount fails, we'll gracefully fall back, since the feature cannot be supported on older kernels.

Yes, it's unfortunate to see the log spew, but you'll only see it when using new systemd on old kernels. And it doesn't mean much, except that you have an older kernel.

i don't think we can fix or improve anything here. Hence closing. Hope tht makes sense.

@mbiebl
Copy link
Contributor Author

mbiebl commented Aug 28, 2020

Please don't enable it then on kernels < 5.8

@mbiebl mbiebl reopened this Aug 28, 2020
@poettering
Copy link
Member

We don't "enable" kernel log messages.

We generally check for features, not kernel versions. Because people backport like there was no tomorrow. Kernel version checks are only last resort.

And upstream kernel peeps recommended to check for "hidepid=invisible" support, and that's what we are doing hence.

Again, this is cosmetic only. And only when new systemd is combined with old kernel.

@mbiebl
Copy link
Contributor Author

mbiebl commented Aug 29, 2020

What I meant is, that hidepid=invisible should not be used on kernels < 5.8.

@poettering
Copy link
Member

As mentioned, we generally try hard to avoid kernel version checks because people backport. Like in this case: we do a feature check with a graceful fallback instead of a version check. Its more robust and prettier, except that on old kernels you sometimes see a kernel log message.

@mbiebl
Copy link
Contributor Author

mbiebl commented Sep 3, 2020

hi @poettering, fwiw I completely agree with you that it's better to check for features instead for version numbers.
In this case though, this is not what happens: The feature is used unconditionally and simply generates an error on older kernels. If there was a way to probe for the availability without triggering this error, then this would be fine.
The next best is to use a kernel version check and I think we should use this here.

We should't bee too nonchalant with generating (unnecessary) error messages when we can avoid them. It's just noise that confuses and worries users. And trust me, users will see those messages.
I've seen this a lot. A current example is https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=968116

@poettering
Copy link
Member

In this case though, this is not what happens: The feature is used unconditionally and simply generates an error on older

It is attempted to be used. And if it doesn't work there's a graceful fallback in place.

And it's not an error, it's a log message you can safely ignore.

kernels. If there was a way to probe for the availability without triggering this error, then this would be fine.
The next best is to use a kernel version check and I think we should use this here.

I am sorry, but this is purely cosmetic, and doesn't matter on newer kernels anyway. Let's not make our code worse just to for superficial cosmetics on old kernels people are unlikely to use with brand-new systemd anyway...

@mbiebl
Copy link
Contributor Author

mbiebl commented Sep 3, 2020

And it's not an error, it's a log message you can safely ignore.

Unfortunately this is a response I hear too often. How are users supposed to know which error messages are safe to ignore or not? If we are not more careful, log messages become useless because of all the noise.

@mbiebl
Copy link
Contributor Author

mbiebl commented Sep 3, 2020

Needless to say I don't agree with your take on this issue.

@DemiMarie
Copy link

And it's not an error, it's a log message you can safely ignore.

Unfortunately this is a response I hear too often. How are users supposed to know which error messages are safe to ignore or not? If we are not more careful, log messages become useless because of all the noise.

PID 1 could (and, IMO, should) log a message stating to the effect of “Checking if kernel supports hidepid=invisible. If the kernel does not support this, it will log warnings of the form “proc: Bad value for hidepid”. These are harmless and should be ignored.”

@mbiebl
Copy link
Contributor Author

mbiebl commented Sep 8, 2020

I think we should do two things:

  • Log a proper error message as @DemiMarie suggested
  • Cache the result, ie. do not repeatedly try to attempt using hidepid=invisible. It's just pointless trying to do that during the runtime of the system.

@poettering
Copy link
Member

Caching is nasty, since this code is run in short-lived children forked off PID 1, i.e. the processes that eventually become the daemons after one more execve(): i.e. it's not as easy as maintaining a cache in a global variable, since they already are in forked children with their own copied memory.

@poettering
Copy link
Member

PID 1 could (and, IMO, should) log a message stating to the effect of “Checking if kernel supports hidepid=invisible. If the kernel does not support this, it will log warnings of the form “proc: Bad value for hidepid”. These are harmless and should be ignored.”

Logging before the mounting is ugly, since this would mean we'd also log on current kernels. and that's just noisy. We could however log a message when we notice the hidepid= thing fails, i.e. only after the fact, on old kernels.

@boucman
Copy link
Contributor

boucman commented Sep 9, 2020

we really need a way to filter messages in journald....

@poettering I know you don't like the idea, but we should be able to reach a stage where there is no log at all in normal operations... Logs fill up disk space, cause wear of flash devices and embedded devices can be isolated for decades. Disk filling up or wearing is a deathclock ticking on the device.

We need to log errors, but we must be able to not log any "normal log" they are the slow killers of embedded devices

(kinda offtopic, sorry)

@poettering
Copy link
Member

@boucman the offending log msg this issues is about is done by the kernel, not userspace.

@boucman
Copy link
Contributor

boucman commented Sep 9, 2020

yes I know, but it will still end up in the journal and there is very little we can do about it.

Anyway, that's off topic and I don't want to highjack the original thread...

@DemiMarie
Copy link

@boucman can journald be told to just log only high-severity messages? Also if your device is going for decades without firmware updates you have a major security problem, unless it has no persistent mutable state and no network access.

@poettering we could have the child report its result to PID 1 over a socketpair.

@poettering
Copy link
Member

@poettering we could have the child report its result to PID 1 over a socketpair.

we could also use a touch file or so, but i'd rather not have such complexity in place for a cosmetic issue. i think adding a log message is fine, but adding non-trivial caching infra that requires IPC or otherwise leaks out of process memory is a bit over the top for the fact that this is a very temporary issue only, i.e. when people mix new userspace with old kernels.

@tomty89
Copy link
Contributor

tomty89 commented Dec 3, 2020

How about an option in some conf (e.g. /etc/systemd/system.conf) that allows user to disable the cause/feature manually?

@MichaIng
Copy link

There is still dmesg which is, I guess, more often used to check kernel logs, compared to journalctl. I totally agree, that if anyhow possible, triggering a kernel error log should be avoided. I agree to @mbiebl that such "expected" error messages are dangerous, as they decrease the value of the whole log, either make it more difficult to see and debug actual errors or lead to admins start ignoring kernel errors where quick action is required. An "error" message should mean an "error" and require attention and action to get it fixed, until there is no error message anymore.

And it's not an error, it's a log message you can safely ignore.

It is an error message (level 3), appears red in dmesg and journalctl, appears when running dmesg -l err. If it was a higher log level (warning, notice, ...) as result of a probe, that would be okay.

I do not agree with @boucman that no logs are a goal in any way. There are many reasons why one needs or wants to go though informational or even debug logs. And journalctl can easily be configured to filter logs in every thinkable way via command line options or /etc/systemd/journald.conf(.d/*.conf):

  • journalctl -p 3 to show only error logs (+crit,alert,emerg)
  • You can remove kernel logs completely from journald by setting ReadKMsg=no in it's config file(s).
  • There is a MaxLevelKMsg setting, but that controls the messages sent BY journald TO kernel log.
  • A setting to control the kernel log level to read indeed seems to be missing currently and would be actually a nice thing.
  • dmesg -n [1-8] can be used to control the kernel log level sent to console, but AFAIK journald pulls kernel messages from /proc/kmsg directly, which always contains all log levels.

I know it's not always easily possible, probably we have to live with this "expected" error message, but it should be taken a reasonable and serious request, worth to spend some time, to avoid it.

@poettering
Copy link
Member

How about an option in some conf (e.g. /etc/systemd/system.conf) that allows user to disable the cause/feature manually?

No way. Every option comes at a cost. And it's totally not worth adding one this is a purely cosmetic issue, visible only if you use old kernels with new systemd versions, which very soon is a thing of the past anyway.

Quite frankly, I think we should close this, it's just not important. It's cosmetic and will fix itself as people upgrade kernels.

@gbdlin
Copy link

gbdlin commented Dec 19, 2020

Can you ensure this log message will never occur in the situation, in which it requires attention? Including systems not using systemd? If no, how can one quickly determine if this occurence can be ignored or not? If there is no simple way to make sure for every single occurence of this error message that it can be ignored, there should be for sure something done about it.

@poettering
Copy link
Member

Can you ensure this log message will never occur in the situation, in which it requires attention? Including systems not using systemd? If no, how can one quickly determine if this occurence can be ignored or not? If there is no simple way to make sure for every single occurence of this error message that it can be ignored, there should be for sure something done about it.

We issue a mount() call with hidpid=, which works fine on recent kernels, and fails + logs on old kernels. The failure we see and then retry without hidepid=, as a fallback compat to old kernels. The log you still see though, there's no way to suppress that in the Linux kernel.

My recommendation, if this cosmetic issue is too much for you to handle: please update systemd and kernel reasonably in sync: run new systemd on new kernels and old systemd on old kernels. It's the mix and match where this happens: new systemd on old kernels. where we try to make use of new kernel features, but because you run legacy kernels we need to fall back.

@gbdlin
Copy link

gbdlin commented Dec 19, 2020

It's not answering my question. Let me explain it a bit more: Let's assume that something else, that is mission critical, is triggering the same log message from kernel in my system - is this even possible when using systemd? Can something in my system trigger mount directly? If yes, how can I easily determine the origin of the specific message in my kernel log? Is there a way to quickly check if this specific message comes from systemd trying to probe my kernel or if this comes from some other software that may fail to operate properly because of that error?

I assume that for other systems that do not run systemd I should treat this exact message as an error.

@poettering
Copy link
Member

Let's hope that whatever that "mission critical" thing is you are talking about that it checks errors from system calls. Because exactly like in systemd, if they try the mount option and it's not supported the syscall fails + the log is generated.

If your "mission critical" thing ignores errors like this it probably shouldn't be something you deploy as "mission critical".

You know, tons of syscalls can fail, and they do all the time. 99.999% don#t log about it though. And things are fine, because apps handle the error and do something smart, either handle things in some other way, or log about it or so. It's how this is done: fail on error so that the caller can do something about it. Now, in this very specific case some operation fails, and this is correctly returned as failure code to the app, but then also logged about. And that last part is ugly and unnecessary, but can't be addressed, since it's in old software.

Anyway, if you really think that this cosmetic issue is worth discussing in so much detail, I really don#t want to see the email threads you open about all the rest of the log messages you see in dmesg...

anyway, please upgrade your kernel or downgrade your systemd if this log message is really so terrible to endure...

@rpurdie
Copy link
Contributor

rpurdie commented Jan 15, 2021

Just for the record, Yocto Project is looking like it will have to carry a patch adding a version check to systemd as the kernel warnings are creating too many problems for our developers getting reports from users about "new errors" in logs.

@poettering
Copy link
Member

OK, I don't think we should change anything here. It's purely cosmetic, and sooner or later will go away anyway once people upgrade their kernels (or downgrade systemd).

@rpurdie
Copy link
Contributor

rpurdie commented Feb 13, 2021

It's purely cosmetic

The thing is it is not purely cosmetic. The reports of these errors takes up serious amounts of time for developers having to handle the reports from concerned users who don't know better so we're carrying yet more systemd patches which can never be upstream.

@mbiebl
Copy link
Contributor Author

mbiebl commented Feb 13, 2021

OK, I don't think we should change anything here. It's purely cosmetic, and sooner or later will go away anyway once people upgrade their kernels (or downgrade systemd).

I don't agree. It's just lazy ignoring issues like this.

@keszybz
Copy link
Member

keszybz commented Feb 13, 2021

I don't agree. It's just lazy ignoring issues like this.

Just for the record, Yocto Project is looking like it will have to carry a patch adding a version check to systemd

Once the patch is ready, please submit it as PR here. If it's not too ugly, we can merge it. And even if not, it'll make it easy to cherry-pick into distros that want it.

@poettering
Copy link
Member

Please no version checks for cosmetic stuff. Feature checks are ok. Version checks suck and are last resort stuff. People backport all the time. Not worth doing this for cosmetic issues.

My recommendation: just turn of the sandboxing option on your distro, i.e. return to the status quo ante. You dont need the sandboxing, the stuff will work without. If cosmetic issues are that much of a priority to you then maybe just drop the sandboxing.

@floppym
Copy link
Contributor

floppym commented Feb 13, 2021

The distros using newer systemd with old kernels could patch the old kernel to not emit this error message.

rpurdie added a commit to rpurdie/systemd that referenced this issue Feb 16, 2021
As of commit 4e39995 ("core: introduce
ProtectProc= and ProcSubset= to expose hidepid= and subset= procfs
mount options") kernels older than v5.8 generate multple warnings at
boot, as seen in this Yocto build from today:

     qemux86-64 login: root
     [   65.829009] proc: Bad value for 'hidepid'
     root@qemux86-64:~# dmesg|grep proc:
     [   16.990706] proc: Bad value for 'hidepid'
     [   28.060178] proc: Bad value for 'hidepid'
     [   28.874229] proc: Bad value for 'hidepid'
     [   32.685107] proc: Bad value for 'hidepid'
     [   65.829009] proc: Bad value for 'hidepid'
     root@qemux86-64:~#

We see reports of the issue as in general its hard to someone to tell
the difference between an error in dmesg which they should worry about and
one that is harmless. This adds support burden to developers so Yocto
Project has added this patch.

The commit that triggers this is systemd v247-rc1~378^2~3 -- so any
systemd 247 and above plus kernel v5.7 or older will need this.

As noted in systemd#16896
it is possible changes could be backported to different kernel versions
so the test isn't 100% foolproof but does give better results than a
continual stream of bug reports.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
@rpurdie
Copy link
Contributor

rpurdie commented Feb 16, 2021

I have opened a pull request for it as mentioned above since it would seem to make sense to at least share the patch.

rpurdie added a commit to rpurdie/systemd that referenced this issue Feb 16, 2021
As of commit 4e39995 ("core: introduce
ProtectProc= and ProcSubset= to expose hidepid= and subset= procfs
mount options") kernels older than v5.8 generate multple warnings at
boot, as seen in this Yocto build from today:

     qemux86-64 login: root
     [   65.829009] proc: Bad value for 'hidepid'
     root@qemux86-64:~# dmesg|grep proc:
     [   16.990706] proc: Bad value for 'hidepid'
     [   28.060178] proc: Bad value for 'hidepid'
     [   28.874229] proc: Bad value for 'hidepid'
     [   32.685107] proc: Bad value for 'hidepid'
     [   65.829009] proc: Bad value for 'hidepid'
     root@qemux86-64:~#

We see reports of the issue as in general its hard to someone to tell
the difference between an error in dmesg which they should worry about and
one that is harmless. This adds support burden to developers so Yocto
Project has added this patch.

The commit that triggers this is systemd v247-rc1~378^2~3 -- so any
systemd 247 and above plus kernel v5.7 or older will need this.

As noted in systemd#16896
it is possible changes could be backported to different kernel versions
so the test isn't 100% foolproof but does give better results than a
continual stream of bug reports.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

10 participants