Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should Apple's "mic mode" be reflected in the API somehow? #47

Open
alvestrand opened this issue Jan 3, 2022 · 12 comments
Open

Should Apple's "mic mode" be reflected in the API somehow? #47

alvestrand opened this issue Jan 3, 2022 · 12 comments
Assignees

Comments

@alvestrand
Copy link
Contributor

A new Apple feature called "mic mode" apparently permits specifying some kinds of audio processing for microphone devices; this was called to Chrome's attention in https://bugs.chromium.org/p/chromium/issues/detail?id=1282442

Is this something that could be useful to expose in the WebRTC API? Are there adaptations that could be made to accomodate this without changing the API?

Assigning to @youennf for comment.

@fippo
Copy link

fippo commented Jan 3, 2022

there is a noise suppression constraint which defaults to true. IIRC the implementation is internal to libwebrtc so probably not as good as the fancy new APIs.

Similar issue for video + background blur: https://bugs.chromium.org/p/chromium/issues/detail?id=1281408&
Note that this isn't specific to Apple's new APIs but also applies to applications like Krisp or Maxine

The API issue here is discoverability -- the application might not want to try enhancing the audio if it is already enhanced or not blur the background if it is already blurred (or recommend the user to turn off the built-in behavior)

@bradisbell
Copy link

Other systems have these settings as well. Windows has an option for enabling/disabling "effects", which I believe is what is used to control the various driver's microphone array in the same/similar way Apple does.

https://docs.microsoft.com/en-us/windows-hardware/drivers/audio/windows-11-apis-for-audio-processing-objects

For example, Lenovo laptops ship with software where you can choose whether to optimize the microphone array for one person in front, multiple, or the whole room. I think this software just configures the Realtek driver's audio "effects".

@henrikand
Copy link

Note that the new support does only enable an application to show the new settings in the Control Center. It is then up to the user to do the actual settings manually. Hence, this new feature does not match the existing WebRTC APIs well imho since all the web application can do is to enable the user to manually change a setting. Also, it would require different native implementations for devices released before 2018 and those released after. The difference is not trivial and can't be changed with a simple flag since usage of a new audio unit is required.

Has it been shown that the new settings adds any value?

@youennf
Copy link
Contributor

youennf commented Jan 3, 2022

Thanks for filing this issue @alvestrand.

In addition to mic mode, there is a corresponding camera mode, which might be useful to discuss jointly.

I think it is worth exposing this value (no need to do background blur if it is already done by the OS, ask user to change its microphone setting if the goal is to do high fidelity audio recording).

Depending on the OS, the value can change based on user interaction with the OS, but not by applications.
Exposing such value would probably require adding a way to notify the web application that this value has somehow changed.
I am not a fan of constraints, this applies here as well given this is a not-controllable/not-settable property.

@alvestrand
Copy link
Contributor Author

I know that at least on Windows, there's been quite a bit of engineering effort spent on turning off this type of functionality - the functions increased CPU usage and decreased sound quality because the same kind of processing was being done inside WebRTC, doing it twice worsened the sound quality, and the WebRTC algorithms, perhaps because being more frequently updated, had superior performance characteristics.

So there are really multiple dimensions here:

  • Controlling whether or not the OS should allow the user to toggle random settings that might or might not improve things
  • Detecting whether or not such effects have been applied (and, if true, what effects have been applied)
  • Possibly controlling the effects from the application

It seems to me that WebRTC ought to be able to get a "clean path" in a reliable way, so that the effects applied are only the ones that WebRTC introduces; it's less clear to me that there's a good way to manipulate platform effects - it's hard to standardize when they vary so much by platform.

@henrikand
Copy link

To maintain an as clean and stable path as possible, I suggest that we don't expose the new mic modes but stick with the existing implementation. Allowing the user to make changes in this area may affect the existing AEC, NS etc. in a negative way. In any case, if changes in this are are made, a signal processing team should study the implications carefully first.

@jan-ivar
Copy link
Member

jan-ivar commented Jan 27, 2022

It seems to me users should be able to pick from both OS-provided features and application provided ones. This suggests we should focus on solving any interference or double processing problems without limiting choice (i.e. let apps detect pre-processing choices the user has made, but not let apps deny them).

@huibk
Copy link

huibk commented Apr 7, 2022

If the browser wants to leave users with a choice it should also provide the means to make that choice in a convenient way for it not to be a false choice. Basically there are two approaches:

  1. We let the app decide because it knows the use case the best. It can request a raw stream with no processing, or a default stream with OS/user configured processing.
  2. The browser offers users the means to resolve interference: an app can detect pre-processing and request disabling of pre-processing. The user can accept or reject on session/site basis.

@youennf
Copy link
Contributor

youennf commented Apr 7, 2022

If the browser wants to leave users with a choice it should also provide the means to make that choice in a convenient way for it not to be a false choice

Capabilities offer a way to support both type of OSes.
When the OS is changing the value from false to true, there would be two cases:

  • OS allows UA to overridie the value: capabilities remain [true, false], setting is changed to true.
  • OS does not allow UA to override the value: capabilities change to [true], setting is changed to true.

@huibk
Copy link

huibk commented Apr 7, 2022

In which cases would the OS not allow override of the value by the UA?
It doesn't seem to be a good solution that web applications have to make OS specific user instructions how to manually turn on and off processing before and after usage. For instance users may want voice isolation always on for their Jitsi meetings but for their JamKaZam sessions it must be turned off.

@youennf
Copy link
Contributor

youennf commented Apr 7, 2022

iOS and macOS do not allow applications to override background blur AFAIK.
Users that turn on background blur will know how to disable it.
The important thing is that web apps know about it to either update their pipeline or provide adequate information, should the setting be not optimal (ideally the user+OS should be able to have the setting right after some limited learning).

@huibk
Copy link

huibk commented Apr 8, 2022

The challenge for applications will be to explain to end-users why similar effects are incompatible and that they may have to toggle it manually on a session basis. Ideally there is a more convenient way to preserve both privacy/control and utility.
What is the situation for audio processing on Mac? Can that be altered by the application?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants