Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AudioEncoderConfig.latencyMode (or similar) #371

Closed
chcunningham opened this issue Sep 29, 2021 · 6 comments · Fixed by #551
Closed

AudioEncoderConfig.latencyMode (or similar) #371

chcunningham opened this issue Sep 29, 2021 · 6 comments · Fixed by #551
Labels
extension Interface changes that extend without breaking. TPAC 2022 Issues to discuss in upcoming TPAC meeting

Comments

@chcunningham
Copy link
Collaborator

VideoEncoderConfig has latencyMode (#269).

AudioEncoder has similar concerns around latency (realtime) vs quality. For example, Chrome's opus encoder currently uses 60ms buffers (framesize) to maximize quality. This is in keeping with our default philosophy of "don't constrain features/quality unless the user asks for it". But we should offer realtime users a way to choose a lower-latency value.

An alternative would be to specify a more granular knob for just framesize. Or do both.

@stefanholmer
Copy link

60 ms is too high as a default for RTC. There's a quite noticeable difference to use 20 ms frames.

I think it could be useful with a frame size knob instead of a latencyMode that implicitly controls the frame size. It can be useful to increase the frame size to 60-120 ms when the bitrate is low (10-16 kbps) in order to reduce packet overhead. It's possible to also do that in the transport layer though, by aggregating frames into a single packet, so the question is if there are compression gains of using larger frame sizes.

@chcunningham chcunningham added the TPAC 2022 Issues to discuss in upcoming TPAC meeting label Jul 27, 2022
@tguilbert-google
Copy link
Member

tguilbert-google commented Sep 1, 2022

Here are my proposals with some context.

Do not add AudioEncoderConfig.frameSize

frameSize is a well defined Opus concept, but it isn't generic enough to fit all codec use cases. For example, FLAC uses block size instead of frame size, and we would need to convert from milliseconds to a number of samples. AAC has a fixed frame size, but has some configurable amount of silent priming frames that affect encoder delay.

Add OpusEncoderConfig.frameSize

  • From @stefanholmer's comment, having a fine grained tuning knob to experiment with is desirable.
  • OpusEncoderConfig is already defined, and will likely be expanded when Fixed audio chunk size support #405 is resolved (EDIT: opus configuration (e.g. in-band fec) #244 was the issue I had in mind). Adding an extra field now shouldn't be an issue.
  • Developers that are already interested in advanced use cases will be able to handle the extra complexity.
  • Limited risk of breaking changes once this knob is introduced; we wouldn't remove it, and it directly exposes an parameter with no nuance.
  • The range of valid values will be defined in the codec registry; suggesting frame size for various scenarios might not be as immediately discoverable to developers as a latencyMode flag.
  • Adding this knob does not prevent us from adding an AudioEncoderConfig.latencyMode in the future.

Do not add AudioEncoderConfig.latencyMode (yet)

  • A latencyMode flag that mirrors the one we have for video decoders would offer some nice symmetry with VideoEncoderConfig.
  • latencyMode would choose between two sets of sensible defaults parameters for each codec (frameSize for OPUS, blockSize for FLAC, priming frames for AAC, etc).
  • The current language behind latencyMode for video is flexible, but we might need to explicitly choose default per-codec parameters to ensure cross-platform and cross-browser consistency. Choosing those defaults could be problematic.
  • This simple flag might abstract away a lot of details and nuance, which could ultimately surprise developers.
  • We can always add this flag later, if any usability complaints are surfaced.

I'd love to hear if there are objections to adding OpusEncoderConfig.frameSize right now, and whether anyone thinks we should also (or instead) prioritize adding AudioEncoderConfig.latencyMode.

@bdrtc
Copy link
Contributor

bdrtc commented Sep 2, 2022

Here are my proposals with some context.

Do not add AudioEncoderConfig.frameSize

frameSize is a well defined Opus concept, but it isn't generic enough to fit all codec use cases. For example, FLAC uses block size instead of frame size, and we would need to convert from milliseconds to a number of samples. AAC has a fixed frame size, but has some configurable amount of silent priming frames that affect encoder delay.

Add OpusEncoderConfig.frameSize

  • From @stefanholmer's comment, having a fine grained tuning knob to experiment with is desirable.
  • OpusEncoderConfig is already defined, and will likely be expanded when Fixed audio chunk size support #405 is resolved. Adding an extra field now shouldn't be an issue.
  • Developers that are already interested in advanced use cases will be able to handle the extra complexity.
  • Limited risk of breaking changes once this knob is introduced; we wouldn't remove it, and it directly exposes an parameter with no nuance.
  • The range of valid values will be defined in the codec registry; suggesting frame size for various scenarios might not be as immediately discoverable to developers as a latencyMode flag.
  • Adding this knob does not prevent us from adding an AudioEncoderConfig.latencyMode in the future.

Do not add AudioEncoderConfig.latencyMode (yet)

  • A latencyMode flag that mirrors the one we have for video decoders would offer some nice symmetry with VideoEncoderConfig.
  • latencyMode would choose between two sets of sensible defaults parameters for each codec (frameSize for OPUS, blockSize for FLAC, priming frames for AAC, etc).
  • The current language behind latencyMode for video is flexible, but we might need to explicitly choose default per-codec parameters to ensure cross-platform and cross-browser consistency. Choosing those defaults could be problematic.
  • This simple flag might abstract away a lot of details and nuance, which could ultimately surprise developers.
  • We can always add this flag later, if any usability complaints are surfaced.

I'd love to hear if there are objections to adding OpusEncoderConfig.frameSize right now, and whether anyone thinks we should also (or instead) prioritize adding AudioEncoderConfig.latencyMode.

we recommend use frameDuration/ptime in miiliseconds is more suitable, The parameter we need actually is duration per frame, this called ptime(packet time)in sdp rfc8866 6.4 ,
ptime is widely used in sip base voip system include webrtc(opus use ptime=10 as sdp attribute by default), and frameDuration is correspond to duration attribute in audioData struct,we think either one is ok and more understandable.

@tguilbert-google
Copy link
Member

tguilbert-google commented Sep 2, 2022

Thanks! I did intend frameSize to be in milliseconds, but never mentioned it. The Opus spec does mention frame duration.

Naming the parameter OpusEncoderConfig.frameDuration also seems good to me. It also clearly conveys a unit of time.

I am not an Opus expert though. If others have a strong preference for one or the other, both seem fine to me.

@bdrtc
Copy link
Contributor

bdrtc commented Sep 5, 2022

Thanks! I did intend frameSize to be in milliseconds, but never mentioned it. The Opus spec does mention frame duration.

Naming the parameter OpusEncoderConfig.frameDuration also seems good to me. It also clearly conveys a unit of time.

I am not an Opus expert though. If others have a strong preference for one or the other, both seem fine to me.

for opus encoder, the frameSize actually is encoder buffer size , it's determined by sampleRate channel numbers and frameDuration.
frameSize = numberOfChannels*sampleRate * frameDuration / 1000
here sampleRate for opus maybe: 8000, 12000, 16000, 24000, 48000, and this parameter exist in spec already.
and numberOfChannels is also exist in spec audioEncoderConfig.
frameDuation is in milliseconds here , maybe 2.5, 5, 10, 20, 40, 60 or 120 ms etc.
we can make a PR for this :>

@tguilbert-google
Copy link
Member

From TPAC: No pushback on this issue. Adding frameDuration seems good. I would let the spec editors adopt this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension Interface changes that extend without breaking. TPAC 2022 Issues to discuss in upcoming TPAC meeting
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants