Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
New developments in the Opus audio codec (xiph.org)
108 points by nullc on July 12, 2013 | hide | past | favorite | 22 comments


This is an awesome page. The discussion about tonality really resonated with me (no pun intended) because it irritates me when switching audio sources at how "squashed" they can sound. For a while I had some macros on my G15 keyboard to change settings in the audio mixer based on music, games, or general computer sounds. If Opus can do a reasonable job on a wide variety of inputs automatically it will be a wonderful contribution.


> And we haven't even started using NEON yet.

That is almost concerning, because even though it seems like overkill to gpu accelerate audio decoding (that the copy and latency overhead isn't worth it) I imagine it will require quite a bit of refactoring to make a SIMD decoder (or encoder). But that will be absolutely necessary to see Opus adoption as the web audio standard where mobile devices need every watt of energy savings possible.


NEON isn't GPU, it's just the ARM equivalent to SSE/MMX. (Though a fair bit nicer to work with).

It doesn't require extensive refactoring. There are patches to use SIMD that aren't merged yet— though pure C improvements have tended to be more important since NEON isn't available on all ARM parts.


Given how contentious the environment surrounding audio codecs have been over the years, Opus (and Silk, too) just seems too good to be true. Am I missing a devil in the details?

Otherwise, I'm excited at the possibilities this provides when used in open source projects that utilize open protocols to deliver audio end-to-end (peer-to-peer) over the network.


the following is not true, see below: warning: don't visit the linked site on opera mobile. It seems to download something bigger and also froze the browser for me.

edit: sorry, false alarm! that was something else. the site loads perfectly fine (and I am honestly happy about every site that does not differentiate between desktop and mobile clients but serves a readable, clearly structured, "static" page like that!).

edit 2: even the audio samples worked flawlessly. the accompanying displays not though.


This is awesome. Who would have guessed that ultra-low-latency low-bandwidth perceptual coding of audio would involve neural networks and hidden Markov models? It seems obvious when you put it that way, but I certainly wouldn't have predicted it.


how does the cpu usage compare to vorbis? (and then to mp3?) i assume the variable bitrate uses less cpu, or?

(b/c i just have to have 40 tabs open, steam, and a mp3 player at all times)


> i assume the variable bitrate uses less cpu, or?

At bitrates higher than about 64 kbps, CBR actually uses less CPU to encode than VBR. This is because the CELT layer was designed to be natively CBR (since for real-time communication you want strict rate control all the time), and VBR is achieved by varying the target CBR rate on a packet-by-packet basis (using additional analysis not required by CBR).

This was a major departure from codecs like Vorbis, which are natively VBR, and implement CBR by encoding at up to 16 different rates and picking the one closest to the actual target, or other codecs which use a "bit reservoir" to shift bits back and forth between frames (increasing delay due to increased buffering requirements).


CPU usage in the decoder is currently higher than Vorbis, though most of the difference is due to having less optimizations. As for the encoder, it's already faster than Vorbis and should become even faster in the future. I haven't checked, but I suspect the same is true when compared to MP3. That being said, the complexity is pretty much negligible on a desktop machine.


Interesting. I tested a while back and found opusenc consistently slower than oggenc. I don't remember the specifics, but I'm surprised a 27% improvement was enough to make up the difference.

It's not negligible to me, as I'm running a music server on a Raspberry Pi that needs to transcode audio, and the Pi can only transcode FLAC to Vorbis at 1.8x realtime. This makes it terribly sensitive to background tasks — anything else going on tends to prevent it from keeping up. So I'm quite glad you all are optimizing for ARM :)


Uh. On a Raspberry Pi you should be using the Opus encoder compiled as fixed point. It will be a _ton_ faster than Vorbis, as the floating point on the rpi is very slow and there is no fixed point vorbis encoder. (Unfortunately, if your input requires resampling, there isn't currently an option to compile the opus-tools front end to use the fixed point resampler, so that will add some slowdown— I guess that should go up in priority)

Although ... rpi ugh. I often feel that device was an evil scheme to turn people off of arm. It is _remarkably_ slow for its cost and power consumption. For $100 you can have an arm device which is easily 32x faster for most DSP-ish stuff, and which draws similar power.


I have an odd affliction - I collect ARM devices. Right from the OpenMoko Freerunner, through the N900 and now the Raspberry Pi.

It's always a tradeoff between price, community size and compute power with these little things. Hell, even the ZipIt Z2 running a tiny little Freescale processor had a good community when it was new.

I can go and get a HardKernel O2 and have really good performance but there won't be that many people in the IRC channel to help me out when I have a device specific question.

The Raspberry Pi has captured such an audience that there's people everywhere who can help, as well as a huge amount of development going on.

The other advantage of the Raspberry Pi is that replacing the whole board is cheaper than buying the JTAG breakout for other devices (I'm looking at you, GlobalScale).

I'd love to talk with you further about the different devices around - I really do have a lot of them and I have more on the way. Support and communities for ARM devices seems to be really fragmented, and it's a shame.

I think my contact details are in my profile. Sorry for the rambling reply, it's late here.


Tremor is an integer-only Vorbis decoder, correct? Same they never ported the encoder as well!


Good to know, thanks :) What's this $100 ARM device you speak of?



Now I'm not sure how much floating point arithmetic the ogg vorbis encoder would use, but ARM can be sensitive to this. If you have the choice between "hardfp" and "softfp" for you distro, you want hardfp, as this will use the floating point hardware (which is present in the pi). Some ARM systems require softfp, since they lack fp hardware.

Also, what optimisations have you got turned on (on the compiler). For a realtime system, it might make sense to turn them down, trading stream size for processing time.


I wish support for Opus was better in popular players. I tried building a RTP + RTSP streaming server using libopus, but even VLC couldn't play it.


There is Opus support in VLC, but I don't believe that RTP is working for it there yet. (Obviously simple HTTP streaming works)


I think it's because the Opus decoder in VLC expects to find a Ogg container, which is a no-go for streaming.


No go for RTP— as thats not the standard and an additional container doesn't provide anything useful, but its fantastic for HTTP streaming.


I guess I should clarify streaming as in low-latency live streaming, where RTP is necessary - and Opus would be a perfect fit with its very low algorithmic delay.


Chrome and Firefox should both support Opus over RTP now as part of WebRTC.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: