About a year ago I wrote a post about how to specifically stream audio from a radio server using NodeJS, and since then I’ve made some upgrades to the code that made the app much scalable and performant, which I would like to share on this post. Unlike that post, however, I’ll try to cover everything you might need regarding streaming audio in general, like encoding and decoding, keeping a global stream and multiple clients, how to deal with real-time and an updated take on streaming from a radio server.

First of all: why use Node? Well, one easy is answer is: because it’s simple. Node is very well known for making streaming extremely straightforward, and now, thanks to a few modules, with audio it is no different. Although subject to discussion, Node also scales well with streams, or at least well enough to get you started, which you will fast enough considering Node is just JavaScript and you do know JavaScript, right?

Before we start I’d like to give a big shout out to @TooTallNate, which, as well as owning most of the modules I’ll talk about here (and being a Node legend), helped me a lot (and continues to help) when I got/get stuck with all of this audio mess. Thanks man, without you I could not have got this far.

Okay, so let’s start off by tackling the biggest problem regarding streaming audio on the web which is that you can’t just stream any type of binary data and expect the HTML5 audio tag to figure everything out by itself (at least not yet), so we need to supply it with compressed audio data, and to do it we need a way to encode the binary data into a raw PCM stream (which is the most uncompressed form of audio) and then compress it in whatever format we want, such as MP3 or OGG (for this post I’ll stick to MP3 only).

Decoding/Encoding MP3 with node-lame

Lame is an open-source MP3 encoder/codec, it is the de-facto tool when doing any sort of manipulation with MP3 files. Until some time ago, the only way to decode/encode MP3 was to manually spawn a child process for Lame and feed data into its standard input (stdin) and read from its standard output (stdout), which sorta works…sorta. But we don’t have to worry about that anymore, because we now have node-lame, which are native bindings to the Lame codec that simply gets the job done: you feed it raw PCM data (that could come from say a radio server, or from a file you decoded using another codec) and it gives you back a stream with valid MP3 data which you can do whatever you want, be it stream directly to clients, write to a file, etc. And if you’re worried about OGG, Mr. TooTallNate took care of it as well with node-ogg.

Even though the lib repos contain a lot of code examples, I’ll paste bits of how I use node-lame in my radio website (which I’ll talk about later):

  var icecast = require("icecast"), // I'll talk about this module later
      lame = require("lame");

  var encoder = lame.Encoder({channels: 2, bitDepth: 16, sampleRate: 44100});
  encoder.on("data", function(data) {
    sendData(data);
  });
  var decoder = lame.Decoder();
  decoder.on('format', function(format) {
    decoder.pipe(encoder);
  });

  var url = 'http://stream.pedromtavares.com:10000'
  icecast.get(url, function(res) {
    res.on('data', function(data) {
      decoder.write(data);
    });
   }

  var clients = []; // consider that clients are pushed to this array when they connect

  function sendData(data){
    clients.forEach(function(client) {
    	client.write(data);
  	});
  }

Considering you have an array of client response objects to write to (which you can manage by using something like express, those lines of code are all you need to successfully stream audio data from a radio server directly to an HTML5 audio tag, and node-lame makes it that much simple.

Keeping a Global Stream and Multiple Clients

Chances are that when you’re dealing with streaming audio on the web, it’s gonna involve serving multiple clients. So far I have dealt with two options regarding this:

  1. You create a personalized child stream for each user connection all coming from a parent stream, and close that stream once the user is gone.
  2. You create a global stream and feed that directly to all clients by adding and removing them from a collection where the data gets streamed to.

As you might have guessed based on the previous code example, I prefer number 2. The reason is that creating a personalized stream with encoder/decoder instances for each user, although guaranteeing a perfect audio experience (since it’s being served to that user only), is way too expensive when it comes to both RAM and CPU. When I chose option 1, my VPS box would easily reach cap all processors with 100% when my stream got about 50 listeners, so imagine what would happen if there were more.

With a global stream you only have to worry about one instance of the decoder and encoder, and leave it to Node to serve 128kb per second to all of your listeners without losing audio integrity (so far I have managed 70 with no issues). There are still problems with this approach though, such as having to keep the decoder/encoder write streams open all the time, allocating a lot of memory due to instances of Buffer objects — and since these are allocated outside of the V8 memory heap, it’s up to the operating system to clear them.

The Challenge of Real Time Streaming

Keeping things in real time for the users is hard when you’re not in control of what’s being streamed, like the case where you want to stream audio files. Depending on your machine, Node will finish reading an audio file in about 5 seconds or so, and if you’re piping that directly to the client response objects, a couple of problems occur:

  • If you’re streaming a sequence of files, such as a playlist, Node will read the entire playlist much faster than your users will hear it, meaning that they’ll be fed ‘old’ buffered data which can be garbage collected at any time. If you’re also informing what song is currently playing on the playlist, what users hear and what the playlist says is playing will get desynced very fast.
  • Two users will be hearing completely different parts of the song if one happens to connect a few seconds after the other, since the reading speed is much faster than ‘hearing speed’.

To solve this problem, we need to constantly pause and resume the file reading speed so it never gets too far ahead nor too far behind (causing pauses in the audio). In terms of code, this is as easy as creating a custom buffer which will only accept a certain size of data, then pause the stream for a determined period of time (like a second). Not surprisingly, TooTallNate also built a small module to encapsulate this logic called node-throttle. But we still need to know how much data we need to stream per second, also known as an audio file’s bitrate. Most files have a bitrate of 128kpbs, but bit rates vary way too much and we can’t always count on them being the same unless we have 100% certainty about the files we’re gonna serve, and remember, if we throttle the wrong number of bytes per second, the live stream will get desynced.

Bundled with FFmpeg (an open-source multimedia package containing various libraries and programs) comes a tool called ffprobe, which does exactly what we need: it analyzes media files and returns every sort of information about it, including the bit rate. The gotcha of using ffprobe is that you need an entire and valid mp3 file for it to work, meaning that if you don’t have the entire file on your server for it to analyze, you’re out of luck. Since ffmpeg is a command line tool, we can simply spawn it as a child process and parse its stdout for the information we need, which is exactly what node-ffprobe does.

With node-throttle and node-ffprobe combined, you can simulate a real time stream pretty decently, of course that it will never be 100%, but, at least on my experience, it gets about 90% accuracy (stream desyncs in about 10-20 seconds), which is already good enough for me :)

  var throttle = require('throttle'),
      fs = require('fs'),
      probe = require('node-ffprobe');
  probe(track, function(err, probeData) {
    var bit_rate = probeData.format.bit_rate;
    var currentStream = fs.createReadStream('track.mp3');
    var unthrottle = throttle(currentStream, (bit_rate/10) * 1.4); // this multiplier may vary depending on your machine
  	currentStream.on('data', function(data){
      decoder.write(data); // consider the decoder instance from the previous example
    });
  });

Streaming From a Radio Server (SHOUTCast/Icecast)

Something that most people aren’t aware of is the usefulness of a radio server for streaming audio. For starters, radio servers like SHOUTCast and Icecast have been around for a long time, so there are a lot of tools that connect to and support them, such as DJing software like VirtualDJ and Traktor as well as broadcasting software such as OS X’s Nicecast, which can broadcast the output of any application running on your computer (like Skype or iTunes), as well as any audio device (like your built-in microphone) or even system audio. This means that with the help of a radio server, you can easily stream live (in real time) from your microphone, a Skype conference (podcasts, anyone?), your iTunes songs and even your DJ mixes, if you happen to be one (like me o/).

For my particular case, I chose SHOUTcast because they provide a server package for you to install on your machine. The installation was extremely easy (they even have a wiki for it) and I got it up and running in no time on my VPS at http://stream.pedromtavares.com.

As you might have noticed from the previous examples, the code for proxying a radio server is really simple with the help of node-icecast, which streams everything coming from the radio server in the form of raw PCM data (which you can encode with node-lame) and, as its README states, you even get to treat ‘metadata’ events, which usually contain the current track information, along with other properties of your broadcast. With the help of web-sockets, you can set up a complete radio website with current track updates in no time:

  var icecast = require("icecast");

  var url = 'http://stream.pedromtavares.com:10000'
  icecast.get(url, function(res) {
    res.on('data', function(data) {
      decoder.write(data); // consider the decoder instance from the first example
    });
    res.on('metadata', function(metadata) {
      var track = icecast.parse(metadata).StreamTitle;
      publishToClients(track); // use your pub/sub lib of choice (I use Juggernaut) to publish tracks to all connected clients
    });
   }

With a radio stream you don’t need to worry about throttling requests or analyzing files since you will be manually feeding the radio server with data in real time anyway, so things just work.

How I Applied All of This

I started messing with this audio stuff with Node about a year and a half ago, and since then I have been maintaining my own radio website. It has two modes:

  1. When there is a DJ connected to the radio server, it proxies the radio server.
  2. When there is no DJ connected, it streams tracks (in real time) from ex.fm, and since they provide such a killer API, I also give the ability of users to build their own playlists to be played.

I am keeping all the code as open-source on a GitHub project if you want to check it out, and although there is much more code, the core of it has already been explained here.

That’s it! I’ll update this post with any other awesome tools related to audio streaming using Node that I manage to find and apply, but I hope what I gathered so far has been of some help. Cheers!

About these ads