So, You Want To Stream Lots Of USB Cameras At Once

June 21, 2023

Let's say you're building a live camera feed for something. Maybe it's a 3D printer, or an RC car, or a Mars Rover analogue (website hasn't been updated in years). Here's a pretty typical requirements list:

  • Video feed sent over an IP network
  • <500ms latency
  • Reasonably efficient encoding (e.g. H.264)
  • 640x480 resolution, 30 fps
  • Recover from network failures

With one stream, you're in pretty good shape! Just plug in any off-the-shelf USB camera, run a quick GStreamer command on both ends, and boom, you're good to go. But as you add more cameras into the mix, you'll run into some hiccups at seemingly every point in the pipeline.

This blog post is the culmination of 2 years of work into camera streaming pipelines on NU ROVER, Northeastern University's team for the University Rover Challenge. Over that time, we've evolved from an unreliable six-camera setup requiring four separate devices to an efficient, easy-to-use, reliable system encompassing 14 cameras on a single computer. This post will summarize the experimentation we did, the decisions we made, and the lessons we learned, starting with the individual camera modules themselves and working up through the pipeline all the way to how the feed is displayed.

Link to this section Cameras

A camera module connects some image sensor to some interface. The choice of image sensor doesn't impact much; NU ROVER chooses camera modules based purely on FOV and resolution. The interface, however, greatly affects what hardware you can use for streaming.

Link to this section CSI

Camera Serial Interface (CSI), sometimes called "MIPI," is the most barebones protocol for connecting a camera: the protocol is specifically designed for cameras and connected directly to the CPU on boards like the Raspberry Pi. A good example of this type is the Raspberry Pi Camera, but other manufacturers supply various cameras with different FOVs and sensors.

Pros:

  • Direct connection with processor, very low overhead

Cons:

  • Limited to number of CSI lanes provided by platform (e.g. 1 for Raspberry Pi, 2 for Jetson Nano, 6 for Jetson Orin)
  • Spec limits cable length to 30cm (although it usually works up to 2m)
  • Requires software tailored to the platform (e.g. raspivid), which may complicate integration with streaming software
  • More fragile cables than USB

In the past, we used multiple devices (2 Raspberry Pis, a Jetson Nano, and a Jetson TX2) to connect many CSI cameras into our system. This dramatically increased complexity and required installation of a network switch, taking up precious space inside the rover. Our new system does not make use of CSI cameras.

Link to this section IP

IP cameras are often used as part of home security systems, making them readily available. They directly connect using Ethernet, removing the need for a separate device for encoding. Our team purchased a few, and I spent some time testing them, but I ruled them out pretty quickly. They're physically much harder to mount and integrate into a system, and since they're typically designed for recording and not live viewing, they often have very high latency.

Pros:

  • Minimal configuration needed
  • Designed to scale up to >10 camera systems
  • Durable and weatherproof

Cons:

  • Cameras are quite bulky and thus harder to find mounting points for
  • Cables are much larger and difficult to route
  • Requires using a (potentially very large) network switch to connect multiple
  • Provide limited or no encoder configuration (e.g. no way to adjust stream bitrate)
  • High latency (~1000-2000ms)

If you're working on a physically large, stationary, or spread-out system, IP cameras might be worth considering. Otherwise, the added bulk is a substantial drawback.

Link to this section Analog Video

This year, our team partnered up with NUAV, Northeastern's unmanned aerial vehicles organization, to utilize a drone in part of the autonomous challenge. Their camera feed (used only during emergency teleoperation) was a 5.8 GHz off-the-shelf system common in FPV (first-person view) drone operation. These systems use an analog video signal to send video to the operator's goggles. While analog video is tempting as a self-contained system, it doesn't scale effectively to a multi-camera setup.

Pros:

  • Standalone system
  • Low latency
  • Graceful signal degradation if interference exists

Cons:

  • Requires a dedicated antenna, receiver, and display to view each feed
  • Each video feed requires its own hardware
  • Almost always fixed to 5.8 GHz (shorter range compared to 2.4 GHz WiFi or 900 MHz IP radios)
  • Limited number of channels/cameras (8)

We considered installing an analog video camera to use if our primary network link dropped out, but given the characteristics of the 5.8 GHz band, there are few situations in which this would be helpful. (If we can get a 5.8 GHz signal to the rover, we can almost definitely get a 2.4 GHz signal to it as well.)

Link to this section USB

This brings us to USB cameras, arguably the most common type in consumer use today. These seem like an obvious choice, but there are a few complications when you scale beyond one or two cameras (which we'll get into).

Pros:

  • Cable and connector are durable but not too bulky
  • Standard software support across platforms
  • Small modules available with standardized mounting holes from Arducam, ELP, etc.
  • Unlimited number of cameras on each computer (we'll get to this...)

Cons:

  • Some overhead compared with CSI

It's no surprise USB was our first choice when we began developing the system. Even if you rely on CSI cameras for some systems, you're unlikely to be able to build a low-cost and practical system that incorporates 10 or more of them, meaning you'd have to mix in some USB cameras as well.

Link to this section USB Bandwidth

We're all used to just plugging in USB devices arbitrarily and having everything work. Cameras, however, tend to push the USB 2.0 connection to its limit, and using several of them on one computer can be surprisingly difficult.

Link to this section A bit of bandwidth math

Consider the bandwidth requirement of a typical video stream: 640x480 pixels, 30 fps.

Quick note: If you've worked with images, you might assume we need 8 bits per channel to represent each pixel in the frame. This isn't quite true -- we can save a little extra space with some smart encoding. Most raw video from USB cameras is in "YUYV" format, and contains three values: Y (luminance, or "brightness"), U (color), and V (also color). The eye is more sensitive to small variations in luminance than to similar variations in color, so we can save space by using 8 bits for luminance and 4 bits for each of the color signals, giving a total of 16 bits per pixel per frame.

The requirement for a single frame is thus:

640×480×16=4915200 bits640 \times 480 \times 16 = 4915200 \text{ bits}

And the requirement for a single second of video is:

4915200×30=147456000 bit/s147 Mbit/s18.4 MByte/s4915200 \times 30 = 147456000 \text{ bit/s} \approx 147 \text{ Mbit/s} \approx 18.4 \text{ MByte/s}

The theoretical maximum speed of USB 2.0 is 480 Mbit/s, and subtracting signal overhead, we're only left with 53.2 MByte/s. For a single camera, this is fine! We've got plenty of bandwidth left over.

A diagram showing a camera connected to a USB controller, showing only using a small portion of the total link capacity

What happens when we add multiple cameras? It depends. Let's first consider the case that the cameras are on different USB controllers from each other -- we'll go over what this means in a second. This is also fine.

Three parallel copies of the prior diagram

Now what if we try to put multiple cameras on a single USB hub? Here's where we start to run into problems.

A diagram showing three cameras connected to one USB controller through a hub, where the link between the hub and the controller is oversaturated

In a setup like this, two of the cameras will work fine, but the third will not work. Specifically, you'll be able to open all three device files simultaneously, but you'll only receive video frames from two of them. The third camera will not raise an error, which can make troubleshooting annoying.

If this seems weird for a reasonable USB setup to fail like this, that's because it is. Most devices don't come anywhere near to reaching the USB bandwidth limit. And if they do, they'll typically handle it gracefully: your SSD might copy files over a little bit slower, or your printer might take a bit longer to load the document you sent it.

With a USB camera, though, there's no good way to handle a lack of bandwidth: it doesn't have the memory to delay frames and send them later, it can't send a smaller-sized image since the software expects a specific resolution, and there's no way for it to coordinate with other cameras to share the bandwidth by slowing down framerate. Thus, you get nothing.

Let's take a step back and look at what this means in practice. A USB controller is the "root" of the tree of USB devices. It's a chip that connects the USB bus to the CPU, either directly or via a protocol like PCIe. In practice, most computers only have one or two controllers, then use internal USB hubs to provide more physical USB ports.

For instance, a Raspberry Pi 3 has four physical USB ports but only a single internal USB controller, meaning all of the ports share bandwidth. As a result, you're unlikely to get more than two USB cameras working in YUYV mode at this resolution simultaneously on the Pi.

Link to this section USB 3.0

However, most modern devices have USB 3.0 ports. Can those help us out? USB 3.0 has a theoretical maximum rate of 500 Mbyte/s, so we're good, right?

An incorrect diagram showing the oversaturated link "upgraded" to USB 3.0 by replacing the hub and controller with USB 3.0 variants

Unfortunately, this is not how this works.

USB 2.0 uses a single differential pair of wires for signaling. USB 3.0 adds extra wires to the cable and connector, for a total of 3 differential pairs. The two additional pairs are used for "SuperSpeed" signaling (the fast data rates), but the original pair is still just a regular USB 2.0 connection. USB 3.0 traffic is handled completely separately from USB 2.0 traffic.

In essence, a "USB 3.0 cable" is a USB 3.0 cable and a USB 2.0 cable in the same housing. Consequently, a "USB 3.0 hub" is a USB 3.0 hub and a USB 2.0 hub in the same box, wired up to different pins in the same connectors. And you guessed it, a "USB 3.0 controller" is just a USB 3.0 controller and a USB 2.0 controller in the same chip.

Here's the output of lsusb -t on a Raspberry Pi 4:

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M

You can see that despite the Pi 4 having a single USB controller chip, it appears in Linux as two entirely separate USB buses -- one for USB 2.0 (the "480M" is the speed of the port in bits) and the other for USB 3.0 (listed as "5000M").

Here's a diagram that more accurately captures what's going on:

the diagram showing the oversaturated link, but with a separate USB 3.0 link off to the side with zero utilization

Since the cameras are all USB 2.0, all of the data they send stays on the USB 2.0 pair all the way to the controller. (USB 3.0 cameras exist, but are expensive and rare outside of industrial settings.)

Link to this section MJPG encoding

Your webcam probably runs at much higher than 640x480p. The popular Logitech C920, for instance, supports 1080p video at 30 fps and runs over USB 2.0, which seems like it'd use up more bandwidth than we have, right? Something doesn't add up.

1920 pixels×1080 pixels×16 bits×30 fps995.3 Mbit/s124.4 Mbyte/s>53.2 Mbyte/s1920 \text{ pixels} \times 1080 \text{ pixels} \times 16 \text{ bits} \times 30 \text{ fps} \approx 995.3 \text{ Mbit/s} \approx 124.4 \text{ Mbyte/s} \gt 53.2 \text{ Mbyte/s}

These cameras support higher resolutions and framerates by compressing frames before sending them to the computer, using a technique known as MJPG (motion JPEG). Frames are captured by the camera, compressed using the JPEG algorithm, and then sent along the wire to the computer, at which point they can be decompressed.

Different cameras will use different compression ratios depending on the resolution and framerate, meaning we don't have hard numbers to go off of. The improvement from motion JPEG in a multi-camera setup is unfortunately not that great: compression ratios are set assuming a given camera is the only device on the bus, so the bandwidth reduction at smaller resolutions is much less significant. In practice, I've found that up to 3 cameras can coexist on a single bus at this resolution and framerate, meaning it's a pretty minor improvement.

Link to this section Your Toolbox

Here are the tools you have to try to solve the USB bandwidth problem:

  • Add more devices, and split up the cameras between them. NU ROVER did this initially, but it complicates your entire system quite a bit -- you now have to deal with a network switch, plus however many extra devices you brought in.
  • Find a way to add more USB controllers to a particular device. When we upgraded our systems to the Jetson Orin, we added a USB expansion card into the PCIe slot. In particular, we chose a "Quad Chip" model which contained four separate USB controllers. These are popular among VR enthusiasts -- it turns out connecting many stationary VR tracking towers leads to similar bandwidth issues as USB cameras.
  • Switch to the MJPG mode instead of the YUYV one. This adds a bit of latency since the device now has to decode the MJPG image, but the bandwidth improvements are often significant enough to make it worthwhile. Plus, the MJPG modes are typically more similar across camera models -- some cameras can be picky about what framerates they accept for YUYV streaming, for instance.
  • Plan which camera connects to which controller carefully. If you plan on dynamically starting and stopping streams, and you know you'll never want two specific cameras running simultaneously, you can put them on the same controller and they'll never have to fight over bandwidth. Conversely, if there's a set of cameras you plan on using all together, make sure not too many of them are on the same controller.
  • Apply some driver tweaks to squeeze out a little extra bandwidth. The Linux UVC driver provides a mode (UVC_QUIRKS_FIX_BANDWIDTH) that calculates the bandwidth requirements for each device itself, which can sometimes be more accurate than what the camera reports.

Link to this section Encoding

You've gotten an image from the camera. Now what?

You'll probably want to use GStreamer to link together most of your pipeline. GStreamer is a framework that allows you to chain together different video elements, such as sources, encoders, payloaders, and decoders, into a pipeline that is executed all at once. The concepts in this post aren't GStreamer-exclusive, and I won't be providing many "ready-to-go" pipelines since they vary based on hardware encoding support and system, but I will at times assume that you're linking together your source, encoder, and payloader using GStreamer.

The most common codec for video compression is H.264, which is generally good enough for most purposes. Other codecs like H.265, AV1, and VP9 can offer lower bandwidth, but the benefits are minor at smaller video resolutions. The most important factor is hardware encoding and decoding support; if the encoding is handled by the CPU, it may not keep up with many video streams simultaneously, especially on lower-end devices.

The most important factors to tune are bandwidth and keyframe interval. Bandwidth, in this case, is the amount of video data sent over the network per second. If your bandwidth is set too high, you'll start to see artifacts as packets are dropped whenever your network link slows down (for instance, if you have a wireless link and your robot drives behind a wall). If it's set too low, your video quality will suffer and your feed will be full of compression artifacts. If you're operating over a wired Ethernet connection, you can probably divide the bandwidth of the Ethernet link by however many streams you plan to have simultaneously, but take into account any other network traffic on the machine, too. NU ROVER uses a bandwidth of 1 Mbps for each 480p stream.

Keyframe interval, also known as "I-frame interval," refers to the time in between I-frames in the video signal. H.264 encoded video consists of I-frames (frames which contain a complete image), P-frames (which reference previous frames), and B-frames (which can reference future frames and are only relevant in the context of a pre-recorded video). The purpose of P-frames is to improve the video compression by not sending the same data in every frame -- if your camera feed is mostly a still image, it is wasteful to send that entire image on every frame when you can instead only send the specific areas of the image which changed.

If you have too many I-frames, you will devote too much of your bandwidth to them and have worse picture quality as a result. However, if you have too few, you may have periods where you can't see the entire image if an I-frame gets dropped due to network conditions. This usually manifests itself as a gray screen in which moving areas gradually return to color while motionless areas remain gray until the next I-frame. NU ROVER uses 1 keyframe every 30 seconds.

Link to this section Protocols

Now that you have your encoded video feed, you'll need to find a way to send it over the network. There are a few existing protocols that can help you here. Of course, you'll need to use the same protocol on both sides of the connection.

Link to this section Motion JPEG

Motion JPEG, also known as MJPG, also works for sending video over a network, frame by frame, usually over HTTP. Each frame is compressed individually, making it one of the simplest protocols. However, because there's no inter-frame compression, this protocol uses much more bandwidth to stream video at a given quality than others.

Link to this section HTTP Live Streaming

HTTP Live Streaming (HLS) is a protocol used to stream H.264 videos over the internet. It works by dividing the video into chunks, then serving each chunk in turn using HTTP. This chunking process introduces a high amount of latency, making HLS unsuitable for a low-latency feed.

Link to this section Real-time Transport Protocol

Real-time Transport Protocol (RTP) is one of the simplest ways to send video over a network. It sends data over UDP, meaning network errors and dropped packets are not corrected like they would be with a TCP-based protocol. This is ideal for low-latency applications -- if one frame isn't received correctly, you don't want to spend time retransmitting it when you could instead be transmitting the next frame. However, RTP could be a poor choice for streams where quality is more important than latency.

RTP works by payloading data into packets and sending them to a predefined UDP host and port. This payloading process will look different depending on the codec you choose -- for now, let's assume we're using H.264. GStreamer provides the rtph264pay plugin to payload H.264 data, as well as the rtph264depay plugin to depayload it on the other end.

For example, let's assume the video server is on 192.168.1.10, the client is on 192.168.1.11, and the video stream should be on port 9090. We can set up an RTP stream by running this GStreamer pipeline on the server:

gst-launch-1.0 videotestsrc ! video/x-raw,rate=30,width=640,height=480 ! x264enc tune=zerolatency ! rtph264pay ! udpsink host=192.168.1.11 port=9090
  • videotestsrc is just a generic test video source (color bars)
  • The video/x-raw part forces the output of videotestsrc to have the specified resolution and framerate
  • x264enc is the encoder (which runs on the CPU -- something to avoid)
  • rtph264pay payloads the stream for RTP
  • udpsink sends those payloaded packets over the network to the specified host and port (our client)

Then, on our client, we can run this to decode the stream:

gst-launch-1.0 udpsrc port=9090 ! application/x-rtp,payload=96 ! rtph264depay ! avdec_h264 ! autovideosink
  • udpsrc reads incoming packets on a particular UDP port
  • The application/x-rtp part says that those packets should be interpreted as an RTP stream
  • rtph264depay turns the packets into a raw H.264 stream
  • avdec_h264 decodes the H.264 stream (on the CPU)
  • autovideosink displays the stream

To scale this to multiple camera feeds, we just need to choose different UDP ports for each feed. There are plenty of available port numbers for this.

Link to this section Real Time Streaming Protocol

The solution above using RTP leaves a few things to be desired:

  • Having to manually assign port numbers is annoying
  • Starting or stopping streams must be done by manually launching or killing processes on the server
  • RTP (in mode 96) doesn't work with VLC or other standard media players
  • The server must know the IP address of the client, not the other way around

Real Time Streaming Protocol, or RTSP, is a protocol built on top of RTP which can solve these shortcomings. RTSP at first behaves much like an HTTP server, and RTSP URLs look similar to HTTP URLs.

Let's look at what happens when a client tries to load rtsp://192.168.1.10/video:

  • The client opens a connection to 192.168.1.10 on port 554 (the default RTSP port)
  • The client sends an OPTIONS request to the server for /video
    • The server responds with a list of request types accepted (usually OPTIONS, DESCRIBE, PLAY, TEARDOWN...)
  • The client sends a DESCRIBE request for /video
    • The server responds with a list of available video formats
  • The client sends a SETUP request for /video, and provides a pair of open ports: an even-numbered port for receiving video data, and an odd-numbered port for receiving UDP control signals using RTCP (RTP Control Protocol).
    • The server responds with its own pair of ports
  • The client sends a PLAY request for /video
    • The server begins sending video data to the client on the specified ports
  • The client sends a TEARDOWN request for /video
    • The server stops sending video data to the client

RTSP provides a few other methods, including PAUSE, which can be useful for interruptable streams.

To play around with RTSP, you can use the gst-rtsp-launch project:

gst-rtsp-launch "( videotestsrc ! video/x-raw,rate=30,width=640,height=480 ! x264enc tune=zerolatency ! rtph264pay pt=96 name=pay0 )"

Then, connect with GStreamer's rtspsrc plugin:

gst-launch-1.0 rtspsrc location=rtsp://127.0.0.1:8554/video latency=0 ! rtph264depay ! avdec_h264 ! autovideosink

Or, connect with VLC by entering the RTSP URL into the "Open Network Stream" window.

If you're building out a system with multiple video streams, you'll want to use the GStreamer bindings to define your own RTSP server's request handling logic. Here's an example of how to use the Python bindings, but bindings for C++ and other languages also exist. NU ROVER has our own in-house implementation which we may decide to open-source once it stabilizes, assuming we have time to do so.

Link to this section Extra Signaling Logic

One limitation of RTSP is that by default there is no way for a client to list the streams available on the server. If you don't want to maintain a list of streams on both sides of the connection, you may want to add some extra logic to handle this.

One approach would be to try to send this data over the RTSP connection somehow, say, by returning it in response to a DESCRIBE request for a certain URL. Doing this would require the ability to embed this handling logic into whatever RTSP client and server you use, which could be difficult.

The approach NU ROVER took instead was to use another communication channel which we already had (ROS messages) to send this data. However, you could rely on an HTTP API, a database, or some other method. You might want such a protocol to handle:

  • Listing available stream paths on the server
  • Sending metadata for browsing (stream name, description, etc)
  • Sending technical metadata (resolution, framerate, camera orientation, bandwidth requirements, etc)
  • Handling "exclusivity" of streams (e.g. if two streams are from the same camera but at different resolutions, and only one can be used at a time)
  • Sending calibration data and exact positions for relevant cameras, to enable computer vision uses on the other end of the RTSP stream
  • Enabling or disabling baked-in overlays in the video signals (timestamp, camera name, etc)
  • Pausing the video stream to take high-resolution "snapshots" of the video feed

Link to this section Clients

What client are you going to use to decode and display the video signal? You'll likely want your video feeds to be integrated into whatever control UI you use for other operations, so this will depend on your situation.

Link to this section VLC

VLC Media Player is a common program for playing various media files, and it can play video from an RTSP source. Go to "File" → "Open Network Stream" and enter your RTSP URL.

VLC is typically meant for watching movies or other prerecorded content, meaning it buffers video to smooth out playback, dramatically increasing latency. While there are ways to reduce the latency (I couldn't find a good comprehensive source, so just search for "VLC RTSP low latency"), I haven't had much success. I'd suggest keeping VLC around to help with troubleshooting streams, but not to use it in your "production" system.

Link to this section GStreamer

GStreamer's rtspsrc can also load video data over RTSP, and its glimagesink plugin can display video data in a window, making it functional as a standalone RTSP viewer. (Remember to specify latency=0 for rtspsrc.) However, if you want your video feed to be displayed in the same window as the rest of your UI, you'll likely need something more complex.

Link to this section Qt, etc.

Bindings often exist to display media from a GStreamer pipeline in applications built with frameworks like Qt (QtGStreamer), React Native (react-native-gstreamer), and others. You can find a pipeline that works by launching with the gst-launch-1.0 command, then copy it into your application.

Link to this section Browsers

Web-based user interfaces are becoming much more popular, and NU ROVER has our own based on React. Of course, you can't run arbitrary C++ code inside of a browser sandbox, nor can you access UDP/TCP ports directly, so embedding a GStreamer widget directly in the browser isn't feasible.

Link to this section Browser Specifics

Sending low-latency video to a web browser is more difficult than it would seem. Browsers are built for media consumption, meaning they often buffer video to smooth out playback at the cost of higher latency. There are quite a few ways to include videos in a web browser, none optimal.

Link to this section HTTP Live Streaming

HLS seems like it's exactly right for this use case, but the required "chunking" of video data adds a high amount of latency, making it unusable for a low-latency video feed.

Link to this section Raw H.264

It is possible to directly send an H.264 stream to the browser. However, pre-existing solutions for streaming H.264 are difficult to come by. One example is the ROS web_video_server package, which implements its own HTTP server to handle this.

The benefit of this approach is that video can be embedded in a website with a <video> tag, and the video is compressed when entering the browser, reducing bandwidth use and allowing the browser to choose the most efficient decoder for the platform. The important drawback is that there is no way to control the amount of buffering that the browser does to the video feed, and browsers like to buffer a significant chunk of frames to ensure smooth playback. NU ROVER experimented with using JavaScript to automatically seek the video to the most recent frame in the buffer, but found this difficult and unreliable.

Link to this section Motion JPEG

One potential approach is to run a GStreamer pipeline on the client which decodes the H.264 data, then compresses each frame as a JPEG, then sends those into the browser.

Sending a series of images to the browser can be done using the multipart/x-mixed-replace mimetype: the server will send an HTTP response with Content-Type: multipart/x-mixed-replace;boundary=--<boundary_name>, then send a series of MJPG frames, separated by --<boundary_name> lines.

This is the approach NU ROVER currently uses. We've found that encoding and decoding JPEG images adds a small but acceptable amount of latency. We use an in-house project to serve these images, starting and stopping the underlying GStreamer pipelines and RTSP streams on demand.

A tip for using GStreamer with an application like this: to send buffers from a GStreamer pipeline into your application, you'll want to use appsink. Here's an example of code that uses this.

Link to this section Motion BMP?

In theory, you could send a multipart/x-mixed-replace payload with any image format, not just JPEG. Sending an uncompressed bitmap image (.bmp), for instance, would remove the need to encode and decode the JPEG but would increase the data sent to the browser. In our case, we're running the client on the same machine as the browser, meaning this added data transfer is largely irrelevant.

This approach is a bit harder to implement than Motion JPEG: effectively, you just need to add the proper headers to make each frame a BMP file, but we haven't found a good GStreamer plugin that does this. In theory, this could be done in the HTTP server when the GStreamer buffers are read.

Link to this section Stream Limit

If you try to run more than six concurrent Motion JPEG streams into a browser, you'll notice that beyond the first six, none of them load. To prevent creating too many HTTP connections when a site loads, browsers will restrict the number of concurrent HTTP connections to a single host to six.

In Firefox, it's possible to change this setting by going to about:config and changing the network.http.max-persistent-connections-per-server setting from 6 to a higher value, say, 20. However, there is no way to change this setting in Chrome.

One workaround, if you have a DNS server handy or can modify /etc/hosts, is to configure multiple subdomains for the same server, as described in this StackOverflow answer.

An example /etc/hosts file could look like this:

cameras1.local 192.168.1.10
cameras2.local 192.168.1.10
cameras3.local 192.168.1.10
cameras4.local 192.168.1.10

Alternatively, a DNS server could be used with a wildcard record to allow any subdomain to work.

Then, ensure that no more than six streams are loaded using the same subdomain, and all streams should work fine.

Link to this section WebRTC

One potential approach is using WebRTC, which is built for low-latency applications like video conferencing. We haven't investigated this approach yet for a few reasons. WebRTC is primarily aimed at browser-based applications exchanging data with each other, not receiving data from a non-browser application, so example code for non-browser environments streaming into browsers is hard to find. WebRTC is also built around the assumption that it needs to traverse NAT using a STUN server -- not only is that not necessary for us, it's impossible as our network runs completely offline. That said, the use of WebRTC for low-latency streaming holds potential. I'd love to see this investigated, and hope I'll get the time to play around with it myself at some point.

Link to this section Conclusion

We've walked through the steps in building a camera streaming pipeline, from the choice of individual camera modules to the framework used to display them to the user. This guide is the resource I wish our team had when developing our streaming pipeline, and covers many of the pitfalls and decisions that will need to be made. It's far from complete; there are many techniques and topics in streaming to explore that simply fall outside of what we've worked with, but I hope it's useful in whatever system you're building nonetheless.