If anyone in the industry accidentally slept through the past 13 years, it might have been easy to miss the major blip we've had in terms of delivering at-scale, low-latency streaming video. After all, in 2009, Roger Pantos introduced the first draft specification for HTTP Live Streaming (HLS). With the advent of HLS came the easily foreseen jump to massive latencies for live streaming, north of 40 seconds of delay per "live" stream due to the need for at least three HLS segments (typically each between 4 to 10 seconds in length) to be recorded and then pushed out to end users.
Gone were the days of the 3–4 second latencies that had been around since Windows Media and early QuickTime videoconferencing, replaced by the "modern" approach using an HTTP server instead of a dedicated streaming media server. And while the MPEG-backed Dynamic Adaptive Streaming over HTTP (DASH) in 2011 did much to create an industry-standard HTTP segmented-streaming solution that replaced a whole slew of streaming options, it did nothing to lower the latencies back to reasonable levels.
"If DASH could help reduce [all of the choices] down to HLS and DASH," wrote Will Law, chief architect for Akamai's Edge Technology Group, back in 2011 just after the ratification of DASH, "then the convergence that that offers, while not 'total,' is none-the-less desirable and beneficial at all levels of the media chain—encoders, distributors, and playback clients."
When Latency Impedes Communication
So much of that tolerance for delayed live streaming vanished in early 2020, as the onset of the pandemic drove almost all of us to use some form of real-time delivery for virtual family gatherings or business meetings, and we balanced ways to get through conversations without having to wait the interminable 30–45 seconds between sentences. And to get back to lower latencies meant jettisoning the HTTP streaming solutions of the 2009–2011 timeframe in favor of something more akin to the streaming latencies we had in the late 1990s and early 2000s.
The return to real-time video in 2020, while necessary to be able to communicate across geographic distances during the pandemic, didn't really address the other major issues surrounding streaming that HTTP solutions were able to solve, namely scale and video quality.
The combination of these issues is not just a concern faced by consumers who are trying to create virtual extended-family gatherings or by houses of worship that are trying to hold interactive services. The need for lower latencies has hit every segment of the streaming industry, including enterprise, media and entertainment, sports, and education.
Low Latency Thrived in 2021
During a Streaming Media Connect 2021 event, the team at the Help Me Stream Research Foundation revealed data from the "State of Streaming Spring 2021" survey, including key challenges that face personnel in the video service delivery segment. And the data showed just how badly we need to solve the balance of video quality, scale, and latency. While video quality was still the number-one concern from one-third of respondents, latency was a close second (26%), with buffering and scalability tying for third place (16% each). We posit that those last three—latency, buffering, and scalability, together making up the top choice of 58% of the survey's respondents—are actually a single issue when it comes to live streaming.
Rob Gambino, director of solutions at Harmonic, shared the stage during the Streaming Media Connect event and pointed out that even video quality has a huge effect on video latencies. "Video quality, when it comes to streaming, is a function of a couple of different things," he said. "It's how much time do you want to spend actually looking at the video to encode it, which can increase your latency from source to screen."
Buffering and scalability can be reduced by spending extra time compressing video, essentially eking out better quality at a lower bitrate. This solves the quality portion of the equation but still increases the latency. So the question becomes one of how much visual artifacting is acceptable to reduce overall bandwidth while still delivering video at scale at anywhere close to real time.
Early in 2022, it's easy to assume that the balance among scale, latency, and video quality might be insurmountable and that HTTP streaming will continue to dominate "live" event streaming. But that's not necessarily the case, thanks to a number of initiatives that occurred in the past year. Let's look at a few of those.
Low Latency Comes to HTTP Streaming
Pantos is still around and is still working on that draft specification. Version 23 of the "Pantos spec" became RFC 8216 in 2017, when the request for comments stage that is often the last step to finalize a specification into a more formal Internet Engineering Task Force (IETF) standard was met with no objections for the use of HLS for on-demand content delivery at scale.
Based on presentations that Pantos has given over the past 3 years, including at two Streaming Media West events, it's clear that he's heard plenty of requests to modify the specification again, this time to lower latencies across the board. Some of these requests likely came from MLB Advanced Media, which comprises the only non-Apple authors that signed on to the initial RFC 8216 specification and delivers a number of live sporting events, such as professional baseball and pro hockey, as a streaming service. The next specification, sometimes billed as HLS 2.0 or HTTP Live Streaming 2nd Edition, is now in its 10th iteration, meaning that the move to low-latency HLS (LL-HLS) is almost complete.
Part of the impetus to move HLS to low latency came from the fact that the DASH Industry Forum (DASH-IF) published a series of low-latency interoperability specifications in 2017, now referred to as Low-Latency DASH (LL-DASH). As the team at THEO, a provider of video playback technologies, reminds us all, however, LL-DASH achieves lower latencies only by trimming traditional DASH segments into "smaller, non-overlapping chunks (often containing a handful of frames), which are all independent of one another."
The upside of both LL-HLS and LL-DASH is that the initial long wait—at the outset of a live HTTP streaming event—for multiple segments to be encoded no longer matters, as the origin server can send these smaller trimmed segments out to a player before the entire segment is encoded. In other words, we're still being forced to send segments, just smaller ones that require a higher frequency of delivery confirmation.
In addition, a possible extension of the Common Media Application Format (CMAF) to include byte-range addressing means that the work started in 2021 to look at delivering both LL-HLS and LL-DASH in a common media format will continue into 2022.
Real Time Beyond HLS and DASH
Returning us to actual real-time delivery, though, requires going way back to something called Real-Time Streaming Protocol (RTSP). This standards-based protocol has been around almost as long as TCP/IP and continues to work very well. The downside of RTSP—and even the newer Real-Time Messaging Protocol (RTMP), which is still widely in use years after its projected demise—is that real-time compression solutions deliver highly time-sensitive packets of video and are, therefore, very susceptible to dropping that same video if it doesn't arrive in a timely manner.
WebRTC
That's where the WebRTC standard comes into play in modern real-time video delivery solutions. The Help Me Stream Research Foundation, as a 501(c)(3) not-for-profit research firm, offers services to test claims about a number of streaming solutions. One test we recently conducted focused on Wowza's real-time video streaming offering that scales out on its Wowza Streaming Cloud service. Our basic finding in October 2021 was that, with a few edge-case exceptions, the solution works as advertised, as we were able to successfully test a single published stream at average latencies of about 400 milliseconds across a wide variety of devices.
Beyond our initial browser-based tests, where we introduced intermittent networks and simultaneously tested across a variety of networks, we were also able to broadcast via modified vision-mixing software using both WebRTC and RTMP, with latencies averaging around 600 milliseconds. In addition, latencies were low enough, even across both LAN and cellular networks, that we found the audio "feedback loop" sweet spot that only comes with analog sound systems or low-latency digital audio broadcasts.
One of the newer companies in the real-time streaming space is also one of the most ambitious. Millicast, an outgrowth of Influxis, leverages WebRTC for interactive streaming experiences, and just before this issue went to press, it was acquired by Dolby Technologies.
HESP
THEO has its own HTTP-based solution, High Efficiency Streaming Protocol (HESP), which it says delivers sub-second latencies. Following in the footsteps of the Pantos spec from all those years ago, THEO's Pieter-Jan Speelmans has submitted a spec to IETF for HESP. The aptly named draft-theo-hesp-01, dated Nov. 20, 2021, details an approach that Speelmans and THEO believe is superior to the current LL-HLS and LL-DASH approaches.
THEO created its its own ultra-low-latency solution called the High Efficiency Streaming Protocol (HESP), which it claims is superior to both LL-DASH and LL-HLS.
Unlike some solutions that shorten GOP segments to achieve lower latency, HESP avoids the pitfalls of too-short segments by utilizing two streams. The first is the primary stream that is being delivered, and the other is a supplemental stream that is known as the continuation stream. Like its name implies, the second stream provides continuity in case the primary stream is lost, similar to current low-latency approaches.
In addition, by optimizing the container format, the HESP Alliance claims approximately 10% bandwidth savings against current streaming scenarios, while allowing lower channel-changing speeds of around 100 milliseconds.
The CDN's Role in Low-Latency Streaming
Given the fact that most low-latency streaming—and especially bidirectional streams, such as those used in voice or video chat—don't scale well on their own, one of the components required to deliver real-time streaming at scale is some form of CDN. We're all familiar with the two basic types of CDN approaches. One focuses on a very large number of points of presence (PoP) to either stage on-demand content or push live-origin content out as far as possible—while still remaining on an optimized network until it's delivered from a PoP that's physically close to the end user. The other approach focuses on beefing up the core network, with origin servers residing mainly at this core, so that as many streaming requests as possible take the shortest path between the origin server and the end user.
More recent hybrid approaches balance between the two, using what the industry calls "edge delivery" (and sometimes "edge computing" if there's a computational component, such as transcoding or authentication, being handled at an edge device in the PoP) as a way to speed up delivery of a base level of streams. This hybrid approach is used by a number of companies, including nanocosmos and Phenix Real Time Solutions.
Oliver Lietz founded nanocosmos about 25 years ago, initially as a coding company offering software development kits and coding to the broadcast industry. "We've now evolved to a service company providing natural legacy CDN for live streaming," he says, "and we have many customers worldwide who are using that for specific verticals like betting, bidding, [and] i-gaming."
Lietz notes that low-latency streaming allows levels of interactivity, including real-time polling, bidding, and audio or video chat. "Interactivity is key with that solution," he says, "so the vertical solutions go beyond the standard broadcast operations."
Phenix sees interactivity as also including synchronized watch parties, what it calls SyncWatch, for which it has two U.S. patents (US 10,601,914 and US 10,855,763). Phenix claims to provide "high quality synchronous video with less than 500 milliseconds of end-to-end latency at broadcast scale, anywhere in the world," and the company goes to great lengths to remind potential clients that its "real-time streaming latency is always less than a ½ second and with Phenix all viewers see the content synchronously."
Unlike Wowza's claims for its Real-Time Streaming at Scale, which, as previously noted, the Help Me Stream Research Foundation independently confirmed, we have yet to validate Phenix's claims of sub-500-millisecond delivery and synchronized viewing. However, Phenix's 2020 partnership with Verizon Media seems to hold promise, at least for delivery times of less than a second, as the two companies intend to use Verizon Media's global network "to enable sub-second latency for live sports at scale" and also provide a platform for real-time auction events.
How Does Low Latency Impact Video Quality?
As we wrap up this year's "The State of Real-Time Streaming Delivery," let's look at a quote about video quality from Streaming Media Connect 2021. Harmonic's Gambino pointed out the benefits of balancing video quality (normally increased computational time) and latency (normally lower processing time) by using some of the emerging research into human visual system (HVS) perceptual quality. "The really great advent here is Content Aware Encoding," said Gambino, "using things like machine learning or artificial intelligence to be aware of what it is that's happening in a scene where the important bits are and being able to determine, on-the-fly, where to spend the processing budget."
We're optimistic that the combination of all of these enhancements—such as better video quality using Context Aware Encoding, hybrid CDN approaches with edge delivery, and lower overall segment sizes for HTTP-based streaming—will result in a step change for the industry's ability to deliver real-time streaming at scale.
Comentários