メニュー

Expand
Rate this page:

Thanks for rating this page!

We are always striving to improve our documentation quality, and your feedback is valuable to us. How could this documentation serve you better?

Working with VP8 Simulcast

概要

This guide introduces the Simulcast technique and explains how you can use it to enhance the video quality of your Group Room applications.

目次

What’s Simulcast

An SFU (Selective Forwarding Unit) is a media infrastructure component used for scaling videoconferences. Twilio’s Group Rooms is based on an SFU that enables developers to add a large number of Participants to a video room by forwarding audio, video, and data information from each publisher to any of its subscribers. Given that this forwarding takes place at Twilio’s Cloud, there’s no additional client-side CPU or memory consumption as the number of Room Participants increases. However, the problem in these architectures is that SFUs just forward and can neither transcode nor modify the video. Hence, when there are subscribers with limited downlink bandwidth, publishers need to reduce quality to adapt to the worst of them so that no subscriber is congested. As shown in the following figure, this is suboptimal as we are constraining the quality of participants that could communicate with much higher quality.

In Group Rooms, by default the video quality is constrained to the worst of the available bandwidths of the participants.

Simulcast is a standardized technique designed for solving this problem. Simulcast involves the simultaneous sending of different versions of the same video track encoded independently at different resolutions and framerates. With Simulcast, the SFU has several versions of the track with different qualities, so that it can forward higher qualities to higher bandwidth subscribers and lower qualities to lower bandwidth ones. In more technical jargon, we say that Simulcast is a mechanism for providing scalability to non-scalable video codecs such as VP8.

Simulcast involves the simultaneous sending of several version of the same video track so that the SFU infrastructure can forward different qualities to different subscribers depending on their network status and capabilities.

Remark that Simulcast involves the track publisher (which needs to send the different track qualities) and the SFU (which selects the most optimal quality for each subscriber.) However, when Participants act only as subscribers they are not aware of Simulcast as they just receive a standard VP8 encoded video. Hence, they can neither enable nor disable Simulcast use.

Enabling Simulcast in your Twilio application

Simulcast can be enabled in Group Room clients sending media to Twilio’s SFU. The following table illustrates Twilio’s current support for Simulcast:

Twilio Video SDK Browser (or N/A) VP8 Simulcast Support (only Group Rooms)
JavaScript Chrome Yes (SDK v1.7.0+)
JavaScript Firefox いいえ
JavaScript Safari Yes (Safari 12.1+ with SDK 1.17.0+)
Android 該当なし Yes (SDK v2.1.0+)
iOS 該当なし Yes (SDK v2.1.0+)

Enabling Simulcast using the JavaScript SDK

By default, Simulcast is disabled. You can enable Simulcast on a per-Participant basis at Room connect-time. This is done using the ConnectOptions as shown in the following code snippet:

// Web Javascript
// Remember that Simulcast only needs to be enabled in media publishers
// See compatibility table above with supported browsers and required SDK versions

const room = await connect(token, {
    preferredVideoCodecs: [
      { codec: 'VP8', simulcast: true }
    ]
});

Any Group Room Participant with VP8 Simulcast enabled publishes all its video tracks using VP8 Simulcast. Once this is done, Twilio’s video infrastructure leverages Simulcast tracks to provide the best possible quality to any subscriber without requiring any additional action from you.

Enabling Simulcast using the iOS SDK

By default, Simulcast is disabled. You can enable Simulcast on a per-Participant basis at Room connect-time. This is done using the ConnectOptions as shown in the following code snippet:

// Swift code
// Remember that Simulcast only need to be enabled in media publishers
// See compatibility table above to with required SDK versions

let connectOptions = ConnectOptions(token: accessToken) { (builder) in
    builder.preferredVideoCodecs = [Vp8Codec(simulcast: true)]
}

Any Group Room Participant with VP8 Simulcast enabled publishes all its video tracks using VP8 Simulcast. Once this is done, Twilio’s video infrastructure leverages Simulcast tracks to provide the best possible quality to any subscriber without requiring any additional action from you.

Enabling Simulcast using the Android SDK

By default, Simulcast is disabled. You can enable Simulcast on a per-Participant basis at Room connect-time. This is done using the ConnectOptions as shown in the following code snippet:

// Java code
// Remember that Simulcast only need to be enabled in media publishers
// See compatibility table above to with required SDK versions

ConnectOptions connectOptions = new ConnectOptions.Builder(accessToken).preferVideoCodecs(Collections.singletonList(new Vp8Codec(true))).build();

Any Group Room Participant with VP8 Simulcast enabled publishes all its video tracks using VP8 Simulcast. Once this is done, Twilio’s video infrastructure leverages Simulcast tracks to provide the best possible quality to any subscriber without requiring any additional action from you.

Resolution and Simulcast layers

Twilio SDKs encode up to three spatial layers when simulcast is enabled. The following table illustrates what layers are typically generated given a particular capture resolution. Remark that this is just an approximation and that the real behavior may be slightly different. In the table, disabled means that that layer is not sent in those conditions (i.e. that quality is not generated by the publisher and hence is not available at the SFU to be forwarded to subscribers.)

Capture resolution Layer 1 Layer 2 Layer 3
352x288 352x288 無効 無効
480x360 240x180 480x360 無効
640x480 320x240 640x480 無効
640x480 (with crop) 240x240 480x480 無効
960x540 240x135 480x270 960x540
1024x768 256x192 512x384 1024x768
1024x768 (with crop) 240x192 480x384 960x768
1280x720 320x180 640x360 1280x720
1280x720 (with crop) 225x180 450x360 900x720

Simulcast and Capture Settings on Mobile SDKs

To optimize video quality while minimizing CPU usage and bandwidth, it is recommended to use VP8 simulcast with the capture settings suggested below on each mobile platform.

iOS

Capture Frame Rate

24 FPS. When simulcasting, this will result in 3 temporal layers of 24 FPS, 12 FPS, and 6 FPS. Selecting 24 frames / second instead of the default of 30 reduces the CPU load on the VP8 software encoder.

Capture Dimensions

  1. 1024x768 on most iPhones
  2. 1280x720 on iPhone X and models that do not have support for 1024x768
  3. 640x480 on iPhone 6s and earlier models

iOS devices support high resolution capture formats with ratios of 1.33:1 and 1.77:1. When simulcasting, it is often desirable to produce a squarish ratio (1.25:1) that can be viewed by subscribers in landscape or portrait, and as smaller thumbnails. Cropping is performed at the source by using a format request. Besides changing the ratio of the captured video, cropping also reduces the number of pixels that need to be processed by the software encoder. Using 1280x720 or 1024x768 for video capture will result in 3-layer simulcast with the layer structure as shown in the table above. Using 640x480 is recommended on older iPhones and will result in 2-layer simulcast.

Other Considerations

If a Group Room is being used, it is recommended to remove the rotation tags using hardware acceleration using this API. Also, it is recommended to reduce the audio bitrate tuned for speech content.

サンプルコード

The above recommendations are implemented in this code snippet:

struct CaptureDeviceUtils {

    // Produce 3 spatial layers ~ {960x768, 480x384, 240x192}. 1024x768 is captured on most phones
    // Produce 3 spatial layers ~ {900x720, 450x360, 225x180}, 1280x720 is captured on on iPhone X
    static let kSimulcastVideoDimensions = CMVideoDimensions(width: 900, height: 720)
    static let kSimulcastVideoFrameRate = UInt(24)
    static let kSimulcastVideoBitrate = UInt(1800)

     /*
     * @brief Finds the smallest format that is suitably close to the ratio requested.
     *
     * @param device The AVCaptureDevice to query.
     * @param targetRatio The ratio that is preferred.
     *
     * @return A format that satisfies the request.
     */
    static func selectFormatBySize(device: AVCaptureDevice,
                                   targetSize: CMVideoDimensions) -> VideoFormat {
        // Arranged from smallest to largest.
        let formats = CameraSource.supportedFormats(captureDevice: device)
        var selectedFormat = formats.firstObject as? VideoFormat
        for format in formats {
            guard let videoFormat = format as? VideoFormat else {
                continue
            }
            if videoFormat.pixelFormat != PixelFormat.formatYUV420BiPlanarFullRange {
                continue
            }
            let dimensions = videoFormat.dimensions
            // Cropping might be used if there is not an exact match.
            if (dimensions.width >= targetSize.width && dimensions.height >= targetSize.height) {
                selectedFormat = videoFormat
                break
            }
        }
        return selectedFormat!
}

let options = CameraSourceOptions { (builder) in
    // Stripping rotation tags using hardware acceleration
    builder.rotationTags = .remove
}
camera = CameraSource(options: options, delegate: self)

// Assume front camera is available
let frontCamera = CameraSource.captureDevice(position: .front)
if let camera = camera {
    localVideoTrack = LocalVideoTrack(source: camera, enabled: true, name: "Camera")

    // Discover a simulcast format for the front camera
    let format = CaptureDeviceUtils.selectFormatBySize(device: frontCamera!,
                                                       targetSize: CaptureDeviceUtils.kSimulcastVideoDimensions)

    // Lower the frame rate to reduce CPU load, but still produce 3 temporal layers (f, f/2, f/4)
    format.frameRate = CaptureDeviceUtils.kSimulcastVideoFrameRate

    // Apply slight cropping to reduce CPU load, and provide square-ish video
    let croppedFormat = VideoFormat.init()
    croppedFormat.dimensions = CaptureDeviceUtils.kSimulcastVideoDimensions
    camera.requestOutputFormat(croppedFormat)

    camera.startCapture(device: device, format:format) { (captureDevice, videoFormat, error) in
        if let error = error {
            self.logMessage(messageText: "Capture failed with error.\ncode = \((error as NSError).code) error = \(error.localizedDescription)")
        }
    }
}

let connectOptions = ConnectOptions(token: accessToken) { (builder) in
    if let localVideoTrack = localVideoTrack {
        builder.videoTracks = [localVideoTrack]
    }
    builder.isNetworkQualityEnabled = true
    builder.networkQualityConfiguration =
        NetworkQualityConfiguration(localVerbosity: .minimal, remoteVerbosity: .minimal)
    // Enable Vp8 simulcast, and cap the bitrate at 1.8 Mbps to reduce strain on the sender. Reduce audio bitrate for speech content.
    builder.encodingParameters = EncodingParameters(audioBitrate:16, videoBitrate:1800)
    builder.preferredVideoCodecs = [Vp8Codec(simulcast: true)]
}

Android

Capture Frame Rate

24 FPS. When simulcasting, this will result in 3 temporal layers of 24 FPS, 12 FPS, and 6 FPS. Selecting 24 frames / second instead of the default of 30 reduces the CPU load on the VP8 encoder.

Capture Dimensions

  1. 1280x720 on Android devices that support VP8 hardware acceleration
  2. 1024x768 on more recent Android devices that do not support VP8 hardware acceleration
  3. 640x480 on older Android devices

Using 1280x720 or 1024x768 for video capture will result in 3-layer simulcast with the layer structure as shown in the table above. Using 640x480 for video capture will result in a 2-layer simulcast.

Other Considerations

It is recommended to reduce the audio bitrate tuned for speech content.

サンプルコード

The above settings are specified as part of the Video Constraints API as shown in the code snippet below:

import tvi.webrtc.MediaCodecVideoEncoder;

VideoDimensions videoDimensions = VideoDimensions.VGA_VIDEO_DIMENSIONS;
if (MediaCodecVideoEncoder.isVp8HwSupported()) {
    videoDimensions = VideoDimensions.HD_720P_VIDEO_DIMENSIONS;
}
VideoConstraints videoConstraints = new VideoConstraints.Builder()
                                                        .maxFps(VideoConstraints.FPS_24)
                                                        .maxVideoDimensions(videoDimensions)
                                                        .build();

LocalVideoTrack localVideoTrack = LocalVideoTrack.create(context, true, videoCapturer, videoConstraints);

// Enable network quality information for local and remote participants
NetworkQualityConfiguration configuration =
            new NetworkQualityConfiguration(
                        NetworkQualityVerbosity.NETWORK_QUALITY_VERBOSITY_MINIMAL,
                        NetworkQualityVerbosity.NETWORK_QUALITY_VERBOSITY_MINIMAL);

ConnectOptions connectOptions = new ConnectOptions.Builder(accessToken)
            .enableNetworkQuality(true)
            .networkQualityConfiguration(configuration)
            .videoTracks(Collections.singletonList(localVideoTrack))
             // Cap the bitrate at 1.8 Mbps to reduce strain on the sender. Reduce audio bitrate for speech content.
            .encodingParameters(new EncodingParameters(16, 1800)
             // Enable Vp8 simulcast
            .preferVideoCodecs(Collections.singletonList(new Vp8Codec(true))) // Enable simulcast
            .build();

Pros and cons of Simulcast

When enabling Simulcast in your Group Rooms application you enjoy the following advantages:

  • VP8 subscribers enjoy differentiated quality adapted to their available bandwidth. This significantly improves the quality on Group Rooms with many heterogeneous Participants.
  • VP8 subscribers are isolated from each other so that a subscriber with a degraded network link does not affect the reception quality of other subscribers.

On the other hand, Simulcast also has some drawbacks:

  • Simulcast only contributes to improve the video quality in Group Rooms with 3 or more Participants.
  • Publishers battery consumption is higher due to the need of encoding multiple versions of the same video track.
  • Publishers bandwidth consumption is higher (up to double in some cases) due to sending multiple versions of the same video track. Note that this increase does not impact your Programmable Video costs as Twilio does not charge upstream (i.e. from sender to Twilio’s cloud) bandwidth.

Limitations and known issues

  • Simulcast should only be used in Group Rooms. Using it in P2P Rooms does not improve quality and only contributes to degrade application performance.
  • Simulcast is only supported for the VP8 video codec.
  • The combination of Simulcast and oscillating bandwidth conditions at the publisher might generate suboptimal recording qualities. If the primary objective of your application is to have optimal recording video quality you might prefer not to enable Simulcast on it.
Luis Lopez
Rate this page:

ヘルプが必要ですか?

誰しもが一度は考える「コーディングって難しい」。そんな時は、お問い合わせフォームから質問してください。 または、Stack Overflow でTwilioタグのついた情報から欲しいものを探してみましょう。