On this page

Working with VP8 Simulcast

(warning)

Warning

This documentation is for reference only. We are no longer onboarding new customers to Programmable Video. Existing customers can continue to use the product until December 5, 2026.

We recommend migrating your application to the API provided by our preferred video partner, Zoom. We've prepared this migration guide to assist you in minimizing any service disruption.

What is simulcast

Simulcast is a standardized technique used to retain media quality when some subscribers have limited bandwidth. It is a mechanism for providing scalability to non-scalable video codecs such as VP8.

With simulcast, a client sends multiple versions of the same video simultaneously. Each version is encoded independently at a different resolution and frame rate; this way, a subscriber with limited bandwidth can receive a lower quality version of the media, but subscribers with more bandwidth can receive a higher quality video and their media quality is not degraded.

How Twilio uses simulcast

Twilio Video provides the option to use VP8 simulcast in Group Rooms via Twilio's Selective Forwarding Unit (SFU). To learn more about SFUs and media exchange in Peer-to-Peer vs. Group Rooms, see Understanding Video Rooms.

SFUs simply forward media and can neither transcode nor modify the video. When sending media with unicast (only sending one version of the video track), publishers need to reduce quality to adapt to the worst of their subscribers' bandwidths so that no subscriber is congested.

With VP8 simulcast, the SFU can forward higher quality videos to higher bandwidth subscribers and lower quality videos to lower bandwidth ones. The track publisher sends different track qualities and the SFU selects the most optimal quality for each subscriber. Then, the subscriber receives a single VP8 encoded video that is most suited for their network conditions.

The following illustration shows the difference between unicast and simulcast in a Group Room.

How video is sent in Group Rooms with unicast and simulcast.

The SFU is critical to enabling simulcast. Simulcast is not recommended in Peer-to-Peer rooms, because there is no SFU mediating the data being sent. Simulcast in a Peer-to-Peer room means that each client is sending multiple differently encoded versions to every other participant in the room, which will consume more resources for both senders and receivers.

Pros and cons of simulcast

Simulcast offers the following benefits for Group Room participants:

VP8 subscribers can receive video adapted to their available bandwidth. This significantly improves the quality in Group Rooms with many heterogeneous Participants.
VP8 subscribers are isolated from each other so that a subscriber with a degraded network link does not affect the reception quality of other subscribers.

On the other hand, simulcast also has some drawbacks:

Simulcast only contributes to improve the video quality in Group Rooms with three or more Participants.
On mobile devices, publishers' battery consumption is higher because the publisher encodes multiple versions of the same video track.
Publishers' bandwidth consumption is higher (up to double in some cases) because the publisher sends multiple versions of the same video track. This increase does not impact your Programmable Video costs as Twilio does not charge for upstream bandwidth (i.e. from the sender to Twilio's cloud).

Adaptive simulcast

Twilio's standard VP8 simulcast sends up to three layers of video at different resolutions. See the approximate resolutions of each layer here. In some video conferencing contexts, the higher resolution layers that consume the most resources to encode may not be needed.

Twilio Video offers adaptive simulcast, which enables and disables simulcast layers dynamically to improve bandwidth and CPU usage. This helps save device resources in cases such as presentation and grid modes, when the application does not need a Participant's highest resolution video. Adaptive simulcast ensures that publishers are only encoding the spatial layers needed at a given moment.

For example, when someone is presenting in a video conference, you will frequently only display the presenter's video in a large format, and will display only thumbnails of other participants' video. The participants who are not presenting do not need to encode and send higher resolution video layers because their video is not highlighted. The same might also be true in a video conference in grid mode, where each participant's video is the same size and no one's video needs to be the highest quality.

In these situations, adaptive simulcast can detect which layers are being used by subscribers and automatically turn off encoding on the publisher's side for higher spatial layers that are not being used. As speakers change, adaptive simulcast will dynamically turn on or turn off the appropriate spatial layers, based on what subscribers in the room are using.

Note that adaptive simulcast will not disable any video layers when the Room is being recorded, to help produce high quality recordings.

Adaptive simulcast is currently available in the Twilio Video JavaScript SDK, Android SDK, and iOS SDK. Please review:

for more information. If your application is currently using VP8 simulcast, we recommend that you switch to adaptive simulcast.

Limitations

Simulcast should only be used in Group Rooms. Using it in Peer-to-Peer Rooms does not improve quality and degrades application performance.
Simulcast is only supported for the VP8 video codec.

Resolution and simulcast layers

Twilio SDKs encode up to three spatial layers when simulcast is enabled. The following table illustrates which layers are typically generated given a particular capture resolution. Remark that this is just an approximation and that the real behavior may be slightly different. In the table, disabled means that that layer is not sent in those conditions. (Video of the specified resolution is not generated by the publisher and is not available at the SFU to be forwarded to subscribers).

Capture resolution	Layer 1	Layer 2	Layer 3
352x288	352x288	disabled	disabled
480x360	240x180	480x360	disabled
640x480	320x240	640x480	disabled
640x480 (with crop)	240x240	480x480	disabled
960x540	240x135	480x270	960x540
1024x768	256x192	512x384	1024x768
1024x768 (with crop)	240x192	480x384	960x768
1280x720	320x180	640x360	1280x720
1280x720 (with crop)	225x180	450x360	900x720
1920x1080	480x270	960x540	1920x1080

Enable simulcast in your Twilio application

Simulcast can be enabled in Group Rooms. The following table illustrates Twilio's current support for simulcast:

Twilio Video SDK	Browser (or N/A)	VP8 Simulcast Support (only Group Rooms)
JavaScript	Chrome	Yes (SDK v1.7.0+)
JavaScript	Firefox	No
JavaScript	Safari	Yes (Safari 12.1+ with SDK 1.17.0+)
Android	N/A	Yes (SDK v2.1.0+)
iOS	N/A	Yes (SDK v2.1.0+)

Enable adaptive simulcast using the JavaScript SDK

Simulcast is disabled by default. You can enable simulcast on a per-Participant basis when connecting to a Room.

To enable adaptive simulcast, set preferredVideoCodecs="auto" in ConnectOptions when connecting to a video Room. The SDK will use VP8 simulcast, and will enable/disable simulcast layers dynamically, thus improving bandwidth and CPU usage.

Adaptive simulcast works best when used along with Client Track Switch Off Control and Video Content Preferences. These two flags allow the SFU to determine which simulcast layers are needed, thus allowing it to disable the layers not needed on publisher side.


_11const { connect } = require('twilio-video');
_11
_11const room = await connect(token, {
_11  preferredVideoCodecs: 'auto',
_11  bandwidthProfile: {
_11    video: {
_11      contentPreferencesMode: 'auto',
_11      clientTrackSwitchOffControl: 'auto'
_11    }
_11  }
_11});

Please note the following limitations with adaptive simulcast in the JavaScript SDK:

Specifying preferredVideoCodecs="auto" will revert to unicast in the following cases:
- The publisher is using Firefox
- The publisher has preferred the H264 codec
- The Room is configured to support only the H264 codec
- Peer-to-Peer Rooms
When the Room is being recorded, the SFU will not disable any simulcast layers of the publisher's VideoTrack.

Enable standard simulcast using the JavaScript SDK

You can enable standard simulcast by setting simulcast: true in ConnectOptions when connecting to a video Room.


_10// Web JavaScript
_10// Remember that simulcast only needs to be enabled in media publishers
_10// See compatibility table above with supported browsers and required SDK versions
_10
_10const room = await connect(token, {
_10    preferredVideoCodecs: [
_10      { codec: 'VP8', simulcast: true }
_10    ]
_10});

Any Group Room Participant with VP8 simulcast enabled publishes all their video tracks using VP8 simulcast. Once this is done, Twilio's video infrastructure leverages simulcast tracks to provide the best possible quality to any subscriber without requiring any additional action from you.

Enable simulcast using the iOS SDK

By default, simulcast is disabled. You can enable simulcast on a per-Participant basis when connecting to a Room. This is done using the ConnectOptions as shown in the following code snippet:


_10// Swift code
_10// Remember that simulcast only need to be enabled in media publishers
_10// See compatibility table above to with required SDK versions
_10
_10let connectOptions = ConnectOptions(token: accessToken) { (builder) in
_10    builder.preferredVideoCodecs = [Vp8Codec(simulcast: true)]
_10}

Any Group Room Participant with VP8 simulcast enabled publishes all its video tracks using VP8 simulcast. Once this is done, Twilio's video infrastructure leverages simulcast tracks to provide the best possible quality to any subscriber without requiring any additional action from you.

Enable simulcast using the Android SDK

By default, simulcast is disabled. You can enable simulcast on a per-Participant basis when connecting to a Room. This is done using the ConnectOptions as shown in the following code snippet:


_10// Java code
_10// Remember that simulcast only need to be enabled in media publishers
_10// See compatibility table above to with required SDK versions
_10
_10ConnectOptions connectOptions = new ConnectOptions.Builder(accessToken).preferVideoCodecs(Collections.singletonList(new Vp8Codec(true))).build();

Simulcast and capture settings on mobile SDKs

To optimize video quality while minimizing CPU usage and bandwidth, it is recommended to use VP8 simulcast with the capture settings suggested below on each mobile platform.

iOS

Capture Frame Rate

24 FPS. When simulcasting, this will result in 3 temporal layers of 24 FPS, 12 FPS, and 6 FPS. Selecting 24 frames / second instead of the default of 30 reduces the CPU load on the VP8 software encoder.

Capture Dimensions

1024x768 on most iPhones
1280x720 on iPhone X and models that do not have support for 1024x768
640x480 on iPhone 6s and earlier models

iOS devices support high resolution capture formats with ratios of 1.33:1 and 1.77:1. When simulcasting, it is often desirable to produce a squarish ratio (1.25:1) that can be viewed by subscribers in landscape or portrait, and as smaller thumbnails. Cropping is performed at the source by using a format request. Besides changing the ratio of the captured video, cropping also reduces the number of pixels that need to be processed by the software encoder. Using 1280x720 or 1024x768 for video capture will result in 3-layer simulcast with the layer structure as shown in the table above. Using 640x480 is recommended on older iPhones and will result in 2-layer simulcast.

Other Considerations

If a Group Room is being used, it is recommended to remove the rotation tags using hardware acceleration using this API. Also, it is recommended to reduce the audio bitrate tuned for speech content.

Sample Code

The above recommendations are implemented in this code snippet:


_79struct CaptureDeviceUtils {
_79
_79    // Produce 3 spatial layers ~ {960x768, 480x384, 240x192}. 1024x768 is captured on most phones
_79    // Produce 3 spatial layers ~ {900x720, 450x360, 225x180}, 1280x720 is captured on on iPhone X
_79    static let kSimulcastVideoDimensions = CMVideoDimensions(width: 900, height: 720)
_79    static let kSimulcastVideoFrameRate = UInt(24)
_79    static let kSimulcastVideoBitrate = UInt(1800)
_79
_79     /*
_79     * @brief Finds the smallest format that is suitably close to the ratio requested.
_79     *
_79     * @param device The AVCaptureDevice to query.
_79     * @param targetRatio The ratio that is preferred.
_79     *
_79     * @return A format that satisfies the request.
_79     */
_79    static func selectFormatBySize(device: AVCaptureDevice,
_79                                   targetSize: CMVideoDimensions) -> VideoFormat {
_79        // Arranged from smallest to largest.
_79        let formats = CameraSource.supportedFormats(captureDevice: device)
_79        var selectedFormat = formats.firstObject as? VideoFormat
_79        for format in formats {
_79            guard let videoFormat = format as? VideoFormat else {
_79                continue
_79            }
_79            if videoFormat.pixelFormat != PixelFormat.formatYUV420BiPlanarFullRange {
_79                continue
_79            }
_79            let dimensions = videoFormat.dimensions
_79            // Cropping might be used if there is not an exact match.
_79            if (dimensions.width >= targetSize.width && dimensions.height >= targetSize.height) {
_79                selectedFormat = videoFormat
_79                break
_79            }
_79        }
_79        return selectedFormat!
_79}
_79
_79let options = CameraSourceOptions { (builder) in
_79    // Stripping rotation tags using hardware acceleration
_79    builder.rotationTags = .remove
_79}
_79camera = CameraSource(options: options, delegate: self)
_79
_79// Assume front camera is available
_79let frontCamera = CameraSource.captureDevice(position: .front)
_79if let camera = camera {
_79    localVideoTrack = LocalVideoTrack(source: camera, enabled: true, name: "Camera")
_79
_79    // Discover a simulcast format for the front camera
_79    let format = CaptureDeviceUtils.selectFormatBySize(device: frontCamera!,
_79                                                       targetSize: CaptureDeviceUtils.kSimulcastVideoDimensions)
_79
_79    // Lower the frame rate to reduce CPU load, but still produce 3 temporal layers (f, f/2, f/4)
_79    format.frameRate = CaptureDeviceUtils.kSimulcastVideoFrameRate
_79
_79    // Apply slight cropping to reduce CPU load, and provide square-ish video
_79    let croppedFormat = VideoFormat.init()
_79    croppedFormat.dimensions = CaptureDeviceUtils.kSimulcastVideoDimensions
_79    camera.requestOutputFormat(croppedFormat)
_79
_79    camera.startCapture(device: device, format:format) { (captureDevice, videoFormat, error) in
_79        if let error = error {
_79            self.logMessage(messageText: "Capture failed with error.\ncode = \((error as NSError).code) error = \(error.localizedDescription)")
_79        }
_79    }
_79}
_79
_79let connectOptions = ConnectOptions(token: accessToken) { (builder) in
_79    if let localVideoTrack = localVideoTrack {
_79        builder.videoTracks = [localVideoTrack]
_79    }
_79    builder.isNetworkQualityEnabled = true
_79    builder.networkQualityConfiguration =
_79        NetworkQualityConfiguration(localVerbosity: .minimal, remoteVerbosity: .minimal)
_79    // Enable Vp8 simulcast, and cap the bitrate at 1.8 Mbps to reduce strain on the sender. Reduce audio bitrate for speech content.
_79    builder.encodingParameters = EncodingParameters(audioBitrate:16, videoBitrate:1800)
_79    builder.preferredVideoCodecs = [Vp8Codec(simulcast: true)]
_79}

Android

Capture Frame Rate

24 FPS. When simulcasting, this will result in 3 temporal layers of 24 FPS, 12 FPS, and 6 FPS. Selecting 24 frames / second instead of the default of 30 reduces the CPU load on the VP8 encoder.

Capture Dimensions

1280x720 on Android devices that support VP8 hardware acceleration
1024x768 on more recent Android devices that do not support VP8 hardware acceleration
640x480 on older Android devices

Using 1280x720 or 1024x768 for video capture will result in 3-layer simulcast with the layer structure as shown in the table above. Using 640x480 for video capture will result in a 2-layer simulcast.

Other Considerations

It is recommended to reduce the audio bitrate tuned for speech content.

Sample Code

The above settings are specified as part of the Video Format API as shown in the code snippet below:


_25import tvi.webrtc.MediaCodecVideoEncoder;
_25
_25VideoDimensions videoDimensions = VideoDimensions.VGA_VIDEO_DIMENSIONS;
_25if (MediaCodecVideoEncoder.isVp8HwSupported()) {
_25    videoDimensions = VideoDimensions.HD_720P_VIDEO_DIMENSIONS;
_25}
_25VideoFormat videoFormat = new VideoFormat(videoDimensions, 24);
_25
_25LocalVideoTrack localVideoTrack = LocalVideoTrack.create(context, true, videoCapturer, videoFormat);
_25
_25// Enable network quality information for local and remote participants
_25NetworkQualityConfiguration configuration =
_25            new NetworkQualityConfiguration(
_25                        NetworkQualityVerbosity.NETWORK_QUALITY_VERBOSITY_MINIMAL,
_25                        NetworkQualityVerbosity.NETWORK_QUALITY_VERBOSITY_MINIMAL);
_25
_25ConnectOptions connectOptions = new ConnectOptions.Builder(accessToken)
_25            .enableNetworkQuality(true)
_25            .networkQualityConfiguration(configuration)
_25            .videoTracks(Collections.singletonList(localVideoTrack))
_25             // Cap the bitrate at 1.8 Mbps to reduce strain on the sender. Reduce audio bitrate for speech content.
_25            .encodingParameters(new EncodingParameters(16, 1800)
_25             // Enable Vp8 simulcast
_25            .preferVideoCodecs(Collections.singletonList(new Vp8Codec(true))) // Enable simulcast
_25            .build();