メニュー

Expand
Rate this page:

Thanks for rating this page!

We are always striving to improve our documentation quality, and your feedback is valuable to us. How could this documentation serve you better?

Developing High Quality Video Applications

概要

This guide provides advice for developing high-quality Twilio Video applications. For an optimal end-user experience, we highly recommend that you read the complete Twilio Programmable Video documentation and tailor our general recommendations provided here to your specific use-case.

目次

Guide for the Impatient

With the links below you can directly jump to the recommended settings for each use-case.

Desktop Clients Mobile Clients
P2P Room Recommended Settings Recommended Settings
Group Room (grid) Recommended Settings Recommended Settings
Group Room (collaboration) Recommended Settings Recommended Settings
Group Room (presentation) Recommended Settings Recommended Settings

What does Quality Mean?

Quality is an elusive concept that may have different meanings in different contexts. With Twilio Programmable Video, quality is a synonym of Quality of Experience understood as to how well a video application solves to end-users’ needs and addresses their expectations.

Videoconferencing is the most typical use-case of real-time video applications. They allow end-users to communicate “as they do face-to-face.” Hence, end-users expectations are high fidelity (i.e. high resolution, high frame-rate, etc) and low latency (i.e. real-time conversational interactions). However, the quality of experience may also be impacted by other aspects such as battery consumption, availability of computing and networking resources, etc. Some of the variables affecting quality inversely impact one another. For example, if you increase the video resolution then the battery consumption and the networking costs will also increase.

Hence, before starting developing a high-quality video application, first you must wonder: what do end-users need and expect? Having a precise answer to that question will help you make the most appropriate decisions for quality optimization.

Concepts and Terminology

You may find useful the following concepts and definitions:

Resolution

Video tracks can be understood as sequences of still images each of which is encoded as a matrix of pixels. The resolution refers to the dimensions of such a matrix expressed as width x height. The following resolutions are common:

Resolution Dimensions (pixels)
FullHD (Full High Definition) - aka 1080p 1920x1080
HD (High Definition) - aka 720p 1280x720
qHD (Quarter High Definition) - aka 540p 960x540
VGA (Video Graphics Array) 640x480
QCIF (Quarter Common Interface Format) 176×144

Frame-rate

The frame-rate refers to the number of still images that the video stream includes per time unit. It is typically expressed in terms of fps (frames per second). Hence, an HD@30fps video will comprise a sequence of 30 HD still images per second.

Bitrate

The bitrate refers to the number of bits that a given video or audio stream consumes when being transported through a digital network. It is typically measured in terms of bps (bits per second) sometimes prefixed with a power of 10 prefix (e.g. Kbps, Mbps, etc).

P2P and Group Rooms

P2P and Group Rooms and the two main building blocks of Twilio Programmable Video APIs. Please, read our Understanding Video Rooms guide for further guidance.

Codecs: VP8, H.264, and VP8 Simulcast

A codec refers to a type of algorithm that encodes a video signal typically compressing it in the process. VP8 and H.264 are the two main codecs used for videoconferencing. VP8 Simulcast is a scalable version of the VP8 codec. For further information, you may read our Managing Codecs and Working with VP8 Simulcast developer guides.

Network Bandwidth Profile API

The Network Bandwidth Profile API (aka BW Profile API) is a Twilio Video API specifically designed for optimizing bandwidth utilization in Group Rooms. This is a critical API for creating high-quality Group Room applications.

Track Priority API

The Track Priority API allows developers to set the relative priority of Tracks in a video application. The Network Bandwidth Profile API uses Track priorities to assign bandwidth to tracks.

Track Subscription API

The Track Subscription API allows developers to determine which tracks should be received by which participants. This API is essential in use-cases where the number of participants is large and it's not necessary for all of them to receive the audio and/or video of all of the rest.

Dominant Speaker Detection API

The Dominant Speaker is the participant having the highest audio activity at a given time. Many videoconferencing applications enhance the Dominant Speaker (e.g. by representing it larger in the central area of the UI). Twilio’s Dominant Speaker Detection API makes it possible for developers to be notified when the Dominant Speaker changes in a Group Room. Refer to our Detecting the Dominant Speaker developers guide for further guidance.

Network Quality API

The Network Quality API is a Video API specifically designed for monitoring the network quality on Group Rooms. Refer to Using the Network Quality API developer guide for further information.

General Recommendations

Before going deep into the technical details, it may be interesting to understand some general common-sense recommendations that you may find useful in your design process.

Subscribe only to what end-users need

Encoding, communicating and rendering video tracks is expensive. This is very noticeable in multiparty applications when the number of participants is large. For example, in a room with 20 participants, it is generally a bad idea to have all the participants subscribing and rendering 20 video tracks. That will probably contribute to generate network congestion and overload the client CPU resources making the quality of experience unacceptable. Instead, well-designed videoconferencing services tend to limit the number of subscribed tracks to the ones that are really required. For example, in an e-learning application, it doesn't provide much value having all the students rendering the video of the rest of the students all the time. It is more reasonable to do it only in special situations such as when a question is being asked by that specific participant. In that case, developers must make intelligent use of the Twilio Track Subscription API, selecting the tracks subscribed by each participant and the Network Bandwidth Profile API, which dynamically selects the video tracks of the participants with higher speaking activity and discards the rest.

Make it simple for end-users to mute

Your application should provide mute capabilities to end-users so that they can disable the video or audio communication as they wish. This will avoid unnecessary traffic and background noise.

Use VP8 Simulcast in multiparty Group Rooms

Multiparty Group Rooms participants should prefer VP8 Simulcast over other video codecs. The larger the number of participants in a room, the more important Simulcast is for providing the best possible quality of experience.

Use a reasonable resolution and frame-rate

Frame-rate and resolution are the two main capture constraints that affect video fidelity. When the video source is a camera showing people or moving objects, typically the perceptual quality is better at higher frame-rate. However, for screen-sharing, the resolution is typically more relevant. You should try to set resolution and frame-rate to the minimum value required by your use-case. Over-dimensioning resolution and frame-rate will have a negative impact on the CPU and network consumption and may increase latency. In addition, remember that the resolution and frame-rate you specify as capture constraints are just hints for the client video engine. The actual resolution and frame-rate may decrease if CPU overuse is detected or if the network capacity is not enough for the required traffic.

Consider the render dimensions

When setting your video capture constraints for publishers you must also wonder about the render size on the subscriber's side. If you know that a given video track is to be rendered only in thumbnail size for all subscribers, then it does not make sense to capture it in high resolution at the publisher.

Do not share resources

High-resolution video and audio consume relevant CPU and bandwidth resources. If those resources are being shared with other applications the quality of experience will decrease. To have the best possible experience, you should recommend your end-users to close all the applications that may steal CPU or bandwidth to your video service while it's executing.

Use the best connectivity you can find

Network connectivity is the most critical aspect affecting communication quality. Restricted bandwidth, high latency, and packet loss may affect very negatively your end-users’ experience. Hence, you should recommend using the best possible network access they may find: wired connectivity is commonly better than a wireless connection. Among wireless connections, typically corporate or cellular connectivity is better than public open shared WiFi networks.

Think twice before using maxVideoBitrate or maxAudioBitrate

Both parameters allow controlling the maximum Participant’s upstream bandwidth.

  • maxVideoBitrate specifies the maximum video bitrate a participant can publish to the Room. By default, no value is set and the maxVideoBitrate is unlimited. In that case, the bitrate is only limited by the Twilio client SDK using an algorithm that considers the available bandwidth and CPU resources. In general, we recommend trusting that algorithm and avoid setting the maxVideoBitrate. The only exception is when developers prefer to sacrifice quality in exchange for battery life. In that situation, we recommend setting maxVideoBitrate to a value between 500000 and 2000000 bps per track. Note, if a Participant is Publishing N video tracks then each video track will be limited to consuming maxVideoBitrate/N.
  • maxAudioBitrate specifies the maximum audio bitrate published by a Participant. It only takes effect in variable bitrate codecs like Opus (i.e. it has no effect on PCM codecs). By default it is unset and Opus is configured with its default settings consuming between 20Kbps and 40Kbps. Twilio’s recommendation is to keep the default and only use maxAudioBitrate in exceptional low bandwidth scenarios where a lower audio bitrate is required.
Recommendation When to use it
maxVideoBitrate Keep default (unset) Exceptionally (extending battery life), keep it between 500000 and 2000000 bps per video track
maxAudioBitrate Keep default (unset) Exceptionally (low BW network), keep it over 16000 bps per audio track

Use GLL

On the Internet, latency and packet loss depend on geolocation. When the connection between a sender and a receiver spans the globe, latency and jitter are increased by the distance between the parties. Packet loss is also more likely, due to the number of routers in the connection path. Due to this, the Twilio infrastructure that serves your rooms should be as close as possible to your clients. Otherwise, quality may be affected:

  • In both P2P and Group Rooms the connectivity time may increase.
  • In Group Rooms the media latency and packet loss may increase making the fidelity to drop.

To minimize these problems, Twilio makes it possible to specify the signaling and media regions for your Rooms. However, determining what's the closest region for a participant is not always trivial. For this reason, we recommend developers use GLL (Global Low Latency). When GLL is specified, Twilio will automatically choose the region that minimizes latency. See our Video Regions and Global Low Latency documentation for further insight.

Measure

Quality should be understood as a process. You should try to measure both your end-users’ perception as well as the many different factors that may affect it including CPU consumption and network connectivity metrics. You may find Twilio’s Network Quality API interesting for the latter. With that information, try to understand your end-users’ pain points and design a strategy to minimize them. Periodically repeating the measure-analyze-implement cycle is the best way to guarantee you are offering the best possible quality of experience to your users.

P2P or Group Rooms: Which Room Should I Use?

Selecting the most appropriate Room Type for your use-case is a critical step. For that, we strongly recommend following our Understanding Twilio Video Rooms Developers Guide. From the quality perspective, and without any consideration to features or compliance, the difference between P2P and Group Rooms can be synthesized in the following table:

P2P Rooms Group Rooms
Media connections Client-to-client communication Server-routed communication
Upstream bandwidth Proportional to the number of participants Constant with the number of participants

You may find the following rules of thumb useful to assess the suitability of P2P Rooms for your use-case:

  • If you require high-quality video, then P2P Rooms are only recommended for 1-to-1 communications.
  • If you can tolerate low-quality video, then P2P Rooms can be used for rooms with up to 4 participants.
  • If your room has only audio, then P2P Rooms can go up to 10 participants.
  • In the rest of the cases, Group Rooms will probably offer you better quality.

Enhancing Quality in P2P Rooms

Using the appropriate client-side settings is essential for optimizing P2P Room quality. The following recommendations may be useful for that purpose.

Desktop Clients in P2P Rooms: Recommended Settings

Codec Settings

Setting Recommended value
Video codec
  • Use VP8 (default)
  • H.264: only if needed for interoperability reasons
  • Never use Simulcast
Audio codec
  • Use Opus (default)

Video Capture Settings

Setting Recommended value
For webcam
  • Use VGA@30fps
  • Consider HD@30fps if CPU resources make it possible
For screen
  • Use FullHD@15fps
  • Consider HD@15fps if you detect CPU overuse

Mobile Clients in P2P Rooms: Recommended Settings

Codec Settings

Setting Recommended value
Video codec
  • Use VP8 (default)
  • H.264: only if needed for interoperability reasons
  • Never use Simulcast
Audio codec
  • Use Opus (default)

Video Capture Settings

Setting Recommended value
For webcam
  • Use VGA@30fps
  • Consider HD@30fps if codec has HW support.
For screen
  • Use HD@15fps
  • Consider FullHD@15fps if codec has HW support.

Enhancing Quality in Group Rooms

Group Room quality strongly depends on how the bandwidth is managed. To optimize quality, you must make sure that your video tracks are appropriately prioritized and that bandwidth is allocated in alignment with your use-case needs. This is done using the Track Priority API and the Network Bandwidth Profile API.

Track Priority API: Recommendations

Track priorities are used to determine the importance of tracks. They are used to allocate bandwidth and to decide which tracks should be switched off in case of congestion. Track priorities are use-case dependent and setting them correctly is essential for having optimal quality. The following general guidelines may be helpful for that objective:

Audio track priorities

  • From the perspective of the Network Bandwidth Profile API, audio tracks are always a higher priority than video tracks. Hence, you may think of audio as being in a special more important category.
  • In general, setting the priority of an audio track will have no effect in your application.

Video track priorities

  • In most use-cases, the following rule of thumb holds: video track priority should map to render size, where higher priority video tracks will be rendered with larger dimensions than lower priority video tracks.
  • Typically there should be only one video track with priority high. When screen-sharing, the screen should be the high priority track. If screen-share is absent, and Dominant Speaker Detection is activated, the dominant speaker video may be the high priority track.
  • Video tracks rendered as thumbnail should have priority low.
  • You may need to dynamically adapt video track priorities. For example, dominantSpeakerPriority may need to go from high to low when a screen-share is activated.

Network Bandwidth Profile API: Selecting the mode

Determining the Bandwidth Profile Mode Bandwidth Profiles have three modes: collaboration, presentation, and grid. You can determine the mode that best fits your use-case with the following decision diagram:

Decision diagram for Network Bandwidth Profile mode selection

Do I use Group Rooms?

  • If your application uses Twilio P2P Rooms, answer NO. Otherwise, answer YES.

Is it a multiparty service?

  • If your application is only used for 1-to-1 communications (i.e. there are never more than 2 Participants in the Room) answer NO. Otherwise, answer YES.

Is there a main video track?

  • If your application UI renders all video tracks with the same display size, answer NO. If your application has one (or several) video tracks that are enhanced in the UI (e.g. dominant speaker, screen-share, etc.) taking more display area answer YES.

Can I use VP8 Simulcast?

  • If a relevant fraction of your application end-users cannot use VP8 simulcast (e.g. because you have decided to use H.264, or because it’s not supported, etc.) answer NO. Otherwise, answer YES.

Is the main track quality critical?

  • If you prefer the main video track quality to be preserved by all means, even at the cost of completely switching off other less relevant tracks when bandwidth is low (e.g. the screen-share in a presentation), answer YES. Otherwise, answer NO.

Developing Applications with grid mode

Applications use grid mode for one of the following reasons:

  • The application is 1-to-1.
  • The application is for multiparty communications but the UI layout does not enhance any video tracks over others (i.e. all tracks are rendered with the same size).
  • It’s not possible to use Simulcast. Note that for large rooms (i.e. rooms with 5 or more participants), not using Simulcast will typically bring a significant degradation on video quality even in grid mode.

Typical GUI layout used for grid mode. Videos are displayed in a matrix where all video tracks have equal relevance.

Desktop Clients in Group Rooms: Recommended Settings for grid mode

Codec Settings

Setting Recommended value
Video codec (1-to-1 Rooms)
  • Use VP8 (default)
  • H.264: only if needed for interoperability reasons
  • Never use Simulcast
Video codec (multiparty Rooms)
  • Use VP8 Simulcast
  • Use VP8 if simulcast is not supported
  • H.264: only if needed for interoperability reasons
Audio codec
  • Use Opus (default)

Video Capture Settings

Setting Recommended value
For webcam
  • Use VGA@30fps
  • Consider HD@30fps if CPU resources make it possible
For screen
  • Use FullHD@15fps
  • Consider HD@15fps if you detect CPU overuse

Network Bandwidth Profile Settings

Setting Recommended value
mode
  • Keep default (grid)
maxSubscriptionBitrate
  • Keep default (unlimited)
dominantSpeakerPriority
  • Keep default (standard)
maxTracks
  • Keep it under 10
renderDimensions
  • Keep defaults (VGA is the standard)
  • You may use HD resolution for the standard if capture is HD and if CPU consumption makes it possible

Track Priority Settings

Setting Recommended value
webcam
  • Keep default (standard)
screen
  • Keep default (standard)

Other API settings

Setting Recommended value
Network Quality API Active
Dominant Speaker Detection Inactive

Mobile Clients in Group Rooms: Recommended Settings for grid mode

Codec Settings

Setting Recommended value
Video codec (1-to-1 Rooms)
  • Use VP8 (default)
  • H.264: only if needed for interoperability reasons
  • Never use Simulcast
Video codec (multiparty Rooms)
  • Use VP8 Simulcast
  • Use VP8 if simulcast is not supported
  • H.264: only if needed for interoperability reasons
Audio codec
  • Use Opus (default)

Video Capture Settings

Setting Recommended value
For webcam
  • Use VGA@30fps
  • Consider HD@30fps if codec has HW support
For screen
  • Use HD@15fps
  • Consider FullHD@15fps if codec has HW support

Network Bandwidth Profile Settings

The Network Bandwidth Profile API is not yet supported in Mobile platforms.

Track Priority Settings

Setting Recommended value
webcam
  • Keep default (standard)
screen
  • Keep default (standard)

Other API settings

Setting Recommended value
Network Quality API Active
Dominant Speaker Detection Inactive

Developing Applications with collaboration mode

Applications using collaboration mode typically share the following properties:

  • Interactions are multiparty (i.e. a large number of participants communicate)
  • The UI layout is designed to enhance one main video track (e.g. dominant speaker).
  • The rest of the video tracks are displayed in thumbnail size.
  • Keeping all tracks visible is more important than having higher quality in the main track.

Applications using collaboration mode typically enhance the dominant speaker and represent the rest of participants in thumbnail size.

Desktop Clients in Group Rooms: Recommended settings for collaboration mode

Codec Settings

Setting Recommended value
Video codec (1-to-1 Rooms)
  • Use grid mode instead
Video codec (multiparty Rooms)
  • Use VP8 Simulcast
  • If VP8 or H.264 is used by most participants, then use grid mode instead
Audio codec
  • Use Opus (default)

Video Capture Settings

Setting Recommended value
For webcam
  • Use VGA@30fps
  • Consider HD@30fps if CPU resources make it possible
For screen
  • Use FullHD@15fps
  • Consider HD@15fps if you detect CPU overuse

Network Bandwidth Profile Settings

Setting Recommended value
mode
  • Use collaboration
maxSubscriptionBitrate
  • Keep default (unlimited)
dominantSpeakerPriority
  • Keep default (standard)
maxTracks
  • Keep it under 10
renderDimensions
  • For high use FullHD
  • For standard use HD
  • For low use QCIF

Track Priority Settings

Setting Recommended value
webcam
  • Use low
screen
  • Use high

Other API settings

Setting Recommended value
Network Quality API Active
Dominant Speaker Detection Active

Mobile Clients in Group Rooms: Recommended settings for collaboration mode

Codec Settings

Setting Recommended value
Video codec (1-to-1 Rooms)
  • Use grid mode instead
Video codec (multiparty Rooms)
  • Use VP8 Simulcast
  • If VP8 or H.264 is used by most participants, then use grid mode instead
Audio codec
  • Use OPUS (default)

Video Capture Settings

Setting Recommended value
For webcam
  • Use VGA@30fps
  • Consider HD@30fps if codec has HW support
For screen
  • Use HD@15fps
  • Consider FullHD@15fps if codec has HW support

Network Bandwidth Profile Settings

The Network Bandwidth Profile API is not yet supported in Mobile platforms.

Track Priority Settings

Setting Recommended value
webcam
  • Use low
screen
  • Use high

Other API settings

Setting Recommended value
Network Quality API Active
Dominant Speaker Detection Active

Developing Applications with presentation mode

Applications using presentation mode typically share the following properties:

  • Interactions are one-to-many (i.e. one participant presents to a large audience).
  • The UI layout is designed to enhance one main video track (e.g. the presenter screen-share).
  • The rest of the video tracks may or may not be displayed as they are not so relevant.
  • Presenter quality is critical and more relevant than keeping viewers' tracks on.

Applications using presentation mode typically have a screen-share track whose quality must be maximized by all means. They may additionally display the presenter’s webcam or other participants webcam but with lower priority.

Desktop Clients in Group Rooms: Recommended settings for presentation mode

Codec Settings

Setting Recommended value
Video codec (1-to-1 Rooms)
  • Use grid mode instead
Video codec (multiparty Rooms)
  • Use VP8 Simulcast (critical for the participant publishing the main track)
  • If VP8 or H.264 is used by most participants, then use grid mode instead
Audio codec
  • Use Opus (default)

Video Capture Settings

Setting Recommended value
For webcam
  • Use VGA@30fps
  • Consider HD@30fps if CPU resources make it possible
For screen
  • Use FullHD@15fps
  • Consider HD@15fps if you detect CPU overuse

Network Bandwidth Profile Settings

Setting Recommended value
mode
  • Use presentation
maxSubscriptionBitrate
  • Keep default (unlimited)
dominantSpeakerPriority
  • Keep default (standard)
maxTracks
  • Keep it under 10
renderDimensions
  • For high use FullHD
  • For standard use HD
  • For low use QCIF

Track Priority Settings

Setting Recommended value
webcam
  • Use low
screen
  • Use high

Other API settings

Setting Recommended value
Network Quality API Active
Dominant Speaker Detection Active

Mobile Clients in Group Rooms: Recommended settings for presentation mode

Codec Settings

Setting Recommended value
Video codec (1-to-1 Rooms)
  • Use grid mode instead
Video codec (multiparty Rooms)
  • Use VP8 Simulcast
  • If VP8 or H.264 is used by most participants, then use grid mode instead
Audio codec
  • Use OPUS (default)

Video Capture Settings

Setting Recommended value
For webcam
  • Use VGA@30fps
  • Consider HD@30fps if codec has HW support
For screen
  • Use HD@15fps
  • Consider FullHD@15fps if codec has HW support

Network Bandwidth Profile Settings

The Network Bandwidth Profile API is not yet supported in Mobile platforms.

Track Priority Settings

Setting Recommended value
webcam
  • Use low
screen
  • Use high

Other API settings

Setting Recommended value
Network Quality API Active
Dominant Speaker Detection Active
Luis Lopez
Rate this page:

ヘルプが必要ですか?

誰しもが一度は考える「コーディングって難しい」。そんな時は、お問い合わせフォームから質問してください。 または、Stack Overflow でTwilioタグのついた情報から欲しいものを探してみましょう。