Concept: "Zoom meets Twitch"

Summary

This concept is around using Mist Server in novel ways, to solve some “real” problems in the world of global media infrastructure.

Concept

The concept is dubbed “Zoom meets Twitch”, and is about demonstrating the use of a single mistserver node to act as a hub for both of the following use cases:

  1. video/audio chat

and

  1. broadcasting

Running both mechanisms via the same software can enable the creation of unprecedented means of communicating in a distributed world. This is further discussed below.

Video/Audio chat

Communication mechanisms characterised as either “one-to-one”, “one-to-few” and latterly “few-to-few”.

Information passes in all directions, each participant can publish and consume.

Requires low-latency (near-real-time), for conversational real-time sharing.

Often implemented using low-latency webRTC, or PSTN / GSM (landline, mobile).

Examples include: Zoom, Hangouts, Clubhouse, WhatsApp, Jitsi, Discord, Twitter Spaces, Signal, Telegram, Facetime, Slack, Facebook Messenger, Skype / MSN Messenger, Chatroulette, Microsoft Netmeeting, public service telephone network, GSM.

Broadcasting

Communication mechanisms characterised as “one-to-many” (recently “few-to-many”).

Information passes in one direction, from broadcaster to consumer, the consumer can only consume.

Near-real-time is usually acceptable, i.e. with a few seconds delay for an improved user experience via buffering.

Often implemented as hls (http live streaming), or via electromagnetic waves (radio), satellite and cable.

e.g. Twitch, YouTube Live, Facebook Live, Periscope, Meerkat, livestream.com, Livepeer Broadcaster, Mist / RTMP, nginx / RTMP), private sattelite broadcasting, public / national broadcasting (radio-waves, freeview), cable TV, OTT.

Convergence

The implication of having both use cases served by a single node, is that new forms of real-time digital social media can be achieved more easily.

This comes from allowing easier combinations of the means of a) connecting us in real-time, with b) widely sharing our communications in real-time. These combinations can be made in software.

Example - “livestream a community call”

A simple example would be to configure a node to automatically broadcast the media being exchanged within each video/audio chat.

Let’s say that @Shann, @chrishobcroft, @Titan-Node and @Thulinma were having a four-way video/audio chat. Their conversation would be livestreamed, and hence publicly available in real-time for anyone to watch / listen to.

Alone, such a mechanism can provide the ultimate in transparency, as a communications platform for an open world.

Wider Problems

Such a concept can act as a solution to current world problems, such as:

a) general over-reliance on fragile centrally-controlled communications platforms for sharing live media (Zoom, Clubhouse, Twitter Spaces, Discord, YouTube, Twitch).

b) secure recording of community calls / watercoolers / private conversations, for potential release to public record, via existing video-on-demand services. Because if it’s being streamed, it can also be recorded.

c) automated broadcasting (livestreaming) of conversations (per example above).

As well as potentially driving additional traffic to Livepeer’s public network of Orchestrators, this can also provide an interesting new building block as we continue to work out what infrastructure is going to be needed for this “metaverse” thing.

But ser, Jitsi already has rtmp streaming built in, and could just stream to a Mist endpoint

Yes it does. I didn’t try it, but mainly because I didn’t want to.

Their original implementation was aimed directly at supplying YouTube with yet more content, until the jitsi community worked out ways to generalise it to allow streaming to any rtmp endpoint (i.e. not only the YouTube one connected to your Google login).

So yes, it would be possible to “just deploy jitsi and make it stream to Mist”, but when Mist can handle webRTC, and jitsi is effectively just a nice front end onto a webRTC platform, then it seems like a good idea to simplify to just running Mist :slight_smile:

In-home Usage

If such a media hub were to exist, it could even go some way to replacing home computers, smart TV logic, set-top boxes, as well as home telephones (one-to-one audio chat)

Devices such as Facebook Portal TV demonstrate how a camera can fit in to a home’s living room, to provide video/audio chat facilities.

If such a platform existed, it can then be interesting to experiment with running on low-power devices such as Raspberry Pi, with connected camera and hdmi output.

Community Moderation

If such a platform existed, it could then be interesting to integrate the “access control” logic for such a system with a public permissionless blockchain.

Requiring users to prove their ownership of an account via use of a keypair (e.g. “Sign In With Ethereum”) can act as an analogue for their “identity” (e.g. Ethereum Name Service, which can then have permissions granted / revoked based on decentrally-governed datasets.

This would allow for decentrally-organised experiments in “community moderated” real-time social media.

4 Likes

Great concept @chrishobcroft!

The “broadcasting video conference call” use case is growing in popularity, and definitely an interesting area of experimentation for Livepeer. I especially like the idea of controlling streaming experiences with blockchain based identity / state.

The specific implementation detail can be somewhat of a challenge, especially around adding the additional SFU functionality to Mist. While Mist does allow WebRTC-based ingest and output, replicating the full functionality of Jitsi/MediaSoup/Pion is not a trivial amount of work.

Another real-world consideration is reliability. While p2p-based WebRTC works ok, it starts to break down quickly after 4-5 participants. In order to add more people, all peers need to connect to a hosted server. In fact, most of the time, these are hosted server clusters in different regions of the world. Maybe this is a problem to be solved at a later point when there is a need for scale, but it’s important to point this out.

My suggestion for easy next step is to export video from something like Jitsi, or look into software that allows you to automatically join a WebRTC session and export as a single stream. This way, we can focus on making Mist work really well for WebRTC ingest/ouptut.

2 Likes

Thanks for your reply @ericxtang.

Decentralising control of live media

I especially like the idea of controlling streaming experiences with blockchain based identity / state.

Me too! It would provide the opportunity to create new ways for real-time content to be controlled, like “community-based moderation / censorship” and “democratic broadcast programme scheduling”. This can range from e.g. centrally-controlled “dictatorship” systems, through to e.g. fully permissionless “libre” systems… and everything in between, as we see fit, as the global distributed society evolves.

I am very excited to see what materialises in this field!

p2p vs. server-based

p2p-based WebRTC works ok, it starts to break down quickly after 4-5 participants. In order to add more people, all peers need to connect to a hosted server.

Acknowledged. But this is not such a problem as there will need to be a hosted server involved anyway, to broadcast a “source” stream. Or perhaps I’m missing something?

One server, not two :slight_smile:

The big opportunity here is in having one server handling both a) the “video conference call” use case, and b) the “broadcasting” use case, instead of needing two.

The benefits of a one-server approach would be to simplify the end-to-end stack in terms of video transport, integrations, and authentication - allowing projects building such systems to further innovate.

For example, a “one server” approach would require there to only be one integration with the blockchain-based state for permissioning:
e.g. a user to be permitted to: join a call / share camera / share microphone / share screen etc.,
e.g. a “room” to broadcast itself (or record itself).

Such an approach would also only require one integration with Livepeer Public Network, for processing video:
e.g. transcoding hls streams e.g. 1080p into 144p for accessibility of the broadcast;
e.g. “multiplexing” >1 webRTC streams into a single hls broadcast stream;
e.g. object detection / audio in webRTC streams, and other future Orchestrator features.

Having this in a single server also removes the need to use rtmp/srt. The webRTC streams can be converted directly into hls, instead of needing to convert webRTC-to-rtmp/srt in one server, and then rtmp/srt-to-hls in another… feels like wasted effort.

Finally, it would remove the need for the Video Conference Call Server and the Broadcast Server to be “aware of each other”, e.g. for authenticating rooms to be able to broadcast.

While Mist does allow WebRTC-based ingest and output, replicating the full functionality of Jitsi/MediaSoup/Pion is not a trivial amount of work.

Understood. And I don’t suggest replicating the full functionality of these systems you mention.

There is value though, in being able to demonstrate even a “minimum-viable” implementation of such a powerful system in a single node. From my limited understanding of how Mist work, such functionality appears to be “within reach”.

Such a demonstration can serve to catalyse our collective thinking about how to more-effectively a) connect people (video conference), and b) share (broadcast).

Next steps

My suggestion for easy next step is to export video from something like Jitsi, or look into software that allows you to automatically join a WebRTC session and export as a single stream. This way, we can focus on making Mist work really well for WebRTC ingest/ouptut.

Yes! Such an approach using “separate servers” can be pursued immediately, and can help develop a coherent UX, and even allow for initial experimentation with blockchain-based permissioning.

For example, what Huddle01 is already enabling is very interesting (see below), especially in terms of integration with Ethereum via MetaMask. This validates the desire amongst the video developer community to build out the “broadcasting video conference call” use case.

As above, there is an opportunity to help projects like this, and others which may follow, by enabling a “one hosted server” architecture to handle both use cases. So, I am excited to see how well Mist works for webRTC ingest/ouptut, and webRTC-to-hls conversion!

Huddle01

Here is a brief overview of Huddle01’s functionality, available at huddle01:

Two users in a video conference call, with the user’s wallet connected, and streaming to Livepeer from “inside” the call:

The conference call being broadcast, with chat function for viewers to discuss what they are watching:

1 Like