The Tech Stack
Overview
At Huddle, we used three main pieces in our technical stack. Our website was displayed as a React front-end for the web. The front-end communicated with Twilio Programmable Video to send audio and video streams between users, and with a Django/Redis backend, hosted in Heroku, to synchronize the state information and handle messaging.
When a user joined a room, the front-end establishes a connection with the Twilio call. When this happens that user is assigned a Twilio user id. This id is then published to our Django server to update the state of the room with the new user. During the duration of the call, the front-end continuously polls the backend, monitoring for any state updates. If anything changes (for example, a new user joins, a user leaves or someone moves between huddles), all front-ends receive the new state and trigger a re-render. The rendering logic will look at which users are in which huddles and set the coordinates of the video frames appropriately to group huddle participants together. Additionally, if a user moves huddles, or joins, a POST request is sent to the backend, updating the state and triggering a rerender for all other users.
React
For our front-end, we used Typescript-based React, with functional components. The Typescript was important because we knew that with a UX heavy product such as ours, we would be writing a ton of front-end logic. Since Typescript — as the name suggests — is a strictly typed language, it allows for static type-checking. This allowed us to have a large codebase and allow multiple developers to work in parallel without conflicts. Typescript also allows us to leverage ES6 features while still being backwards compatible with browsers on ES3 or ES5.
Our selection of React was made due to its performant and flexible nature, as well as its relatively low learning curve compared to other frameworks such as Angular or Vue. Furthermore, we used functional components rather than class-based ones since functional components are regarded as the future of the framework ever since the introduction of hooks, as they are more performant and easier to develop. Since our platform was so UX intensive, a performant framework was critical since slight delays in video rendering or audio processing can lead to an extremely disorienting experience. In terms of flexibility, React’s components are reusable by nature and lead to a much more manageable component library. The use of inheritance also made reusing and maintaining components much easier. A subtlety of React that made it slightly easier to learn was the JSX syntax. It allowed us to leverage our understanding of HTML tags to quickly iterate on components. We also used the Material-UI library, enabling us to keep the HTML, CSS and TSX of an entire component in one file.
React is also incredibly versatile in terms of its state management and information flow. It allows for local management through hooks, global management through context or redux, and parent to child information flow through functional components. This allowed us to separate local states that didn’t need propagation through the entire app — the status of an individual button — versus global states that many components on all different levels of the tree needed access to.
Another component of our design was the ability for users to engage with state-synched applications built into the platform. For example, users could play the popular Zoom game Codenames right on Huddle, without needing to go to an external site. Users could interact with the app directly and all other users would see the updated board as it changed. In order to develop this in a scalable fashion, we simply displayed these applications as iframe embeddings of external websites that handled state synching on their own. We started by finding existing games and applications that allowed users to set custom URLs, but in the long-term, our aim was to get small teams of developers to create custom experiences for the platform, that, though this embedding model, could be developed independently of the actual Huddle codebase, leading to faster and easier development.
Twilio
Twilio was our video provider for Huddle. This means that all video and audio streams were handled by the service, we simply needed to integrate the API into our front-end. Twilio provides a variety of communication solutions, including messaging, voice, and email, but we used Twilio Programmable Video. There are other APIs that provide video solutions, such as Dolby.io and Agora Video Call.
One reason we chose Twilio was that it supported up to 50 participants in a video room, which is a high limit compared to other APIs we investigated at the time. We thought that the true value of our spatial breakout solutions is not seen with smaller room sizes, so it was important to allow these larger calls. Plus, our initial target market was campus clubs, which often have about 40–50 members, so we wouldn’t be able to provide a video solution unless we could support calls of this size.
Additionally, the API allowed us a ton of flexibility to play with video and audio settings, which was important to our product. For one, we wanted to be able to treat audio streams individually, in order to selectively turn off audio streams for users that were not in a users huddle, or, long-term, allow for spatial audio mixing. Twilio enabled these abilities. In addition, we could adjust audio and video stream settings, to downsample streams that were unimportant to a particular user, in order to maximize performance.
Finally, in addition to audio and video tracks, Twilio allows users to stream ‘data tracks’ as well. The company advertises this feature as useful to enabling virtual “whiteboards and animations,” so we thought it could be used to power the state-synced applications we were looking to build into Huddle, along with other ways to communicate, such as chats and reactions.
It’s important to mention that a third-party video provider was only a stepping stone to our true vision for Huddle. Some of our concepts, such as smart topic labeling, emotion detection, and AI video cropping may have only been possible with our own video solution. However, in order to get the product in the hands of users and receive feedback as soon as possible, we chose to use an off-the-shelf solution for the time being.
Django
Critical to our visual breakout solution was a backend endpoint for state synchronization. This was important, as we needed all users to be able to see the same users in the same Huddles.
We used Django because it is super easy to develop in, meaning we could get off the ground and iterate super quickly. It is also very performant, which was important as we wanted the platform to feel very responsive. Finally, since it's python-based, it had a ton of support for libraries, so it could grow to support any future needs.
Redis
As a database, we used Redis, which is a NoSQL, key-value based, in-memory database. Because it is not stored on-disk, like other databases such as SQL, Redis is much more performant in terms of read-write times. This was very important as we had a very write-heavy workflow, as users could jump between Huddles quickly, meaning the state could change many times per second, and we wanted to make sure the experience felt super snappy for a user. Although a database like Redis is not as great for complex, relational querying, this workflow is not super important for our use case, as a client only needs to access data for a specific call and we had no user accounts.
In the long-term, we did want to support user accounts and persistent Huddle rooms that maintained state when a user re-entered. For this, our plan was to convert the Redis database as a write-back cache, in conjunction with a SQL database. The SQL database would be the long term store for user profile information and room state. When a user joined a Huddle room, the Redis database will retrieve the relevant information from SQL. Then during the duration of the call, our server would perform all reads and writes on the Redis database. Finally, once all users leave the call, the SQL database will be updated with the new room settings and state.
Heroku
Finally, we used Heroku to host our backend service. We chose Heroku because we were able to get our Django and Redis project up and running in just minutes, through a simple terminal command. This ease-of-use meant that we could focus on building the best video chat platform in the world, rather than worry about managing infrastructure. It was also scalable, so it could grow to support a larger user base in the future.