Google Summer of Code (hence GSoC) is a program offered by Google wherein university students can propose a project to open source software organisations. Mentors from the organisation’s community accept proposals and will guide the students through the work, which will be done over the summer break. Google sponsors the work done by the students by giving out a stipend to students who successfully complete their proposed project.
In 2016 I proposed a project involving improving WebRTC through Debian. They lend me out to Jitsi, an organisation that has several open-source software projects involving real-time audio and video communication. One of those projects is Jitsi Meet, which is a web (and now mobile(android, ios)) application which can be used to securely and anonymously communicate with (a group of) people over the internet. Jitsi Meet rivals the proprietary Skype and Google Hangouts, which currently dominate the video conferencing market.
This is how I ended up working on a transcription service for Jits Meet. The goal of of the project was to enable users to be able to have a transcript of their meeting taking place on Jitsi Meet. I was mentored and guided throughout the summer by Boris Grozev and Ingo Bauersachs. We ended up with an architecture where a single client (the browser of an user) locally collects all the audio packets coming into their conference, sends it to a http-server running the open-source speech-to-text library Sphinx4, which then sends the transcription back whenever it was completed. After that the client has to merge all transcription of user into a single file and publish it to everyone in the conference. This architecture did not end up being usable in the current Jitsi Meet application.
This year I have again been accepted as a GSoC student. Google has directly accepted the Jitsi organisation into GSoC 2017 and Jitsi has accepted my proposal to continue my work on the transcription service of last year. I will be mentored by Boris Grozev, Saúl Ibarra Corretgé and Damian Minkov.
The plan this year is to offload managing the transcription of a Jitsi Meet conference to a server-side application. We are planning to use Jigasi, a Java application currently used to be able to call people into a Jitsi Meet conference by SIP. We want to use it to join a conference, collect all the audio and either locally transcribe the audio with Sphinx4 or use an external speech-to-text API which are provided by for example Google or IBM.