Transcribe Podcast Serverlessly using Cloud Build and Cloud Speech-to-Text
Have you listened to a great podcast lately and wished that the transcript was available? Find out how you can generate a podcast transcript using GCP services serverlessly.
_____
My Podcast Story
I joined the podcast craze late and have only been enjoying it starting from one year ago. I used to find it difficult to keep my focus and to stay awake while listening to podcasts. As someone who prefers to consume information by reading (and sometimes highlighting texts), I found it difficult to consume information from a podcast for my learning purpose. When I learned about James Clear’s habit stacking technique, I tried to implement it for my podcast listening habit by combining it with my running/walking exercise habit. As a result, podcast has now become one of my most favorite learning mediums. There are so many interesting podcasts available for free, with a variety of topics that I can listen to depending on my mood and curiosity during the day. However, I still experience one challenge from listening to podcasts.
There are times after listening to a good podcast, I sometimes want to review the content and make personal notes for my learning and future references. Unfortunately, many good podcasts do not provide any transcript, hence makes it difficult for me to continue my learning without spending extra minutes or hours re-listening to the content. Apart from making it easier to review the content, there are other benefits of having podcast transcripts available, such as making it accessible for hearing-impaired and non-native English listeners, and making the podcast content searchable.
Cloud Speech-to-Text Experiment
Faced with this challenge, one fine day over the weekend, I spent some time experimenting using GCP Cloud Speech-to-Text Machine Learning APIs to generate podcast transcript automatically. I submitted an audio file through a gcloud
command and wait for the generated transcript within minutes. I found that the resulting transcript was not 100% accurate, but it was good enough for my purpose. Based on my experience, its accuracy varies a lot depending on the quality of the podcast audio, which sometimes includes the speakers’ accent clarity and background noise existence.
I generated the transcript by setting up a Google Compute Engine (GCE) instance manually and then executing the following steps:
- Download the podcast episode from a URL link.
- Convert the audio encoding from MP3 to FLAC using FFmpeg.
- Copy the FLAC file to GCS bucket.
- Submit the FLAC file to Speech-to-Text recognition API.
- Wait until the Speech recognition API completes.
- Once it completes, retrieve the generated transcription result in JSON format.
- Use
jq
to merge the transcript texts from the JSON file into a TXT file. - Upload the transcript TXT file to GCS bucket.
Moving to Cloud Build
Not satisfied with doing it manually, I aimed to make the whole end-to-end process to run as simple as possible in the next few iterations. Eventually, I settled with the approach of using Cloud Build to run the entire process serverlessly. Below, I outline how I use this approach to transcribe a podcast. You can find the source code from my “audio-transcriber-cloud-build” GitHub repository.
We can take the step-by-step process outlined above and translate the steps into a cloudbuild.yaml
build configuration file.
|
|
After the cloudbuild.yaml
is ready, we can then submit it to Cloud Build by running a single gcloud
command.
|
|
You may need to wait for some time until the build completes, depending on the duration of your audio file. Once the build completes, you can find the resulting transcript in the same GCS bucket folder as the source audio 🎉
To understand the pricing details of the core resources used, please refer to the links below:
All of them are eligible for the Google Cloud Free Tier, which allows you to use resources for free up to specific limits.
Try it!
You can try running the same transcription process on your GCP project by clicking the Open in Google Cloud Shell
button above (which you can also do from the GitHub repository) and then follow the walkthrough tutorial to guide you along the way. Do not forget to customize the substitution variables at the bottom of the cloudbuild.yaml
based on the podcast/audio that you want to transcribe.
Please let me know how it goes for you and share any of your learning and interesting experiences!
_____