The journey of speech technology!!!

I will walk through my experience of speech technology. Everybody thinks the speech is just sound. NO, it’s not!! In the software world, speech is our data. HOW??


Right now, The world is growing fast with a technology called artificial intelligence and machine learning. Speech technology is growing one of a big part in machine learning and Artifical intelligence world. How?? The only thing is the DATA. So, How to get the data?? We can get the data in many ways(Social media, e-commerce ) But here we are getting data from speech!!!

We have a lot of open source repository for speech recognition but few of them really are good those are Deep speech, Tensorflow, Leon but the thing is we have to collect data and we need to train the model. In order to develop, it could be much complicated and we have to spend some time on implementation.

Train data to create the model

Now, We don’t want to cook food We can directly go and eat food at restaurants. Here also the same concepts but here technical restaurants like Google, AWS, and Microsoft have own speech technology service so we don’t want to worry about the deep level of machine learning. Now we are going to choose a good technical restaurant.

Who understands you better? Google or AWS???

Google has a lot of features like Noise Robustness, Multichannel Recognition, Phrase Hints, Speaker Diarization, etc... but Speaker Diarization, Auto-Detect Language, Automatic Punctuation are the beta version those are won’t give an exact result.

Amazon gives all features as production ready services.

Google does not support custom vocabulary dataset. For instance, Particular organisation has special words like the medical realted words so google won’t give that word exactly.

We can create own custom vocabulary list for certain domain in amazon transcribe service.

Audio sampling rate is required at Google so we should find sampling rate of audio file before processing speech to text.

Amazon gives as an optional parameter.

Google supports multiple languages.

Amazon Transcribe supports few of languages US English, US Spanish, British English, Australian English, French, Canadian French, Italian, and Brazilian Portuguese

In google service, After converting an audio file which will give the transcripted response then we need to manually upload a transcripted file into the storage.

By default, Amazon service configured S3 bucket for upload a transcripted audio file into storage and it will give transcripted file URL. We can mention our S3 bucket name as well to store transcripted file.

Google using an encoding format .Hence it will accept only FLAC , WAV , and AMR format files so before proceeding we have to convert audio file to as per encoding format.

Amazon supports MP3, MP4, WAV, and FLAC format.

Finally, Everybody things what is the price of both??

Google Speech to Text price details here and Amazon transcribe service price details here.

As per my work experience of both, I would suggest to use Amazon Transcript Service because it gives good features and developer friendly for configuration level.

In the next blog, I will write about how to configure and use amazon transcribe service with sample python code.

$……………….…………… Happy Reading……………………………….$

If you enjoyed this article, feel free to hit that clap button 👏 to help others find it.

I have a passion for understanding technology at a fundamental level and Sharing ideas and code. * Aspire to Inspire before I expire*

