Bookkeeping Service Providers

  • Accounting
  • Bookkeeping
  • US Taxation
  • Financial Planning
  • Accounting Software
  • Small Business Finance
You are here: Home / CLOUD / Multi-language identification and transcription in Video Indexer

Multi-language identification and transcription in Video Indexer

November 25, 2019 by cbn Leave a Comment

Multi-language speech transcription was recently introduced into Microsoft Video Indexer at the International Broadcasters Conference (IBC). It is available as a preview capability and customers can already start experiencing it in our portal. More details on all our IBC2019 enhancements can be found here.

Multi-language videos are common media assets in the globalization context, global political summits, economic forums, and sport press conferences are examples of venues where speakers use their native language to convey their own statements. Those videos pose a unique challenge for companies that need to provide automatic transcription for video archives of large volumes. Automatic transcription technologies expect users to explicitly determine the video language in advance to convert speech to text. This manual step becomes a scalability obstacle when transcribing multi-language content as one would have to manually tag audio segments with the appropriate language.

Microsoft Video Indexer provides a unique capability of automatic spoken language identification for multi-language content. This solution allows users to easily transcribe multi-language content without going through tedious manual preparation steps before triggering it. By that, it can save anyone with large archive of videos both time and money, and enable discoverability and accessibility scenarios.

Multi-language audio transcription in Video Indexer

The multi-language transcription capability is available as part of the Video Indexer portal. Currently, it supports four languages including English, French, German and Spanish, while expecting up to three different languages in an input media asset. While uploading a new media asset you can select the “Auto-detect multi-language” option as shown below.

1.	A new multi-language option available in the upload page of Video Indexer portal

Our application programming interface (API) supports this capability as well by enabling users to specify 'multi' as the language in the upload API. Once the indexing process is completed, the index JavaScript object notation (JSON) will include the underlying languages. Refer to our documentation for more details.

Additionally, each instance in the transcription section will include the language in which it was transcribed.

2.	A transcription snippet from Video Indexer timeline presenting different language segments

Customers can view the transcript and identified languages by time, jump to the specific places in the video for each language, and even see the multi-language transcription as video captions. The result transcription is also available as closed caption files (VTT, TTML, SRT, TXT, and CSV).

two languages

Methodology

Language identification from an audio signal is a complex task. Acoustic environment, speaker gender, and speaker age are among a variety of factors that affect this process. We represent audio signal using a visual representation, such as spectrograms, assuming that, different languages induce unique visual patterns which can be learned using deep neural networks.

Our solution has two main stages to determine the languages used in multi-language media content. First, it employs a deep neural network to classify audio segments with very high granularity, in other words, very few seconds. While a good model will successfully identify the underlying language, it can still miss-identify some segments due to similarities between languages. Therefore, we apply a second stage for examining these misses and smooth the results accordingly.

3.	A new insight pane showing the detected spoken languages and their exact occurrences on the timeline

Next steps

We introduced a differentiated capability for multi-language speech transcription. With this unique capability in Video Indexer, you can become more effective about the content of your videos as it allows you to immediately start searching across videos for different language segments. During the coming few months, we will be improving this capability by adding support for more languages and improving the model’s accuracy.

For more information, visit Video Indexer’s portal or the Video Indexer developer portal, and try this new capability. Read more about the new multi-language option and how to use it in our documentation.

Please use our UserVoice to share feedback and help us prioritize features or email visupport@microsoft.com with any questions.

Share on FacebookShare on TwitterShare on Google+Share on LinkedinShare on Pinterest

Filed Under: CLOUD

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • March 2016

Recent Posts

  • FabCon Vienna: Build data-rich agents on an enterprise-ready foundation
  • Agent Factory: Connecting agents, apps, and data with new open standards like MCP and A2A
  • Azure mandatory multifactor authentication: Phase 2 starting in October 2025
  • Microsoft Cost Management updates—July & August 2025
  • Protecting Azure Infrastructure from silicon to systems

Recent Comments

    Categories

    • Accounting
    • Accounting Software
    • BlockChain
    • Bookkeeping
    • CLOUD
    • Data Center
    • Financial Planning
    • IOT
    • Machine Learning & AI
    • SECURITY
    • Uncategorized
    • US Taxation

    Categories

    • Accounting (145)
    • Accounting Software (27)
    • BlockChain (18)
    • Bookkeeping (205)
    • CLOUD (1,321)
    • Data Center (214)
    • Financial Planning (345)
    • IOT (260)
    • Machine Learning & AI (41)
    • SECURITY (620)
    • Uncategorized (1,284)
    • US Taxation (17)

    Subscribe Our Newsletter

     Subscribing I accept the privacy rules of this site

    Copyright © 2025 · News Pro Theme on Genesis Framework · WordPress · Log in