MAI-Transcribe-1

Name: MAI-Transcribe-1
Brand: Microsoft
Availability: InStock

Released: 2026-04

Official Site →

AI Tools

MAI-Transcribe-1 Overview

MAI-Transcribe-1 is a speech-to-text model from Microsoft built on a transformer architecture with a bi-directional audio encoder, released in April 2026 as a Public Preview through Azure AI Foundry and Azure Speech. It targets enterprises and developers with high-volume batch transcription needs across 25 languages. Microsoft's FLEURS benchmark figures position it above OpenAI Whisper-large-v3 on all 25 tested languages, with batch processing rated at 2.5x the speed of the company's previous Azure Fast offering.

Pros

Achieves 3.8% average Word Error Rate across 25 languages on the FLEURS benchmark, placing it first overall among the languages Microsoft tested against.
Outperforms OpenAI Whisper-large-v3 across all 25 tested languages, and beats Gemini 3.1 Flash on 22 of the same 25.
Batch transcription runs at 2.5x the speed of Microsoft's own Azure Fast offering, which meaningfully reduces turnaround time for high-volume workloads.
Microsoft cites approximately 50% lower GPU cost compared to leading alternatives, which translates to lower per-job expenses at scale.
On-premises deployment is supported alongside cloud hosting via Azure Speech, giving organizations that cannot send audio off-site a viable path to adoption.
At $0.36 USD per hour of audio on a pay-as-you-go basis, the pricing structure is straightforward to model for variable transcription volumes.

Cons

Real-time transcription is absent at launch and listed only as planned for a future release, ruling out live captioning and real-time meeting transcription use cases for now.
Speaker diarization is also missing from the current release, meaning the model cannot distinguish between different speakers in a recording — a significant gap for meeting transcription and interview workflows.
Contextual biasing, which allows models to prioritize domain-specific vocabulary, is not yet supported, limiting accuracy on specialized terminology such as medical or legal language.
The free MAI Playground is restricted to users in the United States at launch, leaving international developers without a no-cost evaluation path.
Output is limited to text transcription; there is no support for translated output, structured data formats beyond plain text, or audio outputs.
The model is currently in Public Preview status, which means the feature set, pricing, and API behavior are subject to change before general availability.

Specifications

Tool Category	voice / speech-to-text (ASR)
Model Name	MAI-Transcribe-1
Model Family	Microsoft AI (MAI) — first generation
Architecture	Transformer-based text decoder with bi-directional audio encoder
Release Status	Public Preview
Release Date	April 2, 2026
Benchmark ( F L E U R S, 25 Languages)	3.8% average Word Error Rate (WER)
F L E U R S Ranking	#1 overall WER across top 25 languages by Microsoft product usage
Competitor Comparison	Beats OpenAI Whisper-large-v3 on all 25 languages; Gemini 3.1 Flash on 22 of 25; ElevenLabs Scribe v2 and OpenAI GPT-Transcribe on 15 of 25 each
Batch Transcription Speed	2.5x faster than Microsoft Azure Fast offering
G P U Efficiency	Approximately 50% lower GPU cost than leading alternatives
Supported Languages	25 — English, French, German, Italian, Spanish, Hindi, Portuguese, Czech, Danish, Finnish, Hungarian, Dutch, Polish, Romanian, Swedish, Japanese, Korean, Chinese, Arabic, Indonesian, Russian, Thai, Turkish, Vietnamese
Automatic Language Identification	Yes
Word-level Timestamps	Yes
Automatic Punctuation	Yes
Noise Robustness	Optimized for background noise, low-quality audio, overlapping speech
Real-time Transcription	Not yet supported (planned for upcoming release)
Diarization	Not yet supported (planned for upcoming release)
Contextual Biasing	Not yet supported (planned for upcoming release)
Input Audio Formats	MP3, WAV, FLAC
Maximum File Size	200 MB
Output Types	Text transcription
A P I Access	Yes
Platform	Microsoft Foundry (Azure AI Foundry) / Azure Speech
Deployment Options	Cloud via Azure Speech; on-premises also supported
S D K Support	Azure Speech SDK; REST API available
Pricing Model	Pay-as-you-go and commitment tiers
Pricing	$0.36 USD per hour of audio
Free Tier	MAI Playground available for free testing (US-only at launch)
Enterprise	Yes — negotiated rates available via Azure Reserved Capacity and enterprise agreements
M A I Playground Availability	United States only at launch
Integration	Already powering Microsoft Copilot Voice mode, Microsoft Teams transcription, Bing, PowerPoint, Azure Speech
Licensing	Microsoft proprietary
Compliance	Azure infrastructure — inherits Azure compliance and security stack
Training Opt-out / Data Use	Microsoft Responsible AI policies apply; custom voice creation requires approval process

Specifications should be verified from official manufacturer sources. Details may vary by region or configuration.

Notes: The MAI Playground free testing environment is available in the United States only at launch. On-premises deployment is supported in addition to cloud hosting via Azure. The model is in Public Preview as of release; specifications and pricing may change before general availability.

Disclaimer: Specifications, pricing, and availability are subject to change. Please verify all information from the official manufacturer website before purchasing.