Skip to main content

Microsoft

MAI-Transcribe-1

Released: 2026-04

Official Site →
AI Tools

MAI-Transcribe-1 Overview

MAI-Transcribe-1 is a speech-to-text model from Microsoft built on a transformer architecture with a bi-directional audio encoder, released in April 2026 as a Public Preview through Azure AI Foundry and Azure Speech. It targets enterprises and developers with high-volume batch transcription needs across 25 languages. Microsoft's FLEURS benchmark figures position it above OpenAI Whisper-large-v3 on all 25 tested languages, with batch processing rated at 2.5x the speed of the company's previous Azure Fast offering.

Pros

  • Achieves 3.8% average Word Error Rate across 25 languages on the FLEURS benchmark, placing it first overall among the languages Microsoft tested against.
  • Outperforms OpenAI Whisper-large-v3 across all 25 tested languages, and beats Gemini 3.1 Flash on 22 of the same 25.
  • Batch transcription runs at 2.5x the speed of Microsoft's own Azure Fast offering, which meaningfully reduces turnaround time for high-volume workloads.
  • Microsoft cites approximately 50% lower GPU cost compared to leading alternatives, which translates to lower per-job expenses at scale.
  • On-premises deployment is supported alongside cloud hosting via Azure Speech, giving organizations that cannot send audio off-site a viable path to adoption.
  • At $0.36 USD per hour of audio on a pay-as-you-go basis, the pricing structure is straightforward to model for variable transcription volumes.

Cons

  • Real-time transcription is absent at launch and listed only as planned for a future release, ruling out live captioning and real-time meeting transcription use cases for now.
  • Speaker diarization is also missing from the current release, meaning the model cannot distinguish between different speakers in a recording — a significant gap for meeting transcription and interview workflows.
  • Contextual biasing, which allows models to prioritize domain-specific vocabulary, is not yet supported, limiting accuracy on specialized terminology such as medical or legal language.
  • The free MAI Playground is restricted to users in the United States at launch, leaving international developers without a no-cost evaluation path.
  • Output is limited to text transcription; there is no support for translated output, structured data formats beyond plain text, or audio outputs.
  • The model is currently in Public Preview status, which means the feature set, pricing, and API behavior are subject to change before general availability.

Specifications

Tool Categoryvoice / speech-to-text (ASR)
Model NameMAI-Transcribe-1
Model FamilyMicrosoft AI (MAI) — first generation
ArchitectureTransformer-based text decoder with bi-directional audio encoder
Release StatusPublic Preview
Release DateApril 2, 2026
Benchmark ( F L E U R S, 25 Languages)3.8% average Word Error Rate (WER)
F L E U R S Ranking#1 overall WER across top 25 languages by Microsoft product usage
Competitor ComparisonBeats OpenAI Whisper-large-v3 on all 25 languages; Gemini 3.1 Flash on 22 of 25; ElevenLabs Scribe v2 and OpenAI GPT-Transcribe on 15 of 25 each
Batch Transcription Speed2.5x faster than Microsoft Azure Fast offering
G P U EfficiencyApproximately 50% lower GPU cost than leading alternatives
Supported Languages25 — English, French, German, Italian, Spanish, Hindi, Portuguese, Czech, Danish, Finnish, Hungarian, Dutch, Polish, Romanian, Swedish, Japanese, Korean, Chinese, Arabic, Indonesian, Russian, Thai, Turkish, Vietnamese
Automatic Language IdentificationYes
Word-level TimestampsYes
Automatic PunctuationYes
Noise RobustnessOptimized for background noise, low-quality audio, overlapping speech
Real-time TranscriptionNot yet supported (planned for upcoming release)
DiarizationNot yet supported (planned for upcoming release)
Contextual BiasingNot yet supported (planned for upcoming release)
Input Audio FormatsMP3, WAV, FLAC
Maximum File Size200 MB
Output TypesText transcription
A P I AccessYes
PlatformMicrosoft Foundry (Azure AI Foundry) / Azure Speech
Deployment OptionsCloud via Azure Speech; on-premises also supported
S D K SupportAzure Speech SDK; REST API available
Pricing ModelPay-as-you-go and commitment tiers
Pricing$0.36 USD per hour of audio
Free TierMAI Playground available for free testing (US-only at launch)
EnterpriseYes — negotiated rates available via Azure Reserved Capacity and enterprise agreements
M A I Playground AvailabilityUnited States only at launch
IntegrationAlready powering Microsoft Copilot Voice mode, Microsoft Teams transcription, Bing, PowerPoint, Azure Speech
LicensingMicrosoft proprietary
ComplianceAzure infrastructure — inherits Azure compliance and security stack
Training Opt-out / Data UseMicrosoft Responsible AI policies apply; custom voice creation requires approval process

Specifications should be verified from official manufacturer sources. Details may vary by region or configuration.

Notes: The MAI Playground free testing environment is available in the United States only at launch. On-premises deployment is supported in addition to cloud hosting via Azure. The model is in Public Preview as of release; specifications and pricing may change before general availability.
Disclaimer: Specifications, pricing, and availability are subject to change. Please verify all information from the official manufacturer website before purchasing.