Issues with ConversationTranscriber Realtime STT over Private Endpoint – Requires OCSP/CRL Internet Access and Non-Documented Endpoint Format
I am using Azure Speech Service and a VM inside the same VNET. The Speech resource has a Private Endpoint configured, and the VM accesses it through the private IP. Inside the VM, I run a Python backend that uses ConversationTranscriber (Realtime Speech-to-Text) from the Speech SDK.
Problem
Realtime transcription only works if the VM has outbound Internet access. Initially, I configured an NSG with Deny Internet → the ConversationTranscriber stopped receiving transcribed events. After switching to Azure Firewall and explicitly allowing several CA/OCSP/CRL endpoints (such as DigiCert and Microsoft certificate services), Realtime transcription started working again.
This behavior is unexpected because both the VM and Speech Service use the private endpoint, and communication should remain inside the Azure backbone.
Unexpected SDK Behavior
The documentation describes using a private endpoint like:
wss://<private-endpoint-name>.cognitiveservices.azure.com
However, this endpoint does not work with ConversationTranscriber in my environment.
I must use the full STT endpoint path for the service to function:
wss://<service-name>.cognitiveservices.azure.com/stt/speech/recognition/conversation/cognitiveservices/v1?language=ja-JP
Only with this extended endpoint does the SDK succeed in establishing a connection and receiving transcription events.
Questions
- Why does Realtime STT using ConversationTranscriber require access to public OCSP/CRL certificate validation endpoints (DigiCert, Microsoft), even when using a Private Endpoint?
- Is this expected behavior for Speech Realtime?
- Is certificate revocation checking mandatory and not supported over private link?
- Why does the private-endpoint URL described in the documentation fail for ConversationTranscriber, while the full STT endpoint (
/stt/speech/recognition/...) works? - With the private endpoint + Azure Firewall rules already configured, is there anything additional required to ensure stable Realtime STT operation in a private network?
- Additional domains/ports?
- Any SDK-specific configuration?
- Do other Azure Cognitive Services also require outbound Internet access for OCSP/CRL validation when using Private Endpoints?
- Does Batch Transcription work entirely through the Private Endpoint without requiring any outbound Internet connectivity?
- Can Batch STT operate in a VM with zero Internet access (only private link)?
Additional Info
Speech Service: Private Endpoint enabled
VM: Private subnet
NSG Deny Internet → Realtime fails
Azure Firewall + allow CA/OCSP → Realtime works
Using Python SDK: ConversationTranscriber
- Screenshot attached showing the CA domains required for outbound connectivityI am using Azure Speech Service and a VM inside the same VNET. The Speech resource has a Private Endpoint configured, and the VM accesses it through the private IP. Inside the VM, I run a Python backend that uses ConversationTranscriber (Realtime Speech-to-Text) from the Speech SDK.