最近CSDN开展了《0元试用微软 Azure人工智能认知服务,精美礼品大放送》,当前目前活动还在继续,热心的我已经第一时间报名参与,只不过今天才有时间实际的试用。
目前活动要求博文形式分享试用语音转文本、文本转语音、语音翻译、文本分析、文本翻译、语言理解中三项以上的服务。
目前我在试用了 语音转文本、文本转语音、语音翻译 功能后,决定做一个实时语音翻译机,使用后效果是真不错。
下面我们看看如何操作吧,首先我们进入:https://portal.azure.cn/并登录。
获取密钥
在搜索框输入 认知服务 并确认:
然后可以创建语音服务:
然后输入名称,选择位置,选择免费定价,新增资源组并选择:
之后,点击创建。创建过程中会显示正在部署:
部署完成后,点击转到资源:
然后我们点击密钥和终结点,查看密钥和位置/区域:
有两个密钥任选一个即可,位置/区域也需要记录下来,后面我们的程序就需要通过密钥和位置来调用。
Azure 认知服务初体验
Azure 认知服务文档:https://docs.azure.cn/zh-cn/cognitive-services/
按文档要求,我们首先安装Azure 语音相关的python库:
pip install azure-cognitiveservices-speech
首先我们体验一下语音转文本:
测试语音转文本
文档:https://docs.azure.cn/zh-cn/cognitive-services/speech-service/get-started-speech-to-text?tabs=windowsinstall&pivots=programming-language-python
复制官方的代码后,简单修改下实现从麦克风识别语音:
import azure.cognitiveservices.speech as speechsdkspeech_key, service_region = "59392xxxxxxxxxx559de", "chinaeast2"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region, speech_recognition_language="zh-cn")
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)print("说:", end="")
result = speech_recognizer.recognize_once()
print(result.text)
speech_recognition_language决定了语言,这里我设置为中文。
我运行后,对麦克风说了一句话,程序已经准确的识别出我说的内容:
说:微软人工智能服务非常好用。
测试文本转语音
文档:https://docs.azure.cn/zh-cn/cognitive-services/speech-service/get-started-text-to-speech?tabs=script%2Cwindowsinstall&pivots=programming-language-python
借助文档我们还可以实现将转换完成的语音保存起来,但这里我只演示直接声音播放出来:
from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfigspeech_config.speech_synthesis_language = "zh-cn"
audio_config = AudioOutputConfig(use_default_speaker=True)
speech_synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)text_words = "微软人工智能服务非常好用。"
result = speech_synthesizer.speak_text_async(text_words).get()
if result.reason != speechsdk.ResultReason.SynthesizingAudioCompleted:print(result.reason)
感觉转换效果很好。
测试语音翻译功能
文档地址:https://docs.azure.cn/zh-cn/cognitive-services/speech-service/get-started-speech-translation?tabs=script%2Cwindowsinstall&pivots=programming-language-python
经测试,语音翻译同时包含了语音转文本和翻译功能:
from_language, to_language = 'zh-cn', 'en'
translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=speech_key, region=service_region, speech_recognition_language=from_language)
translation_config.add_target_language(to_language)
recognizer = speechsdk.translation.TranslationRecognizer(translation_config=translation_config)def speakAndTranslation():result = recognizer.recognize_once()if result.reason == speechsdk.ResultReason.TranslatedSpeech:return result.text, result.translations[to_language]elif result.reason == speechsdk.ResultReason.RecognizedSpeech:return result.text, Noneelif result.reason == speechsdk.ResultReason.NoMatch:print(result.no_match_details)elif result.reason == speechsdk.ResultReason.Canceled:print(result.cancellation_details)speakAndTranslation()
这里执行后并说一句话,结果:
('大家好才是真的好。', 'Everyone is really good.')
可以同时获取原始文本和译文,所以我们后面的语音翻译工具,也都使用该接口。
语音翻译机开发
程序的大致逻辑结构:
完整代码:
"""
小小明的代码
CSDN主页:https://blog.csdn.net/as604049322
"""
__author__ = '小小明'
__time__ = '2021/10/30'import azure.cognitiveservices.speech as speechsdkfrom azure.cognitiveservices.speech.audio import AudioOutputConfigspeech_key, service_region = "59xxxxde", "chinaeast2"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region,speech_recognition_language="zh-cn")
speech_config.speech_synthesis_language = "zh-cn"
audio_config = AudioOutputConfig(use_default_speaker=True)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)from_language, to_language = 'zh-cn', 'en'
translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=speech_key, region=service_region, speech_recognition_language=from_language)
translation_config.add_target_language(to_language)
recognizer = speechsdk.translation.TranslationRecognizer(translation_config=translation_config)def speakAndTranslation():result = recognizer.recognize_once()if result.reason == speechsdk.ResultReason.TranslatedSpeech:return result.text, result.translations[to_language]elif result.reason == speechsdk.ResultReason.RecognizedSpeech:return result.text, Noneelif result.reason == speechsdk.ResultReason.NoMatch:print(result.no_match_details)elif result.reason == speechsdk.ResultReason.Canceled:print(result.cancellation_details)def speak(text_words):result = speech_synthesizer.speak_text_async(text_words).get()# print(result.reason)if result.reason == speechsdk.ResultReason.Canceled:cancellation_details = result.cancellation_detailsprint("识别取消:", cancellation_details.reason)if cancellation_details.reason == speechsdk.CancellationReason.Error:if cancellation_details.error_details:print("错误详情:", cancellation_details.error_details)while True:print("说:", end=" ")text, translation_text = speakAndTranslation()print(text)print("译文:", translation_text)if "退出" in text:breakif text:speak(translation_text)
简单的运行了一下,中间的打印效果如下:
说: 我只想进转过山和大海。
译文: I just want to go in and out of the mountains and the sea.
说: 也穿越,人山人海。
译文: Also through, the sea of people and mountains.
说: 我曾经目睹这一切全部都随风飘然。
译文: I've seen it all blow in the wind.
说: 转眼成空。
译文: It's empty.
说: 问,世间能有几多愁?
译文: Q, how much worry can there be in the world?
说: 退出。
译文: quit.
最终的语音功能也只有各位亲自体验了噢。