7.6 Microphone Management Section

7.6.1 Overview

The robot can use two different microphones as interaction input sources by default: external microphones (depusheng handheld microphone and flasher lapel microphone) and internal microphones (built-in microphones of the robot). Additionally, a silent mode can be set to disable interaction. The microphone switching and silent mode settings can be done through the AimMaster software. We also provide RPC interfaces for switching the microphone source and setting the silent mode.

Furthermore, we provide raw audio output from the microphone (with noise reduction, echo cancellation, and VAD), which can be used to obtain the robot's microphone audio and integrate it into other interaction systems after disabling Agibot's own interaction chain.

Internal microphone interaction logic (sound, face, lip shape, distance):

The internal fan noise of the robot is relatively loud, so it is recommended that the customer's wake-up and conversation sounds be as loud as possible.
The main focus is on the three largest faces in the center camera. The face will only switch when there is a significant change in face size.
During conversations, lip movement is used to determine if the conversation is ongoing, which helps in anti-interference and separating different speakers.
The recommended distance is 0.5m to 2m in front of the robot. Users who are very tall (over 180 cm) and standing too close may have their faces out of the camera range. The most important thing is to keep the face within the camera range.

External microphones are directional and can be used by speaking directly into the microphone. There is no face recognition logic involved.

Additionally, the interaction supports secondary development. The agent can be set to different modes, allowing users to exit Agibot's cloud audio chain and output only the raw audio and face data for custom interaction agent development.

Silent mode is a state under the normal mode and can be flexibly switched without restarting the agent.

7.6.2 RPC Interface for Switching Internal and External Microphones

Interface Name	`pb:/aimdk.protocol.AgentControlService/SetMicSourceRequest`
Function Summary	Switch between internal and external microphone sources
Interface Type	HTTP JSON RPC
URL	http://192.168.100.110:59301/rpc/aimdk.protocol.AgentControlService/SetMicSourceRequest
Input Parameters	text `{ "mic_source": 1 }` mic_source: 0 represents the internal microphone, 1 represents the external microphone, other values are invalid
Output Parameters	text `{ "header": { "code": "0", "msg": "SetVoiceEnable successfully", "trace_id": "", "domin": "" }, "state": "CommonState_UNKNOWN" }` state: No need to pay attention to this
Example Script	examples/agent/SetMicSource.sh
Notes

7.6.3 RPC Interface for Getting the Current Microphone Source

Interface Name	`pb:/aimdk.protocol.AgentControlService/GetMicSourceRequest`
Function Summary	Get the current microphone source
Interface Type	HTTP JSON RPC
URL	http://192.168.100.110:59301/rpc/aimdk.protocol.AgentControlService/GetMicSourceRequest
Input Parameters	text `{}`
Output Parameters	text `{ "header": { "code": "0", "msg": "Get mic source successfully", "trace_id": "", "domin": "" }, "mic_source": 0 }` mic_source: 0 represents the internal microphone, 1 represents the external microphone, other values are invalid
Example Script	examples/agent/GetMicSource.sh
Notes

7.6.4 RPC Interface for Setting Silent Mode

Interface Name	`pb:/aimdk.protocol.AgentControlService/SetVoiceEnable`
Function Summary	Set silent mode
Interface Type	HTTP JSON RPC
URL	http://192.168.100.110:59301/rpc/aimdk.protocol.AgentControlService/SetVoiceEnable
Input Parameters	text `{ "enable_voice": false }` enable_voice: Set to false to enable silent mode, set to true for normal mode
Output Parameters	text `{ "header": { "code": "0", "msg": "SetVoiceEnable successfully", "trace_id": "", "domin": "" }, "state": "CommonState_UNKNOWN" }` state: No need to pay attention to this
Example Script	examples/agent/SetVoiceEnable.sh
Notes

7.6.5 Query Silent Mode RPC Interface

Interface Name	`pb:/aimdk.protocol.AgentControlService/GetVoiceEnable`
Function Overview	Query the silent mode status
Interface Type	HTTP JSON RPC
URL	http://192.168.100.110:59301/rpc/aimdk.protocol.AgentControlService/GetVoiceEnable
Input Parameters	text `{}`
Output Parameters	text `{ "header": { "code": "0", "msg": "GetVoiceEnable successfully", "trace_id": "", "domin": "" }, "enable_voice": true }` enable_voice: Set to false to enable silent mode, set to true for normal mode
Example Script	examples/agent/GetVoiceEnable.sh
Notes

7.6.6 Set Interaction Mode RPC Interface

Interface Name	`pb:/aimdk.protocol.AgentControlService/SetAgentPropertiesRequest`
Function Overview	Set interaction mode
Interface Type	HTTP JSON RPC
URL	http://192.168.100.110:59301/rpc/aimdk.protocol.AgentControlService/SetAgentPropertiesRequest
Input Parameters	text `{ "contents": { "properties": { "2": "only_voice" } } }` Modes: only_voice: Output only noise-reduced microphone audio /agent/process_audio_output, all subsequent links are disconnected voice_face: Output noise-reduced microphone audio /agent/process_audio_output and face recognition results /agent/vision/face_id, all subsequent links are disconnected normal: Normal operation mode, interaction runs normally
Output Parameters	text `{ "state": "CommonState_UNKNOWN" }`
Example Script	examples/agent/SetAgentPropertiesRequest.sh
Notes	The agent or robot needs to be restarted after calling this interface to take effect It is normal for the return value to be CommonState_UNKNOWN; you can call the GetAgentPropertiesRequest interface to check if the interaction mode has been successfully switched

7.6.7 Get Interaction Mode RPC Interface

Interface Name	`pb:/aimdk.protocol.AgentControlService/GetAgentPropertiesRequest`
Function Overview	Query the interaction mode
Interface Type	HTTP JSON RPC
URL	http://192.168.100.110:59301/rpc/aimdk.protocol.AgentControlService/GetAgentPropertiesRequest
Input Parameters	text `{}`
Output Parameters	text `{ "contents": { "properties": { "2": "only_voice" } } }` Modes: only_voice: Output only noise-reduced microphone audio /agent/process_audio_output, all subsequent links are disconnected voice_face: Output noise-reduced microphone audio /agent/process_audio_output and face recognition results /agent/vision/face_id, all subsequent links are disconnected normal: Normal operation mode, interaction runs normally
Example Script	examples/agent/GetAgentPropertiesRequest.sh
Notes

7.6.8 Noise-Reduced Microphone Audio Topic Interface

Interface Name	`/agent/process_audio_output`
Function Overview	Noise-reduced microphone audio interface
Interface Type	ROS2 Topic
Output Parameters	text `{ "stream_id": 2, "vad_state": "AUDIO_VAD_STATE_PROCESSING", "audio_data": "..." }` stream_id: Microphone identifier, 1 for built-in mic, 2 for external mic vad_state: Voice activity detection state AUDIO_VAD_STATE_NONE = 0 AUDIO_VAD_STATE_BEGIN = 1 AUDIO_VAD_STATE_PROCESSING = 2 AUDIO_VAD_STATE_END = 3 audio_data: Audio byte stream data
Example Script	examples/agent/get_voice.py
Notes	The ROS2 message type is ros2_plugin_proto/msg/RosMsgWrapper, which requires sourcing prebuilt/ros2_plugin_proto_aarch64/share/ros2_plugin_proto/local_setup.bash before use. Note: To obtain the following audio, the robot must be connected to the internet for at least 2 minutes after startup to complete the audio-related authentication process. Otherwise, there will be no raw audio output. If offline use is required, ensure that the interface has audio output before disconnecting from the network. Note: In the current version, the `vad_state` output from the external microphone interface is incorrect. For a single voice input, the expected state sequence is `122222222223`, but the actual output is `0111111111112`. This issue occurs only with external microphones (internal microphones work correctly) and is scheduled to be fixed in a future release. As a workaround in the current version, it is recommended to manually apply a +1 offset compensation to the state values.

7.6.9 Local Face Registration Interface

This interface is not a conventional HTTP JSON RPC or ROS2 Topic, but rather provides a separate script examples/agent/run_face_id_register.sh for calling. The content of the script is as follows:

bash

#!/bin/bash

# 1. The 'images' directory to register (at the same level as the shell script)
RUN_SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
IMAGES_DIR="${RUN_SCRIPT_DIR}/images"

# 2. Faceid base directory
FACEID_SCRIPT_DIR="/agibot/software/v0/scripts/agent/face_id/"
FACEID_LIB_DIR="/agibot/software/v0/bin"
FACEID_OFFLINE_FEAT="/agibot/data/param/interaction/face_id/offline_face_features"

# 3. Relative paths to the executable and configuration files
EXEC="${FACEID_SCRIPT_DIR}/face_id_register"
CONF="${FACEID_SCRIPT_DIR}/face_id_config.json"

chmod +x "$EXEC"
export LD_LIBRARY_PATH="${FACEID_LIB_DIR}":$LD_LIBRARY_PATH

# 4. Invocation
rm -rf "$FACEID_OFFLINE_FEAT"/*
"$EXEC" "$CONF" "$IMAGES_DIR"

Place the face data to be registered in the images directory in the same directory as the script. Execute the script on ORIN to complete the registration. After registration, the ID and image correspondence, as well as the registration result, are stored in the Result.txt file in the same directory. An example is shown below (where blurry and small faces are also successfully registered, but it is still recommended to use clear frontal face images as shown in satisfy.png to avoid adverse effects on recognition rates):

bash

GID17648293009168001    满足.png          OK    注册成功
GID17648293018063607    侧脸.png          FAIL  人脸质量不满足要求
GID17648293020281011    过暗.png          FAIL  人脸质量不满足要求
GID17648293021878934    模糊.png          OK    注册成功
GID17648293024764703    过曝.png          FAIL  人脸质量不满足要求
GID17648293026684491    无人脸.png       FAIL  未检测到人脸
GID17648293028768487    非人脸.png       FAIL  未检测到人脸
GID17648293030305970    人脸过小.png    OK    注册成功

Explanation of the face registration and recognition logic rules:

Place JPG, PNG, and JPEG type face images in the images directory, with each image containing only one clear frontal face. Running the script will register the faces. After registration, you need to restart the agent. On ORIN, run aima em stop-app agent && aima em start-app agent, or you can directly restart the robot.
The locally registered face features will be stored in the /agibot/data/param/interaction/face_id/offline_face_features directory on ORIN.
The construction rule for the registered user ID is: "GID" + timestamp + random 4-digit number. At the time of release, the current machine's SN (/agibot/data/info/sn) will replace "GID" as the new UID.
The Lingxin platform can also upload faces, which we call the cloud-based face database. The cloud-based face database can be configured with greeting information, etc. After the relevant content is distributed, it will be stored in the /agibot/data/param/interaction/face_id/user_info.json file.
Each time the script registers, it will clear the existing local database. Please always re-register all face data completely, i.e., maintain an images folder containing all the faces that need to be recognized. Any additions, deletions, or modifications require re-running the registration script.
The matching rule always prioritizes the cloud database before the local database. Once the first successful match is found, no further matching will be performed.

7.6.10 Face Recognition Result Topic Interface

Interface Name	/agent/vision/face_id
Function Overview	Face recognition results
Interface Type	ROS2 Topic
Output Parameters	text `{ "faces": [ { "timestamp": "1764829536028", "face_id": "A210041B50001917648293023663264", "confidence": 0.9811308, "face_rect": { "x": 326.0, "y": 961.0, "width": 105.0, "height": 119.0 }, "captured_feature_base64": "...", "reference_feature_base64": "..." } ] }` timestamp: Timestamp, using the interactive camera frame timestamp. You can subscribe to the /aima/hal/camera/interactive/color topic to find the corresponding image based on this timestamp. face_id: A210041B50001917648293023663264, where the first 14 digits are the SN number, followed by the database face ID part. confidence: Face match confidence, a value between 0 and 1. face_rect: Face rectangle, which can be used to draw the face position in the camera image. captured_feature_base64: Captured face feature base64 encoded, generally not needed. reference_feature_base64: Reference face feature base64 encoded, generally not needed.
Example Script	examples/agent/get_face_id.py
Notes	The ROS2 message type for this interface is `ros2_plugin_proto/msg/RosMsgWrapper`. You need to source `prebuilt/ros2_plugin_proto_aarch64/share/ros2_plugin_proto/local_setup.bash` to use it. This interface will only publish if the interaction mode is set to `normal` or `voice_face`. No messages will be published in `only_voice` mode.

7.6.11 Wake-Up Result Reporting

Interface Name	/agent/wakeup/pb_3Aaimdk_2Eprotocol_2EWakeUpResult
Function Summary	Wake-up result reporting
Interface Type	ROS2 Topic
Output Parameters	text `{ "language":"zh", "keyword":"远征远征", "timestamp":"1768812507657", "confidence":1.0, "wakeup_id":"event_wMXK9pT05JMXeRAOSc706", "is_success":true, "wakeup_type":"WAKEUP_NORMAL" }` confidence: Voice match confidence, a value between 0 and 1. keyword: Wake-up word. language: Currently only Chinese is supported. wakeup_type: `WAKEUP_UNKNOWN`: Wake-up with unknown source or state (placeholder/exception case). `WAKEUP_NORMAL`: Normal voice wake-up, for example when a user says the default wake-up word. `WAKEUP_FACE_TRIGGERED`: Wake-up triggered by face recognition. `WAKEUP_CUSTOM_TRIGGERED`: Wake-up triggered by custom methods, such as custom wake words or external events.
Example Script	examples/agent/get_wakeup_result.py
Notes	The ROS2 message type is `ros2_plugin_proto/msg/RosMsgWrapper`. Run `source prebuilt/ros2_plugin_proto_aarch64/share/ros2_plugin_proto/local_setup.bash` before use.

7.6.12 Built-In Microphone Wake Word Configuration

Interface Name	`pb:/aimdk.protocol.AgentControlService/SetCustomWakeUpWord`
Function Summary	Set wake-up word for the built-in microphone.
Interface Type	HTTP JSON RPC
URL	http://192.168.100.110:59301/rpc/aimdk.protocol.AgentControlService/SetCustomWakeUpWord
Input Parameters	text `{ "keywords":["你好你好"] }` keywords: Wake-up word.
Output Parameters	text `{ "header": { "code": "0", "msg": "SetCustomWakeUpWord successfully", "trace_id": "", "domin": "" }, "state": "CommonState_UNKNOWN" }` `state=CommonState_UNKNOWN` is normal behavior. `code=0` means success, `code=1` means failure.
Example Script	examples/agent/SetCustomWakeUpWord.sh
Notes	Avoid binding an intelligent agent; otherwise the setting may not take effect after restarting `agent`. Only one wake-up word is supported.