The robot can use two different microphones as interaction input sources by default: external microphones (depusheng handheld microphone and flasher lapel microphone) and internal microphones (built-in microphones of the robot). Additionally, a silent mode can be set to disable interaction. The microphone switching and silent mode settings can be done through the AimMaster software. We also provide RPC interfaces for switching the microphone source and setting the silent mode.
Furthermore, we provide raw audio output from the microphone (with noise reduction, echo cancellation, and VAD), which can be used to obtain the robot’s microphone audio and integrate it into other interaction systems after disabling Agibot’s own interaction chain.
The internal fan noise of the robot is relatively loud, so it is recommended that the customer’s wake-up and conversation sounds be as loud as possible.
The main focus is on the three largest faces in the center camera. The face will only switch when there is a significant change in face size.
During conversations, lip movement is used to determine if the conversation is ongoing, which helps in anti-interference and separating different speakers.
The recommended distance is 0.5m to 2m in front of the robot. Users who are very tall (over 180 cm) and standing too close may have their faces out of the camera range. The most important thing is to keep the face within the camera range.
External microphones are directional and can be used by speaking directly into the microphone. There is no face recognition logic involved.
Additionally, the interaction supports secondary development. The agent can be set to different modes, allowing users to exit Agibot’s cloud audio chain and output only the raw audio and face data for custom interaction agent development.
Silent mode is a state under the normal mode and can be flexibly switched without restarting the agent.
7.6.2 RPC Interface for Switching Internal and External Microphones
only_voice: Output only noise-reduced microphone audio /agent/process_audio_output, all subsequent links are disconnected
voice_face: Output noise-reduced microphone audio /agent/process_audio_output and face recognition results /agent/vision/face_id, all subsequent links are disconnected
normal: Normal operation mode, interaction runs normally
Output Parameters
{
"state": "CommonState_UNKNOWN"
}
Example Script
examples/agent/SetAgentPropertiesRequest.sh
Notes
The agent or robot needs to be restarted after calling this interface to take effect
It is normal for the return value to be CommonState_UNKNOWN; you can call the GetAgentPropertiesRequest interface to check if the interaction mode has been successfully switched
only_voice: Output only noise-reduced microphone audio /agent/process_audio_output, all subsequent links are disconnected
voice_face: Output noise-reduced microphone audio /agent/process_audio_output and face recognition results /agent/vision/face_id, all subsequent links are disconnected
normal: Normal operation mode, interaction runs normally
stream_id: Microphone identifier, 1 for built-in mic, 2 for external mic
vad_state: Voice activity detection state
AUDIO_VAD_STATE_NONE = 0
AUDIO_VAD_STATE_BEGIN = 1
AUDIO_VAD_STATE_PROCESSING = 2
AUDIO_VAD_STATE_END = 3
audio_data: Audio byte stream data
Example Script
examples/agent/get_voice.py
Notes
The ROS2 message type is ros2_plugin_proto/msg/RosMsgWrapper, which requires sourcing prebuilt/ros2_plugin_proto_aarch64/share/ros2_plugin_proto/local_setup.bash before use.
Note: To obtain the following audio, the robot must be connected to the internet for at least 2 minutes after startup to complete the audio-related authentication process. Otherwise, there will be no raw audio output. If offline use is required, ensure that the interface has audio output before disconnecting from the network.
This interface is not a conventional HTTP JSON RPC or ROS2 Topic, but rather provides a separate script examples/agent/run_face_id_register.sh for calling. The content of the script is as follows:
#!/bin/bash
# 1. The 'images' directory to register (at the same level as the shell script)
Place the face data to be registered in the images directory in the same directory as the script. Execute the script on ORIN to complete the registration. After registration, the ID and image correspondence, as well as the registration result, are stored in the Result.txt file in the same directory. An example is shown below (where blurry and small faces are also successfully registered, but it is still recommended to use clear frontal face images as shown in satisfy.png to avoid adverse effects on recognition rates):
Terminal window
GID17648293009168001满足.pngOK注册成功
GID17648293018063607侧脸.pngFAIL人脸质量不满足要求
GID17648293020281011过暗.pngFAIL人脸质量不满足要求
GID17648293021878934模糊.pngOK注册成功
GID17648293024764703过曝.pngFAIL人脸质量不满足要求
GID17648293026684491无人脸.pngFAIL未检测到人脸
GID17648293028768487非人脸.pngFAIL未检测到人脸
GID17648293030305970人脸过小.pngOK注册成功
Explanation of the face registration and recognition logic rules:
Place JPG, PNG, and JPEG type face images in the images directory, with each image containing only one clear frontal face. Running the script will register the faces. After registration, you need to restart the agent. On ORIN, run aima em stop-app agent && aima em start-app agent, or you can directly restart the robot.
The locally registered face features will be stored in the /agibot/data/param/interaction/face_id/offline_face_features directory on ORIN.
The construction rule for the registered user ID is: “GID” + timestamp + random 4-digit number. At the time of release, the current machine’s SN (/agibot/data/info/sn) will replace “GID” as the new UID.
The Lingxin platform can also upload faces, which we call the cloud-based face database. The cloud-based face database can be configured with greeting information, etc. After the relevant content is distributed, it will be stored in the /agibot/data/param/interaction/face_id/user_info.json file.
Each time the script registers, it will clear the existing local database. Please always re-register all face data completely, i.e., maintain an images folder containing all the faces that need to be recognized. Any additions, deletions, or modifications require re-running the registration script.
The matching rule always prioritizes the cloud database before the local database. Once the first successful match is found, no further matching will be performed.
timestamp: Timestamp, using the interactive camera frame timestamp. You can subscribe to the /aima/hal/camera/interactive/color topic to find the corresponding image based on this timestamp.
face_id: A210041B50001917648293023663264, where the first 14 digits are the SN number, followed by the database face ID part.
confidence: Face match confidence, a value between 0 and 1.
face_rect: Face rectangle, which can be used to draw the face position in the camera image.
captured_feature_base64: Captured face feature base64 encoded, generally not needed.
reference_feature_base64: Reference face feature base64 encoded, generally not needed.
Example Script
examples/agent/get_face_id.py
Notes
The ROS2 message type for this interface is `ros2_plugin_proto/msg/RosMsgWrapper`. You need to source `prebuilt/ros2_plugin_proto_aarch64/share/ros2_plugin_proto/local_setup.bash` to use it.
This interface will only publish if the interaction mode is set to `normal` or `voice_face`. No messages will be published in `only_voice` mode.
confidence: Voice match confidence, a value between 0 and 1.
keyword: Wake-up word.
language: Currently only Chinese is supported.
wakeup_type:
WAKEUP_UNKNOWN: Wake-up with unknown source or state (placeholder/exception case).
WAKEUP_NORMAL: Normal voice wake-up, for example when a user says the default wake-up word.
WAKEUP_FACE_TRIGGERED: Wake-up triggered by face recognition.
WAKEUP_CUSTOM_TRIGGERED: Wake-up triggered by custom methods, such as custom wake words or external events.
Example Script
examples/agent/get_wakeup_result.py
Notes
The ROS2 message type is ros2_plugin_proto/msg/RosMsgWrapper. Run source prebuilt/ros2_plugin_proto_aarch64/share/ros2_plugin_proto/local_setup.bash before use.
7.6.12 Built-In Microphone Wake Word Configuration