diff --git a/.gitignore b/.gitignore index 454d569..9d5f617 100644 --- a/.gitignore +++ b/.gitignore @@ -21,6 +21,9 @@ literature/ # But not the necessary ones !docs/figures/**/* +# Videos +video/ + # Presentation stuff *.pptx diff --git a/README.md b/README.md index 5369275..de1d08d 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,9 @@ marker on the chest. You can move a NAO around using a "Human Joystick" approach and make NAO imitate your arm motions. For more details, read our -![report](docs/report.latex). +[report](docs/report.latex) or watch our +[video](https://drive.google.com/file/d/1h1MZO6kPd4VyjMptRxQcwrqtgqMW1eyE/view) +. Our package relies on the NAO being reachable from the computer and the environment variable `NAO_IP` being set to the IP address of the NAO. diff --git a/docs/figures/operator_frames.png b/docs/figures/operator_frames.png new file mode 100644 index 0000000..5cf3b84 Binary files /dev/null and b/docs/figures/operator_frames.png differ diff --git a/docs/figures/robot_torso.png b/docs/figures/robot_torso.png new file mode 100644 index 0000000..2c29e47 Binary files /dev/null and b/docs/figures/robot_torso.png differ diff --git a/docs/report.latex b/docs/report.latex index 886b677..53141d2 100644 --- a/docs/report.latex +++ b/docs/report.latex @@ -211,14 +211,39 @@ and the superscript defines the coordinate frame in which these coordinates are taken. So, for example, $r_{NAO hand}^{NAO torso}$ gives the coordinate of the hand of the NAO robot in the frame of the robot's torso. +\begin{figure} + \centering + %\hfill + \begin{subfigure}[b]{0.45\linewidth} + \includegraphics[width=\linewidth]{figures/operator_frames.png} + \caption{Operator's chest and shoulder frames} + %{{\small $i = 1 \mu m$}} + \label{fig:operator-frames} + \end{subfigure} + \begin{subfigure}[b]{0.45\linewidth} + \includegraphics[width=\linewidth]{figures/robot_torso.png} + \caption{NAO's torso frame} + %{{\small $i = -1 \mu A$}} + \label{fig:nao-frames} + \end{subfigure} + \caption{Coordinate frames} + \label{fig:coord-frames} +\end{figure} + After the ArUco markers are detected and published on ROS TF, as was described in \autoref{ssec:vision}, we have the three vectors $r_{aruco,chest}^{webcam}$, $r_{aruco,lefthand}^{webcam}$ and $r_{aruco,righthand}^{webcam}$. We describe the retargeting for one hand, since it is symmetrical for the other hand. We -also assume that all coordinate systems have the same orientation, with the -z-axis pointing upwards, the x-axis pointing straight into webcam and the -y-axis to the left of the webcam. Therefore, we can directly calculate the hand -position in the user chest frame by the means of the following equation: +also assume that the user's coordinate systems have the same orientation, with +the z-axis pointing upwards, the x-axis pointing straight into webcam and the +y-axis to the left of the webcam \footnote{This assumption holds, because for + the imitation mode the user always faces the camera directly and stands + straight up. We need this assumption for robustness against the orientation + of the chest marker, since it can accidentally get tilted. If we would bind + the coordinate system to the chest marker completely, we would need to place + the marker on the chest firmly and carefully, which is time consuming.}. +Therefore, we can directly calculate the hand position in the user chest frame +by the means of the following equation: $$r_{hand,user}^{chest,user} = r_{aruco,hand}^{webcam} - r_{aruco,chest}^{webcam}$$. @@ -319,7 +344,107 @@ only inform the master about the occurrence of certain events, such as the fall or fall recovery, so that the master could deny requests for any activities, until the fall recovery is complete. -\section{Drawbacks and conclusions} +We will now illustrate our architecture by using interaction between the walker +node and the master node as an example. This interaction is depicted in the +\autoref{fig:integration-example}. The walker node subscribes to the TF +transform of the chest ArUco marker, and requests a position update every 0.1 +seconds. If in the current cycle the marker happens to be outside of the buffer +zone (see \autoref{fig:joystick}), or the rotation of the marker exceeds the +motion threshold, the walker node will ask the master node for a permission to +start moving. The master node will receive the request, and if the current +state of the system is either \textit{walking} or \textit{idle} (see +\autoref{fig:overview}), then the permission will be granted and the system +will transit into the \textit{walking} state. If the robot is currently +imitating the arm motions or has not yet recovered from a fall, then the +permission will not be granted and the system will remain in its current state +\footnote{We did research a possibility of automatic switching between walking + and imitating, so that the robot always imitates when the operator is within + the buffer zone, and stops imitating as soon as the operator leaves the + buffer zone, but this approach requires more skill and concentration from the + operator, so the default setting is to explicitly ask the robot to go into + imitating state and back into idle.}. +The walker node will then receive the +master's response, and in case it was negative, any current movement will be +stopped and the next cycle of the loop will begin. In case the permission was +granted, the walker will calculate the direction and the speed of the movement, +based on the marker position, and will send a command to the robot over the +NAOqi API to start moving. We use a non-blocking movement function, so that the +movement objective can be updated with every loop iteration. Finally, if the +marker is within the buffer zone, the robot will be commanded to stop by the +walker node, and the master will be informed, that the robot has stopped +moving. Since in this case the walker node gives up the control, the permission +from the master doesn't matter. + +A final piece of our system is the speech-based command interface. Since in our +system the acceptable commands vary between states, the speech recognition +controller must be aware of the current state of the system, therefore the +master node is responsible for this functionality. The master node runs an +auxiliary loop, in which a recognition target is sent to the speech +server node. If a relevant word is detected, master receives the result and +updates the state accordingly and then sends a new recognition target. If a +state change occurred before any speech was detected, then the master sends a +cancellation request to the speech server for the currently running objective +and, again, sends a new target. This interaction is schematically displayed in +\autoref{fig:master-speech}. + +\section{Conclusion and possible drawbacks} + +Upon completion of this project, our team successfully applied the knowledge +that we acquired during the HRS lectures and tutorials to a complex practical +task. We implemented an easy to use prototype of a teleoperation system, which +is fairly robust to the environmental conditions. Furthermore, we researched +several approaches to the implementation of the Cartesian control, and were +able to create a Cartesian controller, which is superior to the NAO's built-in +one. Finally, we extensively used ROS and so can confidently employ ROS in the +future projects. + +Our resulting system has a few drawbacks, however, and there is a room for +future improvements. Some of these drawbacks are due to the time constraints, +the other ones have to do with the limitations of NAO itself. The first major +drawback is the reliance on the NAO's built-in speech recognition for +controlling the robot. Because of this, the operator has to be in the same room +with the robot, which severely constraints the applicability of the +teleoperation system. Furthermore, since the acting robot is the one detecting +the speech, it can be susceptible to the sounds it makes during the operation +(joint noises, warning notifications). Also, as the live demonstration +revealed, using voice-based control in a crowded environment can lead to a high +number of false positive detections and therefore instability of the system. A +simple solution is to use two NAO robots, one of which is in the room with the +operator acting solely as a speech detection tool, and the other one is in +another room performing the actions. A saner approach is to apply third-party +speech recognition software to a webcam microphone feed, since there are +speech-recognition packages for ROS available \cite{ros-speech}. However, +because the speech recognition wasn't the main objective of our project, we +will reserve this for possible future work. + +Another important issue, which can be a problem for remote operation are the +cables. A NAO is connected to the controlling computer over the Ethernet cable, +and also, due to the low capacity of the NAO's battery, the power cord needs to +be plugged in most of the time. The problem with this is that without the +direct oversight of the operator, it is impossible to know where the cables are +relative to the robot, so it is impossible to prevent the robot from tripping +over the cables and falling. When it comes to battery power, the NAO has some +autonomy; the Ethernet cable, however, cannot be removed because the onboard +Wi-Fi of the NAO is too slow to allow streaming of the video feed and joint +telemetry. + +A related issue is a relatively narrow field of view of the NAO's cameras. In a +cordless case, the camera feed might be sufficient for the operator to navigate +the NAO through the environment. However, picking up the objects when only +seeing them through the robot's cameras is extremely difficult, because of the +narrow field of view and lack of the depth information. A possible solution to +this issue and the previous one, which would enable to operate a NAO and not be +in the same room with it, is to equip the robot's room with video cameras, +so that some oversight is possible. + +Finally, there is a problem with the NAO's stability when it walks carrying an +object. Apparently, the NAOqi walking controller relies on the movement of the +arms to stabilize the walking. It seems that if the arms are occupied by some +other task during the walk, the built-in controller doesn't try to +intelligently compensate, which has led to a significant number of falls during +our experiments. Due to the time constraints, we weren't able to investigate +any approaches to make the walking more stable. This, however, can be an +interesting topic for future semester projects. % \begin{table}[htbp] % \caption{Table Type Styles}