diff --git a/docs/report.latex b/docs/report.latex index e8408d7..e8b4089 100644 --- a/docs/report.latex +++ b/docs/report.latex @@ -14,12 +14,17 @@ \usepackage{todonotes} \usepackage{hyperref} +\usepackage{fancyhdr} +\pagestyle{fancy} +\rhead{\thepage} +\lhead{Humanoid Robotic Systems} + \def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}} \begin{document} -\title{Humanoid Robotic Systems - ``Teleoperating NAO''} +\title{TUM ICS Humanoid Robotic Systems \\ ``Teleoperating NAO''} \author{Pavel Lutskov, Luming Li, Lukas Otter and Atef Kort} @@ -66,12 +71,12 @@ with a current estimation of the operator's pose, a sensor feedback based robot pose, as well as with the camera feed from both NAO's cameras and with the webcam view of the operator. In order for the user to be able to give explicit commands to the robot, such as a request to open or close the hands or to -temporarily suspend the operation, we implemented a simple voice command system. -Finally, to be able to accommodate different users and to perform control in -different conditions, a small calibration routine was developed, which would -quickly take a user through the process of setting up the teleoperation. -We elaborate on the tools and approaches that we used for implementation of the -user-facing features in \autoref{ssec:interface}. +temporarily suspend the operation, we implemented a simple voice command +system. Finally, to be able to accommodate different users and to perform +control in different conditions, a small calibration routine was developed, +which would quickly take a user through the process of setting up the +teleoperation. We elaborate on the tools and approaches that we used for +implementation of the user-facing features in \autoref{ssec:interface}. An example task, that can be done using our teleoperation package might be the following. The operator can safely and precisely navigate the robot through an @@ -126,13 +131,21 @@ the transforms of the markers with respect to the \verb|odom| frame \subsection{Interface}\label{ssec:interface} -\paragraph{Speech State Machine} +\paragraph{Speech Commands} -Based on NAOqi API and NAO built-in voice recognition +Based on NAOqi API and NAO's built-in voice recognition, we built a Python +speech recognition server, providing a ROS action as a means of accessing it. +It was possible to reuse the results of the HRS Tutorial 7, where a speech +recognition node was already implemented. Those results, however, were not +flexible enough for our purposes, and making the necessary adjustments was more +time-consuming than implementing a node in Python from scratch. It was our +design constraint, that the robot only accepts commands which lead to state +changes that are reachable from the current state. We will provide further +detail on how the state dependency is implemented and how the speech +recognition is integrated with our system in \autoref{sec:integration}. -\begin{table} -\caption{Commands of the speech recognition module} -\begin{center} +\begin{table}[h] +\centering \begin{tabular}{|c|c|c|} \hline \textbf{Command}&\textbf{Action}&\textbf{Available in state} \\ @@ -150,10 +163,18 @@ Based on NAOqi API and NAO built-in voice recognition ``Close'' & Close hands & Idle, Imitation \\ \hline \end{tabular} -\label{tab_speech_states} -\end{center} +\caption{Commands of the speech recognition module} +\label{tab:speech-states} \end{table} +The \autoref{tab:speech-states} depicts the list of available commands, +depending on the state of the system. We tried to make those as short and +distinguishable as possible in order to minimize the number of misunderstood +commands. As a confirmation, the NAO repeats the recognized command, or says +``nope'' if it detected some speech but couldn't recognize a valid command. +Such brevity greatly speeds up the speech-based interaction, compared to the +case if NAO would talk in full sentences. + \paragraph{Teleoperation Interface} In order to make it possible to operate @@ -165,7 +186,7 @@ The NAO-part contains video streams of the top and bottom cameras on the robots head. These were created by subscribing to their respective topics (FIND NAME) using the \textit{rqt\_gui} package. Moreover, it also consists of a rviz window which gives a visual representation of the NAO. For this, the robot's -joint positions are displayed by subscribing to the topic tf where the +joint positions are displayed by subscribing to the topic \verb|tf| where the coordinates and the different coordinate frames are published. We further used the \textit{NAO-meshes} package to create the 3D model of the NAO. @@ -191,20 +212,24 @@ the \textit{NAO-meshes} package to create the 3D model of the NAO. \subsection{Navigation}\label{ssec:navigation} -One of the two main feature in our robot is an intuitive navigation tool, which -allows the robot to navigate the environment by tracking the user movements. +Next, our system needed a way for the operator to command the robot to a +desired location. Furthermore, the operator has to be able to adjust the speed +of the robot's movement. To achieve this we use the approach that we call the +``Human Joystick''. We implement this approach in a module called +\verb|walker|. -By fixing an ArUco marker on the user's chest, we can continuously track its -position and orientation in a three dimensional space and so capture its -motion. - -In order to simplify the task we define a buffer zone where the robot can only -track the orientation of the user then depending on which direction the user -will exit the zone the robot will either go forward, backward, left or right. -Also the covered distance will influence the speed of the robot, the further -the user is from the center of the buffer zone the faster the movement of the -robot will be. The extent of the movement and buffer zone are determined -automatically through calibration. +Through the calibration procedure we determine the initial position of the +operator. Furthermore, we track the position of the operator by locating the +ArUco marker on the operator's chest. Then, we can map the current position of +the user to the desired direction and speed of the robot. For example, if the +operator steps to the right from the initial position, then the robot will be +moving to the right until the operator returns back into the initial position. +The further the operator is from the origin, the faster will the robot move. In +order to control the rotation of the robot, the operator can slightly turn the +body clockwise or counterclockwise while being in the initial position so that +the marker can still be detected by the webcam. The speed of the rotation can +also be controlled by the magnitude of the operator's rotation. The process is +schematically illustrated in \autoref{fig:joystick}. \begin{figure} \centering @@ -213,6 +238,28 @@ automatically through calibration. \label{fig:joystick} \end{figure} +There is a small region around the original position, in which the operator +can stay without causing the robot to move. As soon as the operator exceeds the +movement threshold into some direction, the robot will slowly start moving in +that direction. We use the following relationship for calculating the robot's +speed: + +$$v = v_{min} + \frac{d - d_{thr}}{d_{max} - d_{thr}}(v_{max} - v_{min})$$ + +Here, $d$ denotes the operator's distance from the origin in that direction, +$d_{thr}$ is the minimum distance required for starting the movement and the +$d_{max}$ is the boundary of the control zone; $d_{thr}$ and $d_{max}$ are +determined through the calibration process. Currently, there can only be +movement in one direction at a time, so in case the operator exceeds the +threshold in more than one direction, the robot will move in the direction with +the higher precedence. The forwards-backwards motion has the highest +precedence, then goes the sideways motion and, finally, the rotation. + +Our test have shown, that having the control over the speed is crucial for the +success of the teleoperation. The alignment to an object is impossible if the +robot is walking at its maximum speed, on the other hand walking around the +room at a fraction of the maximum speed is too slow. + \subsection{Imitation}\label{ssec:imitation} One of the main objectives of our project was the imitation of the operator @@ -303,17 +350,17 @@ joint motions need to be calculated by the means of Cartesian control. At first, we tried to employ the Cartesian controller that is shipped with the NAOqi SDK. We soon realized, however, that this controller was unsuitable for our task, because of the two significant limitations. The first problem with -the NAO's controller is that it freezes, if the target is being updated too +the NAO's controller is that it freezes if the target is being updated too often: the arms of the robot start to stutter, and then make a final erratic motion once the program is terminated. However, arm teleoperation requires smoothness and therefore frequent updates of the target position, and the NAO controller didn't fit these requirements. A possible reason for such behavior -is a bug in the implementation, and it might be possible that this problem was -fixed in the later versions of the NAOqi SDK. +could be a bug in the implementation, and it might be possible that this +problem was fixed in the later versions of the NAOqi SDK. Secondly, the controller of the NAO is not robust against \textit{singularities}. Singularities occur, when the kinematic chain loses one -of the degrees of freedom, and so in order to reach a desired position, the +or more degrees of freedom, and so in order to reach a desired position, the joint motors must apply infinite torques. Practically, for the imitation task this would mean that once the robot has its arms fully stretched, the arms would execute violent erratic motions which would hurt the robot or cause it to @@ -330,9 +377,9 @@ $$\dot{\theta} = J^{-1}\dot{r}$$ In this formula $\dot{r}$ denotes the 3D speed of the target, which is the result of the posture retargeting, namely $r_{hand,NAO}^{torso,NAO}$. $J$ is -the Jacobian matrix. The Jacobian matrix gives the relationship between -the joint angle speed and the resulting speed of the effector -on the end of the kinematic chain which the Jacobian matrix describes. +the Jacobian matrix \cite{jacobian}. The Jacobian matrix gives the relationship +between the joint angle speed and the resulting speed of the effector on the +end of the kinematic chain which the Jacobian matrix describes. We now apply a common simplification and state that @@ -374,19 +421,18 @@ The other method that we employed was to calculate the Jacobian matrix analytically. Since only rotational joints were available, the approximation for the Jacobian matrix, which is the tangent in rotational joints, can be calculated using the cross product between the rotational axis of a joint, -denoted by $e_j$, and the rotational vector \\ $r_{end}-r_{j}$, where $r_{end}$ +denoted by $e_j$, and the rotational vector $r_{end}-r_{j}$, where $r_{end}$ is the position of the end effector (i.e.\ hand) and $r_{j}$ is the position of the joint. The following relation gives us one column of the Jacobian matrix. -We can get the rotational axis of a joint and the position of the joint in the -torso frame through NAOqi API. $$ J_j = \frac{\partial r_{end}}{\partial\theta_j} = (e \times (r_{end}-r_j)) $$ -This can be repeated for each rotational joint until the whole matrix is -filled. +We can get the rotational axis of a joint and the position of the joint in the +torso frame through NAOqi API. This can be repeated for each rotational joint +until the whole matrix is filled. The next step for the Cartesian controller is to determine the inverse Jacobian matrix for the inverse kinematic. For this singular value decomposition is @@ -410,18 +456,22 @@ Then we can avoid the singularity behavior by setting to $0$ the entries in $\Sigma^{-1}$ that are above a threshold value $\tau = 50$, which we determined through experimentation. -Our test have shown, that our controller doesn't have the freezing behavior, -which is present in the NAO's own controller, and therefore the target of -the control can be updated with arbitrary frequency. Furthermore, our controller -shows no signs of producing violent arm motions, which means that our strategy -for handling singularities was effective. +The final control objective for the current loop iteration can be stated as: -\section{System Implementation and Integration} +$$\theta_{targ} = \theta_{cur} + \Delta\theta$$ + +Our test have shown, that our controller doesn't have the freezing behavior, +which is present in the NAO's own controller, and therefore the target of the +control can be updated with arbitrary frequency. Furthermore, our controller +shows no signs of producing violent arm motions, which means that our strategy +for handling singularities was effective. The implementation for the whole +imitation routine resides in the \verb|imitator| module of our system. + +\section{System Implementation and Integration}\label{sec:integration} Now that the individual modules were designed and implemented, the whole system -needed to be assembled together. It is crucial that the states of the robot and -the transitions between the states are well defined and correctly executed. The -state machine, that we designed, can be seen in the \autoref{fig:overview}. +needed to be assembled together. The state machine that we designed can be +seen in the \autoref{fig:overview}. The software package was organized as a collection of ROS nodes, controlled by a single master node. The master node keeps track of the current system state, @@ -447,7 +497,7 @@ until the fall recovery is complete. We will now illustrate our architecture by using interaction between the walker node and the master node as an example. This interaction is depicted in the -\autoref{fig:master-walker}. The walker node subscribes to the TF +\autoref{fig:master-walker}. The walker node subscribes to the \verb|tf| transform of the chest ArUco marker, and requests a position update every 0.1 seconds. If in the current cycle the marker happens to be outside of the buffer zone (see \autoref{fig:joystick}), or the rotation of the marker exceeds the @@ -487,14 +537,14 @@ A final piece of our system is the speech-based command interface. Since in our system the acceptable commands vary between states, the speech recognition controller must be aware of the current state of the system, therefore the master node is responsible for this functionality. The master node runs an -auxiliary loop, in which a recognition target is sent to the speech -server node. If a relevant word is detected, master receives the result and -updates the state accordingly and then sends a new recognition target. If a -state change occurred before any speech was detected, then the master sends a -cancellation request to the speech server for the currently running objective -and, again, sends a new target. +auxiliary loop, in which a recognition target is sent to the speech server +node, described in \autoref{ssec:interface}. If a relevant word is detected, +master receives the result and updates the state accordingly and then sends a +new recognition target. If a state change occurred before any speech was +detected, then the master sends a cancellation request to the speech server for +the currently running objective and, again, sends a new target. -\section{Conclusion and possible drawbacks} +\section{Conclusion and Possible Drawbacks} Upon completion of this project, our team successfully applied the knowledge that we acquired during the HRS lectures and tutorials to a complex practical