662 lines
33 KiB
Plaintext
662 lines
33 KiB
Plaintext
\documentclass[conference]{IEEEtran}
|
|
|
|
% \IEEEoverridecommandlockouts
|
|
% The preceding line is only needed to identify funding in the first footnote.
|
|
% If that is unneeded, please comment it out.
|
|
|
|
\usepackage{cite}
|
|
\usepackage{amsmath,amssymb,amsfonts}
|
|
% \usepackage{algorithmic}
|
|
\usepackage{graphicx}
|
|
\usepackage{textcomp}
|
|
\usepackage{xcolor}
|
|
\usepackage{subcaption}
|
|
\usepackage{hyperref}
|
|
|
|
\usepackage{fancyhdr}
|
|
\pagestyle{fancy}
|
|
\rhead{\thepage}
|
|
\lhead{Humanoid Robotic Systems}
|
|
|
|
\def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em
|
|
T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}}
|
|
|
|
\begin{document}
|
|
|
|
\title{TUM ICS Humanoid Robotic Systems \\ ``Teleoperating NAO''}
|
|
|
|
\author{Pavel Lutskov, Luming Li, Lukas Otter and Atef Kort}
|
|
|
|
\maketitle
|
|
|
|
\section{Project Description}
|
|
|
|
In this semester the task of our group was to program a routine for
|
|
teleoperation of a NAO robot. Using the ArUco markers, placed on the operator's
|
|
chest and hands, the position and the posture of the operator should have been
|
|
determined by detecting the markers' locations with a webcam, and then the
|
|
appropriate commands should have been sent to the robot to imitate the motions
|
|
of the operator. The overview of the
|
|
process can be seen in \autoref{fig:overview}. The main takeaway from
|
|
fulfilling this objective was practicing the skills that we acquired during the
|
|
Humanoid Robotic Systems course and to get familiar with the NAO robot as a
|
|
research and development platform.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=\linewidth]{figures/teleoperation_overview.png}
|
|
\caption{Overview of the defined states and their transistions.}
|
|
\label{fig:overview}
|
|
\end{figure}
|
|
|
|
In closer detail, once the markers are detected, their coordinates relative to
|
|
the webcam are extracted. The position and the orientation of the user's
|
|
chest marker is used to control the movement of the NAO around the environment.
|
|
We call this approach a ``Human Joystick'' and we describe it in more detail in
|
|
\autoref{ssec:navigation}.
|
|
|
|
The relative locations of the chest and hand markers can be used to determine
|
|
the coordinates of the user's end effectors (i.e.\ hands) in the user's chest
|
|
frame. In order for the NAO to imitate the arm motions, these coordinates need
|
|
to be appropriately remapped into the NAO torso frame. With the knowledge of the
|
|
desired coordinates of the hands, the commands for the NAO joints can be
|
|
calculated by using the Cartesian control approach. We present a thorough
|
|
discussion of the issues we had to solve and the methods we used for arm motion
|
|
imitation in \autoref{ssec:imitation}.
|
|
|
|
Furthermore, in order to enable the most intuitive teleoperation, a user
|
|
interface was needed to be developed. In our system, we present the operator
|
|
with a current estimation of the operator's pose, a sensor feedback based robot
|
|
pose, as well as with the camera feed from both NAO's cameras and with the
|
|
webcam view of the operator. In order for the user to be able to give explicit
|
|
commands to the robot, such as a request to open or close the hands or to
|
|
temporarily suspend the operation, we implemented a simple voice command
|
|
system. Finally, to be able to accommodate different users and to perform
|
|
control in different conditions, a small calibration routine was developed,
|
|
which would quickly take a user through the process of setting up the
|
|
teleoperation. We elaborate on the tools and approaches that we used for
|
|
implementation of the user-facing features in \autoref{ssec:interface}.
|
|
|
|
An example task, that can be done using our teleoperation package might be the
|
|
following. The operator can safely and precisely navigate the robot through an
|
|
uncharted environment with a high number of obstacles to some lightweight
|
|
object, such as an empty bottle, then make the robot pick up that object and
|
|
bring the object back to the operator. Thanks to the high precision of the arm
|
|
motions and the constant operator input, the robot is able to pick up an object
|
|
of different shapes and sizes, applying different strategies when needed. We
|
|
demonstrate the functioning of our system in the supporting video.
|
|
|
|
We used ROS \cite{ros} as a framework for our implementation. ROS is a
|
|
well-established software for developing robot targeted applications with rich
|
|
support infrastructure and modular approach to logic organization. For
|
|
interacting with the robot we mainly relied on the NAOqi \cite{naoqi} Python
|
|
API. The advantage of using Python compared to C++ is a much higher speed of
|
|
development and a more concise and readable resulting code. We, therefore, used
|
|
C++ only for the most computationally intensive parts of our program, such as
|
|
the ArUco marker detection, because of the efficiency of the C++.
|
|
|
|
\section{System Overview}
|
|
|
|
\subsection{Vision}\label{ssec:vision}
|
|
|
|
The foundational building block of our project is a computer vision system for
|
|
detection of the position and the orientation of ArUco markers \cite{aruco}. In
|
|
our implementation we follow closely the HRS Tutorial 4 and leverage the
|
|
functionality of the ROS ArUco library. One major difference from the lecture,
|
|
however, lies in finding the calibration matrix of the camera. In the tutorial
|
|
we could retrieve the camera intrinsics of the NAO's camera through a call to
|
|
the NAO API. In our case, however, a third-party webcam was used, the
|
|
intrinsics of which we didn't know. In order to find the camera matrix, we used
|
|
a common approach based on the calculation of a homography matrix through a
|
|
search for correspondent points in a series of planar scenes \cite{homography}.
|
|
In particular, we used three checkerboard patterns and the Camera Calibration
|
|
Toolbox for Matlab \cite{cam-toolbox}. Our experiments confirmed that the
|
|
positions and the orientations of the ArUco markers are calculated correctly,
|
|
and therefore the calibration was correct.
|
|
|
|
On the higher level, we extract the coordinates of the ArUco markers in the
|
|
webcam frame, then apply a rotational transformation, so that the Z-coordinate
|
|
of the markers correctly corresponds to the height \footnote{In the camera
|
|
frame the Z-coordinate is parallel to the camera axis.}. Finally, we publish
|
|
the transforms of the markers with respect to the \verb|odom| frame
|
|
\footnote{The choice of the parent frame is arbitrary as long as it is
|
|
consistent throughout the project.} using the ROS \verb|tf|.
|
|
|
|
\begin{figure}
|
|
\centerline{\includegraphics[width=0.8\linewidth]{figures/aruco.png}}
|
|
\caption{ArUco marker detection on the operator.}
|
|
\label{fig:aruco-detection}
|
|
\end{figure}
|
|
|
|
\subsection{Interface}\label{ssec:interface}
|
|
|
|
\paragraph{Speech Commands}
|
|
|
|
Based on NAOqi API and NAO's built-in voice recognition, we built a Python
|
|
speech recognition server, providing a ROS action as a means of accessing it.
|
|
It was possible to reuse the results of the HRS Tutorial 7, where a speech
|
|
recognition node was already implemented. Those results, however, were not
|
|
flexible enough for our purposes, and making the necessary adjustments was more
|
|
time-consuming than implementing a node in Python from scratch. It was our
|
|
design constraint, that the robot only accepts commands which lead to state
|
|
changes that are reachable from the current state. We will provide further
|
|
detail on how the state dependency is implemented and how the speech
|
|
recognition is integrated with our system in \autoref{sec:integration}.
|
|
|
|
\begin{table}[h]
|
|
\centering
|
|
\begin{tabular}{|c|c|c|}
|
|
\hline
|
|
\textbf{Command}&\textbf{Action}&\textbf{Available in state} \\
|
|
\hline
|
|
``Go'' & Wake Up & Sleep \\
|
|
\hline
|
|
``Kill'' & Go to sleep & Idle, Imitation \\
|
|
\hline
|
|
``Arms'' & Start imitation & Idle \\
|
|
\hline
|
|
``Stop'' & Stop imitation & Imitation \\
|
|
\hline
|
|
``Open'' & Open hands & Idle, Imitation \\
|
|
\hline
|
|
``Close'' & Close hands & Idle, Imitation \\
|
|
\hline
|
|
\end{tabular}
|
|
\caption{Commands of the speech recognition module}
|
|
\label{tab:speech-states}
|
|
\end{table}
|
|
|
|
The \autoref{tab:speech-states} depicts the list of available commands,
|
|
depending on the state of the system. We tried to make those as short and
|
|
distinguishable as possible in order to minimize the number of misunderstood
|
|
commands. As a confirmation, the NAO repeats the recognized command, or says
|
|
``nope'' if it detected some speech but couldn't recognize a valid command.
|
|
Such brevity greatly speeds up the speech-based interaction, compared to the
|
|
case if NAO would talk in full sentences.
|
|
|
|
\paragraph{Teleoperation Interface}
|
|
|
|
\paragraph{Calibration}
|
|
|
|
In order to make our system more robust, we have included a routine to
|
|
calibrate it for different users. It can be run in an optional step before
|
|
executing the main application. Within this routine different threshold values,
|
|
which are required for the ``Human Joystick'' approach that is used to control
|
|
the NAO's walker module, as well as various key points, which are needed to
|
|
properly map the operator's arm motions to the NAO, are determined.
|
|
|
|
When the module is started, the NAO is guiding the operator through a number of
|
|
recording steps via spoken prompts. After a successful completion of the
|
|
calibration process, the determined values are written to the
|
|
\textit{YAML-file} \verb|config/default.yaml| \cite{yaml}. This file can then
|
|
be accessed by the other nodes in the system.
|
|
|
|
\paragraph{Teleoperation Interface}
|
|
|
|
In order to make it possible to operate the NAO without visual contact, we have
|
|
developed a teleoperation interface. It allows the operator to receive visual
|
|
feedback on the NAO as well as an estimation of the operators current pose and
|
|
of the buffer and movement zones which are needed to navigate the robot.
|
|
|
|
The NAO-part contains feeds of the top and bottom cameras on the robots head.
|
|
These were created by subscribing to their respective topics using the
|
|
\verb|rqt_gui| package. Moreover, it additionally consists of a
|
|
visualization of the NAO in rviz. For this, the robot's joint positions are
|
|
displayed by subscribing to the \verb|tf| topic where the coordinates and the
|
|
different coordinate frames are published. We further used the
|
|
\verb|nao_meshes| package to render a predefined urdf-3D-model of the NAO. It
|
|
is shown in \autoref{fig:rviz-nao-model}.
|
|
|
|
Furthermore, the interface also presents an estimation of the current pose of
|
|
the operator as well as the control zones for our "Human Joystick" approach in
|
|
an additional \textit{rviz} window. For this, we created a separate node that
|
|
repeatedly publishes a model of the operator and the zones consisting of
|
|
markers to \textit{rviz}. Initially, the \textit{YAML-file} that contains the
|
|
parameters which were determined within the system calibration is read out.
|
|
According to those, the size of markers that estimate the control zones are
|
|
set. Further, the height of the human model is set to 2.2 times the determined
|
|
arm-length of the operator. The size of the other body parts is then scaled
|
|
dependent on that height parameter and predefined weights. We tried to match
|
|
the proportions of the human body as good as possible with that approach. The
|
|
position of the resulting body model is bound to the determined location of
|
|
the Aruco marker on the operators chest, which was again received by
|
|
subscription to the \verb|tf| topic. Thus, since the model is recreated and
|
|
re-published in each iteration of the node it is dynamically moving with the
|
|
operator.
|
|
|
|
Moreover, for a useful interface it was crucial to have a dynamic
|
|
representation of the operator's arms in the model. After several tries using
|
|
the different marker types (e.g. cylinders and arrows) turned out to be too
|
|
elaborate to implement, we decided to use markers of the type
|
|
\textit{line-strip} starting from points at shoulders and ending on points on
|
|
the hands for the model's arms. By using the shoulder points that were defined
|
|
in the body model and locking the points on the hands to the positions that
|
|
were determined for the markers in the operators hands, we finally created a
|
|
model that represents the operators arm positions and thereby provides support
|
|
for various tasks such as grabbing an object. The final model is shown in
|
|
figure \autoref{fig:rviz-human-model}. Just for reference, we also included a
|
|
marker of type \textit{sphere} that depicts the position of the recording
|
|
webcam.
|
|
|
|
In addition, we added camera feed showing the operator. Within the feed ArUco
|
|
markers are highlighted once they are detected. This was done by including the
|
|
output of the ArUco detection module in the interface. A sample output is shown
|
|
in figure \autoref{fig:aruco-detection}.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
%\hfill
|
|
\begin{subfigure}[b]{0.4\linewidth}
|
|
\includegraphics[width=\linewidth]{figures/interface_nao.png}
|
|
\caption{}
|
|
%{{\small $i = 1 \mu m$}}
|
|
\label{fig:rviz-nao-model}
|
|
\end{subfigure}
|
|
\begin{subfigure}[b]{0.4\linewidth}
|
|
\includegraphics[width=\linewidth]{figures/rviz_human.png}
|
|
\caption{}
|
|
%{{\small $i = -1 \mu A$}}
|
|
\label{fig:rviz-human-model}
|
|
\end{subfigure}
|
|
\caption{NAO and operator in rviz.}
|
|
\label{fig:interface}
|
|
\end{figure}
|
|
|
|
\subsection{Navigation}\label{ssec:navigation}
|
|
|
|
Next, our system needed a way for the operator to command the robot to a
|
|
desired location. Furthermore, the operator has to be able to adjust the speed
|
|
of the robot's movement. To achieve this we use the approach that we call the
|
|
``Human Joystick''. We implement this approach in a module called
|
|
\verb|walker|.
|
|
|
|
Through the calibration procedure we determine the initial position of the
|
|
operator. Furthermore, we track the position of the operator by locating the
|
|
ArUco marker on the operator's chest. Then, we can map the current position of
|
|
the user to the desired direction and speed of the robot. For example, if the
|
|
operator steps to the right from the initial position, then the robot will be
|
|
moving to the right until the operator returns back into the initial position.
|
|
The further the operator is from the origin, the faster will the robot move. In
|
|
order to control the rotation of the robot, the operator can slightly turn the
|
|
body clockwise or counterclockwise while being in the initial position so that
|
|
the marker can still be detected by the webcam. The speed of the rotation can
|
|
also be controlled by the magnitude of the operator's rotation. The process is
|
|
schematically illustrated in \autoref{fig:joystick}.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=0.8\linewidth]{figures/usr_pt.png}
|
|
\caption{User position tracking model}
|
|
\label{fig:joystick}
|
|
\end{figure}
|
|
|
|
There is a small region around the original position, in which the operator
|
|
can stay without causing the robot to move. As soon as the operator exceeds the
|
|
movement threshold into some direction, the robot will slowly start moving in
|
|
that direction. We use the following relationship for calculating the robot's
|
|
speed:
|
|
|
|
$$v = v_{min} + \frac{d - d_{thr}}{d_{max} - d_{thr}}(v_{max} - v_{min})$$
|
|
|
|
Here, $d$ denotes the operator's distance from the origin in that direction,
|
|
$d_{thr}$ is the minimum distance required for starting the movement and the
|
|
$d_{max}$ is the boundary of the control zone; $d_{thr}$ and $d_{max}$ are
|
|
determined through the calibration process. Currently, there can only be
|
|
movement in one direction at a time, so in case the operator exceeds the
|
|
threshold in more than one direction, the robot will move in the direction with
|
|
the higher precedence. The forwards-backwards motion has the highest
|
|
precedence, then goes the sideways motion and, finally, the rotation.
|
|
|
|
Our test have shown, that having the control over the speed is crucial for the
|
|
success of the teleoperation. The alignment to an object is impossible if the
|
|
robot is walking at its maximum speed, on the other hand walking around the
|
|
room at a fraction of the maximum speed is too slow.
|
|
|
|
\subsection{Imitation}\label{ssec:imitation}
|
|
|
|
One of the main objectives of our project was the imitation of the operator
|
|
arm motions by the NAO. In order to perform this, first the appropriate mapping
|
|
between the relative locations of the detected ArUco markers and the desired
|
|
hand positions of the robot needs to be calculated. Then, based on the
|
|
target coordinates, the robot joint rotations need to be calculated.
|
|
|
|
\paragraph{Posture retargeting}
|
|
|
|
First, let us define the notation of the coordinates that we will use to
|
|
describe the posture retargeting procedure. Let $r$ denote the 3D $(x, y, z)$
|
|
coordinates, then the subscript defines the object which has these coordinates,
|
|
and the superscript defines the coordinate frame in which these coordinates are
|
|
taken. So, for example, $r_{NAO hand}^{NAO torso}$ gives the coordinate of the
|
|
hand of the NAO robot in the frame of the robot's torso.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
%\hfill
|
|
\begin{subfigure}[b]{0.45\linewidth}
|
|
\includegraphics[width=\linewidth]{figures/operator_frames.png}
|
|
\caption{Operator's chest and shoulder frames}
|
|
%{{\small $i = 1 \mu m$}}
|
|
\label{fig:operator-frames}
|
|
\end{subfigure}
|
|
\begin{subfigure}[b]{0.45\linewidth}
|
|
\includegraphics[width=\linewidth]{figures/robot_torso.png}
|
|
\caption{NAO's torso frame}
|
|
%{{\small $i = -1 \mu A$}}
|
|
\label{fig:nao-frames}
|
|
\end{subfigure}
|
|
\caption{Coordinate frames}
|
|
\label{fig:coord-frames}
|
|
\end{figure}
|
|
|
|
After the ArUco markers are detected and published on ROS \verb|tf|, as was
|
|
described in \autoref{ssec:vision}, we have the three vectors
|
|
$r_{aruco,chest}^{webcam}$, $r_{aruco,lefthand}^{webcam}$ and
|
|
$r_{aruco,righthand}^{webcam}$. We describe the retargeting for one hand, since
|
|
it is symmetrical for the other hand. We also assume that the user's coordinate
|
|
systems have the same orientation, with the z-axis pointing upwards, the x-axis
|
|
pointing straight into webcam and the y-axis to the left of the webcam
|
|
\footnote{This assumption holds, because for the imitation mode the user always
|
|
faces the camera directly and stands straight up. We need this assumption for
|
|
robustness against the orientation of the chest marker, since it can
|
|
accidentally get tilted. If we would bind the coordinate system to the chest
|
|
marker completely, we would need to place the marker on the chest firmly and
|
|
carefully, which is time consuming.}. Therefore, we can directly calculate
|
|
the hand position in the user chest frame by the means of the following
|
|
equation:
|
|
|
|
$$r_{hand,user}^{chest,user} = r_{aruco,hand}^{webcam} -
|
|
r_{aruco,chest}^{webcam}$$
|
|
|
|
Next, we remap the hand coordinates in the chest frame into the user shoulder
|
|
frame, using the following relation:
|
|
|
|
$$r_{hand,user}^{shoulder,user} =
|
|
r_{hand,user}^{chest,user} - r_{shoulder,user}^{chest,user}$$
|
|
|
|
We know the coordinates of the user's shoulder in the user's chest frame from
|
|
the calibration procedure, described in \autoref{ssec:interface}.
|
|
|
|
Now, we perform the retargeting of the user's hand coordinates to the desired
|
|
NAO's hand coordinates in the NAO's shoulder frame with the following formula:
|
|
|
|
$$r_{hand,NAO}^{shoulder,NAO} =
|
|
\frac{L_{arm,NAO}}{L_{arm,user}} r_{hand,user}^{shoulder,user}$$
|
|
|
|
As before, we know the length of the user's arm through calibration and the
|
|
length of the NAO's arm through the specification provided by the manufacturer.
|
|
|
|
A final step of the posture retargeting is to obtain the coordinates of the end
|
|
effector in the torso frame. This can be done through the following relation:
|
|
|
|
$$r_{hand,NAO}^{torso,NAO} =
|
|
r_{hand,NAO}^{shoulder,NAO} + r_{shoulder,NAO}^{torso,NAO}$$
|
|
|
|
The coordinates of the NAO's shoulder in the NAO's torso frame can be obtained
|
|
through a call to the NAOqi API.
|
|
|
|
Now that the desired position of the NAO's hands are known, the appropriate
|
|
joint motions need to be calculated by the means of Cartesian control.
|
|
|
|
\paragraph{Cartesian control}
|
|
|
|
At first, we tried to employ the Cartesian controller that is shipped with the
|
|
NAOqi SDK. We soon realized, however, that this controller was unsuitable for
|
|
our task, because of the two significant limitations. The first problem with
|
|
the NAO's controller is that it freezes if the target is being updated too
|
|
often: the arms of the robot start to stutter, and then make a final erratic
|
|
motion once the program is terminated. However, arm teleoperation requires
|
|
smoothness and therefore frequent updates of the target position, and the NAO
|
|
controller didn't fit these requirements. A possible reason for such behavior
|
|
could be a bug in the implementation, and it might be possible that this
|
|
problem was fixed in the later versions of the NAOqi SDK.
|
|
|
|
Secondly, the controller of the NAO is not robust against
|
|
\textit{singularities}. Singularities occur, when the kinematic chain loses one
|
|
or more degrees of freedom, and so in order to reach a desired position, the
|
|
joint motors must apply infinite torques. Practically, for the imitation task
|
|
this would mean that once the robot has its arms fully stretched, the arms
|
|
would execute violent erratic motions which would hurt the robot or cause it to
|
|
lose balance. Therefore, we needed to implement our own Cartesian controller,
|
|
which would allow us to operate the robot smoothly and don't worry about the
|
|
singularities.
|
|
|
|
In our case, the output of the Cartesian controller are the 4 angles of the
|
|
rotational joints for the shoulder and the elbow part of each arm of the NAO
|
|
robot. The angle speeds for the joints can be calculated using the following
|
|
formula:
|
|
|
|
$$\dot{\theta} = J^{-1}\dot{r}$$
|
|
|
|
In this formula $\dot{r}$ denotes the 3D speed of the target, which is the
|
|
result of the posture retargeting, namely $r_{hand,NAO}^{torso,NAO}$. $J$ is
|
|
the Jacobian matrix \cite{jacobian}. The Jacobian matrix gives the relationship
|
|
between the joint angle speed and the resulting speed of the effector on the
|
|
end of the kinematic chain which the Jacobian matrix describes.
|
|
|
|
We now apply a common simplification and state that
|
|
|
|
$$\Delta \theta \approx J^{-1}\Delta r$$
|
|
|
|
Here $\Delta$ is a small change in angle or the position. We use
|
|
|
|
$$\Delta r = \frac{r_{desired} - r_{current}}{K},\ K = 10$$
|
|
|
|
This means that we want the $r$ to make a small movement in the
|
|
direction of the desired position.
|
|
|
|
Now we need to calculate a Jacobian matrix. There are 2 main ways to determine
|
|
the Jacobian matrix. The first way is the numerical method, where this
|
|
approximation is done by checking how the end effector moves with small joint
|
|
angle changes. For this we can approximate each column of the Jacobian Matrix
|
|
as follows:
|
|
|
|
$$
|
|
J_j = \frac{\partial r}{\partial\theta_j} \approx
|
|
\frac{\Delta r}{\Delta\theta_j} =
|
|
\left(
|
|
\begin{array}{ccc}
|
|
\frac{\Delta r_x}{\Delta\theta_j} &
|
|
\frac{\Delta r_y}{\Delta\theta_j} &
|
|
\frac{\Delta r_z}{\Delta\theta_j}
|
|
\end{array}
|
|
\right)^{T}
|
|
$$
|
|
|
|
We tested this approach, the results, however, were rather unstable, and due to
|
|
the lack of time we didn't investigate the possible ways to make this approach
|
|
perform better. A possible reason for bad performance of this method could be
|
|
the imprecise readings from the NAO's joint sensors and the imprecise
|
|
calculation of the position of the end effector, also performed by the NAO
|
|
internally.
|
|
|
|
The other method that we employed was to calculate the Jacobian matrix
|
|
analytically. Since only rotational joints were available, the approximation
|
|
for the Jacobian matrix, which is the tangent in rotational joints, can be
|
|
calculated using the cross product between the rotational axis of a joint,
|
|
denoted by $e_j$, and the rotational vector $r_{end}-r_{j}$, where $r_{end}$
|
|
is the position of the end effector (i.e.\ hand) and $r_{j}$ is the position of
|
|
the joint. The following relation gives us one column of the Jacobian matrix.
|
|
|
|
$$
|
|
J_j = \frac{\partial r_{end}}{\partial\theta_j} =
|
|
(e \times (r_{end}-r_j))
|
|
$$
|
|
|
|
We can get the rotational axis of a joint and the position of the joint in the
|
|
torso frame through NAOqi API. This can be repeated for each rotational joint
|
|
until the whole matrix is filled.
|
|
|
|
The next step for the Cartesian controller is to determine the inverse Jacobian
|
|
matrix for the inverse kinematic. For this singular value decomposition is
|
|
used, which is given by
|
|
|
|
$$J = U\Sigma V^T$$
|
|
|
|
Then, the inverse can be calculated by
|
|
|
|
$$J^{-1} = V \Sigma^{-1} U^T$$
|
|
|
|
One advantage of this approach is that it can be employed to find a
|
|
pseudoinverse of a non-square matrix. Furthermore, the diagonal matrix $\Sigma$
|
|
has the $J$'s singular values in its main diagonal. If any of the singular
|
|
values are close to zero, this means that the $J$ has lost rank and therefore
|
|
the singularity occurs. We can calculate
|
|
|
|
$$\Sigma^{-1} = (\frac{1}{\Sigma})^T$$
|
|
|
|
Then we can avoid the singularity behavior by setting to $0$ the entries in
|
|
$\Sigma^{-1}$ that are above a threshold value $\tau = 50$, which we determined
|
|
through experimentation.
|
|
|
|
The final control objective for the current loop iteration can be stated as:
|
|
|
|
$$\theta_{targ} = \theta_{cur} + \Delta\theta$$
|
|
|
|
Our test have shown, that our controller doesn't have the freezing behavior,
|
|
which is present in the NAO's own controller, and therefore the target of the
|
|
control can be updated with arbitrary frequency. Furthermore, our controller
|
|
shows no signs of producing violent arm motions, which means that our strategy
|
|
for handling singularities was effective. The implementation for the whole
|
|
imitation routine resides in the \verb|imitator| module of our system.
|
|
|
|
\section{System Implementation and Integration}\label{sec:integration}
|
|
|
|
Now that the individual modules were designed and implemented, the whole system
|
|
needed to be assembled together. The state machine that we designed can be
|
|
seen in the \autoref{fig:overview}.
|
|
|
|
The software package was organized as a collection of ROS nodes, controlled by
|
|
a single master node. The master node keeps track of the current system state,
|
|
and the slave nodes consult with the master node to check if they are allowed
|
|
to perform an action. To achieve this, the master node creates a server for a
|
|
ROS service, named \verb|inform_masterloop|, with this service call taking as
|
|
arguments a name of the caller and the desired action and responding with a
|
|
Boolean value indicating, whether a permission to perform the action was
|
|
granted. The master node can then update the system state based on the received
|
|
action requests and the current state. Some slave nodes, such as the walking or
|
|
imitation nodes run in a high-frequency loop, and therefore consult with the
|
|
master in each iteration of the loop. Other nodes, such as the fall detector,
|
|
only inform the master about the occurrence of certain events, such as the fall
|
|
or fall recovery, so that the master could deny requests for any activities,
|
|
until the fall recovery is complete.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=0.9\linewidth]{figures/sys_arch.png}
|
|
\caption{Overview of the interactions in the system.}
|
|
\label{fig:impl_overview}
|
|
\end{figure}
|
|
|
|
We will now illustrate our architecture by using interaction between the walker
|
|
node and the master node as an example. This interaction is depicted in the
|
|
\autoref{fig:master-walker}. The walker node subscribes to the \verb|tf|
|
|
transform of the chest ArUco marker, and requests a position update every 0.1
|
|
seconds. If in the current cycle the marker happens to be outside of the buffer
|
|
zone (see \autoref{fig:joystick}), or the rotation of the marker exceeds the
|
|
motion threshold, the walker node will ask the master node for a permission to
|
|
start moving. The master node will receive the request, and if the current
|
|
state of the system is either \textit{walking} or \textit{idle} (see
|
|
\autoref{fig:overview}), then the permission will be granted and the system
|
|
will transit into the \textit{walking} state. If the robot is currently
|
|
imitating the arm motions or has not yet recovered from a fall, then the
|
|
permission will not be granted and the system will remain in its current state
|
|
\footnote{We did research a possibility of automatic switching between walking
|
|
and imitating, so that the robot always imitates when the operator is within
|
|
the buffer zone, and stops imitating as soon as the operator leaves the
|
|
buffer zone, but this approach requires more skill and concentration from the
|
|
operator, so the default setting is to explicitly ask the robot to go into
|
|
imitating state and back into idle.}.
|
|
The walker node will then receive the
|
|
master's response, and in case it was negative, any current movement will be
|
|
stopped and the next cycle of the loop will begin. In case the permission was
|
|
granted, the walker will calculate the direction and the speed of the movement,
|
|
based on the marker position, and will send a command to the robot over the
|
|
NAOqi API to start moving. We use a non-blocking movement function, so that the
|
|
movement objective can be updated with every loop iteration. Finally, if the
|
|
marker is within the buffer zone, the robot will be commanded to stop by the
|
|
walker node, and the master will be informed, that the robot has stopped
|
|
moving. Since in this case the walker node gives up the control, the permission
|
|
from the master doesn't matter.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=0.9\linewidth]{figures/master_walker.png}
|
|
\caption{Interaction between master and walker modules.}
|
|
\label{fig:master-walker}
|
|
\end{figure}
|
|
|
|
A final piece of our system is the speech-based command interface. Since in our
|
|
system the acceptable commands vary between states, the speech recognition
|
|
controller must be aware of the current state of the system, therefore the
|
|
master node is responsible for this functionality. The master node runs an
|
|
auxiliary loop, in which a recognition target is sent to the speech server
|
|
node, described in \autoref{ssec:interface}. If a relevant word is detected,
|
|
master receives the result and updates the state accordingly and then sends a
|
|
new recognition target. If a state change occurred before any speech was
|
|
detected, then the master sends a cancellation request to the speech server for
|
|
the currently running objective and, again, sends a new target.
|
|
|
|
\section{Conclusion and Possible Drawbacks}
|
|
|
|
Upon completion of this project, our team successfully applied the knowledge
|
|
that we acquired during the HRS lectures and tutorials to a complex practical
|
|
task. We implemented an easy to use prototype of a teleoperation system, which
|
|
is fairly robust to the environmental conditions. Furthermore, we researched
|
|
several approaches to the implementation of the Cartesian control, and were
|
|
able to create a Cartesian controller, which is superior to the NAO's built-in
|
|
one. Finally, we extensively used ROS and now can confidently employ ROS in the
|
|
future projects.
|
|
|
|
Our resulting system has a few drawbacks, however, and there is room for
|
|
future improvements. Some of these drawbacks are due to the time constraints,
|
|
the other ones have to do with the limitations of NAO itself. The first major
|
|
drawback is the reliance on the NAO's built-in speech recognition for
|
|
controlling the robot. Because of this, the operator has to be in the same room
|
|
with the robot, which severely constraints the applicability of the
|
|
teleoperation system. Furthermore, since the acting robot is the one detecting
|
|
the speech, it can be susceptible to the sounds it makes during the operation
|
|
(joint noises, warning notifications). Also, as the live demonstration
|
|
revealed, using voice-based control in a crowded environment can lead to a high
|
|
number of false positive detections and therefore instability of the system. A
|
|
simple solution is to use two NAO robots, one of which is in the room with the
|
|
operator acting solely as a speech detection tool, and the other one is in
|
|
another room performing the actions. A saner approach is to apply third-party
|
|
speech recognition software to a webcam microphone feed, since there are
|
|
speech-recognition packages for ROS available \cite{ros-speech}. However,
|
|
because the speech recognition wasn't the main objective of our project, we
|
|
will reserve this for possible future work.
|
|
|
|
Another important issue, which can be a problem for remote operation are the
|
|
cables. A NAO is connected to the controlling computer over the Ethernet cable,
|
|
and also, due to the low capacity of the NAO's battery, the power cord needs to
|
|
be plugged in most of the time. The problem with this is that without the
|
|
direct oversight of the operator, it is impossible to know where the cables are
|
|
relative to the robot, so it is impossible to prevent the robot from tripping
|
|
over the cables and falling. When it comes to battery power, the NAO has some
|
|
autonomy; the Ethernet cable, however, cannot be removed because the onboard
|
|
Wi-Fi of the NAO is too slow to allow streaming of the video feed and joint
|
|
telemetry.
|
|
|
|
A related issue is a relatively narrow field of view of the NAO's cameras. In a
|
|
cordless case, the camera feed might be sufficient for the operator to navigate
|
|
the NAO through the environment. However, picking up the objects when only
|
|
seeing them through the robot's cameras is extremely difficult, because of the
|
|
narrow field of view and lack of the depth information. A possible solution to
|
|
this issue and the previous one, which would enable to operate a NAO and not be
|
|
in the same room with it, is to equip the robot's room with video cameras,
|
|
so that some oversight is possible.
|
|
|
|
Finally, there is a problem with the NAO's stability when it walks carrying an
|
|
object. Apparently, the NAOqi walking controller relies on the movement of the
|
|
arms to stabilize the walking. It seems that if the arms are occupied by some
|
|
other task during the walk, the built-in controller doesn't try to
|
|
intelligently compensate, which has led to a significant number of falls during
|
|
our experiments. Due to the time constraints, we weren't able to investigate
|
|
any approaches to make the walking more stable. This, however, can be an
|
|
interesting topic for future semester projects.
|
|
|
|
\bibliography{references}{}
|
|
\bibliographystyle{IEEEtran}
|
|
|
|
\end{document}
|