final fixes on the report

This commit is contained in:
2020-08-05 22:12:57 +02:00
parent 27a1beff70
commit 3e850713ca

View File

@@ -68,11 +68,11 @@ The text data, before being supplied to the neural network, has to pass several
preprocessing stages. These stages, as implemented in this project, form an preprocessing stages. These stages, as implemented in this project, form an
\textit{input pipeline}, which is depicted in \autoref{fig:pipeline}. First, \textit{input pipeline}, which is depicted in \autoref{fig:pipeline}. First,
the pipeline node called \textit{Tokenizer} reads a character stream from a the pipeline node called \textit{Tokenizer} reads a character stream from a
text file. This node is responsible for replacing all non-ASCII alphabetic text file. This node is responsible for replacing all non-alphabetic and
characters in the stream with whitespace, normalizing the stream by setting all non-ASCII characters in the stream with whitespace, normalizing the stream by
remaining alphabetic characters to lowercase, and finally splitting the stream setting all remaining alphabetic characters to lowercase, and finally splitting
into tokens (words) and passing the words one-by-one to the next pipeline the stream into tokens (words) and passing the words one-by-one to the next
stage. pipeline stage.
\begin{figure} \begin{figure}
\centering \centering
@@ -95,8 +95,8 @@ window is filled it is sent down the pipeline for training batch assembly. In
the system implemented in this project a context window of size 5 is used. the system implemented in this project a context window of size 5 is used.
In the final stage of the input pipeline, the node called \textit{Batcher} In the final stage of the input pipeline, the node called \textit{Batcher}
accumulates the context windows into batches, which can then be requested by a accumulates the context windows into batches, which can then be requested by
\textit{Learner} node containing the neural network for the actual neural \textit{Learner} nodes containing the neural network for the actual neural
network training. network training.
The other dimension of the parallelism employed in this system is the The other dimension of the parallelism employed in this system is the
@@ -115,8 +115,8 @@ sequentially.
In the presented system, there is one central node, called the In the presented system, there is one central node, called the
\textit{Dispatcher}, that is responsible for storing the model weights, \textit{Dispatcher}, that is responsible for storing the model weights,
distributing the weights to the \textit{Learner} nodes, which perform the distributing the weights to the \textit{Learner} nodes (which perform the
actual training, and collecting the weights at the end of a training round and actual training) and collecting the weights at the end of a training round and
computing their average. \autoref{fig:modes} demonstrates that the system computing their average. \autoref{fig:modes} demonstrates that the system
allows for each \textit{Learner} to have its own input pipeline, or for one allows for each \textit{Learner} to have its own input pipeline, or for one
single input pipeline to be shared among all Learners, or for some intermediate single input pipeline to be shared among all Learners, or for some intermediate
@@ -268,7 +268,7 @@ to have 32 dimensions.
read by the \verb|library.py| module on start-up, and the vocabulary, the test read by the \verb|library.py| module on start-up, and the vocabulary, the test
dataset and the parameters of training are stored as global module objects. The dataset and the parameters of training are stored as global module objects. The
\verb|bridge.pyx| then imports the \verb|library.py| module and defines several \verb|bridge.pyx| then imports the \verb|library.py| module and defines several
C public API functions for the \verb|main.c| code to access the configuration public C API functions for the \verb|main.c| code to access the configuration
parameters, or to perform a word index lookup or evaluate a neural network parameters, or to perform a word index lookup or evaluate a neural network
based on the test dataset. based on the test dataset.
@@ -277,7 +277,7 @@ function in the \verb|main.c| file, which receives as an argument the path to a
text file, from which the tokens will be read. It then calls a function text file, from which the tokens will be read. It then calls a function
\verb|get_tokens(WordList* wl, const char* filename)|, defined in the \verb|get_tokens(WordList* wl, const char* filename)|, defined in the
\verb|bridge.pyx| file. The \verb|WordList| structure is a dynamically growable \verb|bridge.pyx| file. The \verb|WordList| structure is a dynamically growable
list of \verb|Word| structs, that records the number of \verb|Word|s in the list of \verb|Word| structs that records the number of \verb|Word|s in the
list as well as the memory available for storing the \verb|Word|s. A list as well as the memory available for storing the \verb|Word|s. A
\verb|Word| structure is a wrapper around the C \verb|char*|, keeping track of \verb|Word| structure is a wrapper around the C \verb|char*|, keeping track of
the memory allocated to the pointer. The function \verb|get_tokens| consults a the memory allocated to the pointer. The function \verb|get_tokens| consults a
@@ -285,7 +285,7 @@ global dictionary contained in \verb|bridge.pyx| that keeps track of the file
names for which a token generator already exists. If the generator for the file names for which a token generator already exists. If the generator for the file
was not yet created, or if it is already empty, then a new generator is was not yet created, or if it is already empty, then a new generator is
created, by calling the \verb|token_generator(filename)| function, defined in created, by calling the \verb|token_generator(filename)| function, defined in
\verb|library.py|, which returns the generator that yields a list of tokens \verb|library.py|, which returns a generator that yields a list of tokens
from a line in the file, line by line. A list of words is then queried from the from a line in the file, line by line. A list of words is then queried from the
generator, and the \verb|WordList| structure is populated with the words from generator, and the \verb|WordList| structure is populated with the words from
the list, expanding the memory allocated to it if needed. The \verb|tokenizer| the list, expanding the memory allocated to it if needed. The \verb|tokenizer|
@@ -302,10 +302,10 @@ up their indices in the vocabulary by calling the \verb|vocab_idx_of(Word* w)|
function defined in \verb|bridge.pyx|. That function performs a dictionary function defined in \verb|bridge.pyx|. That function performs a dictionary
lookup for the word, based on the \verb|config/vocab.txt| file, and returns its lookup for the word, based on the \verb|config/vocab.txt| file, and returns its
index on success or \verb|-1| if the word is not known. The Filter will index on success or \verb|-1| if the word is not known. The Filter will
assemble the indices in a \verb|long* window| variable until enough words are assemble valid indices in a \verb|long* window| variable until enough words are
received to send the context window to the Batcher. If a word received from the received to send the context window to the Batcher. If a word received from the
Tokenizer is empty, the Filter sets the first element in the context window to Tokenizer is empty, the Filter sets the first element in the context window to
\verb|-1| and sends the window to the Batcher for termination. \verb|-1| and sends the window to a Batcher for termination.
\paragraph{Batcher} A Batcher is a rather simple pure C routine, that first \paragraph{Batcher} A Batcher is a rather simple pure C routine, that first
assembles the context windows into a batch, simultaneously converting assembles the context windows into a batch, simultaneously converting
@@ -319,7 +319,7 @@ the Learner sends \verb|-1| when announcing itself, then the Batcher will
terminate immediately. terminate immediately.
\paragraph{Learner} A Learner, implemented in \verb|learner| function in \paragraph{Learner} A Learner, implemented in \verb|learner| function in
\verb|main.c| first creates a TensorFlow neural network object and stores the \verb|main.c|, first creates a TensorFlow neural network object and stores the
network as a \verb|PyObject*|. It also initializes a C \verb|WeightList| struct network as a \verb|PyObject*|. It also initializes a C \verb|WeightList| struct
to store the network weights and to serve as a buffer for communication with to store the network weights and to serve as a buffer for communication with
the Dispatcher. It then waits for the Dispatcher to announce a new training the Dispatcher. It then waits for the Dispatcher to announce a new training