final fixes on the report
This commit is contained in:
@@ -68,11 +68,11 @@ The text data, before being supplied to the neural network, has to pass several
|
|||||||
preprocessing stages. These stages, as implemented in this project, form an
|
preprocessing stages. These stages, as implemented in this project, form an
|
||||||
\textit{input pipeline}, which is depicted in \autoref{fig:pipeline}. First,
|
\textit{input pipeline}, which is depicted in \autoref{fig:pipeline}. First,
|
||||||
the pipeline node called \textit{Tokenizer} reads a character stream from a
|
the pipeline node called \textit{Tokenizer} reads a character stream from a
|
||||||
text file. This node is responsible for replacing all non-ASCII alphabetic
|
text file. This node is responsible for replacing all non-alphabetic and
|
||||||
characters in the stream with whitespace, normalizing the stream by setting all
|
non-ASCII characters in the stream with whitespace, normalizing the stream by
|
||||||
remaining alphabetic characters to lowercase, and finally splitting the stream
|
setting all remaining alphabetic characters to lowercase, and finally splitting
|
||||||
into tokens (words) and passing the words one-by-one to the next pipeline
|
the stream into tokens (words) and passing the words one-by-one to the next
|
||||||
stage.
|
pipeline stage.
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\centering
|
\centering
|
||||||
@@ -95,8 +95,8 @@ window is filled it is sent down the pipeline for training batch assembly. In
|
|||||||
the system implemented in this project a context window of size 5 is used.
|
the system implemented in this project a context window of size 5 is used.
|
||||||
|
|
||||||
In the final stage of the input pipeline, the node called \textit{Batcher}
|
In the final stage of the input pipeline, the node called \textit{Batcher}
|
||||||
accumulates the context windows into batches, which can then be requested by a
|
accumulates the context windows into batches, which can then be requested by
|
||||||
\textit{Learner} node containing the neural network for the actual neural
|
\textit{Learner} nodes containing the neural network for the actual neural
|
||||||
network training.
|
network training.
|
||||||
|
|
||||||
The other dimension of the parallelism employed in this system is the
|
The other dimension of the parallelism employed in this system is the
|
||||||
@@ -115,8 +115,8 @@ sequentially.
|
|||||||
|
|
||||||
In the presented system, there is one central node, called the
|
In the presented system, there is one central node, called the
|
||||||
\textit{Dispatcher}, that is responsible for storing the model weights,
|
\textit{Dispatcher}, that is responsible for storing the model weights,
|
||||||
distributing the weights to the \textit{Learner} nodes, which perform the
|
distributing the weights to the \textit{Learner} nodes (which perform the
|
||||||
actual training, and collecting the weights at the end of a training round and
|
actual training) and collecting the weights at the end of a training round and
|
||||||
computing their average. \autoref{fig:modes} demonstrates that the system
|
computing their average. \autoref{fig:modes} demonstrates that the system
|
||||||
allows for each \textit{Learner} to have its own input pipeline, or for one
|
allows for each \textit{Learner} to have its own input pipeline, or for one
|
||||||
single input pipeline to be shared among all Learners, or for some intermediate
|
single input pipeline to be shared among all Learners, or for some intermediate
|
||||||
@@ -268,7 +268,7 @@ to have 32 dimensions.
|
|||||||
read by the \verb|library.py| module on start-up, and the vocabulary, the test
|
read by the \verb|library.py| module on start-up, and the vocabulary, the test
|
||||||
dataset and the parameters of training are stored as global module objects. The
|
dataset and the parameters of training are stored as global module objects. The
|
||||||
\verb|bridge.pyx| then imports the \verb|library.py| module and defines several
|
\verb|bridge.pyx| then imports the \verb|library.py| module and defines several
|
||||||
C public API functions for the \verb|main.c| code to access the configuration
|
public C API functions for the \verb|main.c| code to access the configuration
|
||||||
parameters, or to perform a word index lookup or evaluate a neural network
|
parameters, or to perform a word index lookup or evaluate a neural network
|
||||||
based on the test dataset.
|
based on the test dataset.
|
||||||
|
|
||||||
@@ -277,7 +277,7 @@ function in the \verb|main.c| file, which receives as an argument the path to a
|
|||||||
text file, from which the tokens will be read. It then calls a function
|
text file, from which the tokens will be read. It then calls a function
|
||||||
\verb|get_tokens(WordList* wl, const char* filename)|, defined in the
|
\verb|get_tokens(WordList* wl, const char* filename)|, defined in the
|
||||||
\verb|bridge.pyx| file. The \verb|WordList| structure is a dynamically growable
|
\verb|bridge.pyx| file. The \verb|WordList| structure is a dynamically growable
|
||||||
list of \verb|Word| structs, that records the number of \verb|Word|s in the
|
list of \verb|Word| structs that records the number of \verb|Word|s in the
|
||||||
list as well as the memory available for storing the \verb|Word|s. A
|
list as well as the memory available for storing the \verb|Word|s. A
|
||||||
\verb|Word| structure is a wrapper around the C \verb|char*|, keeping track of
|
\verb|Word| structure is a wrapper around the C \verb|char*|, keeping track of
|
||||||
the memory allocated to the pointer. The function \verb|get_tokens| consults a
|
the memory allocated to the pointer. The function \verb|get_tokens| consults a
|
||||||
@@ -285,7 +285,7 @@ global dictionary contained in \verb|bridge.pyx| that keeps track of the file
|
|||||||
names for which a token generator already exists. If the generator for the file
|
names for which a token generator already exists. If the generator for the file
|
||||||
was not yet created, or if it is already empty, then a new generator is
|
was not yet created, or if it is already empty, then a new generator is
|
||||||
created, by calling the \verb|token_generator(filename)| function, defined in
|
created, by calling the \verb|token_generator(filename)| function, defined in
|
||||||
\verb|library.py|, which returns the generator that yields a list of tokens
|
\verb|library.py|, which returns a generator that yields a list of tokens
|
||||||
from a line in the file, line by line. A list of words is then queried from the
|
from a line in the file, line by line. A list of words is then queried from the
|
||||||
generator, and the \verb|WordList| structure is populated with the words from
|
generator, and the \verb|WordList| structure is populated with the words from
|
||||||
the list, expanding the memory allocated to it if needed. The \verb|tokenizer|
|
the list, expanding the memory allocated to it if needed. The \verb|tokenizer|
|
||||||
@@ -302,10 +302,10 @@ up their indices in the vocabulary by calling the \verb|vocab_idx_of(Word* w)|
|
|||||||
function defined in \verb|bridge.pyx|. That function performs a dictionary
|
function defined in \verb|bridge.pyx|. That function performs a dictionary
|
||||||
lookup for the word, based on the \verb|config/vocab.txt| file, and returns its
|
lookup for the word, based on the \verb|config/vocab.txt| file, and returns its
|
||||||
index on success or \verb|-1| if the word is not known. The Filter will
|
index on success or \verb|-1| if the word is not known. The Filter will
|
||||||
assemble the indices in a \verb|long* window| variable until enough words are
|
assemble valid indices in a \verb|long* window| variable until enough words are
|
||||||
received to send the context window to the Batcher. If a word received from the
|
received to send the context window to the Batcher. If a word received from the
|
||||||
Tokenizer is empty, the Filter sets the first element in the context window to
|
Tokenizer is empty, the Filter sets the first element in the context window to
|
||||||
\verb|-1| and sends the window to the Batcher for termination.
|
\verb|-1| and sends the window to a Batcher for termination.
|
||||||
|
|
||||||
\paragraph{Batcher} A Batcher is a rather simple pure C routine, that first
|
\paragraph{Batcher} A Batcher is a rather simple pure C routine, that first
|
||||||
assembles the context windows into a batch, simultaneously converting
|
assembles the context windows into a batch, simultaneously converting
|
||||||
@@ -319,7 +319,7 @@ the Learner sends \verb|-1| when announcing itself, then the Batcher will
|
|||||||
terminate immediately.
|
terminate immediately.
|
||||||
|
|
||||||
\paragraph{Learner} A Learner, implemented in \verb|learner| function in
|
\paragraph{Learner} A Learner, implemented in \verb|learner| function in
|
||||||
\verb|main.c| first creates a TensorFlow neural network object and stores the
|
\verb|main.c|, first creates a TensorFlow neural network object and stores the
|
||||||
network as a \verb|PyObject*|. It also initializes a C \verb|WeightList| struct
|
network as a \verb|PyObject*|. It also initializes a C \verb|WeightList| struct
|
||||||
to store the network weights and to serve as a buffer for communication with
|
to store the network weights and to serve as a buffer for communication with
|
||||||
the Dispatcher. It then waits for the Dispatcher to announce a new training
|
the Dispatcher. It then waits for the Dispatcher to announce a new training
|
||||||
|
|||||||
Reference in New Issue
Block a user