I am currently trying to understand the architecture behind the word2vec neural net learning algorithm, for representing words as vectors based on their context.
<
I find the previous answers here a bit overcomplicated - a projection layer is just a simple matrix multiplication, or in the context of NN, a regular/dense/linear layer, without the non-linear activation in the end (sigmoid/tanh/relu/etc.) The idea is to project the (e.g.) 100K-dimensions discrete vector into a 600-dimensions continuous vector (I chose the numbers here randomly, "your mileage may vary"). The exact matrix parameters are learned through the training process.
What happens before/after already depends on the model and context, and is not what OP asks.
(In practice you wouldn't even bother with the matrix multiplication (as you are multiplying a 1-hot vector which has 1 for the word index and 0's everywhere else), and would treat the trained matrix as a lookout table (i.e. the 6257th word in the corpus = the 6257th row/column (depends how you define it) in the projection matrix).)