Basically I\'m trying to implement backpropogation
in a network. I know the backpropogation algorithm is hard coded, but I\'m trying to make it functional first
The initial weights you're using in your network are pretty large. Typically you want to initialize weights in a sigmoid-activation neural network proportionally to the inverse of the square root of the fan-in of the unit. So, for units in layer i of the network, choose initial weights between positive and negative n^{-1/2}, where n is the number of units in layer i-1. (See http://www.willamette.edu/~gorr/classes/cs449/precond.html for more information.)
The learning rate parameter that you seem to be using is also fairly large, which can cause your network to "bounce around" during training. I'd experiment with different values for this, on a log scale: 0.2, 0.1, 0.05, 0.02, 0.01, 0.005, ... until you find one that appears to work better.
You're really only training on two examples (though the network you're using should be able to model these two points easily). You can increase the diversity of your training dataset by adding noise to the existing inputs and expecting the network to produce the correct output. I've found that this helps sometimes when using a squared-error loss (like you're using) and trying to learn a binary boolean operator like XOR, since there are very few input-output pairs in the true function domain to train with.
Also, I'd like to make a general suggestion that might help in your approach to problems like this: add a little bit of code that will allow you to monitor the current error of the network when given a known input-output pair (or entire "validation" dataset).
If you can monitor the error of the network during training, it will help you see more clearly when the network is converging -- the error should decrease steadily as you train the network. If it bounces all around, you'll know that you're either using too large a learning rate or need to otherwise adapt your training dataset. If the error increases, something is wrong with your gradient computations.