Keras model: Input shape dimension error for RL agent

问题

My goal is to develop a DQN-agent that will choose its action based on a certain strategy/policy. I previously worked with OpenAi gym-environments, but now I wanted to create my own RL environment.

At this stage, the agent shall either choose a random action or choose his action based on the predictions given by a deep neural network (defined in the class DQN).

So far, I have setup both the neural net model and my environment. The NN shall receive states as its input. These states represent 11 possible scalar values ranging from 9.5 to 10.5 (9.5, 9.6, ..., 10.4, 10.5). Since we're dealing with RL, the agent generates its data during the training process. The output should be either 0 and 1 corresponding to the recommended action.

Now, I would like to feed my agent a scalar value: e.g. a sample state of x = 10 and let him decide upon the action to take (Agent.select_action() is called), I encounter an issue related to the input shape/input dimension.

Here's the code: 1. DQN Class:

class DQN():

    def __init__(self, state_size, action_size, lr):
        self.state_size = state_size
        self.action_size = action_size
        self.lr = lr

        self.model = Sequential()
        self.model.add(Dense(128, input_dim=self.state_size, activation='relu'))
        self.model.add(Dense(128, activation='relu'))
        self.model.add(Dense(self.action_size, activation='linear'))

        self.model.compile(optimizer=Adam(lr=self.lr), loss='mse')

        self.model.summary()


    def model_info(self):
        model_description = '\n\n---Model_INFO Summary: The model was passed {} state sizes,\
            \n {} action sizes and a learning rate of {} -----'\
                            .format(self.state_size, self.action_size, self.lr)
        return model_description

    def predict(self, state):
        return self.model.predict(state)

    def train(self, state, q_values):
        self.state = state
        self.q_values = q_values
        return self.model.fit(state, q_values, verbose=0)

    def load_weights(self, path):
        self.model.load_weights(path)

    def save_weights(self, path):
        self.model.save_weights(path)

2. Agent Class:

NUM_EPISODES = 100
MAX_STEPS_PER_EPISODE = 100
EPSILON = 0.5 
EPSILON_DECAY_RATE = 0.001
EPSILON_MIN = 0.01
EPSILON_MAX = 1
DISCOUNT_FACTOR = 0.99
REPLAY_MEMORY_SIZE = 50000
BATCH_SIZE = 50
TRAIN_START = 100
ACTION_SPACE = [0, 1]
STATE_SIZE = 11 
LEARNING_RATE = 0.01

class Agent():
    def __init__(self, num_episodes, max_steps_per_episode, epsilon, epsilon_decay_rate, \
        epsilon_min, epsilon_max, discount_factor, replay_memory_size, batch_size, train_start):
        self.num_episodes = NUM_EPISODES
        self.max_steps_per_episode = MAX_STEPS_PER_EPISODE
        self.epsilon = EPSILON
        self.epsilon_decay_rate = EPSILON_DECAY_RATE
        self.epsilon_min = EPSILON_MIN
        self.epsilon_max = EPSILON_MAX
        self.discount_factor = DISCOUNT_FACTOR
        self.replay_memory_size = REPLAY_MEMORY_SIZE
        self.replay_memory = deque(maxlen=self.replay_memory_size)
        self.batch_size = BATCH_SIZE
        self.train_start = TRAIN_START
        self.action_space = ACTION_SPACE
        self.action_size = len(self.action_space)
        self.state_size = STATE_SIZE
        self.learning_rate = LEARNING_RATE
        self.model = DQN(self.state_size, self.action_size, self.learning_rate)

    def select_action(self, state):
        random_value = np.random.rand()
        if random_value < self.epsilon:
            print('random_value = ', random_value)       
            chosen_action = random.choice(self.action_space) # = EXPLORATION Strategy
            print('Agent randomly chooses the following EXPLORATION action:', chosen_action)       
        else: 
            print('random_value = {} is greater than epsilon'.format(random_value))       
            state = np.float32(state) # Transforming passed state into numpy array
            prediction_by_model = self.model.predict(state) 
            chosen_action = np.argmax(prediction_by_model[0]) # = EXPLOITATION strategy
            print('NN chooses the following EXPLOITATION action:', chosen_action)       
        return chosen_action

if __name__ == "__main__":
    agent_test = Agent(NUM_EPISODES, MAX_STEPS_PER_EPISODE, EPSILON, EPSILON_DECAY_RATE, \
        EPSILON_MIN, EPSILON_MAX, DISCOUNT_FACTOR, REPLAY_MEMORY_SIZE, BATCH_SIZE, \
            TRAIN_START)
    # Test of select_action function:
    state = 10 
    state = np.array(state)
    print(state.shape)
    print(agent_test.select_action(state))

Here's the traceback error I get when running this code:

**ValueError**: Error when checking input: expected dense_209_input to have 2 dimensions, but got array with shape ()

I am unsure why the error regarding 2 dimensions occurs since I have configured the NN in the DQN class to receive only 1 dimension.

I have already read through similar questions on stackoverflow (Keras Sequential model input shape, Keras model input shape wrong, Keras input explanation: input_shape, units, batch_size, dim, etc). However, I was not yet able to adapt the suggestions to my use case.

Do you have any suggestions or hints? Thank you for your help!

回答1:

There are several problems here. First, what you call state_size is actually a state space, i.e. a collection of all possible states your agent can be in. The state size is actually 1, since there is only one parameter you want to pass as a state.

When you define your input layer here:

self.model.add(Dense(128, input_dim=self.state_size, activation='relu'))

You say that your input dimension will be equal to 11, but then when you call the prediction, you pass it 1 number (10).

So you either need to modify input_dim to receive only one number, or you can define your state vector like state = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), each number corresponding to a possible state (from 9.5 to 10.5). So when the state is 9.5 your state vector is [1, 0, 0, ...0] and so on.

The second problem is that when you define your state you should put square brackets