-
-
Notifications
You must be signed in to change notification settings - Fork 74
Description
Hi @gsurma,
Thank you for the wonderful code and the medium article. I tried implementing your code but found that the loss function in my model shoots off after some time.
These are the hyper-parameters I used:
initialize environment
env = MainGymWrapper.wrap(gym.make('SpaceInvaders-v0'))
#env = gym.make('SpaceInvaders-v0')
define hyperparameters
total_step_limit = 5000000
wandb.config.episodes = 1000
GAMMA = 0.99
MEMORY_SIZE = 350000
BATCH_SIZE = 32
TRAINING_FREQUENCY = 4
TARGET_NETWORK_UPDATE_FREQUENCY = 40000
MODEL_PERSISTENCE_UPDATE_FREQUENCY = 10000
REPLAY_START_SIZE = 50000
action_size = env.action_space.n
EXPLORATION_MAX = 1.0
EXPLORATION_MIN = 0.1
EXPLORATION_TEST = 0.02
EXPLORATION_STEPS = 425000
EXPLORATION_DECAY = (EXPLORATION_MAX-EXPLORATION_MIN)/EXPLORATION_STEPS
wandb.config.batch_size = 32
wandb.config.learning_rate = 0.00025
input_shape = (4, 84, 84)
The CNN is the same. I also used np.sign for the rewards I got.
Can you guide me on what might be possibly going wrong?