题意:如何在 Keras-RL / OpenAI GYM 中实现自定义环境?
问题背景:
I'm a complete newbie to Reinforcement Learning and have been searching for a framework/module to easily navigate this treacherous terrain. In my search I've come across two modules keras-rl & OpenAI GYM.
我是强化学习的完全新手,一直在寻找一个框架/模块来轻松探索这片充满挑战的领域。在我的搜索中,我遇到了两个模块:keras-rl 和 OpenAI GYM。
I can get both of them two work on the examples they have shared on their WIKIs but they come with predefined environments and have little or no information on how to setup my own custom environment.
我可以让它们在各自的 WIKI 上分享的示例中运行,但这些示例带有预定义的环境,对于如何设置我自己的自定义环境几乎没有信息或没有说明。
I would be really thankful if anyone could point me towards a tutorial or just explain it to me on how can i setup a non-game environment?
如果有人能指引我找到一个教程,或者只是向我解释一下如何设置一个非游戏环境,我将不胜感激。
问题解决:
I've been working on these libraries for some time and can share some of my experiments.
我已经在这些库上工作了一段时间,可以分享一些我的实验。
Let us first consider as an example of custom environment a text environment, https://github.com/openai/gym/blob/master/gym/envs/toy_text/hotter_colder.py
让我们首先考虑一个自定义环境的例子,一个文本环境:https://github.com/openai/gym/blob/master/gym/envs/toy_text/hotter_colder.py
For a custom environment, a couple of things should be defined.
对于自定义环境,需要定义几项内容。
- Constructor__init__ method 构造函数 `__init__` 方法
- Action space 动作空间
- Observation space (see gym/gym/spaces at master · openai/gym · GitHub for all available gym spaces (it's a kind of data structure))
观测空间(请参阅 gym/gym/spaces 在 GitHub 上的 master 分支,了解所有可用的 gym 空间(这是一种数据结构))。
- _seed method (not sure that it's mandatory)
`_seed` 方法(不确定是否是必须的)
- _step method accepting action as a param and returning observation (state after action), reward (for transition to new observational state), done (boolean flag), and some optional additional info.
`_step` 方法接受一个动作作为参数,并返回观测(动作后的状态)、奖励(转移到新观测状态的奖励)、done(布尔标志)以及一些可选的附加信息。
- _reset method that implements logic of fresh start of episode.
_reset
方法实现了新一集开始的逻辑
Optionally, you can create a _render method with something like
可选地,你可以创建一个 _render
方法,类似于这样:
def _render(self, mode='human', **kwargs):outfile = StringIO() if mode == 'ansi' else sys.stdoutoutfile.write('State: ' + repr(self.state) + ' Action: ' + repr(self.action_taken) + '\n')return outfile
And also, for better code flexibility, you can define logic of your reward in _get_reward method and changes to observation space from taking action in _take_action method.
翻译:此外,为了更好的代码灵活性,你可以在 `_get_reward` 方法中定义奖励的逻辑,并在 `_take_action` 方法中定义执行动作后对观测空间的更改。