The aim of this exercise is to demonstrate reinforcement learning within a ‘world’ consisting of a grid-like area populated by a range of pre-programmed (‘dumb’) agents. The dumb agents will interact with the area, and one another, to some degree – and will also share the area with some ‘intelligent’ agents, whose behaviours will be dictated by their own learnings through reinforcement learning. Furthermore, a central, machine learning ‘queen’ agent will learn from the combined experiences of all the intelligent agents, and issue certain instructions to the individual agents.
The intelligent agents will form sensible behaviours, stratagems and patterns based on their own experience. They will then interact with the dumb agents in a meaningful way which is conducive towards the goals of the intelligent agents. The queen agent will provide some guidance as to how the intelligent agents can best act, including the co-ordination of any multi-agent efforts.
An example of this may be a police force (intelligent agents) managing a civil populace (dumb agents), with a police chief (queen) overseeing the police officers. The police may be assigned goals, such as ‘ensuring positive morale of the populace’, ‘reducing injury to officers’ and ‘protecting civic property’. Through experimenting with behaviours (‘doing nothing’, ‘murdering troublemakers’ etc) the police force and police chief may learn the best way to govern the city. The results of this experiment may then be translated into suggested ‘real-world’ approaches to deal with problems.
The example given is not a commitment to a certain scenario, and is used only for illustration.
To create a grid-based area, and a basic agent which can navigate the area and has certain states. In the example scenario above, these may include injury and the ability to fight, hunger and the ability to eat – and so forth.
Key Stage 2:To add a set of behaviours to the basic agent created above, and to create many variations upon the above agent to accurately represent a group of stereotypes within a society. Examples of this may include ‘alpha male’, ‘mother’, ‘child’, ‘shy female’ – and so on. At the completion of this stage the agents in the scenario should proceed autonomously with their business requiring no human input (although, to no goal or purpose)
Key Stage 3:To introduce an intelligent agent to the scenario, that is capable of interacting with the dumb agents created above, but has no pre-programmed behaviours of its own. This agent will have knowledge of state, experience and goal, and will decide its own actions based on these (and initially, some degree of randomised experimentation).
Key Stage 4:To ‘clone’ the above agent, and have many of the same agents working in tandem, each with their own set of experiences, and as such, each with their own set of behaviours.
Key Stage 5:To introduce a queen agent, who shall act as a central repository of experiences for all the intelligent agents, and as such is able to co-ordinate and manage the other agents – without revoking the base individuality of the intelligent agents resulting from their separate individual experiences.