Creatures Learn Action Selection

Creatures Learn Action Selection in a Multi-Motivational System

Course: CS 263C - Animats Based Modeling (topic in artificial intelligence)
Quarter: Fall 2006
Professor: Michael G. Dyer
Final Report

Course Description

Animats are mobile/sensing animal-like software agents embedded in simulated dynamic environments. Emphasis on modeling: goal-oriented behavior via neurocontrollers, adaptation via reinforcement learning, evolutionary programming. Animat-based tasks include foraging, mate finding, predation, navigation, predator avoidance, cooperative nest construction, communication, and parenting.

Project Overview

In this project I attempt to explore learning action selection in a multi-motivational system by applying Konidaris and Barto's multi-motivational design framework to a scenario of warfare, survival, and mating. In particular, I constructed a scenario with two combating species: a type of spider with superior fighting skills and a type of ant with superior speed. I intended to show that using the framework, specifically the adjustment of motivation priorities, would allow the two species to coexist.

I engineered both species to have the same independent motivations: increasing their population and destroying the other population. I did not want to bias their priorities initially, instead relying on learned adjustments of drive priorities to reach equilibrium. I predicted that with a proper system to update priorities, the spiders would prioritize destroying ants, while the ants would prioritize increasing population.

Creature learning in this scenario was challenging to accomplish since the death of creatures makes it necessary to use vicarious learning through observation. Furthermore, action selection in a war environment can be crucial to a species’ survival, especially when its motivations are conflicting.

The project was implemented in C++ and OpenGL.

Movie Quick Links (more details below)

A sample run, displaying statistics of satiation/priority levels and population sizes

References

1. G. Konidaris, A. Barto, "An Adaptive Robot Motivational System.", From Animals to Animats 9: The Ninth International Conference on the SIMULATION OF ADAPTIVE BEHAVIOR (SAB'06), 2006.

Summary of Motivational Framework

Konidaris and Barto [1] introduce a motivational system design framework for handling a situation when an agent has multiple, possibly conflicting motivations. The framework consists of many drives, each representing an individual goal of the agent. Each drive contains a satiation level and a priority parameter (which maps to a specific priority curve). The reward of a drive can be calculated by these factors, which change over time given the agent's current situation (i.e. modified through a learning process). The agent’s goal at any given time is to maximize the total reward through its actions.

I was particularly interested in the framework’s robustness, since the drive levels are numerically comparable and the drive priorities are highly precise. Konidaris and Barto’s experiment, giving a single agent drives for two types of resources, inspired me to create a more complex environment with two types of species with identical drives. I wanted to give the species drives that were not traditional “consumption” goals, instead giving them goals that made modeling the satiation effects more challenging.

Environment and Creatures

The world is a 2-D Cartesian grid with boundaries.

There are two types of species in the world:

- Spider: 3 times the strength of an ant, giving them an advantage in fighting.
- Ant: 2 times the speed of a spider, giving them an advantage in mating.

It should be noted that these species were picked arbitrarily and were not intended to represent real spiders or ants. These creatures have no physical size in the world (each creature contains a single location).

Each creature has a perception range of approximately 10% of the board. In this range they can perceive the following items:

- Size of own population
- Size of other population
- Witness birth of same/other species
- Witness death of same/other species

(click image to enlarge)

Actions

The only two actions of both species are “fighting” and “mating”. For both actions, they begin by choosing a random location on the grid as a target and then walk in a straight line towards the target. In the case of fighting, if a member of the opposite species is encountered (within a distance threshold) who is also looking to fight, they will engage in fighting. Similarly, if two of the same species are looking to mate and cross paths, they will mate (no sex assignment). If an opponent or mate is not encountered before reaching the target, a new random target is selected on the grid.

Only two participants are allowed to fight or mate at one time. If an ant engages a fight with a spider at 1/3 health, the spider will die and the ant will be unharmed. Otherwise, the ant will die and the spider will have its health decrease by 1/3. Mating produces another creature instantly.

A creature will always choose the action (mating or fighting) that will generate the max reward at the given time. This reward is calculated using the framework’s reward function and depends on the drive levels and observed population sizes.

Updating Satiation

I update the satiation levels of the creatures based on the following guidelines:

Whenever a creature mates, its increase population satiation rises by:
(a / same population size)
Whenever a creature witnesses a death of its own species, its increase population satiation drops by:
(b / same population size)
Whenever a creature kills another creature, its destroy satiation rises by:
(a / other population size)
Whenever a creature witnesses a foreign birth, its destroy satiation drops by:
(b / other population size)

where a and b are constants and a >> b. Satiation adjustments are based on population sizes to promote mating and fighting when it would make the largest effect on the population sizes.

For instance, a creature that mates when its population size is 3 should be much happier than when its population size is 100.

Learning Priorities

All creatures periodically learn through slight priority parameter adjustments over time. The following formulas were used for each drive:

Increase Population Drive
g (Deaths^2 – Births^2)
Destroy Drive
g (ForeignBirths^2 – Kills^2)

where g is a constant. The quantities above are all observed in the creature’s perception range and are squared to produce larger adjustments for learning periods where more data was sampled.

This project uses a Lamarckian evolution system where each child inherits the drive priority parameters from the oldest parent (who would have the priority curve which best reflects the environment).

Experiments

Below are three experiments I performed. Note the graphs are based on the average satiations and priority parameters over the entire population.

1. 50/50% action selection

This was my base experiment where the creatures did not use the drives and selected mating or fighting 50% of the time. This demonstrated the advantage the spiders had over the ants initially. The chart below shows the population

sizes of ants and spiders over time. The spiders quickly dominate over the ants.

(click image to enlarge)

2. Fixed Drive Priorities

In this experiment, the drives were used (including priorities), but the priorities were kept fixed (i.e. creatures do not learn). In this case, the drive with the lowest satiation level will attempt to be satiated, due to the use of the priority curves. The left figure shows the population sizes over time, showing that the ants have survived better than before, but are still destroyed by the spiders over a length of time. The right figure shows the satiation levels of the two species. The ants’ satiation level for mating is much higher than its satiation level for destroying, since the ants have a smaller population size and are generally “happier” with each mate. However, in response to the lower destroy satiation level, it will choose to fight more than mate. As it turns out, this is not the best strategy for the ants’ survival.

(click image to enlarge)

3. Adjusting Drive Priorities

The final experiment uses periodically adjusting drive priorities (i.e. creatures learn). The left figure shows that after a slight oscillation in population sizes, the ants have learned to survive with the spiders. The equilibrium of population sizes of both species is roughly proportional to the physical dominance the spiders have over the ants. The middle figure shows similar satiation level patterns as in the previous experiment. The right figure shows the priority parameters adjusting over time. Note that the ants have now prioritized mating much higher than destroying, since it was recognized as a more difficult drive to satiate.

(click image to enlarge)

The movie below displays a sample run of experiment #3. The top area shows statistics on satiation and priority levels (which are values between 0 and 1). The satiation and priority levels are shown separately for both spiders and ants. The individual red and white bars correspond to each individual creatures destroy and mate levels, respectively. The colony average is displayed in decimal form to the right of each row, along with a dashed line representation. Total population sizes are displayed in the bottom left.

(click to watch movie)

Conclusion

I wanted to show two things in the experiment - that the spiders would prioritize destroying ants, and that the ants would prioritize increasing in population. The experiment only showed the latter to be true. In fact, the spiders only decreased their priority parameters for the destroy drive, which was unexpected.

Nonetheless, this project shows an application of a multi-motivational system where agents, given a limited amount of perception and intelligence, can work as a "team" in the presence of multiple choices.

Another surprising aspect of this project was that it was so difficult to keep the populations from growing exponentially. At first I reasoned that it is generally easier for a mate to occur than a fight, but after several experiments I realized that the most important factor in the growth patterns was the adjustments to the priorities. When adjusting the priority parameters, I squared the values of births and deaths for two reasons. One, it would place emphasis on multiple samples extracted as a more reliable representation of the environment. Secondly, it helped to control the population because as the population grew, the priority parameter adjustments became larger and helped to prevent the system from a “break down”. I concluded that the larger the population sizes, the larger the priority adjustments needed to be for a stable system.