Visual Attention Controller for an Autonomous Humanoid

Course: CS 269 - Current Topics in Artificial Intelligence: Humanoid Character Simulation
Quarter: Winter 2007
Professor: Petros Faloutsos
Final Report

Course Description

Coverage of a variety of state-of-the-art techniques in humanoid computer animation. Covers the latest literature in human animation and visual modeling. Topics include muscle modeling and simulation, skinning, motor control, motion capture techniques, physics-based techniques and control, facial animation, and hardware accelerated rendering techniques.

Project Overview

This project introduces a new DANCE plug-in simulator to automatically give an articulated object (skeleton) the appearance of having visual attention in its environment.  Features of the visual attention controller include object tracking and awareness, simulated field of vision, short term memory, and idle looking (daydreaming).  The controller can process moving and static objects, as well as objects of different color and sizes.

This project presents a simple solution to the problem of character visual attention, given an unknown environment.  This is a challenging problem since humanoid characters usually appear robotic, unaware of their environment like an empty shell.  Visual attention is important to give the user a sense of realism that the character has perception in the environment and to inform the user of the thought process of the character.  This problem is particularly challenging because visual attention is an illusive quality and difficult to define.  Furthermore, a fully realistic simulation would require considerations from many fields of study, such as computer vision and cognitive science. 

This controller narrows the scope of the problem by focusing on a limited amount of perceptible properties of the environment, namely object velocity, color, and size.  It is intended to give the user a basic sense of the character’s awareness in the environment through its gaze actions.  It works by first registering the visual attention controller with a skeleton articulated object, and then registering the various objects in the scene that the skeleton can perceive.

The project was implemented in C++ and OpenGL, using the DANCE software package.

Movie Quick Links (more details below)

When no objects are present, agent reverts to idle looking mode
Tracking a bouncing ball that eventually stops, then moves again
Ball unnoticed until it reaches field of view
Closer objects appear larger (more salient) despite its smaller size
Larger moving object steals attention away
More attractive colors are viewed before dull colors


1.  C. Nothegger, S. Winter, M. Raubal, “Computation of the Salience of Features”, 2003.
2.  Y. Kim, M. van Velsen, and R. Hill Jr, “Modeling Dynamic Perceptual Attention in Complex Virtual Environments”, 2005.
3.  C. Peters and C. O’Sullivan, “Bottom-Up Visual Attention for Virtual Human Animation”, 2003.


The character was given a field of view of 190 degrees horizontal and 90 degrees vertical, with a maximum distance threshold, as recommended from [2].  Within this field of view the character can detect the presence of an object, and its properties such as size, color, and velocity.  It can detect when a static object begins to move, and when a moving object becomes static.  It can also detect when a static object has changed positions to another static state during a period when the object was out of the character’s field of view.

The following is a movie demonstrating the field of view of the character.  The blue lines indicate the character's peripheral vision, although the actual viewing volume is much larger than shown.  The character remains in its idle looking state until the ball bounces into the field of vision, attracting the character’s attention.

(click to watch movie)

Appearance of Visual Awareness

Visual attention of the character is simulated through neck rotations, indicating where the character is looking.  The rotations are based on aligning the line of sight to a gaze target (position) in the 3D world.  The neck is a ball joint, supporting rotations along the X, Y, and Z axes.

The plug-in allows the gaze target to be manually edited by GUI controls, or to be clicked and dragged in the window display.  The plug-in displays the target and a line indicating the actual gaze direction, which can deviate from the target line if the neck rotations are restricted by the joint limits. 

The plug-in supports two action modes: manual transitioning and object tracking.  The former supports manual target edits during simulation to force the character to look briefly at a new target, before returning to its original action.  The latter is used when the character must follow an object with its gaze, which may be quicker than its neck rotations, requiring smooth transitions.

Idle Looking

When no objects are present in the field of view of the character, or all objects in the field of view have 100% certainty, the character will resort to idle looking or “daydreaming”.

This algorithm combines large and small jumps of the gaze target in randomized intervals of time and distance.  Smooth transitions using acceleration and deceleration are used for a reasonably realistic response.  The algorithm favors larger horizontal head movements over vertical head movements and favors small changes in depth movement.

In detail, the algorithm first computes a new destination target based on a randomized horizontal and vertical rotation of the current sight vector, projected at a randomly incremented or decremented distance from the current target’s depth.  It begins interpolating the motion between the current and destination target by accelerating to half the distance and decelerating the rest of the distance.  The destination target is moved in an arbitrary direction during this process in small increments, producing a non-linear path of rotation.  This non-linear path is more visually appealing since linear motion appears robotic.

The following movie demonstrates this "day dreaming" state.

(click to watch movie)

Object Salience

When multiple objects are present in the field of vision that have yet to be explored, the algorithm computes a salience score associated with each object and selects the object with the highest salience score to be viewed first.  This salience score is also used to determine how long an object is viewed.

The salience score is the average of the volume salience score and color salience score.

The following equation is used to calculate the volume salience score:
Volume Salience = Object Bounding Box Volume / (Distance From Head * Self Volume) 

This results in a larger salience for objects that are closer to the character’s head.  The self volume factor normalizes this measurement for comparisons with the color salience score.

The color salience score is computed by first converting the RGB value of the color to HSB (Hue, Saturation, and Brightness) and taking the average of the saturation and brightness. 

This was loosely based on the research from [1].

The left movie demonstrates volume salience.  The red sphere is significantly smaller in volume than the orange cube, but since it is closer in distance to the head of the character, it generates a larger salience score and is viewed first.

The right movie demonstrates color salience.  A single red and a single yellow sphere are placed in a chain of dull colored spheres.  The character first looks at the red and yellow sphere for a length of time, and then quickly glances over the remaining dull items due to their low salience score.

(click to watch movie)

(click to watch movie)

Short Term Memory

The short term memory stores information about all registered objects in the scene.  This memory is necessary to prevent the character from getting stuck on certain objects or in cycles [3].  

Each object is associated with the following information:

- Certainty value – 0% = completely unknown, 100% = full explored.

- Last known state – static or moving.
- Last known position in 3D world.

The certainty value represents how well the character understands the object.  All objects begin with a certainty of 0%.  When an object certainty value reaches 100%, the character will move on to explore other objects.  The rates at which the certainty of an object is updated depends on the salience score of the object and whether it is moving or static.  The certainty will decrease when an object state change is detected in the field of view of the character.  Such object state changes include a static object moving, a moving object stopping, or a static object changing to another static position.

In the movie below, the character tracks the moving ball for a while until it is comfortable with it (object certainty reaches 100%) and resorts to idle looking.  When the ball stops bouncing, the character detects this change of state and quickly glances at the object again.  The ball finally begins to move again and the character detects this change of state as well.

(click to watch movie)

Algorithm Detail

The following is a detailed outline of the algorithm loop:

  • Determine which objects are in the FOV

  • Sort the objects detected in the FOV into a list of moving objects and a list of static objects.

  • Decide which object to look at

  • In the presence of moving objects with certainty less than 100%, select a moving object with the highest salience score and look at it.

  • Otherwise, in the presence of static objects with certainty less than 100%, select a static object with the highest salience score and look at it.

  • Otherwise, resort to idle looking.

  • Update memory

  • If the current line of sight is close to the target object, increase its certainty. 

  • The certainty increase rate of the object is first determined by taking a base rate, depending if the object is moving or not.  It is then divided by the salience score and restricted within a threshold of limits.  Using this new rate results in more salient objects being viewed for longer periods of time, and less salient objects being quickly glanced over.

  • If the certainty of an object reaches 100%, store its state as static/moving and its current world position.

  • If a moving object in the FOV has 100% certainty and was last remembered as static, decrease its certainty significantly.

  • If a static object in the FOV has 100% certainty and was last remembered as moving, decrease its certainty moderately.

  • If a static object in the FOV has 100% certainty and was last remembered at a different world position, decrease its certainty moderately.

  • Rotate neck to target

  • Produce smooth rotation transitions from the current target to the destination target.

Below is a movie that demonstrates key features of this algorithm.  The scenario begins with a small orange ball bouncing into the field of view of the character, attracting its attention.  Soon after, an enormous purple ball falls from the sky and steals the attention away from the smaller ball.  Once the purple ball stops bouncing, the character looks back at the orange ball which continues to move.  Lastly, once the character feels comfortable with the orange ball, it looks back at the purple ball for a while due to its high salience.

(click to watch movie)

Make a Free Website with Yola.