Visual Attention Controller for an Autonomous Humanoid
Course: CS 269 - Current Topics in
Artificial Intelligence: Humanoid Character Simulation |
Course Description
Coverage of a variety of state-of-the-art techniques in humanoid computer animation. Covers the latest literature in human animation and visual modeling. Topics include muscle modeling and simulation, skinning, motor control, motion capture techniques, physics-based techniques and control, facial animation, and hardware accelerated rendering techniques.
Project Overview
This project introduces a new DANCE plug-in simulator to automatically give an articulated object (skeleton) the appearance of having visual attention in its environment. Features of the visual attention controller include object tracking and awareness, simulated field of vision, short term memory, and idle looking (daydreaming). The controller can process moving and static objects, as well as objects of different color and sizes.
This project presents a simple solution to the problem of character visual attention, given an unknown environment. This is a challenging problem since humanoid characters usually appear robotic, unaware of their environment like an empty shell. Visual attention is important to give the user a sense of realism that the character has perception in the environment and to inform the user of the thought process of the character. This problem is particularly challenging because visual attention is an illusive quality and difficult to define. Furthermore, a fully realistic simulation would require considerations from many fields of study, such as computer vision and cognitive science.
This controller narrows the scope of the problem by focusing on a limited amount of perceptible properties of the environment, namely object velocity, color, and size. It is intended to give the user a basic sense of the character’s awareness in the environment through its gaze actions. It works by first registering the visual attention controller with a skeleton articulated object, and then registering the various objects in the scene that the skeleton can perceive.
The project was implemented in C++ and OpenGL, using the DANCE software package.
Movie Quick Links (more details below)
References
1.
C. Nothegger, S. Winter, M. Raubal, “Computation of the Salience of Features”,
2003.
2.
Y.
Kim, M. van Velsen, and R. Hill Jr, “Modeling Dynamic Perceptual Attention in
Complex Virtual Environments”, 2005.
3. C. Peters and
C. O’Sullivan, “Bottom-Up Visual Attention for Virtual Human Animation”, 2003.
Perception |
The character was given a field of view of 190 degrees horizontal and 90 degrees vertical, with a maximum distance threshold, as recommended from [2]. Within this field of view the character can detect the presence of an object, and its properties such as size, color, and velocity. It can detect when a static object begins to move, and when a moving object becomes static. It can also detect when a static object has changed positions to another static state during a period when the object was out of the character’s field of view. The following is a movie demonstrating the field of view of the character. The blue lines indicate the character's peripheral vision, although the actual viewing volume is much larger than shown. The character remains in its idle looking state until the ball bounces into the field of vision, attracting the character’s attention. |
Appearance of Visual Awareness |
Visual attention of the character is simulated through neck rotations, indicating where the character is looking. The rotations are based on aligning the line of sight to a gaze target (position) in the 3D world. The neck is a ball joint, supporting rotations along the X, Y, and Z axes. The plug-in allows the gaze target to be manually edited by GUI controls, or to be clicked and dragged in the window display. The plug-in displays the target and a line indicating the actual gaze direction, which can deviate from the target line if the neck rotations are restricted by the joint limits. The plug-in supports two action modes: manual transitioning and object tracking. The former supports manual target edits during simulation to force the character to look briefly at a new target, before returning to its original action. The latter is used when the character must follow an object with its gaze, which may be quicker than its neck rotations, requiring smooth transitions. |
Idle Looking |
When no objects are present in the field of view of the character, or all objects in the field of view have 100% certainty, the character will resort to idle looking or “daydreaming”. This algorithm combines large and small jumps of the gaze target in randomized intervals of time and distance. Smooth transitions using acceleration and deceleration are used for a reasonably realistic response. The algorithm favors larger horizontal head movements over vertical head movements and favors small changes in depth movement. In detail, the
algorithm first computes a new destination target based on a randomized
horizontal and vertical rotation of the current sight vector, projected at a
randomly incremented or decremented distance from the current target’s
depth. It begins interpolating the motion between the current and
destination target by accelerating to half the distance and decelerating the
rest of the distance. The destination target is moved in an arbitrary
direction during this process in small increments, producing a non-linear
path of rotation. This non-linear path is more visually appealing since
linear motion appears robotic. The
following movie demonstrates this "day dreaming" state. |
Object Salience |
When multiple objects are present in the field of vision that have yet to be explored, the algorithm computes a salience score associated with each object and selects the object with the highest salience score to be viewed first. This salience score is also used to determine how long an object is viewed. The salience score is the average of the volume salience score and color salience score. The following equation is used to
calculate the volume salience score: This results in a larger salience for objects that are closer to the character’s head. The self volume factor normalizes this measurement for comparisons with the color salience score. The color salience score is computed by first converting the RGB value of the color to HSB (Hue, Saturation, and Brightness) and taking the average of the saturation and brightness. This was loosely based on the research from [1]. The left movie demonstrates volume salience. The red sphere is significantly smaller in volume than the orange cube, but since it is closer in distance to the head of the character, it generates a larger salience score and is viewed first. The right movie demonstrates color salience. A single red and a single yellow sphere are placed in a chain of dull colored spheres. The character first looks at the red and yellow sphere for a length of time, and then quickly glances over the remaining dull items due to their low salience score. |
Short Term Memory |
The short term memory stores information about all registered objects in the scene. This memory is necessary to prevent the character from getting stuck on certain objects or in cycles [3]. Each object is associated with the
following information: The certainty value represents how well the character understands the object. All objects begin with a certainty of 0%. When an object certainty value reaches 100%, the character will move on to explore other objects. The rates at which the certainty of an object is updated depends on the salience score of the object and whether it is moving or static. The certainty will decrease when an object state change is detected in the field of view of the character. Such object state changes include a static object moving, a moving object stopping, or a static object changing to another static position. In the movie below, the character tracks the moving ball for a while until it is comfortable with it (object certainty reaches 100%) and resorts to idle looking. When the ball stops bouncing, the character detects this change of state and quickly glances at the object again. The ball finally begins to move again and the character detects this change of state as well. |
Algorithm Detail |
The following is a detailed outline of the algorithm loop: |
|
Below is a movie that demonstrates key features of this algorithm. The scenario begins with a small orange ball bouncing into the field of view of the character, attracting its attention. Soon after, an enormous purple ball falls from the sky and steals the attention away from the smaller ball. Once the purple ball stops bouncing, the character looks back at the orange ball which continues to move. Lastly, once the character feels comfortable with the orange ball, it looks back at the purple ball for a while due to its high salience. |