MAT200A 02W
Courses:MAT200A 02W:Report: Gesture Recognition

schedule MAT200A 02W

Report by: Purva Gujar




Gesture Recognition

Gesture Recognition

....recognizing the semantics of motion



What is
Gesture Recognition?
Gesture can be defined as motion of the limbs or body made to express or help express thought or to emphasize speech.
Biologists define "gesture" broadly, stating, "the notion of gesture is to embrace all kinds of instances where an individual engages in movements whose communicative intent is paramount, manifest, and openly acknowledged" .

Gestures are an important aspect of human interaction, both interpersonally and in the context of man-machine interfaces. There are many facets to the modeling and recognition of human gesture: gestures can be expressed through hands, faces, or the entire body. Gesture recognition is an important skill for robots that work closely with humans. Gestures help to clarify spoken commands and are a compact means of communicating geometric information.

Machine gesture and sign language recognition is, as the name suggests, about recognition of gestures and/or sign language using computers. A primary goal of gesture recognition research is to create a system which can identify specific human gestures and use them to convey information or for device control.

Gesture recognition focusses on both static as well as dynamic gestures. Though static gesture recognition (or static pose recognition) is relatively easier, dynamic gesture recognition is more robust in certain cases. The popular approaches to gesture recognition are: 1. Model-based 2. Feature extraction (spacial and temporal features) 3. Computational model based.

The approach to gesture recognition can be broadly classified into two types. Vision-based approach and non-vision based approach. Hardware techniques used for gathering information about body positioning; are thus either image-based (using cameras, infrared lights, moving lights, etc) or device-based (using instrumented gloves, styli, position trackers, etc.), although hybrids are beginning to come about. Interface with computers using gestures of the human body, typically use hand movements. In gesture recognition technology using vision based approach, a camera reads the movements of the human body and communicates the data to a computer that uses the gestures as input to convey information or control devices. For example, a person clapping his hands together in front of a camera can produce the sound of cymbals being crashed together when the gesture is fed through a computer.

However, getting the data is only the first step. The second step, that of recognising the sign or gesture once it has been captured is much more challenging, especially in a continuous stream. Also, there are challenges of designing a robust system.







What can Gesture Recognition do? Gesture cognition has applications in all walks of life.

Recognizing gestures as input might make computers more accessible for the physically-impaired and make interaction more natural for young children. It could also provide a more expressive and nuanced communication with a computer. Gesture recognition is already being used for interaction with a 3-D immersion environment.

Proposed applications include word processing using input with hand sign language, games, and other entertainment and educational approaches in which hand motion could result in multimedia effects.

Finger pointing is one area of study as a way to select or move objects around. Face tracking, eye motion, and lip reading are also being considered as ways to provide interaction. There have also been multimedia experiments in which the entire human body and its range of motions was used to produce computer effects.

One way gesture recognition is being used is to help the physically impaired to interact with computers, such as interpreting sign language. The technology also has the potential to change the way users interact with computers by eliminating input devices such as joysticks, mice, and keyboards and allowing the unencumbered body to give signals to the computer through gestures such as finger pointing.

In addition to hand and body movement, gesture recognition technology also can be used to read facial and speech expressions (i.e., lip reading), and eye movements.





Gesture Recognition - Corporate Links

Corporate Links



Use your Head UseYourHead is Cybernet Systems Corporation's low-cost gesture recognition based software which allows a user to instruct the computer to take actions by just making gestures using the head. It uses a camera to as an input device to the computer vision system. Users can can assign specific actions to their head movements. They can duck their head left and right in games as they make the same motion in real life. A term paper can be autosaved when they look down at their notes. They may move their field of view in a flight simulator without ever looking away from the monitor. The uses for UseYourHead are limited only by our imagination.

CyberGlove This is an award-winning instrumented glove by Immersion 3D.

The CyberGlove® is a fully instrumented glove that provides up to 22 high-accuracy joint-angle measurements. It uses proprietary resistive bend-sensing technology to accurately transform hand and finger motions into real-time digital joint-angle data. The VirtualHand® Studio software converts the data into a graphical hand which mirrors the subtle movements of the physical hand. It is available in two models 18-sensor based and 22-sensor based and for either hand.

The CyberGlove has been used in a wide variety of real-world applications, including digital prototype evaluation, virtual reality biomechanics, and animation. The CyberGlove has become the de facto standard for high-performance hand measurement and real-time motion capture.

Some samples: CyberGlove         Fish Animation         Flight Stimulator

Talking Glove Virtual Technologies VTi has been developing advanced human-interface technologies for many years, and has created a wide range of quality products for vertical market applications. The talking glove allows a deaf person using sign-language to communicate with a person who does not understand sign language at all.

Interactive Dialog-Box Agent OSU Motion Recognition Laboratory believes that one of the next major steps in the advancement of computing devices is not only making them faster, but also making them more interactive and responsive to the user. This project involves design and implemention of a prototype perceptual user interface for a responsive dialog-box agent. The method incorporates real-time computer vision techniques to recognize user acknowledgements from natural head gestures (nod=yes, shake=no). IBM PupilCam technolgy together with anthropometric head and face measures are first used to detect the location of the user's face. Salient facial features are then tracked to compute the global 2-D motion direction of the head at each frame which is then interpreted and corresponding action is taken. Using a IBM PupilCam and Pentium III 1Gz computer, the vision system runs at 30Hz.

Real Time Animation Modern Cartoons, Inc., based in Venice Beach California, specializes in real time animation for broadcast television. They are pushing an emerging art form that they call “real-time performance animation” onto American airwaves and into the prime time spotlight. Its successes are driven by two goals: a quest to perfect character-based motion-capture techniques, and a dedication to building a creative animation studio. This revolution is taking the form of a unique marriage of technological innovation, entrepreneurial spirit and fine artistry.

SignStream SignStream™ is a database tool for analysis of linguistic data captured on video being developed at the Boston University. Although SignStream is being designed specifically for working with data from American Sign Language, the tool may be applied to any kind of language data captured on video. In particular, SignStream would be suited to the study of other signed languages as well as to studies that include the gestural component of spoken languages. One goal of the SignStream project is also to develop a large database of coded American Sign Language utterances.

Interactive Robot Programming This project which was carried out at the Carnegie Mellon University demonstrated gesture based control of a mobile robot using single-handed gestures. Compared to a mouse and keyboard, hand gestures have high redundancy to convey geometric and temporal data. They are rich in vocabulary while being intuitive to users. These features make hand gestures an attractive tool to interact with robots. The project's research team has developed a gesture spotting and recognition algorithm based on a Hidden Markov Model (HMM).

Motion Processor Toshiba Corporation has announced the development of a prototype motion processor that supports real-time recognition and display of kinetic, three-dimensional objects on a PC. The motion processor's ability to recognize and detect the movements of hand images point the way to a more natural, gesture-based interface between people and computers.

The new motion processor receives infrared light which is emitted from a light source and reflected by the hand. The dissipation of the light ensures that the intensity of reflected light from the background is too weak to detect. Moreover, by using the reflectance and directional information of the object surface, the motion processor is able to use variations in reflected light intensity to construct a 3D image of the hand. The commercialized image processor will provide support for computers that recognize sign language and advanced interactive applications in multimedia and edutainment. It will also facilitate access to computers for the physically challenged and aged, and provide children with a simpler-to-use interface than the keyboard.





Gesture Recognition - Art Links

Art Links



Surface Drawing By Steven Schkolne

Surface Drawing allows artists and designers to create 3D shapes comfortably and naturally with hand motions and physical tools.

As the hand is moved through space, the trail of its motion is recorded by the computer as a stroke. These strokes appear to float in the air using the head-tracked stereoscopic display environment of the responsive workbench. In analogy to traditional drawing, strokes are combined to make complex organic shapes. This medium facilitates the early phases of creation that are not supported in traditional computer tools. The resulting shapes have an organic, physical quality. This technology is useful for many situations when people create 3D shapes, from architecture and industrial design to fine art and digital movies.

Sensorband By Edwin van der Heide, Zbigniew Karkowski, and Atau Tanaka

Sensorband is a trio of musicians using interactive technology. Gestural interfaces - ultrasound, infrared, and bioelectric sensors - become musical instruments. Edwin plays the MIDIconductor, machines worn on his hands that send and receive ultrasound signals, measuring the hands' rotational positions of and relative distance. Zbigniew activates his instrument by the movement of his arms in the space around him. This cuts through invisible infrared beams mounted on a scaffolding structure. Atau plays the BioMuse, a system that tracks neural signals (EMG), translating electrical signals from the body into digital data. Together, Sensorband creates a live group dynamic, bringing a visceral physical element to interactive technologies.

Trans Plant By Christa Sommerer & Laurent Mignonneau

"Trans Plant" was a interactive computer installation developed for the Tokyo Metropolitan Museum of Photography.

In "Trans Plant" visitors enter a semi-circled room and become part of a virtual jungle which starts to surround them. As the visitor steps forward into the installation space, he sees himself inside a projection screen in front of him. By walking freely and without any devices , he soon discovers that grass is growing, wherever he walks, following each step and movement he does. When stopping and staying still, trees and bushes will grow on the place where he stands. The size, color and shape of these plants depend on the size of the person. Moving the body slightly backwards or forwards the color density changes as well. Each visitor creates different plants, bringing up his/her own personal forest, that is an expression of his personal attention and feeling for the virtual space.

Intro Act By Christa Sommerer & Laurent Mignonneau

"Intro Act" was an interactive computer installation developed for the 95 Biennale de Lyon and in is now in the permanent collection of the Musee d'Art Contemporain in Lyon France. It represents a universe of unexplored abstract organic forms that react and interact with human beings.

In "Intro Act" visitors enter the installation space and immediately will find themselves projected into a virtual space in front of them. As they move their body in the real space, different three dimensional evolution of abstract organic forms are synchronized and linked to the visitors movement and gestures. Like exploring a different universe, the visitor will try to orient himself, finding out which movement will cause which event. Lifting for example his arm, will suddenly lead to extensive development of wild growth explosions out of his hand. Other behavior and movement will lead to destruction of organic forms, whereas other gestures will lead to construction, expansion and differentiation of the virtual species. The visitor continuously sees himself inside this three dimensional world, he defines it, creates it, destroys it and explores it.

Interactive Theater By Naoka Tosa

Interactive movies, in which interaction capabilities are introduced into movies, is considered to be a new type of media that integrates various media, including telecommunications, movies, and video games. In interactive movies, people enter cyberspace and enjoy the development of a story there by interacting with the characters in the story.

Contact Water By Taisuke Murakami

New types of interactive art and entertainment can now be created by mixed reality (MR) technology, which merges the real and virtual worlds in real time. Real world scenes are augmented with virtual world images synthesized by computer graphics. Wearing a see-through head-mounted display (HMD), players can experience the virtual world while maintaining the sense of reality that comes from being able to see objects existing in the real world.

When the players step into the play field, a computer generated water surface appears on each player's palm. Each player gets aquatic animals from the fountain in the center of field, such as dolphins or belugas. The aquatic animals display various gestures following their respective player's orders, which are given via hand movements. For example, leaning the hand to one side makes the water flow, and the aquatic animals start to swim upward. Moving the hand up and down makes them breach the surface. Waving to both sides makes them dance with their upper bodies above the surface. Their whistles and squeaks, specific to the respective aquatic animals, produce a more vivid ambience. Thus hand movements allow players to communicate interactively with the aquatic animals.

Knit One, Swim Two By Ingrid Bachmann

This installation, Knit One, Swim Two, was done at the Hearst Art Gallery. In this installation, two 14-foot steel knitting needles can be manipulated by the viewer. On the other side of a gallery wall, viewers see clear cylinders of water moving up and down; these are weights activated by the motion of the knitting needles. The weights' movements are tracked by a sensor system, fed into a computer and transformed into an image on a monitor in front of the knitting needles.

Interactive Arenas Vivid Group creates and develops interactive arenas used primarily in the Museum, Science Centre and Hall of Fame Industry. The Mandala® Gesture Xtreme (GX) system has allowed for suspension of disbelief by placing participants into computer generated landscapes, permitting interaction with icons and symbols within that world.

Some products by Vivid Group are as below.

Japan: Recycling Pavilion This interactive exhibit has been installed at a recycling plant outside of Tokyo. During tours, groups of school children have the chance to interact and learn about the importance of recycling, and how to go about separating different types of recyclable material.

The "game" educates participants on 'recyclable materials' versus 'non-recyclable materials', then as the participant grows more confident they may choose a more difficult level, attempting to separate a greater variety of recyclable materials. The students must attempt to deposit the materials in their respective recycle bins, racing against the time clock to accumulate as many recycle points as possible.

Tokyo Water Cycle Created for an installation at the Mount Fuji Museum, visitors learn about the various stages and states of water as it journeys through the environment.

This journey begins in a lake where heat evaporates a tiny water particle into the air. You ride the water particle into the clouds, where you attempt to combine with other water particles to form a water droplet big enough to fall from the sky. When you fall, you land on the ground, seeping into an underground stream and attempt to navigate your way through the stream past water taps, clogged sections, and other blockages. If you make it through the underground "maze", the stream rejoins the lake completing the water cycle.