Motion capture

From Wikipedia, the free encyclopedia

Motion capture, motion tracking, or mocap are terms used to describe the process of recording movement and translating that movement onto a digital model. Initially invented in Scotland, it is used in military, entertainment, sports, and medical applications. In filmmaking it refers to recording actions of human actors, and using that information to animate digital character models in 3D animation. When it includes face, fingers and captures subtle expressions, it is often referred to as performance capture.

[edit] The procedure

In motion capture sessions, movements of one or more actors are sampled many times per second, although with most techniques (recent developments from ILM use images for 2D motion capture and project into 3D) motion capture records only the movements of the actor, not his/her visual appearance. This animation data is mapped to a 3D model so that the model performs the same actions as the actor. This is comparable to the older technique of rotoscope where the visual appearance of the motion of an actor was filmed, then the film used as a guide for the frame by frame motion of a hand-drawn animated character.

Camera movements can also be motion captured so that a virtual camera in the scene will pan, tilt, or dolly around the stage driven by a camera operator, while the actor is performing and the motion capture system can capture the camera and props as well as the actor's performance. This allows the computer generated characters, images and sets, to have the same perspective as the video images from the camera. A computer processes the data and displays the movements of the actor, providing the desired camera positions in terms of objects in the set. Retroactively obtaining camera movement data from the captured footage is known as match moving.

[edit] Advantages

Motion capture offers several advantages over traditional computer animation of a 3D model:

More rapid, even real time results can be obtained. In entertainment applications this can reduce the costs of keyframe-based animation. For example: Hand Over

The amount of work does not vary with the complexity or length of the performance to the same degree when using traditional techniques. This allows many tests to be done with different styles or deliveries.

Complex movement and realistic physical interactions such as secondary motions, weight and exchange of forces can be easily recreated in a physically accurate manner.

[edit] Disadvantages

Specific hardware and special programs are required to obtain and process the data.

The cost of the software and equipment, personnel required can be prohibitive for small productions.

The capture system may have specific requirements for the space it is operated in depending on camera field of view.

When problems occur it is easier to reshoot the scene rather than trying to manipulate the data. Only a few systems allow real time viewing of the data to decide if the take needs to be redone.

Capturing motion for quadruped characters can be difficult.

The results are limited to what can be performed within the capture volume without extra editing of the data.

Movement that does not follow the laws of physics generally cannot be captured.

Traditional animation techniques such as added emphasis on anticipation and follow through, secondary motion or manipulating the shape of the character as with squash and stretch animation techniques must be added later.

If the computer model has different proportions from the capture subject artifacts may occur. For example, if a cartoon character has large, over-sized hands, these may intersect strangely with any other body part when the human actor brings them too close to his body.

The real life performance may not translate on to the computer model as expected.

[edit] Applications

Video games often use motion capture to animate athletes, martial artists, and other in-game characters.

Movies use motion capture for CG effects, in some cases replacing traditional cel animation, and for completely computer-generated creatures, such as Jar Jar Binks, Gollum, The Mummy, and King Kong.

Sinbad: Beyond the Veil of Mists was the first movie made primarily with motion capture, although many character animators also worked on the film.

In producing entire feature films with computer animation, the industry is currently split between studios that use motion capture, and studios that do not. Out of the three nominees for the 2006 Academy Award for Best Animated Feature, two of the nominees (Monster House and the winner Happy Feet) used motion capture, and only Pixar's Cars was animated without motion capture. In the ending credits of Pixar's film Ratatouille, a stamp appears labelling the film as "100% Pure Animation -- No Motion Capture!"

Motion capture has begun to be used extensively to produce films which attempt to simulate or approximate the look of live-action cinema, with nearly photorealistic digital character models. The Polar Express used motion capture to allow Tom Hanks to perform as several distinct digital characters (in which he also provided the voices). The 2007 adaptation of the saga Beowulf animated digital characters whose appearances were based in part on the actors who provided their motions and voices. The Walt Disney Company has announced that it will distribute Robert Zemeckis's A Christmas Carol and Tim Burton's Alice in Wonderland using this technique.

Virtual Reality and Augmented Reality allow users to interact with digital content in real-time. This can be useful for training simulations, visual perception tests, or performing a virtual walk-throughs in a 3D environment. Motion capture technology is frequently used in digital puppetry systems to drive computer generated characters in real-time.

Gait analysis is the major application of motion capture in clinical medicine. Markerless motion capture allows clinicians to evaluate human motion, without burdening patients with body suits or tracking devices. This allows patients to move freely within a defined area, using cameras that map the silhouette of the person and fit 3 to 24 perspectives to a model of the person, to track range of motion, gait, and several other biometric factors, and streams this information live into analytical software. Because this system removes the markers, patients, physicians and analysts are able to collect quantifiable data in real-time with less patient inconvenience, although they tend to have centimeter resolution verses the sub millimeter resolution of most marker based systems.

[edit] Methods and systems

Motion tracking or motion capture started as a photogrammetric analysis tool in biomechanics research in the 1970s and 1980s, and expanded into education, training, sports and recently computer animation for cinema and video games as the technology matured. A performer wears markers near each joint to identify the motion by the positions or angles between the markers. Acoustic, inertial, LED, magnetic or reflective markers, or combinations of any of these, are tracked, optimally at least two times the rate of the desired motion, to submillimeter positions.

[edit] Optical systems

Optical systems utilize data captured from image sensors to triangulate the 3D position of a subject between one or more cameras calibrated to provide overlapping projections. Data acquisition is traditionally implemented using special markers attached to an actor; however, more recent systems are able to generate accurate data by tracking surface features identified dynamically for each particular subject. Tracking a large number of performers or expanding the capture area is accomplished by the addition of more cameras. These systems produce data with 3 degrees of freedom for each marker, and rotational information must be inferred from the relative orientation of three or more markers; for instance shoulder, elbow and wrist markers providing the angle of the elbow.

[edit] Optical: Passive Markers

A dancer wearing a suit used in an optical motion capture system

Several markers are placed at specific points on an actor's face during facial optical motion capture

Passive optical system use markers coated with a retroreflective material to reflect light back that is generated near the cameras lens. The camera's threshold can be adjusted so only the bright reflective markers will be sampled, ignoring skin and fabric.

The centroid of the marker is estimated as a position within the 2 dimensional image that is captured. The grayscale value of each pixel can be used to provide sub-pixel accuracy by finding the centroid of the Gaussian.

An object with markers attached at known positions is used to calibrate the cameras and obtain their positions and the lens distortion of each camera is measured. Providing two calibrated cameras see a marker, a 3 dimensional fix can be obtained. Typically a system will consist of around 6 to 24 cameras. Systems of over three hundred cameras exist to try to reduce marker swap. Extra cameras are required for full coverage around the capture subject and multiple subjects.

Vendors have constraint software to reduce problems from marker swapping since all markers appear identical. Unlike active marker systems and magnetic systems, passive systems do not require the user to wear wires or electronic equipment rather hundreds of rubber balls with reflective tape, which needs to be replaced periodically. The markers are usually attached directly to the skin (as in biomechanics), or they are velcroed to a performer wearing a full body spandex/lycra suit designed specifically for motion capture. This type of system can capture large numbers of markers at frame rates as high as 2000fps. The frame rate for a given system is often traded off between resolution and speed so a 4 megapixel system runs at 370 hertz normally but can reduce the resolution to .3 megapixels and then run at 2000 hertz. Typical systems are $100,000 for 4 megapixel 360 hertz systems, and $50,000 for .3 megapixel 120 hertz systems.

[edit] Optical: Active marker

Active optical systems triangulate positions by illuminating one LED at a time very quickly or multiple LEDs with software to identify them by their relative positions, somewhat akin to celestial navigation. Rather than reflecting light back that is generated externally, the markers themselves are powered to emit their own light. Since Inverse Square law provides 1/4 the power at 2 times the distance, this can increase the distances and volume for capture. ILM used active Markers in Van Helsing to allow capture of the Harpies on very large sets. The power to each marker can be provided sequentially in phase with the capture system providing a unique identification of each marker for a given capture frame at a cost to the resultant frame rate. The ability to identify each marker in this manner is useful in realtime applications. The alternative method of identifying markers is to do it algorithmically requiring extra processing of the data.

[edit] Optical: Time modulated active marker

A high-resolution active marker system with 3,600 × 3,600 resolution at 480 hertz providing real time submillimeter positions.

Active marker systems can further be refined by strobing one marker on at a time, or tracking multiple markers over time and modulating the amplitude or pulse width to provide marker ID. 12 megapixel spatial resolution modulated systems show more subtle movements than 4 megapixel optical systems by having both higher spatial and temporal resolution. Directors can see the actors performance in real time, and watch the results on the mocap driven CG character. The unique marker IDs reduce the turnaround, by eliminating marker swapping and providing much cleaner data than other technologies. LEDs with onboard processing and a radio synchronization allow motion capture outdoors in direct sunlight, while capturing at 480 frames per second due to a high speed electronic shutter. Computer processing of modulated IDs allows less hand cleanup or filtered results for lower operational costs. This higher accuracy and resolution requires more processing than passive technologies, but the additional processing is done at the camera to improve resolution via a subpixel or centroid processing, providing both high resolution and high speed. These motion capture systems are typically under $50,000 for an eight camera, 12 megapixel spatial resolution 480 hertz system with one actor.

IR sensors can compute their location when lit by mobile multi-LED emitters, e.g. in a moving car. With Id per marker, these sensor tags can be worn under clothing and tracked at 500 Hz in broad daylight.

[edit] Optical: Semi-passive Imperceptible Marker

One can reverse the traditional approach based on high speed cameras. Systems such as Prakash use inexpensive multi-LED high speed projectors. The specially built multi-LED IR projectors optically encode the space. Instead of retro-reflective or active light emitting diode (LED) markers, the system uses photosensitive marker tags to decode the optical signals. By attaching tags with photo sensors to scene points, the tags can compute not only their own locations of each point, but also their own orientation, incident illumination, and reflectance.

These tracking tags that work in natural lighting conditions and can be imperceptibly embedded in attire or other objects. The system supports an unlimited number of tags in a scene, with each tag uniquely identified to eliminate marker reacquisition issues. Since the system eliminates a high speed camera and the corresponding high-speed image stream, it requires significantly lower data bandwidth. The tags also provide incident illumination data which can be used to match scene lighting when inserting synthetic elements. The technique appears ideal for on-set motion capture or real-time broadcasting of virtual sets but has yet to be proven.

[edit] Optical: Markerless

Emerging techniques and research in computer vision are leading to the rapid development of the markerless approach to motion capture. Markerless systems such as those developed at Stanford, MIT, and Max Planck Institute, do not require subjects to wear special equipment for tracking. Special computer algorithms are designed to allow the system to analyze multiple streams of optical input and identify human forms, breaking them down into constituent parts for tracking. Applications of this technology extend deeply into popular imagination about the future of computing technology.

One commercially available markerless system designed by Organic Motion was featured during Intel CEO Paul Otellini's keynote address at the 2008 Consumer Electronics Show in Las Vegas. During the demonstration, singer Steven Harwell of the band Smash Mouth performed live while tracking data generated in realtime by the markerless system were instantaneously fed into the Unreal Engine 3. By using the motion capture system as an input device, the game engine utilized tracking data to animate a virtual Steve located within a garage scene. The demonstration showcases the adaptability of markerless technology in service industries such as patient care wherein a variety of subjects could benefit from motion analysis without the need for extensive user calibrations. These systems work well with large motions, but tend to have difficulties with fingers, faces, wrist rotations and small motions. Some systems require no special suits, while others prefer special colors to identify limbs.

[edit] Non-optical systems

[edit] Inertial systems

Inertial Motion Capture technology is based on miniature inertial sensors, biomechanical models and sensor fusion algorithms. It's an easy to use and cost-efficient way for full-body human motion capture. The motion data of the inertial sensors (inertial guidance system) is transmitted wirelessly to a PC or laptop, where the full body motion is recorded or viewed. Most inertial systems use Gyroscopes to measure rotations. These rotations are translated to a skeleton in the software. Much like optical markers, the more gyro's the more human like the data. No external cameras, emitters or markers are needed for relative motions. Inertial mocap systems capture the full six degrees of freedom body motion of a human in real-time. Benefits of using Inertial systems include; No solving, freedom from studios as most systems are portable, and large capture areas. These systems are similar to the Wii controllers but much more sensitive and having much greater resolution and update rate. They can accurately measure the direction to the ground to within a degree. The popularity of inertial systems is rising amongst independent game developers, mainly because of the quick and easy set up resulting in a fast pipeline. A range of suits are now available from various manufacturers and base price's range from $25,000 to $80,000 USD.

[edit] Mechanical motion

Mechanical motion capture systems directly track body joint angles and are often referred to as exo-skeleton motion capture systems, due to the way the sensors are attached to the body. A performer attaches the skeletal-like structure to their body and as they move so do the articulated mechanical parts, measuring the performer’s relative motion. Mechanical motion capture systems are real-time, relatively low-cost, free-of-occlusion, and wireless (untethered) systems that have unlimited capture volume. Typically, they are rigid structures of jointed, straight metal or plastic rods linked together with potentiometers that articulate at the joints of the body. These suits tend to be in the $25,000 to $75,000 range plus an external absolution positioning system.

[edit] Magnetic systems

Magnetic systems calculate position and orientation by the relative magnetic flux of three orthogonal coils on both the transmitter and each receiver. The relative intensity of the voltage or current of the three coils allows these systems to calculate both range and orientation by meticulously mapping the tracking volume. JZZ Technologies, Inc uses this hardware in their E-Factor motion capture analysis program. The sensor output is 6DOF, which provides useful results obtained with two-thirds the number of markers required in optical systems; one on upper arm and one on lower arm for elbow position and angle. The markers are not occluded by nonmetallic objects but are susceptible to magnetic and electrical interference from metal objects in the environment, like rebar (steel reinforcing bars in concrete) or wiring, which affect the magnetic field, and electrical sources such as monitors, lights, cables and computers. The sensor response is nonlinear, especially toward edges of the capture area. The wiring from the sensors tends to preclude extreme performance movements. The capture volumes for magnetic systems are dramatically smaller than they are for optical systems. With the magnetic systems, there is a distinction between “AC” and “DC” systems: one uses square pulses, the other uses sine wave pulse.

[edit] Related techniques

[edit] Facial Motion Capture

Most traditional motion capture hardware vendors provide for some type of low resolution facial capture utilizing anywhere from 32 to 300 markers with either an active or passive marker system. All of these solutions are limited by the time it takes to apply the markers, calibrate the positions and process the data. Ultimately the technology also limits their resolution and raw output quality levels.

High Fidelity Facial Motion Capture, also known as Performance Capture, is the next generation of fidelity and is utilized to record the more complex movements in a human face in order to capture higher degrees of emotion. Facial capture is currently arranging itself in several distinct camps. Studio Pendulum's Alter Ego and ArtemDigital's NVisage are utilizing traditional Vicon based motion capture data enhanced with custom software to layer and control blend shapes. Three companies, Image Metrics, Mova and CaptiveMotion have developed proprietary systems for capturing facial performances.

Image Metrics extends the blend shaped based solution by allowing markerless tracking of HD video captured of an artists performance. CaptiveMotion's Embody and Mova's Contour capture the actual topology of an actors face which allows for a greater level of detail without the limits of dealing with a pre-defined blend shape library.

[edit] RF Positioning

RF (radio frequency) positioning systems are becoming more viable as higher frequency RF devices allow greater precision than older RF technologies. The speed of light is 30 centimeters per nanosecond (billionth of a second), so a 10 gigahertz (billion cycles per second) RF signal enables an accuracy of about 3 centimeters. By measuring amplitude to a quarter wavelength, it is possible to improve the resolution down to about 8 mm. To achieve the resolution of optical systems, frequencies of 50 gigahertz or higher are needed, which are almost as line of sight and as easy to block as optical systems. Multipath and reradiation of the signal are likely to cause additional problems, but these technologies will be ideal for tracking larger volumes with reasonable accuracy, since the required resolution at 100 meter distances isn’t likely to be as high.

[edit] Non Traditional Systems

An alternative approach was developed where the actor is given an unlimited walking area through the use of a rotating sphere, similar to a hamster ball, which contains internal sensors recording the angular movements, removing the need for external cameras and other equipment. Even though this technology could potentially lead to much lower costs for mocap, the basic sphere is only capable of recording a single continuous direction. Additional sensors worn on the person would be needed to record anything more.

A studio in the Netherlands is using a 6DOF (Degrees of freedom) motion platform with an integrated omni-directional treadmill with high resolution optical motion capture to achieve the same effect. The captured person can walk in an unlimited area, negotiating different uneven terrains. Applications include medical rehabilitation for balance training, biomechanical research and virtual reality.

[edit] See also

Computer graphics portal

Video games
Animation
Rotoscope
Match moving, also known as “motion tracking”
Gollum, a character in The Lord of the Rings movies who was partially animated with motion capture (the motion capture was used as a reference even when the original data wasn't used.)
List of motion and gesture file formats
Hand Over

[edit] External links

Truebones motion capture resources and software
Gesture Follower project at Ircam - Centre Pompidou, Paris MaxMSP Motion tracking system
Human motion analysis for Motion Capturing Paper of introduction to the beginning of Mocap technologies.
JZZ Technologies, Inc- E-Factor motion capture analysis program
VirtualCinematography.org—Several papers on Universal Capture use in Matrix films
CGW Article on motion capture
Motion Capture Resources
Free Repository of Motion Capture Data
Free Data, membership and information on Motion Capture
The World Records of Motion Capture -- As of 2008-02-09, link has been overrun by spammers. Click on HomePage button on left, then "World Records" link to get to good page.
Free Motion Capture data - Full body and Face MOCAP data
MoCapData — Over 4,000 Free Motion Capture data by VICON
Motek Entertainment — Thousands of optical motion capture sequences in BVH and FBX formats.
Centroid Online — Hundreds of motion capture sequences in BVH and FBX formats.
Markerless MoCap for Interaction — Markerless full-body motion capture for human-computer interaction using a single camera.
Motion Capture Videos
Mocap Service by EDDADESIGN - Full Body & Facial Mocap Service in Barcelona. VICON Technology
Mocap Service by Centroid3D - Full Body & Facial Mocap Service at Pinewood Studios, UK. Motion Analysis Technology
Research-Grade Motion Capture Technology - Wireless motion capture systems by Northern Digital

Research groups addressing the task of Markerless Motion Capture: