WO2009061283A2 - Human motion analysis system and method - Google Patents

Human motion analysis system and method Download PDF

Info

Publication number
WO2009061283A2
WO2009061283A2 PCT/SG2008/000428 SG2008000428W WO2009061283A2 WO 2009061283 A2 WO2009061283 A2 WO 2009061283A2 SG 2008000428 W SG2008000428 W SG 2008000428W WO 2009061283 A2 WO2009061283 A2 WO 2009061283A2
Authority
WO
WIPO (PCT)
Prior art keywords
human
motion
posture
candidates
postures
Prior art date
Application number
PCT/SG2008/000428
Other languages
French (fr)
Other versions
WO2009061283A3 (en
Inventor
Wee Kheng Leow
Ruixuan Wang
Chee-Seng Mark Lee
Dongfeng Xing
Hon Wai Leong
Original Assignee
National University Of Singapore
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University Of Singapore filed Critical National University Of Singapore
Publication of WO2009061283A2 publication Critical patent/WO2009061283A2/en
Publication of WO2009061283A3 publication Critical patent/WO2009061283A3/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B24/00Electric or electronic controls for exercising apparatus of preceding groups; Controlling or monitoring of exercises, sportive games, training or athletic performances
    • A63B24/0003Analysing the course of a movement or motion sequences during an exercise or trainings sequence, e.g. swing for golf or tennis
    • A63B24/0006Computerised comparison for qualitative assessment of motion sequences or the course of a movement
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/11Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb
    • A61B5/1113Local tracking of patients, e.g. in a hospital or private home
    • A61B5/1114Tracking parts of the body
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/11Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb
    • A61B5/1116Determining posture transitions
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/11Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb
    • A61B5/1121Determining geometric values, e.g. centre of rotation or angular range of movement
    • A61B5/1122Determining geometric values, e.g. centre of rotation or angular range of movement of movement trajectories
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/11Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb
    • A61B5/1126Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb using a particular sensing technique
    • A61B5/1128Measuring movement of the entire body or parts thereof, e.g. head or hand tremor, mobility of a limb using a particular sensing technique using image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/162Segmentation; Edge detection involving graph-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B24/00Electric or electronic controls for exercising apparatus of preceding groups; Controlling or monitoring of exercises, sportive games, training or athletic performances
    • A63B24/0003Analysing the course of a movement or motion sequences during an exercise or trainings sequence, e.g. swing for golf or tennis
    • A63B24/0006Computerised comparison for qualitative assessment of motion sequences or the course of a movement
    • A63B2024/0012Comparing movements or motion sequences with a registered reference
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B2102/00Application of clubs, bats, rackets or the like to the sporting activity ; particular sports involving the use of balls and clubs, bats, rackets, or the like
    • A63B2102/32Golf
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B2220/00Measuring of physical parameters relating to sporting activity
    • A63B2220/80Special sensors, transducers or devices therefor
    • A63B2220/806Video cameras
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B2225/00Miscellaneous features of sport apparatus, devices or equipment
    • A63B2225/20Miscellaneous features of sport apparatus, devices or equipment with means for remote communication, e.g. internet or the like
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B2225/00Miscellaneous features of sport apparatus, devices or equipment
    • A63B2225/50Wireless data transmission, e.g. by radio transmitters or telemetry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present invention relates broadly to a method and system for human motion analysis.
  • 2D video-based software such as V1 Pro [V1 Pro, swing analysis software, www.v1golf.com1 , MotionView [MotionView, golf swing video and motion analysis software, www.golfcoachsystems.com/qolf-swinq- software, html
  • MotionCoach MotionCoach, golf swing analysis system, www. motioncoach . com]
  • cSwing 2008 video swing analysis program, www.cswing.coml
  • 3D motion capture systems such as Vicon [Vicon 3D motion capture system, www.vicon.com/applications/sports.html] and MAC Eagle [Motion Analysis Corporation, Eagle motion capture system, www.motionanalysis.com] capture 3D human motion by tracking reflective markers attached to the human body and computing the markers' positions in 3D. Using specialized cameras, these systems can capture 3D motion efficiently and accurately. Given the captured 3D motion, it is relatively easy for an addon algorithm to compute the motion discrepancies of the user's motion relative to domain-specific reference motion. However, they are not equipped with an intelligent software for automatic assessment of the motion discrepancies based on domain- specific assessment criteria. They are very expensive systems requiring six or more cameras to function effectively. They are also cumbersome to set up and difficult to use. These are passive marker-based systems.
  • the markers are LEDs that each blink a special code that uniquely identifies the marker.
  • Such systems can resolve some tracking difficulties of passive marker-based system.
  • the LEDs are connected by cables which supply electricity for them to operate.
  • Such a tethered system places restriction on the kind of motion that can be captured. So, it is less versatile than untethered systems.
  • U.S. Patents US 4891748, US 7095388, disclose systems that capture the video of a person performing a physical skill, project the reference video of an expert scaled according to the body size of the person, and compare the motion in the videos of the person and the expert. In these systems, motion comparison is performed only in 2D videos. They are not accurate enough and may fail due to depth ambiguity in 3D motion and self-occlusions of body parts.
  • Japanese Patent JP 2794018 discloses a golf swing analysis system that attaches a large number of markers onto a golfer's body and club, and captures a sequence of golf swing images using a camera. The system then computes the makers' coordinates in 2D, and compares the coordinate data with those in a selected reference data.
  • US Patent Publication US 2006/0211522 discloses a system of colored markers placed on a baseball player's arms, legs, bat, pitching mat, etc. for manually facilitating the proper form of the player's body. No computerized analysis and comparison is described in the patent.
  • US Patent US 5907819 discloses a golf swing analysis system that attaches motion sensors on the golfer's body. The sensors record the player's motion and send the data to a computer through connecting cables to analyze the player's motion.
  • Japanese Patents JP 9-154996, JP 2001-614, and European Patent EP 1688746 describe similar systems that attach sensors to the human body.
  • US Patent Publication 2002/0115046 and US Patent 6567536 disclose similar systems except that a video camera is also used to capture video information which is synchronized with the sensor data. Since the sensors are connected to the computer by cables, the motion type that can be captured is restricted. These are tethered systems, as opposed to the marker- based systems described above, which are untethered.
  • US Patent US 7128675 discloses a method of analyzing a golf swing by attaching two lasers to the putter. A camera connected to a computer records the laser traces and provides feedback to the golfer regarding his putting swing. For the same reason as the methods that use motion sensors, the motion type that can be captured is restricted.
  • a method for human motion analysis comprising the steps of capturing one or more 2D input videos, of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
  • the method may further comprise the step of determining differences between 3D reference data for said human motion and the selected sequence of 3D human postures.
  • the method may further comprise the step of visualizing said differences to a user.
  • Extracting the sets of 2D body regions may comprise one or more of a group consisting of background subtraction, iterative graph-cut segmentation and skin detection.
  • Determining the 3D human posture candidates may comprise the steps of generating a first 3D human posture candidate; and flipping a depth orientation of body parts represented in the first 3D human posture candidate around respective joints to generate further 3D human posture candidates from the first 3D human posture candidate.
  • Generating the first 3D human posture candidate may comprise temporally aligning the extracted sets of 2D body portions from each frame with 3D reference data of the human motion and adjusting the 3D reference data to match the 2D body portions.
  • Selecting the sequence of 3D human postures from the 3D human posture candidates may be based on a least cost path among the 3D human posture candidates for the respective frames.
  • Selecting the sequence of 3D human postures from the 3D human posture candidates may further comprise refining a temporal alignment of the extracted sets of 2D body portions from each frame with 3D reference data of the human motion.
  • a system for human motion analysis comprising the steps of means for capturing one or more 2D input videos of the human motion; means for extracting sets of 2D body regions from respective frames of the 2D input videos; means for determining 3D human posture candidates for each of the extracted sets of 2D body regions; and means for selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the ' human motion in 3D.
  • the system may further comprise means for determining differences between 3D reference data for said human motion and the selected sequence of 3D human postures.
  • the system may further comprise means for visualizing said differences to a user.
  • the means for extracting the sets of 2D body regions may perform one or more of a group consisting of background subtraction, iterative graph-cut segmentation and skin detection.
  • the means for determining the 3D human posture candidates may generate a first 3D human posture candidate; and flips a depth orientation of body parts represented in the first 3D human posture candidate around respective joints to generate further 3D human posture candidates from the first 3D human posture candidate.
  • Generating the first 3D human posture candidate may comprise temporally aligning the extracted sets of 2D body portions from each frame with 3D reference data of the human motion and adjusting the 3D reference data to match the 2D body portions.
  • the means for selecting the sequence of 3D human postures from the 3D human posture candidates may determine a least cost path among the 3D human posture candidates for the respective frames.
  • the means for selecting the sequence of 3D human postures from the 3D human posture candidates may further comprise means for refining a temporal alignment of the extracted sets of 2D body portions from each frame with 3D reference data of the human motion.
  • a data storage medium having computer code means for instructing a computing device to execute a method for human motion detection, the method comprising the steps of capturing one or more 2D input videos of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of.3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
  • Figure 1 illustrates the block diagram of a human motion analysis system with the camera connected directly to the computer, according to an example embodiment.
  • Figure 2 shows a schematic top-down view drawing of an example embodiment comprising a camera.
  • Figure 3(a) illustrates the performer standing in a standard posture.
  • Figure 3(b) illustrates a 3D model of the performer standing in a standard posture according to an example embodiment.
  • the dots denote joints, straight lines denote bones connecting the joints, and gray scaled regions denote body parts.
  • Figure 4 illustrates an example of body region extraction.
  • Figure 4(a) shows an input image and
  • Figure 4(b) shows the extracted body regions, according to an example embodiment.
  • Figure 5 illustrates the flipping of the depth orientation of body part b in the z- direction to the new orientation denoted by a dashed line, according to an example embodiment.
  • Figure 6 illustrates an example result of posture candidate estimation according to an example embodiment
  • Figure 6(a) shows the input image with a posture candidate overlaid.
  • Figure 6(b) shows the skeletons of the posture candidates viewed from the front. At this viewing angle, all the posture candidates overlap exactly.
  • Figure 6(c) shows the skeletons of the posture candidates viewed from the side. Each candidate is shown with a different gray scale.
  • Figure 7 illustrates an example display of detailed 3D difference by overlapping the estimated performer's postures (dark gray scale) with the corresponding expert's postures (lighter gray scale) according to an example embodiment.
  • the overlapping postures can be rotated in 3D to show different views.
  • the estimated performer's postures can also be overlapped with the input images for visual verification of their correctness.
  • Figure 8 illustrates an example display of colored-coded regions overlapped with an input image for quick assessment according to an example embodiment.
  • the darker gray scale regions indicate large error, the lighter gray scale regions indicate moderate error, and the transparent regions indicate negligible or no error.
  • Figure 9 illustrates the block diagram of a human motion analysis system with the camera and output device connected to the computer through a computer network, according to an example embodiment.
  • Figure 10 illustrates the block diagram of a human motion analysis system with the wireless input and output device, such as a hand phone or Personal Digital Assistant equipped with a camera, connected to the computer through a wireless network, according to an example embodiment.
  • Figure 11 shows a schematic top-down view of an example . embodiment comprising multiple cameras arranged in a straight line.
  • Figure 12 shows a schematic top view of an example embodiment comprising multiple cameras placed around the performer.
  • Figure 13 shows a flow chart illustrating a method for human motion detection according to an example embodiment.
  • Figure 14 shows a schematic drawings of a computer system for implementing the method and system of an example embodiment.
  • the described example embodiments provide a system and method for acquiring a human performer's motion in one or more 2D videos, analyzing the 2D videos, comparing the performer's motion in the 2D videos and a 3D reference motion of an expert, computing the 3D differences between the performer's motion and the expert's motion, and delivering information regarding the 3D difference to the performer for improving the performer's motion.
  • the system in example embodiments comprises one or more 2D cameras, a computer, an external storage device, and a display device. In a single camera configuration, the camera acquires the performer's motion in a 2D video and passes the 2D video to a computing device. In a multiple camera configuration, the cameras acquire the performer's motion simultaneously in multiple 2D videos and pass the 2D videos to the computing device.
  • calculating, “determining”, “generating”, “initializing”, “outputting”, or the like refer to the action and processes of a computer system,, or similar electronic device, that manipulates and transforms data represented as physical quantities within the the computer system into other data similarly represented as physical quantities within the computer system or other information storage, transmission or display devices.
  • the present specification also discloses apparatus for performing the operations of the methods.
  • Such apparatus may be specially constructed for the required purposes, or may comprise a general purpose computer or other device selectively activated or reconfigured by a computer program stored in the computer.
  • the algorithms and displays presented herein are not inherently related to any particular computer or other apparatus.
  • Various general purpose machines may be used with programs in accordance with the teachings herein.
  • the construction of more specialized apparatus to perform the required method steps may be appropriate.
  • the structure of a conventional general purpose computer will appear from the description below.
  • the present specification also implicitly discloses a computer program, in that it would be apparent to the person skilled in. the art that the individual steps of the method described herein may be put into effect by computer code.
  • the computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and coding thereof may be used to implement the teachings of the disclosure contained herein.
  • the computer program is not intended to be limited to any particular control flow. There are many other variants of the computer program, which can use different control flows without departing from the spirit or scope of the invention.
  • the computer readable medium may include storage devices such as magnetic or optical disks, memory chips, or other storage devices suitable for interfacing with a general purpose computer.
  • the computer readable medium may also include a hard-wired medium such as exemplified in the internet system, or wireless medium such as exemplified in the GSM mobile telephone system.
  • the invention may also be implemented as hardware modules. More particular, in the hardware sense, a module is a functional hardware unit designed for use with other components or modules. For example, a module may be implemented using discrete electronic components, or it can form a portion of an entire electronic circuit such as an Application Specific Integrated Circuit (ASIC). Numerous other possibilities exist. Those skilled in the art will appreciate that the system can also be implemented as a combination of hardware and software modules,.
  • ASIC Application Specific Integrated Circuit
  • the 3D difference can include 3D joint angle difference, 3D velocity difference, etc. depending on the requirements of the application domain. 7. Visualizing and highlighting the 3D difference in a display device.
  • An example embodiment of the present invention provides a system and method for acquiring a human performer's motion in one 2D video, analyzing the 2D video, comparing the performer's motion in the 2D video and a 3D reference motion of an expert, computing the 3D differences between the performer's motion and the expert's motion, and delivering information regarding the 3D difference to the performer for improving the performer's motion.
  • FIG. 1 shows a schematic block diagram of the example embodiment of a human motion analysis system 100.
  • the system 100 comprises a camera unit 102 coupled to a processing unit, here in the form of a computer 104.
  • the computer 104 is further coupled to an output device 106, and an external storage device 108.
  • the example embodiment comprises a stationary camera 200 with a fixed lens, which is used to acquire a 2D video m' of the performer's 202 entire motion.
  • the 2D video is then analyzed and compared with a 3D reference motion M of an expert.
  • the difference between the performer's 202 2D motion and the expert's 3D reference motion is computed.
  • the system displays and highlights the difference in an output device 106 ( Figure 1).
  • the software component implemented on the computer 104 ( Figure 1) in the example embodiment comprises the following processing stages:
  • the method for Stage 1 in an example embodiment comprises a background subtraction technique described in [C. Stauffer and W.E.L. Grimson. Adaptive background mixture models for real-time tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1998], an iterative graph-cut segmentation technique described in [C. Rother, V. Kolmogorov, and A. Blake. Grabcut - interactive foreground extraction using iterated graph cuts. In Proceedings of ACM SIGGRAPH, 2004], and a skin detection technique described in [MJ. Jones and J.M. Rehg. Statistical color models with application to skin detection. International. Journal of Computer Vision, 46:81-96, 2002]. The contents of those references are hereby incorporated by cross references.
  • Figure 4 illustrates an example result of body region extraction.
  • Figure 4(a) shows an input image
  • Figure 4(b) shows the extracted body region.
  • the lighter gray scale region is extracted by the iterative graph-cut segmentation technique
  • the darker gray scale parts are extracted using skin detection and iterative graph-cut segmentation techniques.
  • the method for Stage 2 in the example embodiment comprises computing the parameters of a scaled-orthographic camera projection, which include the camera's 3D rotation angle ( ⁇ , ⁇ , ⁇ ), camera position (c , c ), and scale factor s. It is assumed that the performer's posture at the first image frame of the video is the same as a standard calibration posture (for example, Figure 3).
  • the method comprises the following steps:
  • 3D model of the performer Projecting a 3D model of the performer at calibration posture under the default camera parameters and render as a 2D projected body region. This step can be performed using OpenGL [OpenGL, www.opengl.org] in the example embodiment. The content of that reference is hereby incorporated by cross-reference.
  • the 3D model of the performer can be provided in different forms. For example, a template 3D model may be used, that has been generated to function as a generic template for a large cross section of possible performers.
  • a 3D model of an actual performer may first be generated, which will involve an additional pre-processing step for generation of the customized 3D model, as will be appreciated and is understood by a person skilled in the art.
  • PCA principal component analysis
  • Compute the camera position as the difference between the centers, i.e. /s and c y (p' y - p ⁇ f s. 7.
  • the calibration method for stage 2 in the example embodiment thus derives the camera parameters for the particular human motion analysis system in question. It will be appreciated by a person skilled in the art that the same parameters can later be used for human motion analysis of a different performer, provided that the camera settings remain the same for the different performer. On the other hand, as mentioned above, a customized calibration using customized 3D models of an actual performer may be performed for each performer if desired, in different embodiments,.
  • the method for stage S2 may comprise using other existing algorithms for the camera calibration, such as for example the "camera calibration tool box for MatLab” [www.vision.Caltech.edu/bouguetj/calib_doc/], the contents of which are hereby incorporated by cross-reference.
  • the method for Stage 3 in the example embodiment comprises estimating the approximate temporal correspondence C(f) and the approximate rigid transformation T 1 that best align the posture ⁇ C( o in the 3D reference motion to the extracted body region
  • each transformation T 1 at time V can be determined by finding the best match between extracted body region S' and 2D projected model body region P(T(SC(O)):
  • T v a ig mmds ⁇ P ⁇ T(B c ⁇ t> ) )), S t ',) where the optimal T, is computed using a sampling technique described in
  • the method of computing the optimal temporal correspondence C(t) comprises the application of dynamic programming as follows. Let d(F, C(F)) denote the difference d :
  • (F, f) corresponds to the possible frame correspondence between f and t, and the correspondence cost is d(F, t).
  • a path in D is a sequence of frame correspondences for F
  • the least cost path is obtained by tracing back the path from D(L',L) to D(O, 0).
  • the optimal C(t) is given by the least cost path.
  • the method for stage 4 in the example embodiment estimates 3D posture candidates that align with the extracted body regions. That is, for each time F, find a set (S' , ⁇ of 3D posture candidates whose 2D projected model body regions
  • the example embodiment uses a nonparametric implementation of the Belief Propagation (BP) technique described in [E.B. Sudderth, AT. Ihler, WT. Freeman, and A.S. Willsky. Nonparametric belief propagation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 605-612, 2003. M. Isard. Pampas: Real-valued graphical models for computer vision.
  • BP Belief Propagation
  • Stage 3 the temporary align posture in the 3D reference motion forms the initial estimate for each frame.
  • each body part at each pose sample projects each body part at each pose sample to compute the mean image positions of its joints. Then, starting from the root body part, generate a pose sample for each body part such that the body part at the pose sample is connected to its parent body part, and the projected image positions of its joints match the computed mean positions of its joints.
  • Figure 6 illustrates example posture candidates in Figures 6(b) and (c) generated from an input image in Figure 6(a).
  • the skeletons of the posture candidates are viewed from the front. At this viewing angle, all the posture candidates overlap exactly, given the nature of how they have been derived explained above for the example embodiment.
  • Figure 6(c) shows the different skeletons of the posture candidates viewed from the side, illustrating the differences between the respective posture candidates.
  • the method for Stage 5 in the example embodiment comprises refining the estimate of temporal correspondence C(O and selecting the best posture candidates B' , that best match the corresponding reference postures B C ( t %
  • the method of computing the optimal refined temporal correspondence C(P) comprises the application of dynamic programming as follows. Let d (P, t, P) denote the
  • D denote a (L' + 1) * (L + 1) x N correspondence matrix, where N is the maximum number of posture candidates at any time t'.
  • N is the maximum number of posture candidates at any time t'.
  • Each matrix element at (P, t, I) corresponds to the possible correspondence between t ⁇ t, and /', and the correspondence cost is d(f, t, P).
  • the least cost path is obtained by tracing back the path from D(L',L, 1(L)) to D(O, 0, /(O)).
  • the optimal C(O and /(O are given by the least cost path.
  • the method for Stage 6 in the example embodiment comprises computing the 3D difference between the selected 3D posture candidate 6W « and the corresponding 3D reference posture ⁇ C(f) at each time F.
  • the 3D difference can include 3D joint angle difference, 3D joint velocity difference, etc. depending on the specific coaching requirements of the sports.
  • the method for Stage 7 in the example embodiment comprises displaying and highlighting the 3D difference in , a display device.
  • An example display of detailed 3D difference is illustrated in Figure 7.
  • Figure 7 illustrates an example display of detailed 3D difference by overlapping the estimated performer's postures e.g. 700 (dark gray scale) ' with the corresponding expert's postures e.g. 702 (lighter gray scale) according to an example embodiment.
  • the overlapping postures can be rotated in 3D to show different views (compare rows 704 and 706).
  • the estimated performer's postures can also. be overlapped with the input images (row 708) for visual verification of their correctness.
  • Figure 8 illustrates an example display of colored-coded regions e.g. 800, 802 overlapped with an input image 804 for quick assessment according to an example embodiment.
  • the darker gray scale regions e.g. 800 indicates large error
  • the lighter gray scale regions e.g. 802 indicates moderate error
  • the transparent regions e.g. 806 indicate negligible or no error.
  • the 2D input video is first segmented into the corresponding performer's motion segments.
  • the method of determining the corresponding performer's segment boundary for each reference segment boundary t comprises the following steps:
  • r can be determined as follows,
  • T* ⁇ k .
  • the input body region is extracted with the help of colored markers.
  • the appendages carried by the performer e.g., a golf club
  • the 3D reference motion of the expert is replaced by the 3D posture sequence of the performer computed from the input video acquired in a previous session.
  • the 3D reference motion of the expert is replaced by the 3D posture sequence of the performer computed from the input videos acquired in previous sessions that best matches the 3D reference motion of the expert.
  • the camera 900 and output device 902 are connected to a computer 904 through a computer network 906, as shown in Figure 9.
  • the computer 904 is coupled to. an external storage device 908 directly in this example.
  • a wireless input and output device 1000 such as a hand phone or Personal Digital Assistant equipped with a camera, is connected to a computer 1002 through a wireless network 1004, as shown in Figure 10.
  • the computer 1002 is coupled to an external storage device 1006 directly in this example.
  • multiple cameras 1101-1103 are arranged along a straight line, as shown in Figure 11. Each camera acquires a portion of the performers 1104 entire motion when the performer 1104 passes in front of the respective camera. This embodiment also allows the system to acquire high-resolution video of a user whose body motion spans a large arena.
  • multiple cameras 1201-1204 are placed around the performer 1206, as shown in Figure 12. This arrangement allows different cameras to capture the frontal view of the performer 1206 when he faces different cameras.
  • the calibration method for the stage S2 processing in addition to calibration of each of the individual cameras as described above for the single camera embodiment, further comprises computing the relative positions and orientations between the cameras using an inter-relation algorithm between the cameras, as will be appreciated by a person skilled in the art.
  • inter-relation algorithms are understood in the art, and will not be described in more detail herein. Reference is made for example to [R. Jain, R. -Kasturi, and B. G. Schunck, Machine Vision, McGraw-Hill 1995] and [R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2000.] for example algorithms for use in such an embodiment. The contents of those references are hereby incorporated by cross-reference.
  • This stage segments the human body in each image frame of the input video.
  • the human body, the arms, and the background are assumed to have different colors so that they can be separated. This assumption is reasonable and easiiy satisfied, for instance, for a user who wears short-sleeved colored shirt and stands in front of a background of a different color.
  • the background can be a natural scene which is nonuniform in color.
  • This stage is achieved using a combination of background removal, graph-cut algorithm and skin color detection. In case the background is uniform, the segmentation algorithm can be simplified.
  • This stage computes the camera's extrinsic parameters, assuming that its intrinsic parameters have already been pre-computed. This stage can be achieved using existing camera calibration algorithms.
  • This stage estimates the approximate temporal correspondence between 3D reference motion and 2D input video.
  • Dynamic Programming technique is used to estimate the temporal correspondence between the input video and the reference motion by matching the 2D projections of 3D postures in the reference motion with the segmented human body in the 2D input video.
  • This stage also estimates the approximate global rotation and translation of the user's body relative to the 3D reference motion.
  • This stage selects the best posture candidates that form smooth motion over time. It also refines the temporal correspondence estimated in Stage 2. This stage is accomplished using Dynamic Programming.
  • the framework of the example embodiments can be applied to analyze various types of motion by adopting appropriate 3D reference motion. It will be appreciated by a person skilled in the art that by adapting the system and method to handle specific application domains, these stages can be refined and optimized to reduce computational costs and improve efficiency.
  • Figure 13 shows a flow chart 1300 illustrating a method for human motion detection according to an example embodiment.
  • one or more 2D input videos of the human motion are captured.
  • sets of 2D body regions are extracted from respective frames of the 2D input videos.
  • 3D human posture candidates are determined for each of the extracted sets of 2D body regions.
  • a sequence of 3D human postures from the 3D human posture candidates for the respective frames is selected as representing the human motion in 3D.
  • the method and system of the example embodiment can be implemented on a computer system 1400, schematically shown in Figure 14. It may be implemented as software, such as a computer program being executed within the computer system 1400, and instructing the computer system 1400 to conduct the method of the example embodiment.
  • the computer system 1400 comprises a computer module 1402, input modules such as a keyboard 1404 and mouse 1406 and a plurality of output devices such as a display 1408, and printer 1410.
  • the computer module 1402 is connected to a computer network 1412 via a suitable transceiver device 1414, to enable access to e.g. the Internet or other network systems such as Local Area Network (LAN) or Wide Area Network (WAN).
  • LAN Local Area Network
  • WAN Wide Area Network
  • the computer module 1402 in the example includes a processor 1418, a Random Access Memory (RAM) 1420 and a Read Only Memory (ROM) 1422.
  • the computer module 1402 also includes a number of Input/Output (I/O) interfaces, for example I/O interface 1424 to the display 1408, and I/O interface 1426 to the keyboard 1404.
  • I/O Input/Output
  • the components of the computer module 1402 typically communicate via an interconnected bus 1428 and in a manner known to the person skilled in the relevant art.
  • the application program is typically supplied to the user of the computer system 1400 encoded on a data storage medium such as a CD-ROM or flash memory carrier and read utilising a corresponding data storage medium drive of a data storage device 1430.
  • the application program is read and controlled in its execution by the processor 1418.
  • Intermediate storage of program data maybe accomplished using RAM 1420.

Abstract

A method and system for human motion analysis. The method comprises the steps of capturing one or more 2D input videos of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.

Description

Human Motion Analysis System and Method
FIELD OF INVENTION
The present invention relates broadly to a method and system for human motion analysis.
BACKGROUND
There are two general types of systems that can be used for motion analysis: 2D video- based software and 3D motion capture systems. 2D video-based software such as V1 Pro [V1 Pro, swing analysis software, www.v1golf.com1 , MotionView [MotionView, golf swing video and motion analysis software, www.golfcoachsystems.com/qolf-swinq- software, html, MotionCoach [MotionCoach, golf swing analysis system, www. motioncoach . com] , and cSwing 2008 [cSwing 2008, video swing analysis program, www.cswing.coml provide a set of tools for the user to manually assess his performance. It is affordable but lacks the intelligence to perform the assessment automatically. The assessment accuracy depends on the user's competence in using the software. Such systems perform assessment only in 2D, which is less accurate than 3D assessment. For example, accuracy may be reduced due to depth ambiguity in 3D motion and self-occlusions of body parts.
3D motion capture systems such as Vicon [Vicon 3D motion capture system, www.vicon.com/applications/sports.html] and MAC Eagle [Motion Analysis Corporation, Eagle motion capture system, www.motionanalysis.com] capture 3D human motion by tracking reflective markers attached to the human body and computing the markers' positions in 3D. Using specialized cameras, these systems can capture 3D motion efficiently and accurately. Given the captured 3D motion, it is relatively easy for an addon algorithm to compute the motion discrepancies of the user's motion relative to domain-specific reference motion. However, they are not equipped with an intelligent software for automatic assessment of the motion discrepancies based on domain- specific assessment criteria. They are very expensive systems requiring six or more cameras to function effectively. They are also cumbersome to set up and difficult to use. These are passive marker-based systems.
There is also available an active marker-based system. In the system, the markers are LEDs that each blink a special code that uniquely identifies the marker. Such systems can resolve some tracking difficulties of passive marker-based system. However, the LEDs are connected by cables which supply electricity for them to operate. Such a tethered system places restriction on the kind of motion that can be captured. So, it is less versatile than untethered systems.
U.S. Patents US 4891748, US 7095388, disclose systems that capture the video of a person performing a physical skill, project the reference video of an expert scaled according to the body size of the person, and compare the motion in the videos of the person and the expert. In these systems, motion comparison is performed only in 2D videos. They are not accurate enough and may fail due to depth ambiguity in 3D motion and self-occlusions of body parts.
Japanese Patent JP 2794018 discloses a golf swing analysis system that attaches a large number of markers onto a golfer's body and club, and captures a sequence of golf swing images using a camera. The system then computes the makers' coordinates in 2D, and compares the coordinate data with those in a selected reference data.
US Patents US 2004/0209698 and US 7097459 disclose similar systems as JP
2794018 except that two or more cameras are used to capture multiple simultaneous image sequences. Therefore, they have the potential to compute 3D coordinates. These are essentially marker-based motion capture systems.
US Patent Publication US 2006/0211522 discloses a system of colored markers placed on a baseball player's arms, legs, bat, pitching mat, etc. for manually facilitating the proper form of the player's body. No computerized analysis and comparison is described in the patent. US Patent US 5907819 discloses a golf swing analysis system that attaches motion sensors on the golfer's body. The sensors record the player's motion and send the data to a computer through connecting cables to analyze the player's motion.
Japanese Patents JP 9-154996, JP 2001-614, and European Patent EP 1688746 describe similar systems that attach sensors to the human body. US Patent Publication 2002/0115046 and US Patent 6567536 disclose similar systems except that a video camera is also used to capture video information which is synchronized with the sensor data. Since the sensors are connected to the computer by cables, the motion type that can be captured is restricted. These are tethered systems, as opposed to the marker- based systems described above, which are untethered.
US Patent US 7128675 discloses a method of analyzing a golf swing by attaching two lasers to the putter. A camera connected to a computer records the laser traces and provides feedback to the golfer regarding his putting swing. For the same reason as the methods that use motion sensors, the motion type that can be captured is restricted.
A need therefore exists to provide a human motion analysis system and method that seek to address at least one of the above-mentioned problems.
SUMMARY
In accordance with a first aspect of the present invention there is provided a method for human motion analysis, the method comprising the steps of capturing one or more 2D input videos, of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
The method may further comprise the step of determining differences between 3D reference data for said human motion and the selected sequence of 3D human postures. The method may further comprise the step of visualizing said differences to a user.
Extracting the sets of 2D body regions may comprise one or more of a group consisting of background subtraction, iterative graph-cut segmentation and skin detection.
Determining the 3D human posture candidates may comprise the steps of generating a first 3D human posture candidate; and flipping a depth orientation of body parts represented in the first 3D human posture candidate around respective joints to generate further 3D human posture candidates from the first 3D human posture candidate.
Generating the first 3D human posture candidate may comprise temporally aligning the extracted sets of 2D body portions from each frame with 3D reference data of the human motion and adjusting the 3D reference data to match the 2D body portions.
Selecting the sequence of 3D human postures from the 3D human posture candidates may be based on a least cost path among the 3D human posture candidates for the respective frames.
Selecting the sequence of 3D human postures from the 3D human posture candidates may further comprise refining a temporal alignment of the extracted sets of 2D body portions from each frame with 3D reference data of the human motion.
In accordance with a second aspect of the present invention there is provided a system for human motion analysis, the method comprising the steps of means for capturing one or more 2D input videos of the human motion; means for extracting sets of 2D body regions from respective frames of the 2D input videos; means for determining 3D human posture candidates for each of the extracted sets of 2D body regions; and means for selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the' human motion in 3D.
The system may further comprise means for determining differences between 3D reference data for said human motion and the selected sequence of 3D human postures.
The system may further comprise means for visualizing said differences to a user.
The means for extracting the sets of 2D body regions may perform one or more of a group consisting of background subtraction, iterative graph-cut segmentation and skin detection.
The means for determining the 3D human posture candidates may generate a first 3D human posture candidate; and flips a depth orientation of body parts represented in the first 3D human posture candidate around respective joints to generate further 3D human posture candidates from the first 3D human posture candidate.
Generating the first 3D human posture candidate may comprise temporally aligning the extracted sets of 2D body portions from each frame with 3D reference data of the human motion and adjusting the 3D reference data to match the 2D body portions.
The means for selecting the sequence of 3D human postures from the 3D human posture candidates may determine a least cost path among the 3D human posture candidates for the respective frames.
The means for selecting the sequence of 3D human postures from the 3D human posture candidates may further comprise means for refining a temporal alignment of the extracted sets of 2D body portions from each frame with 3D reference data of the human motion. In accordance with a third aspect of the present invention there is provided a data storage medium having computer code means for instructing a computing device to execute a method for human motion detection, the method comprising the steps of capturing one or more 2D input videos of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of.3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the invention will be better understood and readily apparent to one of ordinary skill in the art from the following written description, by way of example only, and in conjunction with the drawings, in which:
Figure 1 illustrates the block diagram of a human motion analysis system with the camera connected directly to the computer, according to an example embodiment.
Figure 2 shows a schematic top-down view drawing of an example embodiment comprising a camera. Figure 3(a) illustrates the performer standing in a standard posture. Figure 3(b) illustrates a 3D model of the performer standing in a standard posture according to an example embodiment. The dots denote joints, straight lines denote bones connecting the joints, and gray scaled regions denote body parts.
Figure 4 illustrates an example of body region extraction. Figure 4(a) shows an input image and Figure 4(b) shows the extracted body regions, according to an example embodiment.
Figure 5 illustrates the flipping of the depth orientation of body part b in the z- direction to the new orientation denoted by a dashed line, according to an example embodiment. Figure 6 illustrates an example result of posture candidate estimation according to an example embodiment Figure 6(a) shows the input image with a posture candidate overlaid. Figure 6(b) shows the skeletons of the posture candidates viewed from the front. At this viewing angle, all the posture candidates overlap exactly. Figure 6(c) shows the skeletons of the posture candidates viewed from the side. Each candidate is shown with a different gray scale.
Figure 7 illustrates an example display of detailed 3D difference by overlapping the estimated performer's postures (dark gray scale) with the corresponding expert's postures (lighter gray scale) according to an example embodiment. The overlapping postures can be rotated in 3D to show different views. The estimated performer's postures can also be overlapped with the input images for visual verification of their correctness.
Figure 8 illustrates an example display of colored-coded regions overlapped with an input image for quick assessment according to an example embodiment. The darker gray scale regions indicate large error, the lighter gray scale regions indicate moderate error, and the transparent regions indicate negligible or no error.
Figure 9 illustrates the block diagram of a human motion analysis system with the camera and output device connected to the computer through a computer network, according to an example embodiment.
Figure 10 illustrates the block diagram of a human motion analysis system with the wireless input and output device, such as a hand phone or Personal Digital Assistant equipped with a camera, connected to the computer through a wireless network, according to an example embodiment. Figure 11 shows a schematic top-down view of an example . embodiment comprising multiple cameras arranged in a straight line.
Figure 12 shows a schematic top view of an example embodiment comprising multiple cameras placed around the performer.
Figure 13 shows a flow chart illustrating a method for human motion detection according to an example embodiment.
Figure 14 shows a schematic drawings of a computer system for implementing the method and system of an example embodiment.
DETAILED DESCRIPTION
The described example embodiments provide a system and method for acquiring a human performer's motion in one or more 2D videos, analyzing the 2D videos, comparing the performer's motion in the 2D videos and a 3D reference motion of an expert, computing the 3D differences between the performer's motion and the expert's motion, and delivering information regarding the 3D difference to the performer for improving the performer's motion. The system in example embodiments comprises one or more 2D cameras, a computer, an external storage device, and a display device. In a single camera configuration, the camera acquires the performer's motion in a 2D video and passes the 2D video to a computing device. In a multiple camera configuration, the cameras acquire the performer's motion simultaneously in multiple 2D videos and pass the 2D videos to the computing device.
Some portions of the description which follows are explicitly or implicitly presented in terms of algorithms and functional or symbolic representations of operations on data within a computer memory. These algorithmic descriptions and functional or symbolic representations are the means used by those skilled in the data processing arts to convey most effectively the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities, such as electrical, magnetic or optical signals capable of being stored, transferred, combined, compared, and otherwise manipulated.
Unless specifically stated otherwise, and as apparent from the following, it will be appreciated that throughout the present specification, discussions utilizing terms such as
"calculating", "determining", "generating", "initializing", "outputting", or the like, refer to the action and processes of a computer system,, or similar electronic device, that manipulates and transforms data represented as physical quantities within the the computer system into other data similarly represented as physical quantities within the computer system or other information storage, transmission or display devices.
The present specification also discloses apparatus for performing the operations of the methods. Such apparatus may be specially constructed for the required purposes, or may comprise a general purpose computer or other device selectively activated or reconfigured by a computer program stored in the computer. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose machines may be used with programs in accordance with the teachings herein. Alternatively, the construction of more specialized apparatus to perform the required method steps may be appropriate. The structure of a conventional general purpose computer will appear from the description below.
In addition, the present specification also implicitly discloses a computer program, in that it would be apparent to the person skilled in. the art that the individual steps of the method described herein may be put into effect by computer code. The computer program is not intended to be limited to any particular programming language and implementation thereof. It will be appreciated that a variety of programming languages and coding thereof may be used to implement the teachings of the disclosure contained herein. Moreover, the computer program is not intended to be limited to any particular control flow. There are many other variants of the computer program, which can use different control flows without departing from the spirit or scope of the invention.
Furthermore, one or more of the steps of the computer program may be performed in parallel rather than sequentially. Such a computer program may be stored on any computer readable medium. The computer readable medium may include storage devices such as magnetic or optical disks, memory chips, or other storage devices suitable for interfacing with a general purpose computer. The computer readable medium may also include a hard-wired medium such as exemplified in the internet system, or wireless medium such as exemplified in the GSM mobile telephone system.
The computer program when loaded and executed on such a general-purpose computer effectively results in an apparatus that implements the steps of the preferred method.
The invention may also be implemented as hardware modules. More particular, in the hardware sense, a module is a functional hardware unit designed for use with other components or modules. For example, a module may be implemented using discrete electronic components, or it can form a portion of an entire electronic circuit such as an Application Specific Integrated Circuit (ASIC). Numerous other possibilities exist. Those skilled in the art will appreciate that the system can also be implemented as a combination of hardware and software modules,.
The motion analysis and comparison is performed in the following stages in an example embodiment:
1. Extracting the performer's body regions in each image frame of the 2D videos. 2. Calibrating the parameters of the cameras.
3. Estimating the temporal correspondence and rigid transformations that best align the postures in a 3D reference motion to the body regions in the image frames.
4. Estimating the 3D posture candidates that produce the human body regions in the image frames, using the results obtained in Stage 3 as the initial estimates.
5. Selecting the 3D posture candidate that best matches the human body region in each time instant of the 2D video and refine the temporal correspondence between the 2D video and the 3D reference motion. In the case of multiple-camera configuration, the selected 3D posture candidate simultaneously best matches the human body regions in each time instant of the multiple 2D videos
6. Computing the 3D difference between the selected 3D posture candidates and the corresponding 3D reference posture. The 3D difference can include 3D joint angle difference, 3D velocity difference, etc. depending on the requirements of the application domain. 7. Visualizing and highlighting the 3D difference in a display device.
An example embodiment of the present invention provides a system and method for acquiring a human performer's motion in one 2D video, analyzing the 2D video, comparing the performer's motion in the 2D video and a 3D reference motion of an expert, computing the 3D differences between the performer's motion and the expert's motion, and delivering information regarding the 3D difference to the performer for improving the performer's motion.
Figure 1 shows a schematic block diagram of the example embodiment of a human motion analysis system 100. The system 100 comprises a camera unit 102 coupled to a processing unit, here in the form of a computer 104. The computer 104 is further coupled to an output device 106, and an external storage device 108.
With reference to Figure 2, the example embodiment comprises a stationary camera 200 with a fixed lens, which is used to acquire a 2D video m' of the performer's 202 entire motion. The 2D video is then analyzed and compared with a 3D reference motion M of an expert. The difference between the performer's 202 2D motion and the expert's 3D reference motion is computed. The system displays and highlights the difference in an output device 106 (Figure 1). The software component implemented on the computer 104 (Figure 1) in the example embodiment comprises the following processing stages:
1. Extracting the input body region S' in each image /', at time V of the video m'.
2. Calibrating the parameters of the camera 200. 3. Estimating the temporal correspondence C(f) between input video time f and reference time t and rigid transformations Tf that best align the posture SC(f) in the 3D reference motion to the body region SV in image /V for each time f.
4. Estimating the 3D posture candidates B\, that align with the input body regions B' , in the input images /',, using the results obtained in Stage 3 as the initial estimates.
5. Selecting the 3D posture candidate that best matches the input body region S',, for each time t', and refine the temporal correspondence C[F).
6. Computing the 3D difference between the selected 3D posture candidate β', and the corresponding 3D reference posture BC(η at each time f. 7. Visualizing and highlighting the 3D difference in the display device 106 (Figure
1).
The method for Stage 1 in an example embodiment comprises a background subtraction technique described in [C. Stauffer and W.E.L. Grimson. Adaptive background mixture models for real-time tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1998], an iterative graph-cut segmentation technique described in [C. Rother, V. Kolmogorov, and A. Blake. Grabcut - interactive foreground extraction using iterated graph cuts. In Proceedings of ACM SIGGRAPH, 2004], and a skin detection technique described in [MJ. Jones and J.M. Rehg. Statistical color models with application to skin detection. International. Journal of Computer Vision, 46:81-96, 2002]. The contents of those references are hereby incorporated by cross references. In different example embodiments, for videos with simple background, background subtraction technique is sufficient. For videos with complex background, iterative graph-cut and skin detection techniques should be used. Figure 4 illustrates an example result of body region extraction. Figure 4(a) shows an input image and Figure 4(b) shows the extracted body region. The lighter gray scale region is extracted by the iterative graph-cut segmentation technique, and the darker gray scale parts are extracted using skin detection and iterative graph-cut segmentation techniques. The method for Stage 2 in the example embodiment comprises computing the parameters of a scaled-orthographic camera projection, which include the camera's 3D rotation angle (θ , θ , θ ), camera position (c , c ), and scale factor s. It is assumed that the performer's posture at the first image frame of the video is the same as a standard calibration posture (for example, Figure 3). The method comprises the following steps:
1. Setting the camera parameters to default values: θ = θ = θ = 0, c = c = 0, s a r x y z x y
=1
2. Projecting a 3D model of the performer at calibration posture under the default camera parameters and render as a 2D projected body region. This step can be performed using OpenGL [OpenGL, www.opengl.org] in the example embodiment. The content of that reference is hereby incorporated by cross-reference. It is noted that in different example embodiments, the 3D model of the performer can be provided in different forms. For example, a template 3D model may be used, that has been generated to function as a generic template for a large cross section of possible performers. In another embodiment a 3D model of an actual performer may first be generated, which will involve an additional pre-processing step for generation of the customized 3D model, as will be appreciated and is understood by a person skilled in the art.
3. Computing the principal direction and the principal length h of the 2D projected model body region by applying principal component analysis (PCA) on the pixel positions in the projected model body region. The principal direction is the first eigenvector computed by PCA, and the principal length is the maximum length of the model body region along the principal direction.
4. Computing the principal direction and the principal length h' of the extracted captured body region in the first image frame of the video in a similar way.
5. Computing the camera scale s = h' I h.
6. Computing the camera position (c , c ).
Compute the center (p'χ , p' ) of the extracted body region and the center (p , p ) of the 2D projected model body region. Compute the camera position as the difference between the centers, i.e. /s and cy = (p'y - p} f s. 7. Computing the camera rotation angle θ about Z-axis as the angular difference between the principal directions of the extracted body region and the 2D projected model body region. Camera rotation angles θ and θ are omitted.
The calibration method for stage 2 in the example embodiment thus derives the camera parameters for the particular human motion analysis system in question. It will be appreciated by a person skilled in the art that the same parameters can later be used for human motion analysis of a different performer, provided that the camera settings remain the same for the different performer. On the other hand, as mentioned above, a customized calibration using customized 3D models of an actual performer may be performed for each performer if desired, in different embodiments,.
It is noted that in different embodiments, the method for stage S2 may comprise using other existing algorithms for the camera calibration, such as for example the "camera calibration tool box for MatLab" [www.vision.Caltech.edu/bouguetj/calib_doc/], the contents of which are hereby incorporated by cross-reference.
The method for Stage 3 in the example embodiment comprises estimating the approximate temporal correspondence C(f) and the approximate rigid transformation T1 that best align the posture βC(o in the 3D reference motion to the extracted body region
Sr in image /r for each time t' = 0 L', where U +1 is the length of the video sequence. The length of the 3D reference motion is L+1 , for t = 0, ..., L The estimation is subjected to a temporal order constraint: for any two temporally ordered postures in the performer's motion, the two corresponding postures in the reference motion have the same temporal order. That is, for any ^1 and I2, such that ^1 < £2, C[U) < C[V2)-
Given a particular C, each transformation T1 at time V can be determined by finding the best match between extracted body region S' and 2D projected model body region P(T(SC(O)):
Tv = aig mmds{P{T(Bc{t>))), St',) where the optimal T, is computed using a sampling technique described in
[Sampling methods, www.statpac.com/surveys/sampling/htm]. The content' of that reference is hereby incorporated by cross reference. The method for computing the difference d s (vS, S) between two image regions S and S' comprises computing two parts: ds(S, S') = λAdA(A, A') + λEdE(E, E) where d is the amount of overlap between the set A of pixels in the silhouette of the 2D projected model body region and the set and A' of pixels in the silhouette of the extracted body region in the video image, d is the Chamfer distance described in [M.A.
Butt and P. Maragos, Optimum design of chamfer distance transforms, IEEE
Transactions on Image Processing, 7(10), 1998, 1477-1484] between the set E of edges in the 2D projected model body region and the set E of edges in the extracted body region, and λ and λ are constant parameters. The content of that reference is hereby incorporated by cross-reference..
The method of computing the optimal temporal correspondence C(t) comprises the application of dynamic programming as follows. Let d(F, C(F)) denote the difference d :
S d(t, Ctf)) = ds(P(Tt,(Bcii>))), Sl,)
Let D denote a (L' + 1) * (L + 1) correspondence matrix. Each matrix element at
(F, f) corresponds to the possible frame correspondence between f and t, and the correspondence cost is d(F, t). A path in D is a sequence of frame correspondences for F
= 0 L' such that each F has a unique corresponding t = C(t). It is assumed that C(O) = 0 and C(L) = L Let D(F, t) denote the least cost from the frame pair (0, 0) up to (F, f) on the least cost path, and D(O, 0) = c/(0, 0). Then, the optimal solution given by D(U1L) can be recursively computed using dynamic programming as follows:
W
D(t', t) = d(t', t) + mm D(t' - l, t - l - i)
?i=o
Once D(L',L) is computed, the least cost path is obtained by tracing back the path from D(L',L) to D(O, 0). The optimal C(t) is given by the least cost path.
The method for stage 4 in the example embodiment estimates 3D posture candidates that align with the extracted body regions. That is, for each time F, find a set (S', } of 3D posture candidates whose 2D projected model body regions
P(Aj.'|'(Tt-|'(B'f|.))) match the extracted body region B'r in the input images l'r . The computation of the 3D posture candidates is subjected to the joint angle limit constraint The valid joint rotation of each body part is limited to physically possible ranges. The example embodiment uses a nonparametric implementation of the Belief Propagation (BP) technique described in [E.B. Sudderth, AT. Ihler, WT. Freeman, and A.S. Willsky. Nonparametric belief propagation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 605-612, 2003. M. Isard. Pampas: Real-valued graphical models for computer vision. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 613-620, 2003. G. Hua and Y. Wu. Multi-scale visual tracking by sequential belief propagation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 826-833, 2004. E.B. Sudderth, M.I. Mandel, WT. Freeman, and A.S. Willsky. Visual hand tracking using nonparametric belief propagation. In IEEE CVPR Workshop on Generative Model based Vision, 2004.]. The contents of those references are hereby incorporated by cross reference.
It comprises the following steps:
1. Run the nonparametric BP algorithm to generate pose samples for each body part using the results in Stage 3 as the initial estimates. That is, based on the results in
Stage 3, the temporary align posture in the 3D reference motion forms the initial estimate for each frame.
2. Determine a best matching pose for each body part.
• If the pose samples of each body part converge to a single state, choose any pose sample as the best pose for this body part.
• If the pose samples of each body part do not converge to a single state, project each body part at each pose sample to compute the mean image positions of its joints. Then, starting from the root body part, generate a pose sample for each body part such that the body part at the pose sample is connected to its parent body part, and the projected image positions of its joints match the computed mean positions of its joints.
3. Generate the first posture candidate. For each body parts starting from the root body part, modify the depth orientation of the best pose sample such that it has the same depth orientation as that in the corresponding reference posture. All the pose samples are combined into a posture candidate by translating the depth coordinate in each sample, if necessary, such that the neighboring body parts are connected.
4. Generate new 3D posture candidates. Starting from the first 3D posture candidate, flip the depth orientation of n body parts about their parent joints, starting with /7 = 1, while keeping the body parts connected at the joints. Figure 5 illustrates flipping of body part b from a position /c' to k, around a parent joint at/ 5. The above step is repeated for n = 1 , 2, . . . , until N posture candidates are generated.
Figure 6 illustrates example posture candidates in Figures 6(b) and (c) generated from an input image in Figure 6(a). In Figure 6(b) the skeletons of the posture candidates are viewed from the front. At this viewing angle, all the posture candidates overlap exactly, given the nature of how they have been derived explained above for the example embodiment. Figure 6(c) shows the different skeletons of the posture candidates viewed from the side, illustrating the differences between the respective posture candidates. The method for Stage 5 in the example embodiment comprises refining the estimate of temporal correspondence C(O and selecting the best posture candidates B' , that best match the corresponding reference postures BC(t%
1 2
The refinement is subjected to temporal ordering constraint: for any t' and t , such that f < ta, C(P1) < C(P2), and a constraint of small rate of change of posture errors: for each f, Aεr/ Af = (εf - εf-At) I Af is small.
The method of computing the optimal refined temporal correspondence C(P) comprises the application of dynamic programming as follows. Let d (P, t, P) denote the
3D posture difference between the posture candidate S^y and the reference posture Bt which is measured as the mean difference between the orientations of the bones in the postures. Let ds(t\ t, s, I1, k') denote the change of posture difference between the corresponding pairs (S V;-, Bf) and (6V-i,/c', βs).
Let D denote a (L' + 1) * (L + 1) x N correspondence matrix, where N is the maximum number of posture candidates at any time t'. Each matrix element at (P, t, I) corresponds to the possible correspondence between t\ t, and /', and the correspondence cost is d(f, t, P). A path in D is a sequence of correspondences for t' = 0, . . . ,L'such that each t' has a unique corresponding t = C{t) and a unique corresponding posture candidate /' = /(P). It is assumed that C(O) = 0 and C(L) = L Let D(t\ t, P) denote the least cost from the triplet (0, 0, l'o) up to (t\ t, P) on the least cost path, and D(O, 0, /' )
= d (0, 0, /'o). Then, the optimal solution given by D{L\L, 1(L1)) can be recursively computed using dynamic programming as follows: D(tl, t,l(t')) = τmn D(t', tJ')
£(t') = argmmD{t', t, ϊ)
L vyhere
D(t',t, l>) = dc(i\ t, l') + min{D{if - l,t - l - i, k') + dB{t'Λ, t - l - i, l\ k!)}
Once D{L',L, /(L)) is computed, the least cost path is obtained by tracing back the path from D(L',L, 1(L)) to D(O, 0, /(O)). The optimal C(O and /(O are given by the least cost path.
The method for Stage 6 in the example embodiment comprises computing the 3D difference between the selected 3D posture candidate 6W« and the corresponding 3D reference posture βC(f) at each time F. The 3D difference can include 3D joint angle difference, 3D joint velocity difference, etc. depending on the specific coaching requirements of the sports.
The method for Stage 7 in the example embodiment comprises displaying and highlighting the 3D difference in, a display device. An example display of detailed 3D difference is illustrated in Figure 7. Figure 7 illustrates an example display of detailed 3D difference by overlapping the estimated performer's postures e.g. 700 (dark gray scale)' with the corresponding expert's postures e.g. 702 (lighter gray scale) according to an example embodiment. The overlapping postures can be rotated in 3D to show different views (compare rows 704 and 706). The estimated performer's postures can also. be overlapped with the input images (row 708) for visual verification of their correctness.
An example display of color-coded errors for quick assessment is illustrated in Figure 8. Figure 8 illustrates an example display of colored-coded regions e.g. 800, 802 overlapped with an input image 804 for quick assessment according to an example embodiment. The darker gray scale regions e.g. 800 indicates large error, the lighter gray scale regions e.g. 802 indicates moderate error, and the transparent regions e.g. 806 indicate negligible or no error.
In another embodiment where the 3D reference motion contains multiple predefined motion segments, such as Taichi motion, the 2D input video is first segmented into the corresponding performer's motion segments. The method of determining the corresponding performer's segment boundary for each reference segment boundary t, comprises the following steps:
1. Determine initial estimate of the performer's motion segment boundary t ' by C(O = t
2. Obtain a temporal window [f - ω, t' + ω], where ω is the window size.
3. Find one or more smooth sequences of posture candidates in the temporal window.
• Correct posture candidates should change smoothly over time. Suppose B' ; and S' , are correct posture candidates, then the 3D posture difference between them
Cf5(Br I>, β'r +i,/t'), which is measured as the mean difference between the orientations of the bones in the postures, is small for any Te [V - ω, t' ÷ ω].
• Choose a posture candidate for each r e [f - ω, f + ω] to obtain a sequence of posture candidates that satisfy the condition that dB(B'τy, B'T^ιk) is small for each r. 4. Find candidate segment boundaries.
• For each smooth sequence of posture candidates, find the candidate segment boundary r e [f - ω, t' + ω] and the corresponding posture candidate at r that satisfies the segment boundary condition: At a segment boundary, there are large changes of motion directions for some joints. • Denote a candidate segment boundary found above as r. and the corresponding posture candidate as B'..
5. Identify the optimal segment boundary r .
The posture candidate at the optimal segment boundary r* should be the most similar to the corresponding reference posture Bt. Therefore, r can be determined as follows,
Figure imgf000020_0001
T* = τk .
In another example embodiment, the input body region is extracted with the help of colored markers.
In another example embodiment, the appendages carried by the performer, e.g., a golf club, is also segmented. In another example embodiment, the 3D reference motion of the expert is replaced by the 3D posture sequence of the performer computed from the input video acquired in a previous session.
In another example embodiment, the 3D reference motion of the expert is replaced by the 3D posture sequence of the performer computed from the input videos acquired in previous sessions that best matches the 3D reference motion of the expert.
In another example embodiment, the camera 900 and output device 902 are connected to a computer 904 through a computer network 906, as shown in Figure 9. The computer 904 is coupled to. an external storage device 908 directly in this example.
In another example embodiment, a wireless input and output device 1000, such as a hand phone or Personal Digital Assistant equipped with a camera, is connected to a computer 1002 through a wireless network 1004, as shown in Figure 10. The computer 1002 is coupled to an external storage device 1006 directly in this example.
In another example embodiment, multiple cameras 1101-1103 are arranged along a straight line, as shown in Figure 11. Each camera acquires a portion of the performers 1104 entire motion when the performer 1104 passes in front of the respective camera. This embodiment also allows the system to acquire high-resolution video of a user whose body motion spans a large arena.
In another example embodiment, multiple cameras 1201-1204 are placed around the performer 1206, as shown in Figure 12. This arrangement allows different cameras to capture the frontal view of the performer 1206 when he faces different cameras.
In another example embodiment, the arrangements of the cameras discussed above are combined.
In the multi-camera configurations in different example embodiments, for example those shown in Figures 11 and 12, the calibration method for the stage S2 processing, in addition to calibration of each of the individual cameras as described above for the single camera embodiment, further comprises computing the relative positions and orientations between the cameras using an inter-relation algorithm between the cameras, as will be appreciated by a person skilled in the art. Such inter-relation algorithms are understood in the art, and will not be described in more detail herein. Reference is made for example to [R. Jain, R. -Kasturi, and B. G. Schunck, Machine Vision, McGraw-Hill 1995] and [R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2000.] for example algorithms for use in such an embodiment. The contents of those references are hereby incorporated by cross-reference.
Example embodiments of the method and system for human motion analysis can have the following framework of stages:
1. Input Video Segmentation
This stage segments the human body in each image frame of the input video. The human body, the arms, and the background are assumed to have different colors so that they can be separated. This assumption is reasonable and easiiy satisfied, for instance, for a user who wears short-sleeved colored shirt and stands in front of a background of a different color. The background can be a natural scene which is nonuniform in color. This stage is achieved using a combination of background removal, graph-cut algorithm and skin color detection. In case the background is uniform, the segmentation algorithm can be simplified.
2. Camera Calibration
This stage computes the camera's extrinsic parameters, assuming that its intrinsic parameters have already been pre-computed. This stage can be achieved using existing camera calibration algorithms.
3. Estimation of Approximate Temporal Correspondence
This stage estimates the approximate temporal correspondence between 3D reference motion and 2D input video. Dynamic Programming technique is used to estimate the temporal correspondence between the input video and the reference motion by matching the 2D projections of 3D postures in the reference motion with the segmented human body in the 2D input video. This stage also estimates the approximate global rotation and translation of the user's body relative to the 3D reference motion.
4. Estimation of Posture Candidates This stage estimates, for each 2D input video frame, a set of 3D posture candidates that can produce 2D projections that are the same as that in the input video frame. This is performed using an improved version of Belief Propagation method. In a single-camera system, these sets typically, have more than one posture candidate each due to depth ambiguity and occlusion. In a multiple-camera system, the number of posture candidates may be reduced.
5. Selection of best posture candidates
This stage selects the best posture candidates that form smooth motion over time. It also refines the temporal correspondence estimated in Stage 2. This stage is accomplished using Dynamic Programming.
The framework of the example embodiments can be applied to analyze various types of motion by adopting appropriate 3D reference motion. It will be appreciated by a person skilled in the art that by adapting the system and method to handle specific application domains, these stages can be refined and optimized to reduce computational costs and improve efficiency.
Figure 13 shows a flow chart 1300 illustrating a method for human motion detection according to an example embodiment. At step 1302, one or more 2D input videos of the human motion are captured. At step 1304, sets of 2D body regions are extracted from respective frames of the 2D input videos. At step 1306, 3D human posture candidates are determined for each of the extracted sets of 2D body regions. At step 1308, a sequence of 3D human postures from the 3D human posture candidates for the respective frames is selected as representing the human motion in 3D.
The method and system of the example embodiment can be implemented on a computer system 1400, schematically shown in Figure 14. It may be implemented as software, such as a computer program being executed within the computer system 1400, and instructing the computer system 1400 to conduct the method of the example embodiment. The computer system 1400 comprises a computer module 1402, input modules such as a keyboard 1404 and mouse 1406 and a plurality of output devices such as a display 1408, and printer 1410.
The computer module 1402 is connected to a computer network 1412 via a suitable transceiver device 1414, to enable access to e.g. the Internet or other network systems such as Local Area Network (LAN) or Wide Area Network (WAN).
The computer module 1402 in the example includes a processor 1418, a Random Access Memory (RAM) 1420 and a Read Only Memory (ROM) 1422. The computer module 1402 also includes a number of Input/Output (I/O) interfaces, for example I/O interface 1424 to the display 1408, and I/O interface 1426 to the keyboard 1404.
The components of the computer module 1402 typically communicate via an interconnected bus 1428 and in a manner known to the person skilled in the relevant art.
The application program is typically supplied to the user of the computer system 1400 encoded on a data storage medium such as a CD-ROM or flash memory carrier and read utilising a corresponding data storage medium drive of a data storage device 1430. The application program is read and controlled in its execution by the processor 1418. Intermediate storage of program data maybe accomplished using RAM 1420.
It will be appreciated by a person skilled in trie art that numerous variations and/or modifications may be made to the present invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects to be illustrative and not restrictive.

Claims

1. A method for human motion analysis, the method comprising the steps of: capturing one or more 2D input videos of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
2. The method as claimed in claim 1 , further comprising the step of determining differences between 3D reference data for said human motion and the selected sequence of 3D human postures.
3. The method as claimed in claim 2, further comprising the step of visualizing said differences to a user.
4. The method as claimed in any one of the preceding claims, wherein extracting the sets of 2D body regions comprises one or more of a group consisting of background subtraction, iterative graph-cut segmentation and skin detection.
5. The method as claimed in any one of the preceding claims, wherein determining the 3D human posture candidates comprises the steps of: generating a first 3D human posture candidate; and flipping a depth orientation of body parts represented in the first 3D human posture candidate around respective joints to generate further 3D human posture candidates from the first 3D human posture candidate.
6. The method as claimed in claim 5, wherein generating the first 3D human posture candidate comprises temporally aligning the extracted sets of 2D body portions from each frame with 3D reference data of the human motion and ■ adjusting the 3D reference data to match the 2D body portions.
7. The method as claimed in any one of the preceding claims, wherein selecting the sequence of 3D human postures from the 3D human posture candidates is based on a least cost path among the 3D human posture candidates for the respective frames.
8. The method as claimed in claim 7, wherein selecting the sequence of 3D human postures from the 3D human posture candidates further comprises refining a temporal alignment of the extracted sets of 2D body portions from each frame with 3D reference data of the human motion.
9. A system for human motion analysis, the method comprising the steps of: means for capturing one or more 2D input videos of the human motion; means for extracting sets of 2D body regions from respective frames of the
2D input videos; means for determining 3D human posture candidates for each of the extracted sets of 2D body regions; and means for selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
10. The system as claimed in claim 9, further comprising means for determining differences between 3D reference data for said human motion and the selected sequence of 3D human postures.
11. The system as claimed in claim 10, further comprising means for visualizing said differences to a user.
12. The system as claimed in any one of claims 9 to 11 , wherein the means for extracting the sets of 2D body regions performs one or more of a group consisting of background subtraction, iterative graph-cut segmentation and skin detection.
13. The system as claimed in any one of claims 9 to 12, wherein the means for determining the 3D human posture candidates generates a first 3D human posture candidate; and flips a depth orientation of body parts represented in the first 3D human posture candidate around respective joints to generate further 3D human posture candidates from the first 3D human posture candidate.
14. The system as claimed in claim 13, wherein generating the first 3D human posture candidate comprises temporally aligning the extracted sets of 2D body portions from each frame with 3D reference data of the human motion and - adjusting the 3D reference data to match the 2D body portions.
15. The system as claimed in any one of claims 9 to 14, wherein the means for selecting the sequence of 3D human postures from the 3D human posture candidates determines a least cost path among the 3D human posture candidates for the respective frames.
16. The system as claimed in claim 15, wherein the means for selecting the sequence of 3D human postures from the 3D human posture candidates further comprises means for refining a temporal alignment of the extracted sets of 2D body portions from each frame with 3D reference data of the human motion.
17. A data storage medium having computer code means for instructing a computing device to execute a method for human motion detection, the method comprising the steps of: capturing one or more 2D input videos of the human motion; extracting sets of 2D body regions from respective frames of the 2D input videos; determining 3D human posture candidates for each of the extracted sets of 2D body regions; and selecting a sequence of 3D human postures from the 3D human posture candidates for the respective frames as representing the human motion in 3D.
PCT/SG2008/000428 2007-11-09 2008-11-07 Human motion analysis system and method WO2009061283A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US262707P 2007-11-09 2007-11-09
US61/002,627 2007-11-09

Publications (2)

Publication Number Publication Date
WO2009061283A2 true WO2009061283A2 (en) 2009-05-14
WO2009061283A3 WO2009061283A3 (en) 2009-07-09

Family

ID=40626373

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2008/000428 WO2009061283A2 (en) 2007-11-09 2008-11-07 Human motion analysis system and method

Country Status (1)

Country Link
WO (1) WO2009061283A2 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8944939B2 (en) 2012-02-07 2015-02-03 University of Pittsburgh—of the Commonwealth System of Higher Education Inertial measurement of sports motion
CN105664462A (en) * 2016-01-07 2016-06-15 北京邮电大学 Auxiliary training system based on human body posture estimation algorithm
CN109716354A (en) * 2016-10-12 2019-05-03 英特尔公司 The complexity of human interaction object identification reduces
US10398359B2 (en) 2015-07-13 2019-09-03 BioMetrix LLC Movement analysis system, wearable movement tracking sensors, and associated methods
WO2021085453A1 (en) * 2019-10-31 2021-05-06 株式会社ライゾマティクス Recognition processing device, recognition processing program, recognition processing method, and visualizer system
CN112998693A (en) * 2021-02-01 2021-06-22 上海联影医疗科技股份有限公司 Head movement measuring method, device and equipment
EP3933669A1 (en) * 2020-06-29 2022-01-05 KS Electronics Co., Ltd. Posture comparison and correction method using application configured to check two golf images and result data in overlapping state
EP3911423A4 (en) * 2019-01-15 2022-10-26 Shane Yang Augmented cognition methods and apparatus for contemporaneous feedback in psychomotor learning
GB2608576A (en) * 2021-01-07 2023-01-11 Wizhero Ltd Exercise performance system
EP4083926A4 (en) * 2019-12-27 2023-07-05 Sony Group Corporation Information processing device, information processing method and information processing program
US11804076B2 (en) 2019-10-02 2023-10-31 University Of Iowa Research Foundation System and method for the autonomous identification of physical abuse

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5111410A (en) * 1989-06-23 1992-05-05 Kabushiki Kaisha Oh-Yoh Keisoku Kenkyusho Motion analyzing/advising system
US5886788A (en) * 1996-02-09 1999-03-23 Sony Corporation Apparatus and method for detecting a posture
US6124862A (en) * 1997-06-13 2000-09-26 Anivision, Inc. Method and apparatus for generating virtual views of sporting events
US6256418B1 (en) * 1998-04-13 2001-07-03 Compaq Computer Corporation Method and system for compressing a sequence of images including a moving figure
WO2006117374A2 (en) * 2005-05-03 2006-11-09 France Telecom Method for three-dimensionally reconstructing an articulated member or a set of articulated members

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5111410A (en) * 1989-06-23 1992-05-05 Kabushiki Kaisha Oh-Yoh Keisoku Kenkyusho Motion analyzing/advising system
US5886788A (en) * 1996-02-09 1999-03-23 Sony Corporation Apparatus and method for detecting a posture
US6124862A (en) * 1997-06-13 2000-09-26 Anivision, Inc. Method and apparatus for generating virtual views of sporting events
US6256418B1 (en) * 1998-04-13 2001-07-03 Compaq Computer Corporation Method and system for compressing a sequence of images including a moving figure
WO2006117374A2 (en) * 2005-05-03 2006-11-09 France Telecom Method for three-dimensionally reconstructing an articulated member or a set of articulated members

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8944939B2 (en) 2012-02-07 2015-02-03 University of Pittsburgh—of the Commonwealth System of Higher Education Inertial measurement of sports motion
US9851374B2 (en) 2012-02-07 2017-12-26 University of Pittsburgh—of the Commonwealth System of Higher Education Inertial measurement of sports motion
US10398359B2 (en) 2015-07-13 2019-09-03 BioMetrix LLC Movement analysis system, wearable movement tracking sensors, and associated methods
CN105664462A (en) * 2016-01-07 2016-06-15 北京邮电大学 Auxiliary training system based on human body posture estimation algorithm
CN109716354A (en) * 2016-10-12 2019-05-03 英特尔公司 The complexity of human interaction object identification reduces
CN109716354B (en) * 2016-10-12 2024-01-09 英特尔公司 Complexity reduction for human interactive recognition
US11638853B2 (en) 2019-01-15 2023-05-02 Live View Sports, Inc. Augmented cognition methods and apparatus for contemporaneous feedback in psychomotor learning
EP3911423A4 (en) * 2019-01-15 2022-10-26 Shane Yang Augmented cognition methods and apparatus for contemporaneous feedback in psychomotor learning
US11804076B2 (en) 2019-10-02 2023-10-31 University Of Iowa Research Foundation System and method for the autonomous identification of physical abuse
JP2021071953A (en) * 2019-10-31 2021-05-06 株式会社ライゾマティクス Recognition processor, recognition processing program, recognition processing method, and visualization system
JP7281767B2 (en) 2019-10-31 2023-05-26 株式会社アブストラクトエンジン Recognition processing device, recognition processing program, recognition processing method, and visualization system
WO2021085453A1 (en) * 2019-10-31 2021-05-06 株式会社ライゾマティクス Recognition processing device, recognition processing program, recognition processing method, and visualizer system
EP4083926A4 (en) * 2019-12-27 2023-07-05 Sony Group Corporation Information processing device, information processing method and information processing program
EP3933669A1 (en) * 2020-06-29 2022-01-05 KS Electronics Co., Ltd. Posture comparison and correction method using application configured to check two golf images and result data in overlapping state
CN113926172A (en) * 2020-06-29 2022-01-14 韩标电子 Posture comparison and correction method using application program configured to check two golf images and result data in overlapped state
GB2608576A (en) * 2021-01-07 2023-01-11 Wizhero Ltd Exercise performance system
CN112998693A (en) * 2021-02-01 2021-06-22 上海联影医疗科技股份有限公司 Head movement measuring method, device and equipment
CN112998693B (en) * 2021-02-01 2023-06-20 上海联影医疗科技股份有限公司 Head movement measuring method, device and equipment

Also Published As

Publication number Publication date
WO2009061283A3 (en) 2009-07-09

Similar Documents

Publication Publication Date Title
WO2009061283A2 (en) Human motion analysis system and method
Memo et al. Head-mounted gesture controlled interface for human-computer interaction
US9898651B2 (en) Upper-body skeleton extraction from depth maps
US9235753B2 (en) Extraction of skeletons from 3D maps
EP2707834B1 (en) Silhouette-based pose estimation
US8755569B2 (en) Methods for recognizing pose and action of articulated objects with collection of planes in motion
CN108960045A (en) Eyeball tracking method, electronic device and non-transient computer-readable recording medium
US20100208038A1 (en) Method and system for gesture recognition
CN110544301A (en) Three-dimensional human body action reconstruction system, method and action training system
Van der Aa et al. Umpm benchmark: A multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction
CN111488775B (en) Device and method for judging degree of visibility
WO2014139079A1 (en) A method and system for three-dimensional imaging
JP6515039B2 (en) Program, apparatus and method for calculating a normal vector of a planar object to be reflected in a continuous captured image
JP2000251078A (en) Method and device for estimating three-dimensional posture of person, and method and device for estimating position of elbow of person
Elhayek et al. Fully automatic multi-person human motion capture for vr applications
Zou et al. Automatic reconstruction of 3D human motion pose from uncalibrated monocular video sequences based on markerless human motion tracking
US8948461B1 (en) Method and system for estimating the three dimensional position of an object in a three dimensional physical space
He Generation of human body models
Ingwersen et al. Evaluating current state of monocular 3d pose models for golf
Marcialis et al. A novel method for head pose estimation based on the “Vitruvian Man”
El-Sallam et al. Towards a Fully Automatic Markerless Motion Analysis System for the Estimation of Body Joint Kinematics with Application to Sport Analysis.
US20230154091A1 (en) Joint rotation inferences based on inverse kinematics
KR101844367B1 (en) Apparatus and Method for Head pose estimation using coarse holistic initialization followed by part localization
Hori et al. Silhouette-Based 3D Human Pose Estimation Using a Single Wrist-Mounted 360° Camera
Cordea et al. 3D head pose recovery for interactive virtual reality avatars

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08847494

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 08847494

Country of ref document: EP

Kind code of ref document: A2