Sunday, May 31, 2015

Real-Time Vision-Based Hand Tracking and Gesture Recognition

PhD (Electrical and Computer Engineering) thesis by Qing Chen, 2008


Introduction

This research aims to detect free air hand gestures using a video input. A normal web camera is used for the video capturing process and the implementation has two parts, 
  1. Recognizing the hand from the video input
  2. Identify and classify selected hand gestures
One of the main importance of this research is that this approach suggests a real time gesture recognition system since they are using divide and conquer strategy. The have used a combination statistical and syntactic analysis for gesture recognition. The 3D position of the hand is recovered according to the camera's perspective projection. For the high level hand gestures recognition, a stochastic context-free grammar (SCFG ) is used to analyze the syntactic structure of the hand gestures with the terminal strings converted from the postures detected by the low-level of the architecture.


Contribution

To achieve natural and immersive human-computer interaction,the human hand could be used as an interface device. Hand gestures are a powerful human to-human communication channel, which forms a major part of information transfer in our every day life.
Early research on vision-based hand tracking and gesture recognition usually needs the help o f markers or colored gloves. In current state-of-the-art vision-based hand tracking and gesture recognition techniques, research is more focused on tracking the bare hand and identify hand gestures without the help of any markers and gloves. So this research is focused on that purpose.
Since many current approaches are still limited by the lack of speed, accuracy ,robustness and real-time support their contribution is to build a real time 3D hand tracking and gesture recognition system for the purpose of human computer interaction (HCI).

In the first chapter, the researcher has mentioned following as the contribution of this research.

  1. A two-level system architecture is implemented, which combines the advantages of statistical and syntactic pattern recognition approaches effectively, and achieves real-time, accurate and robust hand tracking and gesture recognition with one camera as input device
  2. A parallel cascade structure for the architecture's low-level is implemented using the Ada Boost learning algorithm and a set of Haar-like features. This structure can correctly extract a set of hand postures track the hand motion in 3D in real-time.
  3. The hand gestures are analyzed base on a SCFG, which defines the composite properties based on the constituent hand postures. The assignment of the probability to each production rule of the SCFG can be used to control the "wanted" gestures and the "unwanted" gestures. Smaller probability could be assigned to the"unwanted" gestures while greater value could be assigned to "wanted" gestures so that the resulting SCFG would generate the "wanted" gestures with higher probabilities.
  4. For hand motion analysis, with the uncertainty of hand trajectories, the ambiguous versions can be identified by looking for the SCFG that has the higher probability to generate the input string. The motion patterns can be controlled by adjusting the probabilities associated with the production rules so that the resulting SCFG would generate the standard motion patterns with higher probabilities. 

How does it relates in to my work?

The researchers have suggested some performance requirements that need to be achieved by a 3D hand recognition system. They are,
  • Real-time performance
  • Accuracy
  • Robustness
  • Scalability
  • User-independance
I also need to consider about above requirements since my research needs to recognize the hand also (Recent trend in 3D hand recognition in free hand environment is to use LeapMotion sensor, so I have to search weather that sensor covers those requirements).

The second chapter, the literature review starts with describing  a skeleton structure and the joints of the human hand.
As shown in in the figure above, due to the high DOF of the human hand, hand gesture recognition becomes a very challenging problem. When talk about hand gesture recognition, there are two concepts that we should know.
  • Hand Posture: a hand posture is a static hand pose and its current location without any movements involved. 
  • Hand Gesture: a hand gesture is a sequence of hand postures connected by continuous hand or finger movements over a short period of time.
Lenman et al. suggested that the design space for gestural commands can be characterized along three dimensions. The researcher used that design space for his research as follows,
  1. The intuition aspect : means the selected gestures should be intuitive and comfortable for the user to learn and to remember. The gestures should be straightforward so that least effort will be required for the user to learn the gestures. The user should be able to use their natural hand configurations and not be required to learn any specific or complex hand configurations, which are very easy to cause fatigue and make the user uncomfortable.
  2. The articulatory aspect : means the selected gestures should be easy for recognition and do not cause confusions for the user. Gestures involving complicated hand poses and finger movements should be avoided due to the difficulty to articulate and repeat
  3. The technology aspect : refers to the fact that in order to be viable, the selected gestures must take in to account the properties of employed algorithms and techniques. The required data and information can be extracted and analyzed from the selected gesture commands without causing excessive computation cost for the employed approach. 
Above approach is important when I select suitable hand gestures for my research. After that the researcher have reviewed summarized some vision based hand tracking and gesture recognition systems proposed by researchers (Table 2.1 of the paper). Those approaches can be categorized as Appearance based and 3D hand model based approaches.  Some appearance based algorithms use statistical methods while some use syntactic methods. The one the researcher selected is to use appearance based approach with a hybrid method (both statistical and syntactic). He has listed a set of popular features and algorithms used to detect human hands nd recognize gestures in appearance based approach,

  • Colors and Shapes
  • Hand Features
  • Optical Flow
  • Mean Shift
  • SIFT Features
  • Stereo Image
  • Viola-Jones Algorithm
I think that the gesture recognition part is less important since there are real-time hand recognition sensors like Leap-motion is available now. In the summery of the second chapter, the researcher has compared the Appearance vs. 3D Hand Model and Statistical vs. Syntactic approach. He says that it is easier for appearance-based approaches to achieve real-time performance due to the comparatively simpler 2D image features. Some of the drawbacks and limitations can be listed as follows,


  • 3D hand model is a complex articulated deformable object with many degrees of freedom, a very large image database is required to cover all the characteristic hand images under different views.
  • Lack of the capability to deal with singularities that arise from ambiguous views.
  • Most current 3D hand model based approaches focus on real-time tracking for global hand motions and local finger motions with restricted lighting and background conditions.
  • Scalability problem, where a 3D hand model with specific kinematic parameters cannot deal with a wide variety of hand sizes from different people.
Since static approaches and numeric measurements might not be enough to represent the complex structures of the patterns and activities of the hand, it is more appropriate and effective to use a syntactic approach.The elementary parts used to syntactically describe a complex pattern or an activity are called primitives. For computer vision applications, the principles to identify the primitives are,
  • The number of primitive types should be small.
  • The primitives selected must be able to form an appropriate object representation.
  • Primitives should be easily segmentable from the image.
  • Primitives should be easily recognizable using some statistical pattern recognition method. 
  • Primitives should correspond with significant natural elements of the object structure being described.
After the primitives are extracted, a grammar representing a set of rules must be defined.
G= [Vt, Vn, P, S]
In this model,
  • Vt is the set of terminals,
  • Vn is the set of non-terminals
  • P is a finite set of production rules
Addition to those, a Decision tree classifiers are used to break up a complex decision problem in to a series of several simpler decisions, so that the final conclusion would resemble the desired solution. This type of classifiers can deduce the conclusion using an iterative multistage decisions based on individual features at each node of the decision tree.To improve the overall classification accuracy, different classifiers can be combined so that the overall performance can be optimized.The researcher has introduced some examples for the application of syntactic approaches in computer vision. They are,
  1. Picture Description Language (PDL ) proposed by Shaw
  2. The grammar defined by Hand et al.
  3. Tree like approach suggested by Jones et al.
  4. Etc…


The third chapter of the paper describe the two-level architecture of the design suggested. In the first part of the chapter, he explains how the selection of postures and gestures happen, He has used a taxonomy proposed by Quek to understand hand gestures of different classes. Following figure explains it.

  • Unintentional movements : Hand motions that do not have any intentions to communicate information
  • Manipulative gestures are the ones used to act on objects in an environment(such as picking up a box).
  • Communicative gestures intend to communicate information 

Since the architecture doesn't interpret all hand gestures, some of the hand postures need to be selected. Using those postures, The selected hand gestures introduced. Following table shows selected hand postures and implemented hand gestures using the postures selected.

Since this approach consists o two levels, the lower level recognizes and tracks the hand postures from the user, Then the High level is taking the responsible for gesture recognition and motion analysis. I think in my research, the more I need to be considered about how to recognize gestures from a large set of hand postures that continuously given by the Leap-Motion sensor.

Chapter four of the thesis is about the extraction and tracking of the 3D hand posture from the video input at real-time, The researcher has used Haar Like features for this and some other algorithms. It's not much important since leap motion can detect it.

Chapter five describes the high level hand gesture recognition using a context free grammar. The approach also using a probability measurement for better accuracy. Here they have used SCFG and SCFGs extend CFGs in the same way that Hidden Markov models (HMMs) extend regular grammars. But SCFGs have more flexibility than HMMs

Chapter Six describes how the suggested thesis has been evaluated after it was implemented inside a virtual 3d gamin environment. The game is for user to drive a car using free hand command to the destination using some traffic signs as well.

Advantages and Disadvantages

Advantages

  • Real-Time recognition
  • Better accuracy
  • Low hardware requirements (Only a web cam needed except the PC)
  • Very good robustness against different lighting conditions and a certain degree of robustness against image rotations .

Disadvantages

  • Limited set of hand postures and gestures.
  • To achieve the robustness against cluttered backgrounds, background subtraction and noise removal are need to be applied. 

Suggested future work


  • More diversified hand samples from different people can be used in the training process so that the classifiers will be more user independent .
  • Context-awareness for the gesture recognition system : The same gesture performed within different contexts and environments can have different semantic meanings. For example, with the background extracted from the video, if there is a computer detected, we can say that a pointing gesture means turning on the computer in an office. However,if there is a stove detected from the background, we can be pretty much sure that the user is in a kitchen and the pointing gesture probably means turning on the stove.
  • Track and recognize multiple objects such as human faces, eye gaze and hand gestures at the same time .

Thursday, May 21, 2015

Virtual Environments for Stroke Rehabilitation: Examining a Novel Technology Against End-user, Clinical and Management Demands with Reference to UK Care Provision

By Jamie O’Brien, A thesis for Doctor of Engineering


Introduction

Since there is a growing interest in the use of virtual reality (VR)-based systems for rehabilitation purpose, there is a neediness to understand the strengths and limitations of it. The aim of this paper is to determine the value of the this technology concerning the user focus, clinical effectiveness, marketability and contextual meaningfulness etc. comparing with the end users of those systems. The researcher has given a good introduction about stroke and stroke rehabilitation and done a good literature review as well.

Contribution

As I previously mentioned, stroke and stroke rehabilitation has been described well. His research focuses on treatments for rehabilitation of upper limb (a term used among clinicians to describe the functionality of a patient’s shoulder, arm and hand). He has critically reviewed available literature also.

How does it relate to my work

In the first chapter, the paper gives a brief introduction to stroke and stroke rehabilitation. He has defined Stroke and explained how it occurs. Then, a brief introduction to stroke rehabilitation is given. The term - neuroplasticy  is introduced. The researcher discussed about physiotherapy and occupational therapy which leads neuroplasticy  into success. 
Next, He explains the function of the upper limb. In this section, he describes open chain and closed chain movements and three common types of grip (Power grip, Dynamic grip and Pinch grip). Then he gives an introduction to the rehabilitation of the upper limb.
In the following section, widely used clinical approaches which are common to both  occupational and physiotherapy are explained. Those are:  Carr and Shepherd’s motor optimisation approach, Bobath Approach and Brunnstrom Movement Therapy.

A basic framework for participating in an virtual environment is described. It consists of following sections,
  • Vision: abstraction and ‘ecology’
  • Prehension: the problem of degrees of freedom
  • Sensory feedback and motor control
  • Gesture in a virtual environment
Then he outlined skills relearning in a virtual environments. stages of engram, close skills and open skills, the process which skills are acquired also explained. 

The literature review is give inside the third chapter. He has categorized the review in to selected titles. 
  • Physiotherapy: Upper limb function (Purdue pegboard test, PHANToM, box and block test, Fugl-Meyer Assessment, manual function test)
  • Occupational Therapy: Task- and game - based interventions
  • Occupational Therapy: Activities of daily living (ADL) interventions
  • Neuropsychology: Way-Finding, Urban environment training, Public transport navigation, Virtual shopping mall, 
  • Telerehabilitation
I think that physiotherapy and occupational therapy relate to my research since it directly targets a VR for upper limb function. But in occupational therapy goes far from my research since my aim is not to implement a VR game or related solution.

After reviewing those literature, researches has come up with some parameters which need to be considered. those are,
  • Simplicity in system design tends to result in clearer methods and outcomes 
  • Outcome measurements of the system must relate to clinical concerns
  • The system can only be deemed usable if its acceptability to the therapist and patient has been evaluated
  • Clinical disparities in the sample population have an effect in outcome measurements which ought to be included in the analysis
  • The technology must interface with clinical practice
  • There may a range of unseen factors impact on the recovery process which ought to be considered and precluded
  • The consequences of ‘risky’ virtual environments are not well understood
Finally he has shown research areas reviewed in his review using a table as well.


Wednesday, May 20, 2015

Close Range Depth Sensing Cameras for Virtual Reality based Hand Rehabilitation

IEEE Citation: D. Charles, K. Pedlow, S. McDonough, K. Shek and T. Charles, 'Close range depth sensing cameras for virtual reality based hand rehabilitation', Jnl of Assistive Technologies, vol. 8, no. 3, pp. 138-149, 2014.



Introduction

This paper has done a research on how to use close range depth sensing cameras can be used for hand rehabilitation. The researches uses Leap Motion sensor which tracks hands and fingers operating with low latency and a higher precision of 0.01 mm to implement VR based rehabilitation games. The research team includes two individuals from a serious games software company, two academics from Health and Rehabilitation Technologies, and an academic from Computing and Engineering. The Computing academic along with the CEO at the company co-supervised a software engineer within the company throughout the development process. They built three games which can be used for hand rehabilitation process. The games are simulations of hand focused rehabilitation tasks. They presented it into a set of physiotherapists  and occupational therapists and evaluated it using their responses to the questionnaires and the time scores taken after playing the simulated game.

Contribution and Novelty

This research includes an implementation of previous work related into depth sensing cameras and VR based hand rehabilitation. The novelty of this research is that they have used Leap Motion sensor for their implementation and the way they evaluate it. Generally, evaluation of a hand rehabilitation game is done using some patients that are having some mortal issues with their hands. But in this scenario, evaluation has been after presenting it in to a group of physiotherapists and occupational therapists since their aim is to suggest a workable, fully completed solution that can be used for the hand rehabilitation.

How does it relates in to my work?
In the introduction of this research, the researchers have reviewed some of the commercial-off-the-shelf (COTS) games like PlayStation EyeToy, Nintendo  Wiimote and Microsoft Kinect since on of the research approach is to investigate sensor based COTS games for rehabilitation. The researchers shows that the use  of  COTS  games  in  rehabilitation  has  been  found  to be more engaging than conventionaal therapies. This is one of the main motivations for my research. Since COTS games are mainly implemented for game-play rather than being tailored to treatments, many researchers are investigating and designing new ways to use the effectiveness of COTS games in to rehabilitation therapies.

The development process of the software described under the title "Design and Development Process" explains the key points which need tho think and follow while using evolutionary approach like agile methodology. Additionally, they have done some review about the SDK and 3D engines they have searched for. Since it is important to build a proof of concept (POC) for my research, these reviews are quite important for me to select suitable SDKs, engines for my purpose. They have selected and described three well-known rehabilitation tasks for virtualisation. It is also a plus point since I can get an idea about how the POC need to be build. In their description, they also define some parameters which need to be calculated when doing those tasks. It helps me to get an idea about the evaluation of a task. 

Before starting the evaluation of the implemented games, they have presented them in to the participants and demonstrated their game. It is an important thing to do in the evaluation of my system too. Otherwise, participants are not clearly aware of the system and the evaluation process as well. They have evaluated the system using the game and using a questionnaire as well.

Advantages and Disadvantages

Advantages

  • High accuracy gained when tracking hand and fingers since Leap Motion is used
  • Leap Motion has low maintenance cost
  • Results shows that clinicians were largely positive about the use of our Leap Motion approach in their practice, particularly for use in the home
  • Limited graphics and CPU processing power needed

Disadvantages

  • User does not experience  tactile feedback
  • Motion sickness and vertigo
  • Requires additional setup which can be troublesome for widespread home use.
  • The implemented UI is not in a easy to use way.

Suggested future work

The researchers have included a question for getting participants comments about their implementation, participants have suggested to implement the game with more fun and engagement. They have algo suggested to implement different types games with a range of activities suitable for different types of age groups. 

Refer the paper from emeraldinsight.com: http://www.emeraldinsight.com/doi/full/10.1108/JAT-02-2014-0007



Wednesday, May 13, 2015

A Handle Bar Metaphor for Virtual Object Manipulation with Mid-Air Interaction

IEEE Citation: P. Song, W. Goh, W. Hutama, C. Fu and X. Liu, 'A handle bar metaphor for virtual object manipulation with mid-air interaction', Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI '12, 2012.



Introduction

This research gives a novel way to manipulate 3D objects using both hands in mid-air environment. This approach doesn't use any wearable devices . User interaction is achieved by using an image depth sensor and some of the bi-manual and uni-manual gestures are being used and implemented in this approach. This has implemented POINT,OPEN,CLOSE as single hand gestures and they are used to select a particular virtual object or a set of objects. Bi-manual gestures are suggested and implemented for the object manipulation using ROTATION, TRANSITION and SCALE (RTS)

Contribution and Novelty

This paper suggests an effective visual control metaphor between the user's gestures and the corresponding virtual object manipulations. This new metaphor can be use most of the areas where 3D objects are manipulated such as CAD, 3D printing etc.As the researchers designed this for large scale screens, it can be used within small displays too. Since it's mid-air, the use of large screens gives more immersion to the activity. Suggesting a new UI (user interaction) metaphor and Suggesting some real world hand gestures to use inside the metaphor is the novelty of this research. They have successfully evaluated their work as well. 

How does it relates in to my work?

As the title describes, this research paper is about bi-manual user interaction metaphor. Since my research idea is to suggest a bi-manual UI metaphor towards Stroke rehabilitation, It' necessary to have a good knowledge about what are the existing bi-manual UI metaphors. I think that content of this will help me a lot to to do my researsh as they have been suggested a new UI metaphor, implemented a software and evaluated it as well. The way  they implemented to get rid from immersion syndrome is quite appreciative and it's important thing to be considered about when mid-air interaction, since the sensor continuously detecting the hand. The researchers has used a set of manipulation states and defined them as a state transition diagram of gestures for manipulating objects are given in their paper, This suggests a way to implement a such metaphor using a state transition diagram.

Advantages and Disadvantages

Advantages

the paper itself has mentioned some advantages of the proposed handlebar metaphor as follows,
  • Physical familiarity: The most of the bi-manual gestures used to manipulate the virtual handlebar are intuitive for most users since there are some day to day tasks are being done using handle bar such as cycling, and lawn mowing.
  • Rich variety of 3D manipulation operations: This metaphor gives the ability to manipulate an object in 7 degrees of freedom (7DOF). allowing users to switch between operations without changing the gestures or operational mode. Multi-object manipulation also can be done as easy as single object as well.
  • Supporting both object and non-object centered manipulations: The virtual handlebar pierced inside the 3D object of the scene can be aligned and manipulated seperately. So the users can position the handlebar inside the 3D object (object centered) or outside from it (non - object centered). 
  • Good semantic mapping: This metaphor has a good semantic mapping with the real physical world. Bi-manual gestures like grabbing and releasing are mapped inside the metaphor for the same task in the scene. When the user clutch two fists, the handlebar is grabbed and when free the palms, the handlebar is released.
  • Accommodating sensor limitations: Since kinect sensor has a limited resolution the this method clearly identify the 3D pose information accurately. 

Disadvantages

  • One of the main disadvantage that the researchers faced that the inability of kinect depth sensor to detect hand orientation. So the researchers had to implement a perpendicular to rotate the handlebar along the x-axis.
  • Lack of haptic feedback is also a disadvantage for better immersive environment, only visual feedback is given.

Suggested future work

When a researcher suggest a future work, they suggest them assuming if they had another 6 months (to 1 Year) to continue this research what they would do. In this research, they have not suggested any future works. But, they have suggested some application examples of their model. As a future work, I suggest that the accuracy and the hand orientation detection can be gained using leap-motion depth sensing device.

I found a video explanation of this research from YouTube as well.

Ref: https://www.youtube.com/watch?v=p0EM9Ejv0r0



immersion syndrome: This happens where every hand gesture is captured and constantly interpreted by the system. This can lead to undesirable operations due to misinterpretation of the user’s unintended hand gestures.

Refer the paper from ACM Digital Library: http://dl.acm.org/citation.cfm?id=2208585&preflayout=tabs