Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Human action recognition using saliency-based global and local features

Abdulmunem, Ashwan 2017. Human action recognition using saliency-based global and local features. PhD Thesis, Cardiff University.
Item availability restricted.

[thumbnail of 2017abdulmunemaaphd.pdf]
PDF - Accepted Post-Print Version
Download (11MB) | Preview
[thumbnail of abdulmunemaa.pdf] PDF - Supplemental Material
Restricted to Repository staff only

Download (2MB)


Recognising human actions from video sequences is one of the most important topics in computer vision and has been extensively researched during the last decades; however, it is still regarded as a challenging task especially in real scenarios due to difficulties mainly resulting from background clutter, partial occlusion, as well as changes in scale, viewpoint, lighting, and appearance. Human action recognition is involved in many applications, including video surveillance systems, human-computer interaction, and robotics for human behaviour characterisation. In this thesis, we aim to introduce new features and methods to enhance and develop human action recognition systems. Specifically, we have introduced three methods for human action recognition. In the first approach, we present a novel framework for human action recognition based on salient object detection and a combination of local and global descriptors. Saliency Guided Feature Extraction (SGFE) is proposed to detect salient objects and extract features on the detected objects. We then propose a simple strategy to identify and process only those video frames that contain salient objects. Processing salient objects instead of all the frames not only makes the algorithm more efficient, but more importantly also suppresses the interference of background pixels. We combine this approach with a new combination of local and global descriptors, namely 3D SIFT and Histograms of Oriented Optical Flow (HOOF). The resulting Saliency Guided 3D SIFT and HOOF (SGSH) feature is used along with a multi-class support vector machine (SVM) classifier for human action recognition. The second proposed method is a novel 3D extension of Gradient Location and Orientation Histograms (3D GLOH) which provides discriminative local features representing both the gradient orientation and their relative locations. We further propose a human action recognition system based on the Bag of Visual Words model, by combining the new 3D GLOH local features with Histograms of Oriented Optical Flow (HOOF) global features. Along with the idea from our first work to extract features only in salient regions, our overall system outperforms existing feature descriptors for human action recognition for challenging video datasets. Finally, we propose to extract minimal representative information, namely deforming skeleton graphs corresponding to foreground shapes, to effectively represent actions and remove the influence of changes of illumination, subject appearance and backgrounds. We propose a novel approach to action recognition based on matching of skeleton graphs, combining static pairwise graph similarity measure using Optimal Subsequence Bijection with Dynamic TimeWarping to robustly handle topological and temporal variations. We have evaluated the proposed methods by conducting extensive experiments on widely-used human action datasets including the KTH, the UCF Sports, TV Human Interaction (TVHI), Olympic Sports and UCF11 datasets. Experimental results show the effectiveness of our methods for action recognition.

Item Type: Thesis (PhD)
Date Type: Completion
Status: Unpublished
Schools: Computer Science & Informatics
Date of First Compliant Deposit: 22 December 2017
Last Modified: 16 Apr 2021 13:33

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics