Contextual Action Recognition using Tube Convolutional Neural Network (T-CNN)

S. Venkata Kiran, R.P. Singh .

Abstract


Deep learning has been shown to accomplish exceed expectations loaned comes about for picture characterization and protest identification. In any case, the effect of Deep learning on video examination has been constrained because of multifaceted nature of video information and absence of a documentations. Past convolutional neural systems (CNN) based video activity identification approaches ordinarily comprise of two noteworthy advances: outline level activity proposition age and relationship of recommendations crosswise over edges. Additionally, the greater part of these techniques utilize two-stream CNN system to han-dle spatial and fleeting element independently. In this paper, we propose a conclusion to-end Deep learning system called Tube Con-volutional Neural Network (T-CNN) for activity identification in recordings. The proposed design is a bound together profound net-work that can perceive and confine activity in light of 3D convolution highlights. A video is first isolated into measure up to length cuts and next for each clasp an arrangement of tube expert posals are produced in light of 3D Convolutional Network (ConvNet) highlights. At last, the tube proposition of contrast ent cuts are connected together utilizing system stream and spatio-transient activity identification is performed utilizing these connected video recommendations. Broad analyses on a few video datasets show the unrivaled execution of T-CNN for grouping and restricting activities in both trimmed and untrimmed recordings contrasted with condition of human expressions.

Full Text:

PDF

References


P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–8, 2008.

A. Gaidon, Z. Harchaoui, and C. Schmid. Temporal localization of actions with actoms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2782–2795, 2013.

R. Girshick. Fast r-cnn. In IEEE International Conference on Computer Vision (ICCV), December 2015.

R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014.

G. Gkioxari and J. Malik. Finding action tubes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 759–768, 2015.

M. Jain, J. Van Gemert, H. J´egou, P. Bouthemy, and C. G.Snoek. Action localization with tubelets from motion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 740–747, 2014.

M. Jain, J. C. van Gemert, and C. G. Snoek. What do 15,000 object categories tell us about classifying and localizing actions? In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 46–55, 2015.

H. Jhuang, J. Gall, S. Zuffi, C. Schmid, and M. J. Black. Towards understanding action recognition. In IEEE International Conference on Computer Vision (ICCV), pages 3192– 3199, 2013.

S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):221– 231, 2013.

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.

Y.-G. Jiang, J. Liu, A. Roshan Zamir, I. Laptev, M. Piccardi, M. Shah, and R. Sukthankar. THUMOS challenge: Action recognition with a large number of classes. /ICCV13-Action- Workshop/, 2013.

Y.-G. Jiang, J. Liu, A. Roshan Zamir, G. Toderici,I. Laptev, M. Shah, and R. Sukthankar. THUMOS challenge:Action recognition with a large number of classes.http://crcv.ucf.edu/THUMOS14/, 2014.

R. Joseph and F. Ali. Yolo9000: Better, faster, stronger. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar,and L. Fei-Fei. Large-scale video classification with convolutionalneural networks. In IEEE conference on Computer Vision and Pattern Recognition (CVPR), pages 1725–1732,2014.

Y. Ke, R. Sukthankar, and M. Hebert. Event detection incrowded videos. In IEEE International Conference on ComputerVision (ICCV), pages 1–8, 2007.

T. Lan, Y. Wang, and G. Mori. Discriminative figure-centric models for joint action localization and recognition. In IEEE International Conference on Computer Vision (ICCV), pages 2003–2010, 2011.

Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature,521(7553):436–444, 2015.

F. Negin and F. Bremond. Human action recognition invideos: A survey. INRIA Technical Report, 2016.

X. Peng and C. Schmid. Multi-region two-stream r-cnn foraction detection. In European Conference on Computer Vision(ECCV), pages 744–759, 2016.

S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towardsreal-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS),pages 91–99. 2015.

M. Rodriguez, A. Javed, and M. Shah. Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–8, 2008.

K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems (NIPS), pages 568–576, 2014.

K. Soomro, H. Idrees, and M. Shah. Action localization in videos through context walk. In IEEE International Conference on Computer Vision (CVPR), pages 3280–3288, 2015.

W. Sultani and M. Shah. What if we do not have multiple videos of the same action? – video action localization using web images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.

L. Sun, K. Jia, D.-Y. Yeung, and B. E. Shi. Human action recognition using factorized spatio-temporal convolutional networks. In IEEE International Conference on Computer Vision (ICCV), pages 4597–4605, 2015.

Y. Tian, R. Sukthankar, and M. Shah. Spatiotemporal deformable part models for action detection. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2642–2649, 2013.

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri.Learning spatiotemporal features with 3d convolutional networks. In IEEE International Conference on Computer Vision (ICCV), pages 4489–4497, 2015.

L. Wang, Y. Qiao, and X. Tang. Video action detection with relational dynamic-poselets. In European Conference on Computer Vision (ECCV), pages 565–580, 2014.

L. Wang, Y. Qiao, X. Tang, and L. V. Gool. Actionness estimation using hybrid fully convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2708–2717, June 2016.

P. Weinzaepfel, Z. Harchaoui, and C. Schmid. Learning to track for spatio-temporal action localization. In IEEE International Conference on Computer Vision (ICCV), pages 3164–3172, 2015.

J. Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan,O. Vinyals, R. Monga, and G. Toderici. Beyond short snippets: Deep networks for video classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4694–4702, 2015.


Refbacks

  • There are currently no refbacks.




© International Journals of Advanced Research in Computer Science and Software Engineering (IJARCSSE)| All Rights Reserved | Powered by Advance Academic Publisher.