.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "build/examples_datasets/kinetics400.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_build_examples_datasets_kinetics400.py: Prepare the Kinetics400 dataset ================================== `Kinetics400 `_ is an action recognition dataset of realistic action videos, collected from YouTube. With 306,245 short trimmed videos from 400 action categories, it is one of the largest and most widely used dataset in the research community for benchmarking state-of-the-art video action recognition models. This tutorial will go through the steps of preparing this dataset for GluonCV. Download -------- Please refer to the `official website `_ on how to download the videos. Note that the downloaded videos will consume about 450G disk space, make sure there is enough space before downloading. The crawling process can take several days. Once download is complete, please rename the folder names (since class names have white space and parenthese) for ease of processing. Suppose the videos are downloaded to ``~/.mxnet/datasets/kinetics400``, there will be three folders in it: ``annotations``, ``train`` and ``val``. You can use the following command to rename the folder names: .. code-block:: bash # sudo apt-get install detox detox -r train/ detox -r val/ Decode into frames ------------------ If you decide to train your model using video data directly, you can skip this section. The easiest way to prepare the videos in frames format is to download helper script :download:`kinetics400.py<../../../scripts/datasets/kinetics400.py>` and run the following command: .. code-block:: bash python kinetics400.py --src_dir ~/.mxnet/datasets/kinetics400/train --out_dir ~/.mxnet/datasets/kinetics400/rawframes_train --decode_video --new_width 450 --new_height 340 python kinetics400.py --src_dir ~/.mxnet/datasets/kinetics400/val --out_dir ~/.mxnet/datasets/kinetics400/rawframes_val --decode_video --new_width 450 --new_height 340 This script will help you decode the videos to raw frames. We specify the width and height of frames for resizing because this will save lots of disk space without losing much accuracy. All the resized frames will consume 2.9T disk space. If we don't specify the dimension, the original decoded frames will consume 6.8T disk space. The data preparation process may take a while. The total time to prepare the dataset depends on your machine. For example, it takes about 8 hours on an AWS EC2 instance with EBS and using 56 workers. Generate the training files --------------------------- The last step is to generate training files for standard data loading. You can run the helper script as: .. code-block:: bash python kinetics400.py --build_file_list --frame_path ~/.mxnet/datasets/kinetics400/rawframes_train --subset train --shuffle python kinetics400.py --build_file_list --frame_path ~/.mxnet/datasets/kinetics400/rawframes_val --subset val --shuffle Now you can start training your action recognition models on Kinetics400 dataset. .. note:: You need at least 4T disk space to download and extract the dataset. SSD (Solid-state disks) is preferred over HDD because of faster speed. You may need to install ``Cython`` and ``mmcv`` by ``pip install Cython mmcv``. .. GENERATED FROM PYTHON SOURCE LINES 66-74 Read with GluonCV ----------------- The prepared dataset can be loaded with utility class :py:class:`gluoncv.data.Kinetics400` directly. In this tutorial, we provide three examples to read data from the dataset, (1) load one frame per video; (2) load one clip per video, the clip contains five consecutive frames; (3) load three clips evenly per video, each clip contains 12 frames. .. GENERATED FROM PYTHON SOURCE LINES 77-79 We first show an example that randomly reads 5 videos each time, randomly selects one frame per video and performs center cropping. .. GENERATED FROM PYTHON SOURCE LINES 79-96 .. code-block:: default import os from gluoncv.data import Kinetics400 from mxnet.gluon.data import DataLoader from mxnet.gluon.data.vision import transforms from gluoncv.data.transforms import video transform_train = transforms.Compose([ video.VideoCenterCrop(size=224), video.VideoToTensor() ]) # Default location of the data is stored on ~/.mxnet/datasets/kinetics400. # You need to specify ``setting`` and ``root`` for Kinetics400 if you decoded the video frames into a different folder. train_dataset = Kinetics400(train=True, transform=transform_train) train_data = DataLoader(train_dataset, batch_size=5, shuffle=True) .. GENERATED FROM PYTHON SOURCE LINES 97-99 We can see the shape of our loaded data as below. ``extra`` indicates if we select multiple crops or multiple segments from a video. Here, we only pick one frame per video, so the ``extra`` dimension is 1. .. GENERATED FROM PYTHON SOURCE LINES 99-104 .. code-block:: default for x, y in train_data: print('Video frame size (batch, extra, channel, height, width):', x.shape) print('Video label:', y.shape) break .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Video frame size (batch, extra, channel, height, width): (5, 1, 3, 224, 224) Video label: (5,) .. GENERATED FROM PYTHON SOURCE LINES 105-107 Here is the second example that randomly reads 25 videos each time, randomly selects one clip per video and performs center cropping. A clip can contain N consecutive frames, e.g., N=5. .. GENERATED FROM PYTHON SOURCE LINES 107-111 .. code-block:: default train_dataset = Kinetics400(train=True, new_length=5, transform=transform_train) train_data = DataLoader(train_dataset, batch_size=5, shuffle=True) .. GENERATED FROM PYTHON SOURCE LINES 112-114 We can see the shape of our loaded data as below. Now we have another ``depth`` dimension which indicates how many frames in each clip (a.k.a, the temporal dimension). .. GENERATED FROM PYTHON SOURCE LINES 114-119 .. code-block:: default for x, y in train_data: print('Video frame size (batch, extra, channel, depth, height, width):', x.shape) print('Video label:', y.shape) break .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Video frame size (batch, extra, channel, depth, height, width): (5, 1, 3, 5, 224, 224) Video label: (5,) .. GENERATED FROM PYTHON SOURCE LINES 120-121 Let's plot one training sample with 5 consecutive video frames. index 0 is image, 1 is label .. GENERATED FROM PYTHON SOURCE LINES 121-146 .. code-block:: default from matplotlib import pyplot as plt # subplot 1 for video frame 1 fig = plt.figure() fig.add_subplot(1,5,1) frame1 = train_dataset[150][0][0,:,0,:,:].transpose((1,2,0)).asnumpy()*255.0 plt.imshow(frame1.astype('uint8')) # subplot 2 for video frame 2 fig.add_subplot(1,5,2) frame2 = train_dataset[150][0][0,:,1,:,:].transpose((1,2,0)).asnumpy()*255.0 plt.imshow(frame2.astype('uint8')) # subplot 3 for video frame 3 fig.add_subplot(1,5,3) frame3 = train_dataset[150][0][0,:,2,:,:].transpose((1,2,0)).asnumpy()*255.0 plt.imshow(frame3.astype('uint8')) # subplot 4 for video frame 4 fig.add_subplot(1,5,4) frame4 = train_dataset[150][0][0,:,3,:,:].transpose((1,2,0)).asnumpy()*255.0 plt.imshow(frame4.astype('uint8')) # subplot 5 for video frame 5 fig.add_subplot(1,5,5) frame5 = train_dataset[150][0][0,:,4,:,:].transpose((1,2,0)).asnumpy()*255.0 plt.imshow(frame5.astype('uint8')) # display plt.show() .. image-sg:: /build/examples_datasets/images/sphx_glr_kinetics400_001.png :alt: kinetics400 :srcset: /build/examples_datasets/images/sphx_glr_kinetics400_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 147-149 The last example is that we randomly read 5 videos each time, select 3 clips evenly per video and performs center cropping. A clip contains 12 consecutive frames. .. GENERATED FROM PYTHON SOURCE LINES 149-153 .. code-block:: default train_dataset = Kinetics400(train=True, new_length=12, num_segments=3, transform=transform_train) train_data = DataLoader(train_dataset, batch_size=5, shuffle=True) .. GENERATED FROM PYTHON SOURCE LINES 154-156 We can see the shape of our loaded data as below. Now the ``extra`` dimension is 3, which indicates we have three segments for each video. .. GENERATED FROM PYTHON SOURCE LINES 156-161 .. code-block:: default for x, y in train_data: print('Video frame size (batch, extra, channel, depth, height, width):', x.shape) print('Video label:', y.shape) break .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Video frame size (batch, extra, channel, depth, height, width): (5, 3, 3, 12, 224, 224) Video label: (5,) .. GENERATED FROM PYTHON SOURCE LINES 162-167 Read with VideoLoader --------------------- In case you don't want to decode videos into frames, we provide a fast video loader, `Decord `_, to read the dataset. We still use the utility class :py:class:`gluoncv.data.Kinetics400`. The usage is similar to using image loader. .. GENERATED FROM PYTHON SOURCE LINES 169-171 For example, if we want to randomly read 5 videos, randomly selects one frame per video and performs center cropping. .. GENERATED FROM PYTHON SOURCE LINES 171-181 .. code-block:: default from gluoncv.utils.filesystem import import_try_install import_try_install('decord') # Since we are loading videos directly, we need to change the ``root`` location. # ``tiny_train_videos`` contains a small subset of Kinetics400 dataset, which is used for demonstration only. train_dataset = Kinetics400(root=os.path.expanduser('~/.mxnet/datasets/kinetics400/tiny_train_videos'), train=True, transform=transform_train, video_loader=True, use_decord=True) train_data = DataLoader(train_dataset, batch_size=5, shuffle=True) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip. Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue. To avoid this problem you can invoke Python with '-m pip' instead of running pip directly. Collecting decord Downloading decord-0.6.0-py3-none-manylinux2010_x86_64.whl (13.6 MB) Requirement already satisfied: numpy>=1.14.0 in /usr/local/lib/python3.6/dist-packages (from decord) (1.19.5) Installing collected packages: decord Successfully installed decord-0.6.0 WARNING: You are using pip version 21.0.1; however, version 21.3.1 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. .. GENERATED FROM PYTHON SOURCE LINES 182-183 We can see the shape of our loaded data as below. .. GENERATED FROM PYTHON SOURCE LINES 183-188 .. code-block:: default for x, y in train_data: print('Video frame size (batch, extra, channel, height, width):', x.shape) print('Video label:', y.shape) break .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Video frame size (batch, extra, channel, height, width): (5, 1, 3, 224, 224) Video label: (5,) .. GENERATED FROM PYTHON SOURCE LINES 189-191 Here is the second example that randomly reads 25 videos each time, randomly selects one clip per video and performs center cropping. A clip can contain N consecutive frames, e.g., N=5. .. GENERATED FROM PYTHON SOURCE LINES 191-197 .. code-block:: default train_dataset = Kinetics400(root=os.path.expanduser('~/.mxnet/datasets/kinetics400/tiny_train_videos'), train=True, new_length=5, transform=transform_train, video_loader=True, use_decord=True) train_data = DataLoader(train_dataset, batch_size=5, shuffle=True) .. GENERATED FROM PYTHON SOURCE LINES 198-199 We can see the shape of our loaded data as below. .. GENERATED FROM PYTHON SOURCE LINES 199-204 .. code-block:: default for x, y in train_data: print('Video frame size (batch, extra, channel, depth, height, width):', x.shape) print('Video label:', y.shape) break .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Video frame size (batch, extra, channel, depth, height, width): (5, 1, 3, 5, 224, 224) Video label: (5,) .. GENERATED FROM PYTHON SOURCE LINES 205-207 The last example is that we randomly read 5 videos each time, select 3 clips evenly per video and performs center cropping. A clip contains 12 consecutive frames. .. GENERATED FROM PYTHON SOURCE LINES 207-213 .. code-block:: default train_dataset = Kinetics400(root=os.path.expanduser('~/.mxnet/datasets/kinetics400/tiny_train_videos'), train=True, new_length=12, num_segments=3, transform=transform_train, video_loader=True, use_decord=True) train_data = DataLoader(train_dataset, batch_size=5, shuffle=True) .. GENERATED FROM PYTHON SOURCE LINES 214-215 We can see the shape of our loaded data as below. .. GENERATED FROM PYTHON SOURCE LINES 215-220 .. code-block:: default for x, y in train_data: print('Video frame size (batch, extra, channel, depth, height, width):', x.shape) print('Video label:', y.shape) break .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Video frame size (batch, extra, channel, depth, height, width): (5, 3, 3, 12, 224, 224) Video label: (5,) .. GENERATED FROM PYTHON SOURCE LINES 221-223 We also support other video loaders, e.g., OpenCV VideoReader, but Decord is significantly faster. We refer the users to read the documentations for more information. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 7.742 seconds) .. _sphx_glr_download_build_examples_datasets_kinetics400.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: kinetics400.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: kinetics400.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_