.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "build/examples_datasets/kinetics400.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_build_examples_datasets_kinetics400.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_build_examples_datasets_kinetics400.py:

Prepare the Kinetics400 dataset
==================================

`Kinetics400 <https://deepmind.com/research/open-source/kinetics>`_  is an action recognition dataset
of realistic action videos, collected from YouTube. With 306,245 short trimmed videos
from 400 action categories, it is one of the largest and most widely used dataset in the research
community for benchmarking state-of-the-art video action recognition models. This tutorial
will go through the steps of preparing this dataset for GluonCV.


Download
--------

Please refer to the `official website <https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics>`_ on how to download the videos.
Note that the downloaded videos will consume about 450G disk space, make sure there is enough space before downloading. The crawling process can take several days.

Once download is complete, please rename the folder names (since class names have white space and parenthese) for ease of processing. Suppose the videos are
downloaded to ``~/.mxnet/datasets/kinetics400``, there will be three folders in it: ``annotations``, ``train`` and ``val``. You can use the following command to
rename the folder names:

.. code-block:: bash

   # sudo apt-get install detox
   detox -r train/
   detox -r val/

Decode into frames
------------------

If you decide to train your model using video data directly, you can skip this section.

The easiest way to prepare the videos in frames format is to download helper script
:download:`kinetics400.py<../../../scripts/datasets/kinetics400.py>` and run the following command:

.. code-block:: bash

   python kinetics400.py --src_dir ~/.mxnet/datasets/kinetics400/train --out_dir ~/.mxnet/datasets/kinetics400/rawframes_train --decode_video --new_width 450 --new_height 340
   python kinetics400.py --src_dir ~/.mxnet/datasets/kinetics400/val --out_dir ~/.mxnet/datasets/kinetics400/rawframes_val --decode_video --new_width 450 --new_height 340

This script will help you decode the videos to raw frames. We specify the width and height of frames for resizing because this will save lots of disk space without losing much accuracy.
All the resized frames will consume 2.9T disk space. If we don't specify the dimension, the original decoded frames will consume 6.8T disk space.
The data preparation process may take a while. The total time to prepare the dataset depends on your machine. For example, it takes about 8 hours on an AWS EC2 instance with EBS and using 56 workers.


Generate the training files
---------------------------

The last step is to generate training files for standard data loading. You can run the helper script as:

.. code-block:: bash

   python kinetics400.py --build_file_list --frame_path ~/.mxnet/datasets/kinetics400/rawframes_train --subset train --shuffle
   python kinetics400.py --build_file_list --frame_path  ~/.mxnet/datasets/kinetics400/rawframes_val --subset val --shuffle

Now you can start training your action recognition models on Kinetics400 dataset.

.. note::

   You need at least 4T disk space to download and extract the dataset. SSD
   (Solid-state disks) is preferred over HDD because of faster speed.

   You may need to install ``Cython`` and ``mmcv`` by ``pip install Cython mmcv``.

.. GENERATED FROM PYTHON SOURCE LINES 66-74

Read with GluonCV
-----------------

The prepared dataset can be loaded with utility class :py:class:`gluoncv.data.Kinetics400` directly.
In this tutorial, we provide three examples to read data from the dataset,
(1) load one frame per video;
(2) load one clip per video, the clip contains five consecutive frames;
(3) load three clips evenly per video, each clip contains 12 frames.

.. GENERATED FROM PYTHON SOURCE LINES 77-79

We first show an example that randomly reads 5 videos each time, randomly selects one frame per video and
performs center cropping.

.. GENERATED FROM PYTHON SOURCE LINES 79-96

.. code-block:: default


    import os
    from gluoncv.data import Kinetics400
    from mxnet.gluon.data import DataLoader
    from mxnet.gluon.data.vision import transforms
    from gluoncv.data.transforms import video

    transform_train = transforms.Compose([
        video.VideoCenterCrop(size=224),
        video.VideoToTensor()
    ])

    # Default location of the data is stored on ~/.mxnet/datasets/kinetics400.
    # You need to specify ``setting`` and ``root`` for Kinetics400 if you decoded the video frames into a different folder.
    train_dataset = Kinetics400(train=True, transform=transform_train)
    train_data = DataLoader(train_dataset, batch_size=5, shuffle=True)


.. GENERATED FROM PYTHON SOURCE LINES 97-99

We can see the shape of our loaded data as below. ``extra`` indicates if we select multiple crops or multiple segments
from a video. Here, we only pick one frame per video, so the ``extra`` dimension is 1.

.. GENERATED FROM PYTHON SOURCE LINES 99-104

.. code-block:: default

    for x, y in train_data:
        print('Video frame size (batch, extra, channel, height, width):', x.shape)
        print('Video label:', y.shape)
        break


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Video frame size (batch, extra, channel, height, width): (5, 1, 3, 224, 224)
    Video label: (5,)


.. GENERATED FROM PYTHON SOURCE LINES 105-107

Here is the second example that randomly reads 25 videos each time, randomly selects one clip per video and
performs center cropping. A clip can contain N consecutive frames, e.g., N=5.

.. GENERATED FROM PYTHON SOURCE LINES 107-111

.. code-block:: default


    train_dataset = Kinetics400(train=True, new_length=5, transform=transform_train)
    train_data = DataLoader(train_dataset, batch_size=5, shuffle=True)


.. GENERATED FROM PYTHON SOURCE LINES 112-114

We can see the shape of our loaded data as below. Now we have another ``depth`` dimension which
indicates how many frames in each clip (a.k.a, the temporal dimension).

.. GENERATED FROM PYTHON SOURCE LINES 114-119

.. code-block:: default

    for x, y in train_data:
        print('Video frame size (batch, extra, channel, depth, height, width):', x.shape)
        print('Video label:', y.shape)
        break


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Video frame size (batch, extra, channel, depth, height, width): (5, 1, 3, 5, 224, 224)
    Video label: (5,)


.. GENERATED FROM PYTHON SOURCE LINES 120-121

Let's plot one training sample with 5 consecutive video frames. index 0 is image, 1 is label

.. GENERATED FROM PYTHON SOURCE LINES 121-146

.. code-block:: default

    from matplotlib import pyplot as plt
    # subplot 1 for video frame 1
    fig = plt.figure()
    fig.add_subplot(1,5,1)
    frame1 = train_dataset[150][0][0,:,0,:,:].transpose((1,2,0)).asnumpy()*255.0
    plt.imshow(frame1.astype('uint8'))
    # subplot 2 for video frame 2
    fig.add_subplot(1,5,2)
    frame2 = train_dataset[150][0][0,:,1,:,:].transpose((1,2,0)).asnumpy()*255.0
    plt.imshow(frame2.astype('uint8'))
    # subplot 3 for video frame 3
    fig.add_subplot(1,5,3)
    frame3 = train_dataset[150][0][0,:,2,:,:].transpose((1,2,0)).asnumpy()*255.0
    plt.imshow(frame3.astype('uint8'))
    # subplot 4 for video frame 4
    fig.add_subplot(1,5,4)
    frame4 = train_dataset[150][0][0,:,3,:,:].transpose((1,2,0)).asnumpy()*255.0
    plt.imshow(frame4.astype('uint8'))
    # subplot 5 for video frame 5
    fig.add_subplot(1,5,5)
    frame5 = train_dataset[150][0][0,:,4,:,:].transpose((1,2,0)).asnumpy()*255.0
    plt.imshow(frame5.astype('uint8'))
    # display
    plt.show()


.. image-sg:: /build/examples_datasets/images/sphx_glr_kinetics400_001.png
   :alt: kinetics400
   :srcset: /build/examples_datasets/images/sphx_glr_kinetics400_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 147-149

The last example is that we randomly read 5 videos each time, select 3 clips evenly per video and
performs center cropping. A clip contains 12 consecutive frames.

.. GENERATED FROM PYTHON SOURCE LINES 149-153

.. code-block:: default


    train_dataset = Kinetics400(train=True, new_length=12, num_segments=3, transform=transform_train)
    train_data = DataLoader(train_dataset, batch_size=5, shuffle=True)


.. GENERATED FROM PYTHON SOURCE LINES 154-156

We can see the shape of our loaded data as below. Now the ``extra`` dimension is 3, which indicates we
have three segments for each video.

.. GENERATED FROM PYTHON SOURCE LINES 156-161

.. code-block:: default

    for x, y in train_data:
        print('Video frame size (batch, extra, channel, depth, height, width):', x.shape)
        print('Video label:', y.shape)
        break


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Video frame size (batch, extra, channel, depth, height, width): (5, 3, 3, 12, 224, 224)
    Video label: (5,)


.. GENERATED FROM PYTHON SOURCE LINES 162-167

Read with VideoLoader
---------------------

In case you don't want to decode videos into frames, we provide a fast video loader, `Decord <https://github.com/zhreshold/decord>`_, to read the dataset.
We still use the utility class :py:class:`gluoncv.data.Kinetics400`. The usage is similar to using image loader.

.. GENERATED FROM PYTHON SOURCE LINES 169-171

For example, if we want to randomly read 5 videos, randomly selects one frame per video and
performs center cropping.

.. GENERATED FROM PYTHON SOURCE LINES 171-181

.. code-block:: default


    from gluoncv.utils.filesystem import import_try_install
    import_try_install('decord')

    # Since we are loading videos directly, we need to change the ``root`` location.
    # ``tiny_train_videos`` contains a small subset of Kinetics400 dataset, which is used for demonstration only.
    train_dataset = Kinetics400(root=os.path.expanduser('~/.mxnet/datasets/kinetics400/tiny_train_videos'), train=True,
                                transform=transform_train, video_loader=True, use_decord=True)
    train_data = DataLoader(train_dataset, batch_size=5, shuffle=True)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
    Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
    To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
    Collecting decord
      Downloading decord-0.6.0-py3-none-manylinux2010_x86_64.whl (13.6 MB)
    Requirement already satisfied: numpy>=1.14.0 in /usr/local/lib/python3.6/dist-packages (from decord) (1.19.5)
    Installing collected packages: decord
    Successfully installed decord-0.6.0
    WARNING: You are using pip version 21.0.1; however, version 21.3.1 is available.
    You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.


.. GENERATED FROM PYTHON SOURCE LINES 182-183

We can see the shape of our loaded data as below.

.. GENERATED FROM PYTHON SOURCE LINES 183-188

.. code-block:: default

    for x, y in train_data:
        print('Video frame size (batch, extra, channel, height, width):', x.shape)
        print('Video label:', y.shape)
        break


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Video frame size (batch, extra, channel, height, width): (5, 1, 3, 224, 224)
    Video label: (5,)


.. GENERATED FROM PYTHON SOURCE LINES 189-191

Here is the second example that randomly reads 25 videos each time, randomly selects one clip per video and
performs center cropping. A clip can contain N consecutive frames, e.g., N=5.

.. GENERATED FROM PYTHON SOURCE LINES 191-197

.. code-block:: default


    train_dataset = Kinetics400(root=os.path.expanduser('~/.mxnet/datasets/kinetics400/tiny_train_videos'),
                                train=True, new_length=5, transform=transform_train,
                                video_loader=True, use_decord=True)
    train_data = DataLoader(train_dataset, batch_size=5, shuffle=True)


.. GENERATED FROM PYTHON SOURCE LINES 198-199

We can see the shape of our loaded data as below.

.. GENERATED FROM PYTHON SOURCE LINES 199-204

.. code-block:: default

    for x, y in train_data:
        print('Video frame size (batch, extra, channel, depth, height, width):', x.shape)
        print('Video label:', y.shape)
        break


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Video frame size (batch, extra, channel, depth, height, width): (5, 1, 3, 5, 224, 224)
    Video label: (5,)


.. GENERATED FROM PYTHON SOURCE LINES 205-207

The last example is that we randomly read 5 videos each time, select 3 clips evenly per video and
performs center cropping. A clip contains 12 consecutive frames.

.. GENERATED FROM PYTHON SOURCE LINES 207-213

.. code-block:: default


    train_dataset = Kinetics400(root=os.path.expanduser('~/.mxnet/datasets/kinetics400/tiny_train_videos'), train=True,
                                new_length=12, num_segments=3, transform=transform_train,
                                video_loader=True, use_decord=True)
    train_data = DataLoader(train_dataset, batch_size=5, shuffle=True)


.. GENERATED FROM PYTHON SOURCE LINES 214-215

We can see the shape of our loaded data as below.

.. GENERATED FROM PYTHON SOURCE LINES 215-220

.. code-block:: default

    for x, y in train_data:
        print('Video frame size (batch, extra, channel, depth, height, width):', x.shape)
        print('Video label:', y.shape)
        break


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Video frame size (batch, extra, channel, depth, height, width): (5, 3, 3, 12, 224, 224)
    Video label: (5,)


.. GENERATED FROM PYTHON SOURCE LINES 221-223

We also support other video loaders, e.g., OpenCV VideoReader, but Decord is significantly faster.
We refer the users to read the documentations for more information.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  7.742 seconds)


.. _sphx_glr_download_build_examples_datasets_kinetics400.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: kinetics400.py <kinetics400.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: kinetics400.ipynb <kinetics400.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_