Have you ever want to try the ludicrous power of 24 CPU cores? I sure have! In this post, I am experimenting with Google cloud engine to transcode H264 files to H265.
I have a lot of 1080p H264 movies. This takes too much space. A 30 minutes movies took almost 1GB of storage. However, I found out that we can save more than 70% of the space with the same quality by converting those file to H265. Good news right? Yeah right, The hard pill is that we are trading space for computing complexity. On my laptop, with Intel core I7 6500U with 2 cores, 4 thread and 8 gigabytes of memory, It takes twice the time of the movies to encode. For example, in 1-hour long movie, it will take 2 hours to encode one file. I have 24, 30 minutes long files that I want to convert. That means it will take one day to convert everything!
Surely there is a better solution, right? For example by leveraging GPU capabilities. I got 2x speed up by using Intel encoding solution, Intel QuickSync. The problem is, GPU based encoding tends to produce lower quality images and bigger file size. When I tried to encode a 1 GB file, I got an 80MB file from CPU encoding and 300MB from Intel QuickSync. All in similar preset. 220MB might be not much, but if we have a lot of files, that will pile up quickly. Another option might be to buy GPU with the newer, better, faster chipset. I am stingy though, I can’t justify the money to buy a new GPU.
Fortunately, I heard that Google gives us free credit (300 Euro) for testing their service. Therefore, why don’t we use google cloud engine for doing the encoding? After all, we can select a virtual machine with a powerful CPU. For most people, their internet upload speed might make this solution already impractical even before we test it. Regardless, I am just curious whether Google Cloud Engine can give me good speed encoding H264 file to H265.
First, let’s decide which type of virtual machine that we can choose in Google Cloud Engine. You can view it through here https://cloud.google.com/compute/docs/machine-types. I choose to use the high CPU machine type to perform the experiment because H265 encoding is CPU-bounded process. I opt for the spot instance https://medium.com/google-cloud/google-clouds-spot-instances-win-big-and-you-should-too-5b244ca3facf because it’s cheaper than the on-demand instance and suitable for batch processing such as this one. The machine region is Europe-west4-a I perform the experiment using various core size. The table below shows the exact machine configuration that I use for encoding:
You might ask, this might not be a valid test because I use different memory size. However, if you increase the core size, the number of minimum memory size that you can pick is also increasing.
I use FFMPEG version 4.0.2 to encode the file is 30 minutes long, with the following settings:
ffmpeg -i input.mkv -c:v libx265 -q:v 22 -c:a libopus output.mkv
The option -c:v is used to select the video codec. In this case, we choose the x265 library provided by FFMPEG. the -q:v is used to select the target video quality. Lower value means higher video quality at the expense of increased space requirements. The recommended value for this is actually 28. I choose 22 because using the value 28 gives me unpleasant artifact.
The option -c: a is used to select the audio codec. Libopus is an audio encoding engine for the OPUS codec. Opus is a high-performance format for audio, which has a better sound quality and lower bitrate requirements than MP3, OGG or AAC. For your information, MP3 typically need 320kb/s to produce CD-like quality, while with Opus, you can achieve the same result with only 96kb/s.
From the graph, we can see that only when the CPU core is more than 24 we get faster than real-time decoding. the biggest jump in performance is when we increase the core number from 4 to 8 CPU cores. At 4 cores, the encoding speed is actually two times slower than my laptop with 2 core CPU. This is interesting because Google Compute Engine uses a much faster Intel Xeon Skylake processor on its base machine. This suggests the “CPU Core” performance is not the same as the “base CPU core”. Probably this is because the base machine is shared with other tenants.
The rest of the performance jump is not as significant as when we switch from 4 CPU core to 8 CPU cores. Switching from 8 CPU cores to 16 CPU cores only improve approximately 0.33 times. Further increasing the number of CPU from 16 to 24 only improve 0.25 times. This could be caused by the encoder cannot take more benefit from more CPU core due to the algorithm design. The following pictures show that when we enable 24 CPU cores, The CPU Cores load was lower than 16 CPU cores machine. Starting from 16 CPU Cores, the CPU is not used 100%.
The High-CPU Google Compute engine can be used to perform accelerate transcoding task. However, this is only true when we use more than 8 CPU cores. Lower than that then it might be better just to use your own computer if you just striving for encoding speed. The price for a machine with more than 8 CPU cores is prohibitively expensive, not including the time and money necessary to transfer your file to the Google Cloud Engine. In conclusion? It is impractical. Don’t do it.
You should note that in this experiment, I used the same FFMPEG parameters for every machine. A more optimized parameter for multi-core transcoding might exist. Your miles might vary.