This demonstrates that a $64 Intel Pentium G6400 processor without any GPU or other hardware accelerator is more than capable of running Google’s Inception-V2 based mask RCNN and processing a 3M-pixel real-time video at its native resolution of 2688×1520 (i.e. without down-sampling) and at the full frame rate of 20 fps. Such performance is equivalent to 53 Nvidia GeForce GTX TITAN X [1] or, using today’s most cost effective GPUs, 61 Nvidia GeForce RTX 2060 paired with 61 AMD Ryzen 7 1700X, costing over $46,000 [2], which is more than 700 times cost of a Intel Pentium G6400 processor used in this demo. Thus **Videolytics.AI platform has achieved an astonishing 99.8% cost reduction!**

- Google reports that the running time of mask_rcnn_inception_v2_coco is 79 ms per 600×600 image using an Nvidia GeForce GTX TITAN X card whose single precision throughput is 6144 GFLOPS. To estimate the running time per 3M-pixel (2688×1520) image, the original resolution can be divided into 600×600 subsections overlapping by 300 in both vertical and horizontal dimensions, total 32 subsections, which guarantees that any object whose size is smaller than overlapping dimensions of 300×300 will be contained within at least one subsection. Yet an object larger than 300×300 may be not contained within any subsection and may be missed. To deal with larger objects, a down-sampled image at ¼ resolution is used. In total, 33 runs of mask_rcnn_inception_v2_coco are needed to process a 3M-pixel image, totaling 2.6 seconds. To process above 3M-pixel real-time video at full 20 fps, it takes server nodes of 53 Nvidia GeForce GTX TITAN X cards.
- As of early 2020, the most cost effective GPU system for deep learning is Nvidia GeForce RTX 2060 [3] (listing price $359) whose single precision throughput is 5242 GFLOPS. Since the running time of mask_rcnn_inception_v2_coco per 3M-pixel (2688×1520) image is 2.6 seconds using an Nvidia GeForce GTX TITAN X whose single precision throughput is 6144 GFLOPS, it is estimated to be 3.1 seconds using an Nvidia GeForce RTX 2060. In a cost effective system, GeForce RTX 2060 is usually paired with AMD Ryzen 7 1700X [4] (listing price $400). To process a 3M-pixel camera at the full frame rate of 20 fps, it takes 61 pair of AMD Ryzen 7 1700X and Nvidia GeForce RTX 2060, at total cost of $46299.
- In February 2020, Lambda tested the latest NVIDIA GPUs for deep learning and showed that GeForce RTX 2060 is most cost effective in terms of the number of images processed per second.
- In one study of hardware for deep learning: “RTX 2060 Vs GTX 1080Ti Deep Learning Benchmarks”, Nvidia GeForce RTX 2060 is paired with AMD Ryzen 7 1700X.