What is Horovod?

Review of distributed training framework for TensorFlow, developed by Uber
20 October 2017   3876

What is Horovod?

Horovod is a distributed training framework for TensorFlow. The goal of Horovod is to make distributed Deep Learning fast and easy to use.

Why not traditional Distributed TensorFlow?

The primary motivation for this project is to make it easy to take a single-GPU TensorFlow program and successfully train it on many GPUs faster. This has two aspects:

  1. How much modifications does one have to make to a program to make it distributed, and how easy is it to run it.
  2. How much faster would it run in distributed mode?

Internally at Uber we found that it's much easier for people to understand an MPI model that requires minimal changes to source code than to understand how to set up regular Distributed TensorFlow.

To give some perspective on that, this commit into our fork of TF Benchmarks shows how much code can be removed if one doesn't need to worry about towers and manually averaging gradients across them, tf.Server()tf.ClusterSpec()tf.train.SyncReplicasOptimizer()tf.train.replicas_device_setter() and so on. If none of these things makes sense to you - don't worry, you don't have to learn them if you use Horovod.

In addition to being easy to use, Horovod is fast. Below is a chart representing the benchmark that was done on 32 servers with 4 Pascal GPUs each connected by RoCE-capable 25 Gbit/s network:

Horovod Benchmark
Horovod Benchmark

Horovod achieves 90% scaling efficiency for both Inception V3 and ResNet-101, and 79% scaling efficiency for VGG-16.

While installing MPI and NCCL itself may seem like an extra hassle, it only needs to be done once by the team dealing with infrastructure, while everyone else in the company who builds the models can enjoy the simplicity of training them at scale.

Learn more at GitHub.

GNOME to Report on Struggle With Patent Troll

The GNOME Foundation is charged with patent infringement 9,936,086 in the Shotwell photo manager
22 October 2019   79

The GNOME Foundation reported on actions taken to protect against legal action brought forward by Rothschild Patent Imaging LLC, a patent troll company. Rothschild Patent Imaging LLC proposed withdrawing the lawsuit in exchange for the purchase of a license to use the patent at Shotwell. The license amount is expressed in a five-digit number. Despite the fact that buying a license would be the easiest way out, and litigation would require a lot of money and hassle, the GNOME Foundation decided not to agree to the deal and fight to the end.

Consent would jeopardize other open source projects that could potentially be victims of this patent troll. As long as the patent used for lawsuits, covering obvious and widely used methods of working with images, continues to be valid, it can be used as a weapon to carry out other attacks. A special fund called the GNOME Patent Troll Defense Fund has been set up to finance the protection of GNOME in court and to invalidate a patent (for example, by proving the facts of earlier use of the technologies described in the patent).

Shearman & Sterling was involved in the defense of the GNOME Foundation, which has already sent three documents to the court:

  1. Application for the complete dismissal of the case. The defense considers that the patent appearing in the case is insolvent, and the technologies described in it are not applicable for the protection of intellectual property in software;
  2. The answer to the lawsuit, which calls into question the fact that GNOME should be the defendant in such claims. The document made an attempt to prove that the patent specified in the lawsuit cannot be used to make claims against Shotwell and any other free software.
  3. A counterclaim that will prevent Rothschild Patent Imaging LLC from retreating and choosing a less obstinate victim to attack when it understands the seriousness of GNOME’s intention to fight for invalidating the patent.

The GNOME Foundation is charged with patent infringement 9,936,086 in the Shotwell photo manager. The patent is dated 2008 and describes a technique for wirelessly connecting an image capture device (telephone, web-camera) to an image-receiving device (computer) and then selectively transmitting images with filtering by date, location and other parameters. According to the plaintiff, for the infringement of the patent, it is sufficient to have the import function from the camera, the ability to group images according to certain criteria and send images to external sites (for example, a social network or photo service).