After several leaks and rumors, Google finally unveiled the Pixel 5 and Pixel 4a 5G earlier this year in September. As expected, the devices came with a host of new Google Camera features that set them apart from other Android phones on the market. These include Cinematic Pan for shake-free panning on videos, Locked and Active Stabilization modes, Night Sight support in Portrait Mode, and a Portrait Light feature to adjust the lighting portrait shots automatically. A few weeks after the launch, Google released most of these features for older Pixel devices via a Google Photos update. And now, the company has shared some details about the technology behind the Portrait Light feature.
As per a recent blog post from the company, the Portrait Light feature was inspired by the off-camera lights used by portrait photographers. It enhances portrait shots by modeling a repositionable light source that can be added to the scene. When added automatically, the artificial light source automatically adjusts the direction and intensity to complement the photo’s existing lighting using machine learning.
As Google explains, the feature makes use of novel machine learning models that were trained using a diverse dataset of photographs captured in the Light Stage computational illumination system. These models enable two algorithmic capabilities:
- Automatic directional light placement: Based on the machine learning algorithm, the feature automatically places an artificial light source that is consistent with how a professional photographer would have placed an off-camera light source in the real world.
- Synthetic post-capture relighting: Based on the direction and intensity of the existing light in a portrait shot, the machine learning algorithm adds a synthetic light that looks realistic and natural.
For the automatic directional light placement, Google trained a machine learning model to estimate a high dynamic range, omnidirectional illumination profile for a scene based on an input portrait. This new lighting estimation model can find the direction, relative intensity, and color of all light sources in the scene coming from all directions, considering the face as a light probe. It also estimates the head post of the subject using a MediaPipe Face Mesh. Based on the aforementioned data, the algorithm then determines the direction for the synthetic light.
Once the synthetic lighting’s direction and intensity are established, the next machine learning model adds the synthetic light source to the original photo. The second model was trained using millions of pairs of portraits, both with and without extra lights. This dataset was generated by photographing seventy different people using the Light Stage computational illumination system, which is a spherical lighting rig that includes 64 cameras with different viewpoints and 331 individually-programmable LED light sources.
Each of the seventy subjects was captured while illuminated one-light-at-a-time (OLAT) by each of the 331 LEDs. This generated their reflectance field, i.e., their appearance as illuminated by the discrete sections of the spherical environment. The reflectance field encoded the unique color and light-reflecting properties of the subject’s skin, hair, and clothing and determined how shiny or dull each material appeared in the photos.
These OLAT images were then linearly added together to render realistic images of the subject as they would appear in any image-based lighting environment, with complex light transport phenomena like subsurface scattering correctly represented.
Then, instead of training the machine learning algorithm to predict the output relit images directly, Google trained the model to output a low-resolution quotient image that could be applied to the original input image to produce the desired output. This method is computationally efficient and encourages only low-frequency lighting changes without impacting high-frequency image details that are directly transferred from the input image to maintain quality.
Furthermore, Google trained a machine learning model to emulate the optical behavior of light sources reflecting off relatively matte surfaces. To do so, the company trained the model to estimate the surface normals given the input photo and then applied Lambert’s law to compute a “light visibility map” for the desired lighting direction. This light visibility map is then provided as input to the quotient image predictor to ensure that the model is trained using physics-based insights.
While all of this may seem like a lengthy process that would take the Pixel 5’s mid-range hardware a fair bit of time to process, Google claims that the Portrait Light feature was optimized to run at interactive frame-rates on mobile devices, with a total model size of under 10MB.