Resizing/Converting Frames

Efficiently resizing Frames and converting to RGB-based pixel formats, layouts and data types on the GPU.

Most ML models are trained on RGB frames at a very small resolution, which is not something the Camera natively produces. To use these models, you must convert your Frame to RGB, and resize it to match the input tensor's resolution first.

For example, the BlazeFace ML model declares its input tensor size as following:

Input: [1, 3, 128, 128] (float32)

This often refers to NCHW;

N: Number of batches = 1
C: Number of channels per pixel = 3 (implying RGB)
H: Height of the image = 128 pixels
W: Width of the image = 128 pixels
float32: The data-type of each scalar pixel = float32

No Camera natively produces 128x128 RGB Images, especially not in float32 (as uint8 is much more compact), so we have to convert Frames manually.

VisionCamera includes a GPU-accelerated resizing and pixel-format conversion pipeline in the react-native-vision-camera-resizer package.

Dependency Required

The Resizer requires react-native-vision-camera-resizer to be installed.

Using the resizer

To create a Resizer use useResizer(...) and configure its width, height, channelOrder, dataType and pixelLayout:

import { useResizer } from 'react-native-vision-camera-resizer'

function App() {
  const { resizer } = useResizer({
    width: 128,
    height: 128,
    channelOrder: 'rgb',
    dataType: 'float32',
    pixelLayout: 'planar',
  })
}

import { createResizer } from 'react-native-vision-camera-resizer'

const resizer = await createResizer({
  width: 128,
  height: 128,
  channelOrder: 'rgb',
  dataType: 'float32',
  pixelLayout: 'planar',
})

Then, in your Frame Processor, call resize(...):

const { resizer } = ...
const frameOutput = useFrameOutput({
  pixelFormat: 'yuv',
  onFrame(frame) {
    'worklet'
    const resized = resizer.resize(frame)
    const pixels = resized.getPixelBuffer()
    // call BlazeFace with `pixels` now
    resized.dispose()
    frame.dispose()
  }
})

const resizer = ...
const frameOutput = HybridCameraFactory.createFrameOutput({
  // ...options
  pixelFormat: 'yuv',
  onFrame(frame) {
    'worklet'
    const resized = resizer.resize(frame)
    const pixels = resized.getPixelBuffer()
    // call BlazeFace with `pixels` now
    resized.dispose()
    frame.dispose()
  }
})

The returned GPUFrame holds the resized buffer (see getPixelBuffer()) in the target width/height, channel order, pixel layout and data type - which you can then pass to your ML model - for example to an ONNX Runtime, TFLite, or other. Make sure to dispose() the GPUFrame once you are done using it, otherwise the pipeline stalls.

Availability and Fallbacks

The Resizer pipeline is GPU-accelerated using Metal on iOS, and Vulkan on Android. On Android, the Resizer pipeline requires Vulkan extensions such as VK_ANDROID_external_memory_android_hardware_buffer or VK_EXT_queue_family_foreign, which are often only available on Android 8.0+.

To check if the current device supports the GPU-accelerated Resizer pipeline, use isResizerAvailable() and fall back to a custom CPU implementation otherwise:

import { isResizerAvailable, createResizer } from 'react-native-vision-camera-resizer'

function getResizer(options: ResizerOptions): Resizer | CPUFallbackResizer {
  if (isResizerAvailable()) {
    // GPU accelerated Resizer pipeline is available!
    return createResizer(options)
  } else {
    // GPU accelerated Resizer pipeline is not available on this device,
    // fall back to a CPU implementation.
    return ...
  }
}

Input Frame Constraints

Input Frame Pixel Formats

The Resizer expects the input Frame to be a GPU-backed buffer, so it's recommended to configure the CameraFrameOutput to stream Frames in pixelFormat='yuv'.

For maximum performance, you can set pixelFormat='native', but only if your currently selected CameraFormat's nativePixelFormat is either 'yuv-420-8-bit-full' or 'private', otherwise the input Frame might use a pixel format not supported by the Resizer (like 'raw-bayer-packed96-12-bit').

Warning

Do not set the CameraFrameOutput's pixelFormat to 'rgb', as this performs an unnecessary conversion in the Camera pipeline. The Resizer already converts to RGB, which is faster because it's GPU accelerated.

Input Frame Size

The bigger the Frame is, the more work the Resizer has to do. It is recommended to select a CameraFormat with a size as small as needed to speed up the pipeline. A good default is a format close to the screen's size:

const device = ...
const format = useCameraFormat(device, Templates.Preview)

const device = ...
const format = getCameraFormat(device, Templates.Preview)

Output Configurations

Float vs Int

Most ML models are trained on float32 data types, which means each value in a pixel is a 32-bit Float ranging from 0.0 to 1.0. While GPUs (and NPUs) are optimized on floating point datatypes, it may be worth to quantize the model to use uint8 instead, as this is much more compact in memory (4x smaller), which could improve performance and overall device thermals.

Tip

See ChannelOrder and DataType for the supported output formats the Resizer can convert to.

Pixel Layout

The pixelLayout specifies how the individual channels per pixel are arranged in memory.

'interleaved' stores complete pixels next to each other, which corresponds to (N)HWC:
```
RGBRGBRGBRGB
RGBRGBRGBRGB
RGBRGBRGBRGB
```
Use 'interleaved' if your ML model input tensor is (N)HWC shaped - e.g.: [1, H, W, 3].
'planar' stores one full channel plane after the other, which corresponds to (N)CHW:
```
RRRRRRRRRRRR
GGGGGGGGGGGG
BBBBBBBBBBBB
```
Use 'planar' if your ML model input tensor is (N)CHW shaped - e.g.: [1, 3, H, W].

Tip

See PixelLayout for a list of all memory pixel layouts.

Scale Mode

When the Frame's aspect ratio doesn't match the Resizer's output aspect ratio, the Resizer either has to scale the Frame to 'cover' the output (which crops out any overflow), or 'contain' the Frame inside the output (which adds black bars around underflowing areas).

Tip

See ScaleMode for more information.

Orientation and Mirroring

The Resizer automatically counter-rotates and possibly counter-mirrors the Frame to be in its intended up-right and non-mirrored presentation.

It is recommended to set enablePhysicalBufferRotation to false to prevent the Camera from performing this on it's own, as the Resizer is GPU-accelerated and typically much faster.

Multiple Resizers

It is fine to run multiple Resizer instances if you have multiple different ML models, although it is recommended to convert ML models to the same common type to avoid duplicating conversion overhead. You may even run the Resizer in parallel - see "Async Frame Processing" for more information.

Resizing/Converting Frames

On this page