wellallyTechIn the era of Edge AI, privacy and performance are no longer a trade-off. When dealing with sensitive...
In the era of Edge AI, privacy and performance are no longer a trade-off. When dealing with sensitive health data, such as skin imaging, users are increasingly wary of uploading photos to a central server. This is where Computer Vision on the browser becomes a game-changer.
By leveraging TensorFlow.js, MediaPipe, and WebGPU, we can build a privacy-first AI application that performs real-time skin lesion segmentation and feature extraction directly on the client's device. No data leaves the browser, and the inference is lightning-fast thanks to hardware acceleration.
In this tutorial, we’ll explore how to combine the structural power of MediaPipe with the classification prowess of MobileNetV3 to create a preliminary skin health screening tool.
Running models locally reduces latency and eliminates server costs. We use MediaPipe to isolate the area of interest (segmentation) and then pass that specific region to a lightweight MobileNetV3 model for feature analysis.
graph TD
A[Webcam Stream] --> B{MediaPipe Segmenter}
B -->|Isolate Skin Region| C[ROI Extraction]
C --> D[MobileNetV3 Inference]
D --> E[WebGL/WebGPU Acceleration]
E --> F[React UI Overlay]
F --> G[Real-time Feedback]
To follow along, you'll need a basic understanding of React and these libraries:
@mediapipe/selfie_segmentation or tasks-vision for ROI detection.First, let's set up our React component and initialize the MediaPipe segmenter. This allows us to "see" the user's skin and ignore the background.
import React, { useRef, useEffect } from 'react';
import { ImageSegmenter, FilesetResolver } from "@mediapipe/tasks-vision";
const SkinScreener = () => {
const videoRef = useRef(null);
const canvasRef = useRef(null);
const initSegmenter = async () => {
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
);
const segmenter = await ImageSegmenter.createFromOptions(vision, {
baseOptions: {
modelAssetPath: "path/to/selfie_segmenter.tflite",
delegate: "GPU" // 🚀 Using WebGL/WebGPU
},
runningMode: "VIDEO",
outputCategoryMask: true,
});
return segmenter;
};
// ... useEffect logic to start webcam
};
Once we have the region of interest (ROI), we feed it into MobileNetV3. MobileNetV3 is optimized for mobile CPUs and edge devices, making it perfect for our use case.
import * as tf from '@tensorflow/tfjs';
const runInference = async (roiCanvas) => {
// Load the pre-trained MobileNetV3 model
const model = await tf.loadGraphModel('model/mobilenet_v3_skin/model.json');
// Pre-process the image
const tensor = tf.browser.fromPixels(roiCanvas)
.resizeNearestNeighbor([224, 224])
.expandDims(0)
.div(255.0);
const prediction = await model.predict(tensor);
const data = await prediction.data();
// Return the highest confidence score
return data;
};
While this tutorial covers the basics of edge inference, production-grade medical AI requires more robust pipelines, including better data augmentation and specialized quantization techniques to shrink models without losing accuracy.
For a deeper dive into production-ready AI patterns, advanced model optimization, and deployment strategies for high-performance vision systems, I highly recommend checking out the official technical deep-dives at WellAlly Blog. It’s a fantastic resource for developers looking to bridge the gap between "it works on my machine" and "it scales for millions."
We want to give the user immediate feedback. By using requestAnimationFrame, we can create a seamless loop that updates the UI as the camera moves.
const processFrame = async (segmenter, model) => {
if (videoRef.current.readyState >= 2) {
const result = await segmenter.segmentForVideo(videoRef.current, performance.now());
// Draw the mask on our canvas
const ctx = canvasRef.current.getContext('2d');
drawMask(ctx, result.categoryMask);
// Analyze every 30 frames to save battery
if (frameCount % 30 === 0) {
const diagnosis = await runInference(canvasRef.current);
updateUI(diagnosis);
}
}
requestAnimationFrame(() => processFrame(segmenter, model));
};
By combining MediaPipe for spatial awareness and TensorFlow.js for deep learning inference, we’ve built a powerful, private, and efficient screening tool. This approach significantly reduces the barrier to entry for preliminary health checks, all while keeping user data where it belongs: on their own device.
What's next?
tfjs-backend-webgpu for even faster inference on supported browsers.Have you tried running vision models in the browser? Drop a comment below or share your projects! Happy coding! 💻🔥
Found this useful? Don't forget to check out the more advanced tutorials over at wellally.tech/blog to stay ahead of the curve!