[] On-Skin Smart Watch Handwriting HCI with Deep Learning

My Journey in Smartwatch Innovation: Building the Foundation for On-Skin Handwriting

My Research Report: report

My robust finger tracking algorithm: video

Gesture extraction demo: video

=====================================================================

Hey folks! As a student researcher, I dove into a cool project to fix one of the biggest annoyances with smartwatches: those tiny screens that make typing or gesturing a hassle. My work focused on turning the back of your hand into a touchpad by "listening" to the sounds your finger makes when sliding on your skin. This was my UROP project, and it laid the groundwork for a bigger research paper called "Swift," co-authored with my professor and team. Let me break it down simply!

The Challenge and My Idea

Smartwatches are awesome for notifications, but interacting with them? Not so much. Your finger blocks the screen (the "fat finger" problem), and voice input isn't always private or quiet. Inspired by earlier ideas like SkinWatch (using light to detect skin stretches) or FingerSound (a ring that picks up scraping noises), I proposed using the watch's microphones to capture friction sounds from finger slides on the hand. The goal: Use AI (deep learning) to track these slides in real-time, starting with simple straight lines to test if it works.

What I Built: A Smart Data Pipeline

The heart of my project was creating a reliable way to collect and process data for training AI models. Here's how it worked:

- Setup: I rigged a watch with two mics for stereo sound and a ring with an LED on the sliding finger. Videos recorded the LED's movement at 120 frames per second for accurate "ground truth" directions, while audio captured the friction at 48kHz.

- Collection Rules: In a quiet room, start audio first, then video. Slide fingers flat on the wrist without touching cables—aim for 120+ seconds per session.

- Processing Magic: Using Python scripts:

- Finger Tracker.py: Tracks the LED in videos using color filters and ArUco markers for precise coordinates.

- Gesture Processor.py: Segments slides based on speed, calculates angles (0-360°), and filters out curvy or short ones for clean data.

- Audio Processor.py: Syncs audio to video using cross-correlation (matching sound intensity to movement speed), creating labeled audio clips.

- Output: High-quality datasets with time-frequency sound slices tagged by gesture angles, ready for AI training. I even added visualizations like polar plots to show directions.

This pipeline turned messy raw data into usable training sets, handling variations across people.

# I wrote an OpenCV algorithm to track finger tip movement. This will serve as ground truth for the gesture labels.

# Analyze the gesture video trajectories and time frames. The second picture is a distribution visualization of the gesture directions.

# Get the spectrogram of the microphone recording. This will be the features of the training data

# Match video ground truths with audio segments with cross correlation between the finger tip velocity and the audio strength. This cuts out only the gesture and audio pairs that we care about.

How It Led to "Swift": The Full Paper

My data tools were key for expanding the idea. After my project, we (with my professor) built "Swift," submit to the conference CHI 2026. It recognizes all 26 lowercase letters plus space/backspace gestures on unmodified watches! We added active ultrasound (watch speaker sends inaudible tones, mic catches Doppler shifts from finger motion) to better distinguish similar letters (like 'c' vs. 'o'). With my pipeline collecting diverse data, plus AI tricks like mixup and noise augmentation, Swift hit 87% accuracy after quick personalization. Users in tests loved its silent, natural feel, though some letters with tails (like 'j') were trickier on skin.

Looking Ahead

This project showed how sound-based touch can make wearables more intuitive. Next? More data from diverse folks, curved gestures, and full integration into apps. Imagine texting without looking—game-changer for on-the-go life! If you're into HCI or AI, hit me up with questions. 🚀