COOKCREDIT RESEARCH

The bottleneck in kitchen manipulation isn't the policy.
It's the reward.

Egocentric manipulation datasets capture what hands do. They rarely capture whether what hands did was any good. We produce structured reward signals from real cooking — fineness, uniformity, speed — as a byproduct of scoring the cook.

R(t) = w1 * exp(-k * |ln(A_obs / A_tgt)|) + w2 * exp(-c * CV_robust) + w3 * S_speed * clamp(C/100 * s)
where t is a manipulation episode, A_obs the observed piece area, CV_robust the median absolute deviation ratio, and S_speed is quality-gated by consistency confidence s.
FIGURE 1

Reward signal density: scripted lab demonstrations vs. in-situ cooking sessions

Each dot represents a timestamped reward observation. Lab trajectories (left) produce sparse terminal rewards at episode end. CookCredit sessions (right) produce dense, per-stroke reward signals throughout the manipulation episode — piece measurements after every cut, quality metrics at every pause, and a composite score at session close. The density ratio exceeds 40:1 for a typical 60-second episode.

OBSERVATION

Current egocentric datasets — Ego4D, EgoExo4D, EgoVerse — provide observation-rich, reward-sparse episodes. A hand pose trajectory tells a policy where to move. It does not tell the policy whether the result was a clean brunoise or a ragged hack. The missing signal is structured output quality: what did the cut produce, and how close was it to the task specification?

CookCredit instruments the cook's phone to capture 21 hand landmarks per frame, a blade trajectory derived via wrist-anchored SE(2) prediction, and — critically — instance-level piece geometry at every measurement window. The scoring function R(t) above is not learned. It is calibrated against expert human evaluators (Spearman r > 0.7, n >= 30) and deployed as a deterministic, interpretable reward.

IMPLICATION

A kitchen robot trained with CookCredit episodes receives not just a demonstration of how to chop, but a graded demonstration — this chop scored 84, that one scored 62, here is why. Reward-dense imitation learning converges faster and generalizes better than reward-sparse alternatives. The cook doesn't know they're training a robot. They're just trying to pass.

[1] Xu et al. "EgoVerse: Scaling Egocentric Data for Robot Manipulation." GaTech RL2 Lab, 2025.
[2] Chi et al. "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion." RSS 2023.
[3] Grauman et al. "Ego4D: Around the World in 3,000 Hours." CVPR 2022.
[4] CookCredit. "Wrist-Anchored Blade Localization for Cooking Skill Assessment." Internal, 2026.

Hand Pose Estimation + Blade Prediction

Most hand pose research happens in labs, on scripted tasks, with controlled lighting and clean backgrounds. We think the interesting data is somewhere else — in a real kitchen, at dinner time, with a sharp knife and a pile of onions. CookCredit runs egocentric hand pose estimation on every frame of every session, building dense manipulation trajectories from the one environment that matters most and gets studied least: the home.

hand pose estimation SE(2) wrist transform EMA offset learning kinematic peak detection blade IoU validation
0
keypoints
0%
confidence
0
strokes
0.0
Hz cadence

Piece Segmentation + Measurement

A robot can learn to move its arm through space. The harder problem is knowing whether what it produced was any good. CookCredit closes that loop — every session generates instance-level measurements of the output, not just a recording of the motion. Piece geometry, uniformity metrics, and quality scores, structured and timestamped, ready to serve as reward signals for any system learning to use a blade.

N pieces
--
contours extracted
findContours(RETR_EXTERNAL)
median area
--
mm2
P50(piece_areas)
robust CV
--
MAD / median
1.4826 * MAD / median

Scoring Rubric

Skill isn't one number. It's the relationship between what you intended, what you produced, and how efficiently you got there. Our scoring model separates target accuracy from consistency from speed, penalizes each on independent curves, and gates speed on quality so fast-and-sloppy never outscores slow-and-precise. The result is a score that agrees with expert evaluators — not because we tuned it to a benchmark, but because we calibrated it against their judgment directly.

fineness
--
target distance
w = 0.35
consistency
--
robust CV penalty
w = 0.45
speed
--
quality-gated
w = 0.20
composite
--
weighted sum
waiting for measurement...--

Data Pipeline + Training Flywheel

Every session a cook completes becomes a structured episode — hand poses, blade trajectories, stroke events, and quality labels — stored in a columnar format designed for cross-embodiment research. The same data that scores a cook today can train a manipulation policy tomorrow. We're not building a dataset. We're building a pipeline that grows every time someone picks up a knife.

0
frames captured
0
landmarks / s
0
zarr arrays
0
episodes in DB
--
model mAP50
initializing capture pipeline...

What CookCredit does today.

Measures what you made — the size of every piece, how uniform they are, how quickly you worked. A score that means something, from a camera that was watching closely.

What CookCredit does next.

Shows you how to get better. Live rhythm feedback. Blade angle consistency. Session-over-session progress. The score becomes a coach.

Every session teaches something.

When you cook, your hands move with years of practice — techniques passed down, adapted, perfected in your own kitchen.

CookCredit captures that. Not video of you. The motion of your hands. The geometry of what you made. Anonymized skill, from real kitchens, preparing real food.

The kind of knowledge that's never been written down — until now.

21 landmarks. Every frame. Your phone sees what a chef instructor sees.

From your kitchen to theirs.

CookCredit is building a way for home cooks to cook for the people around them. Scoring is how you prove you're ready.