COOKCREDIT RESEARCH

The bottleneck in kitchen manipulation isn't the policy.
It's the reward.

Egocentric manipulation datasets capture what hands do. They rarely capture whether what hands did was any good. We produce structured reward signals from real cooking — fineness, uniformity, speed — as a byproduct of scoring the cook.

R(t) = w1 * exp(-k * |ln(A_obs / A_tgt)|) + w2 * exp(-c * CV_robust) + w3 * S_speed * clamp(C/100 * s)
where t is a manipulation episode, A_obs the observed piece area, CV_robust the median absolute deviation ratio, and S_speed is quality-gated by consistency confidence s.

FIGURE 1

Reward signal density: scripted lab demonstrations vs. in-situ cooking sessions

Each dot represents a timestamped reward observation. Lab trajectories (left) produce sparse terminal rewards at episode end. CookCredit sessions (right) produce dense, per-stroke reward signals throughout the manipulation episode — piece measurements after every cut, quality metrics at every pause, and a composite score at session close. The density ratio exceeds 40:1 for a typical 60-second episode.

OBSERVATION

Current egocentric datasets — Ego4D, EgoExo4D, EgoVerse — provide observation-rich, reward-sparse episodes. A hand pose trajectory tells a policy where to move. It does not tell the policy whether the result was a clean brunoise or a ragged hack. The missing signal is structured output quality: what did the cut produce, and how close was it to the task specification?

CookCredit instruments the cook's phone to capture 21 hand landmarks per frame, a blade trajectory derived via wrist-anchored SE(2) prediction, and — critically — instance-level piece geometry at every measurement window. The scoring function R(t) above is not learned. It is calibrated against expert human evaluators (Spearman r > 0.7, n >= 30) and deployed as a deterministic, interpretable reward.

IMPLICATION

A kitchen robot trained with CookCredit episodes receives not just a demonstration of how to chop, but a graded demonstration — this chop scored 84, that one scored 62, here is why. Reward-dense imitation learning converges faster and generalizes better than reward-sparse alternatives. The cook doesn't know they're training a robot. They're just trying to pass.

[1] Xu et al. "EgoVerse: Scaling Egocentric Data for Robot Manipulation." GaTech RL2 Lab, 2025.
[2] Chi et al. "Diffusion Policy: Visuomotor Policy Learning via Action Diffusion." RSS 2023.
[3] Grauman et al. "Ego4D: Around the World in 3,000 Hours." CVPR 2022.
[4] CookCredit. "Wrist-Anchored Blade Localization for Cooking Skill Assessment." Internal, 2026.

Hand Pose Estimation + Blade Prediction

Most hand pose research happens in labs, on scripted tasks, with controlled lighting and clean backgrounds. We think the interesting data is somewhere else — in a real kitchen, at dinner time, with a sharp knife and a pile of onions. CookCredit runs egocentric hand pose estimation on every frame of every session, building dense manipulation trajectories from the one environment that matters most and gets studied least: the home.

hand pose estimation SE(2) wrist transform EMA offset learning kinematic peak detection blade IoU validation

keypoints

confidence

strokes

0.0

Hz cadence

Piece Segmentation + Measurement

A robot can learn to move its arm through space. The harder problem is knowing whether what it produced was any good. CookCredit closes that loop — every session generates instance-level measurements of the output, not just a recording of the motion. Piece geometry, uniformity metrics, and quality scores, structured and timestamped, ready to serve as reward signals for any system learning to use a blade.

N pieces

contours extracted

findContours(RETR_EXTERNAL)

median area

mm2

P50(piece_areas)

robust CV

MAD / median

1.4826 * MAD / median

Scoring Rubric

Skill isn't one number. It's the relationship between what you intended, what you produced, and how efficiently you got there. Our scoring model separates target accuracy from consistency from speed, penalizes each on independent curves, and gates speed on quality so fast-and-sloppy never outscores slow-and-precise. The result is a score that agrees with expert evaluators — not because we tuned it to a benchmark, but because we calibrated it against their judgment directly.

fineness

target distance

w = 0.35

consistency

robust CV penalty

w = 0.45

speed

quality-gated

w = 0.20

composite

weighted sum

waiting for measurement...--

The bottleneck in kitchen manipulation isn't the policy.
It's the reward.

Hand Pose Estimation + Blade Prediction

Piece Segmentation + Measurement

Scoring Rubric

Data Pipeline + Training Flywheel

What CookCredit does today.

What CookCredit does next.

Every session teaches something.

From your kitchen to theirs.

The bottleneck in kitchen manipulation isn't the policy.It's the reward.

Hand Pose Estimation + Blade Prediction

Piece Segmentation + Measurement

Scoring Rubric

Data Pipeline + Training Flywheel

What CookCredit does today.

What CookCredit does next.

Every session teaches something.

From your kitchen to theirs.

The bottleneck in kitchen manipulation isn't the policy.
It's the reward.