Measuring movement with computer vision

Today, I was measuring my movement using computer vision. 

Because of my interests, the numbers show Karate kata, which I need to improve.


Technique

Speed (ms)

Performance Score (%)

Morote-uke Gedan-barai (migi)

410

68

Chudan-tsuki (migi)

330

76

Yoko-tsuki (migi)

290

82

Morote-uke Gedan-barai (hidari)

395

70

Chudan-tsuki (hidari)

320

78

Yoko-tsuki (hidari)

285

84




At a basic level, video is a time-stamped measurement tool. Every frame is a discrete snapshot of body position at a known interval. At 60 frames per second, each frame represents about 16.7 milliseconds. That is already within the range where clinically meaningful differences in reaction time, initiation latency, acceleration, and braking can be observed.


What this allows, even from a simple smartphone recording, is segmentation of movement into phases. We can identify true stillness, first detectable motion, peak velocity, end position, and recovery. By counting frames between these events, we can estimate timing for each phase with reasonable accuracy, typically within ±15–20 ms at 60 fps. This is sufficient to distinguish preparation from initiation, early acceleration from late acceleration, and controlled stopping from overshoot.


From a biomechanical perspective, the most valuable measurements are not raw speed, but timing relationships. For example, which segment initiates first, the pelvis, trunk, shoulder, or distal limb? Even without full 3D motion capture, relative timing can be inferred by observing when each segment begins to move from frame to frame. Small delays of 30–50 ms are visible and repeatable when the video is consistent.


Acceleration profiles can also be approximated. By tracking the displacement of a joint or endpoint frame by frame, we can derive velocity curves and infer acceleration patterns. While this is not laboratory-grade kinetics, it is enough to distinguish relaxed, late-accelerating motion from tense, early-loaded motion. This is a key in my Karate kata training. In practice, this distinction often correlates strongly with efficiency, injury risk, and motor control quality.


Another important capability is repeatability analysis. By comparing multiple executions of the same movement, video analysis can reveal whether a motor pattern is stable or variable. As motor control improves, trajectories become more consistent even as speed increases. This is something clinicians often feel intuitively when watching patients, but video allows it to be tracked objectively over time.


It is also possible to infer aspects of neuromuscular tension indirectly. Excessive co-contraction often manifests as hesitation before movement, uneven velocity curves, micro-stutters, or prolonged deceleration. Relaxed, efficient movement tends to look sudden, clean, and symmetrical. While video cannot directly measure muscle activation, these kinematic signatures are surprisingly reliable when compared across sessions.


There are, of course, limits. Video cannot directly measure force, joint moments, or muscle activation. Depth perception is limited without stereo or depth cameras. Absolute joint angles are less reliable than relative timing unless the camera setup is carefully controlled. But for timing, sequencing, coordination, and motor learning, standard video is far more powerful than most people assume.


In short, video analysis sits in a useful middle ground. It is not a replacement for EMG, force plates, or optical motion capture, but it is far more than subjective observation. When used consistently, it becomes a practical tool for studying motor control, rehabilitation progress, and skill refinement, especially when the goal is improving timing, coordination, and efficiency rather than raw strength.


If your interest comes from physiotherapy, this approach aligns well with modern motor learning principles. It supports external observation, objective feedback, and gradual refinement without overloading the patient or practitioner with complex instrumentation.


I hope this gives you a clear picture of what can be done, and why it is both technically sound and clinically relevant.

My next step is to convert it to true 3D perception.







As an Amazon Associate I earn from qualifying purchases.

Multi-agents software development



A multi-agent foundation LLM is built for parallel, role-separated work on complex problems. It assumes that meaningful software development is not linear. Architecture, implementation, testing, refactoring, validation, and documentation are distinct cognitive tasks that benefit from concurrent execution. In this model, multiple specialized agents operate at the same time within a shared project state. Each agent has a defined role, constraints, and success criteria. One agent may reason purely about system architecture and invariants; another may implement code within a restricted scope; a third may generate or run tests; while a fourth evaluates performance, safety, or maintainability.

The defining feature is not chat conversation, but coordination. Agents exchange concrete artifacts such as diffs, test results, failure reports, and design notes. Progress is driven by state changes in the codebase rather than by turn-by-turn chat dialogue. Work can continue asynchronously until stopping conditions are met, for example, passing tests or satisfying performance thresholds. The human orchestrator acts more like an architect and technical lead, resolving ambiguities, arbitrating disagreements, and deciding when outputs are ready to merge.

A chat-agent foundation LLM, by contrast, is optimized for interactive reasoning with a single dominant thread of control. This is the familiar chat conversational model embedded in IDEs and chat interfaces. It excels at local transformations, explanation, debugging, and short-horizon planning. Context is ephemeral and largely conversational. The agent responds to prompts, proposes changes, and waits for feedback. There is no native concept of parallel roles, persistent task ownership, or long-running autonomous execution. Architectural coherence, testing strategy, and long-term state management remain primarily the responsibility of the human.

When you know what you want to change and need help expressing or validating it quickly, the chat-agent foundation LLM is the fastest tool. Its limitation is scale. As projects grow, the cognitive load of maintaining architecture, tests, and constraints in a single conversational stream increases, and progress becomes serial.

A local-agent medium LLM, in the 7B~32B class, occupies a different position entirely. It is constrained by model capacity but empowered by locality and control. These agents are best used as embedded workers inside well-defined pipelines. They are particularly effective when given narrow responsibilities such as refactoring a module, enforcing style rules, extracting structure from text, or performing deterministic transformations. Because they run locally, they can be integrated deeply into tooling, scheduled jobs, and automated workflows without latency or privacy concerns.

However, local agents rely heavily on external structure. They do not substitute for system-level reasoning across a large codebase. Instead, they amplify it when paired with strong orchestration, clear prompts, and explicit constraints. In practice, they function best as components within a broader multi-agent system rather than as standalone decision-makers.

Seen together, these three modes form a hierarchy. The chat-agent foundation LLM accelerates individual thought. The local-agent medium LLM reliably and repeatedly executes bounded work. The multi-agent foundation LLM coordinates execution across time, roles, and abstractions. 

If you want me to explain how I implement each one, leave a comment.


As an Amazon Associate I earn from qualifying purchases.

NVFP4



Keep the original full-precision model frozen as a teacher, then train the quantized NVFP4 model as a student to match the teacher’s output distributions using KL-divergence, rather than retraining the entire model with task losses.

Distillation from the teacher model is the key point.



As an Amazon Associate I earn from qualifying purchases.

Tamashii

Tamashii 魂 is usually translated as “soul,” but in Japanese practice it is less about an inner, metaphysical essence and more about a way of being present in action. When people speak of doing something “with tamashii,” they are not talking about emotion or passion alone. They are pointing to a form of moral and technical alignment, where intention, effort, and execution are inseparable.


In the context of doing things right, tamashii refers to the seriousness with which a task is taken, regardless of scale or audience. It is the refusal to treat any action as trivial. Whether polishing a floor, writing a line of code, forging a blade, or serving tea, the act is approached as complete in itself. Nothing is deferred to later. There is no shortcut justified by invisibility. The work carries the worker’s name, even if no one ever sees it.


This is why tamashii is often discussed alongside craft rather than belief. A sword with tamashii is not one imbued with mysticism, but one made without compromise. The smith did not rush cooling, did not accept a minor flaw, did not say “good enough.” Over time, this attitude becomes visible in the object. The result feels right, balanced, trustworthy. That feeling is not magic. It is accumulated care.


Tamashii also implies accountability beyond rules. Rules can be followed mechanically. Tamashii requires judgment. It asks, “Is this correct?” not “Is this permitted?” In Japanese workplaces and dojos, this distinction matters. Someone may technically meet requirements yet still be told their work lacks tamashii. What is missing is sincerity of effort, awareness of impact, or respect for the lineage of the task itself.


There is a quiet ethical dimension here. To act with tamashii is to acknowledge that actions shape the self. You are not only producing an outcome, you are becoming the kind of person who does things a certain way. Cutting corners is not just a practical decision, it is a formative one. Over time, habits harden. Tamashii resists that erosion by insisting on care even when tired, unseen, or under pressure.


In martial arts, this shows up as consistency rather than intensity. A strike done with tamashii is not the hardest strike, but the most correct one, aligned body, breath, timing, and intent. Power emerges as a byproduct of correctness. The same principle applies outside the dojo. When the process is right, results follow naturally. When the process is compromised, results become fragile.


Importantly, tamashii is not perfectionism. Perfectionism is anxious and self-referential. Tamashii is calm. It accepts human limitation but refuses indifference. Mistakes can happen, but care must be evident. The difference is felt immediately by others, especially those trained in the same discipline.


In modern contexts, tamashii often survives quietly. It appears in engineers who document systems for the next person, writers who revise for clarity rather than praise, parents who keep small promises, even when inconvenient. These acts are rarely celebrated, but they build trust. Over time, people learn who can be relied on. That reputation is not built on claims, but on repeated, unglamorous correctness.


So when tamashii is invoked in relation to doing things right, it is not a poetic flourish. It is a practical standard. Do the thing fully. Respect the work. Leave no residue of laziness or excuse. Let the quality of attention be visible in the result. That is tamashii in action, not as an abstract soul, but as a lived discipline.




As an Amazon Associate I earn from qualifying purchases.

apt quotation..