NVFP4
Keep the original full-precision model frozen as a teacher, then train the quantized NVFP4 model as a student to match the teacher’s output distributions using KL-divergence, rather than retraining the entire model with task losses.
Distillation from the teacher model is the key point.
Tamashii
Tamashii 魂 is usually translated as “soul,” but in Japanese practice it is less about an inner, metaphysical essence and more about a way of being present in action. When people speak of doing something “with tamashii,” they are not talking about emotion or passion alone. They are pointing to a form of moral and technical alignment, where intention, effort, and execution are inseparable.
In the context of doing things right, tamashii refers to the seriousness with which a task is taken, regardless of scale or audience. It is the refusal to treat any action as trivial. Whether polishing a floor, writing a line of code, forging a blade, or serving tea, the act is approached as complete in itself. Nothing is deferred to later. There is no shortcut justified by invisibility. The work carries the worker’s name, even if no one ever sees it.
This is why tamashii is often discussed alongside craft rather than belief. A sword with tamashii is not one imbued with mysticism, but one made without compromise. The smith did not rush cooling, did not accept a minor flaw, did not say “good enough.” Over time, this attitude becomes visible in the object. The result feels right, balanced, trustworthy. That feeling is not magic. It is accumulated care.
Tamashii also implies accountability beyond rules. Rules can be followed mechanically. Tamashii requires judgment. It asks, “Is this correct?” not “Is this permitted?” In Japanese workplaces and dojos, this distinction matters. Someone may technically meet requirements yet still be told their work lacks tamashii. What is missing is sincerity of effort, awareness of impact, or respect for the lineage of the task itself.
There is a quiet ethical dimension here. To act with tamashii is to acknowledge that actions shape the self. You are not only producing an outcome, you are becoming the kind of person who does things a certain way. Cutting corners is not just a practical decision, it is a formative one. Over time, habits harden. Tamashii resists that erosion by insisting on care even when tired, unseen, or under pressure.
In martial arts, this shows up as consistency rather than intensity. A strike done with tamashii is not the hardest strike, but the most correct one, aligned body, breath, timing, and intent. Power emerges as a byproduct of correctness. The same principle applies outside the dojo. When the process is right, results follow naturally. When the process is compromised, results become fragile.
Importantly, tamashii is not perfectionism. Perfectionism is anxious and self-referential. Tamashii is calm. It accepts human limitation but refuses indifference. Mistakes can happen, but care must be evident. The difference is felt immediately by others, especially those trained in the same discipline.
In modern contexts, tamashii often survives quietly. It appears in engineers who document systems for the next person, writers who revise for clarity rather than praise, parents who keep small promises, even when inconvenient. These acts are rarely celebrated, but they build trust. Over time, people learn who can be relied on. That reputation is not built on claims, but on repeated, unglamorous correctness.
So when tamashii is invoked in relation to doing things right, it is not a poetic flourish. It is a practical standard. Do the thing fully. Respect the work. Leave no residue of laziness or excuse. Let the quality of attention be visible in the result. That is tamashii in action, not as an abstract soul, but as a lived discipline.
Running a 24B Mistral Model Locally with MLX macOS
LLM model storage
Store model weights once, outside any tool-specific directory. This allows us to share them across multiple machines.
For example:
~/Documents/MODELS/mlx-community/Mistral-Small-3.2-24B-Instruct-2506-8bit
Create a clean conda environment
Use a conda environment defined in a YAML file. Do not reuse the base.
environment_mlx.yml:
name: env_mlx
channels:
- conda-forge
dependencies:
- python=3.11
- pip
- pip:
- mlx-lm
Create and activate the environment:
conda env create -f environment_mlx.yml
conda activate env_mlx
Install MLX native binaries
Inside the activated environment, install MLX explicitly so the Metal backend is available:
pip install -U mlx
Verify MLX is correctly linked:
python -c "import mlx.core as mx; print(mx)"
You should see a native .so module load without errors.
Run the model
Use the MLX CLI directly:
mlx_lm.generate \
--model ~/Documents/MODELS/mlx-community/Mistral-Small-3.2-24B-Instruct-2506-8bit \
--prompt "What LLM model are you?"
On first run, expect a short Metal warm-up. A successful output confirms that the setup is complete.
==========
I'm a Large Language Model trained by Mistral AI.
==========
Prompt: 8 tokens, 14.191 tokens-per-sec
Generation: 13 tokens, 15.885 tokens-per-sec
Peak memory: 25.096 GB
Keizoku
Keizoku (継続) means continuity, or sustained effort over time. It is not loud persistence or visible struggle, but quiet repetition that compounds almost unnoticed. In Japanese practice, keizoku shows up in daily training, careful habits, and small actions done even when motivation fades. Progress is trusted to time rather than force. What matters is not intensity, but returning again and again, calmly, without drama. This is how skill deepens, character settles, and change becomes permanent.
Pen
Ima
Attention placed fully on the present dissolves anxiety about outcomes.
This phrase reminds you to act clearly now, without borrowing stress from the future.
Ambition narrows awareness.
Presence widens it.
Buns
MOCHI (餅)
Mochi is made from glutinous rice that is steamed and pounded into a stretchy, chewy dough. When filled with sweet red bean paste (anko), it is usually called daifuku. The texture is elastic, slightly sticky, and dense. It is unmistakably Japanese and closely tied to ritual, New Year customs, and seasonal sweets.
BAO WITH RED BEAN (あんまん)
Bao with sweet red bean filling, called anman in Japan, is made from wheat flour dough that is leavened and steamed. The texture is soft, fluffy, and bread-like. This style comes from Chinese culinary tradition and is common as street food or convenience-store fare.
Core difference:
Mochi is rice-based and chewy, bao is wheat-based and fluffy. Mochi is deeply ceremonial and traditional in Japan, bao is adopted and everyday.
NVidia Caterpillar
I keep hearing the same subtext when people talk seriously about modern perception systems, and the podcast made it explicit. Perception is no longer about a clever model running on top of generic hardware. It is a full stack problem where sensors, electronics, data, simulation, training, deployment, and iteration speed all matter equally. If one of those layers is sloppy, the system fails no matter how good the neural network looks on a benchmark.
What resonated most is how far we have moved away from the idea that perception starts with data and ends with inference. In practice, perception starts with physics. Photons, vibrations, motion, noise, timing, power stability, thermal drift. These shape the data long before a model ever sees it. If you ignore this layer, you end up compensating with bigger models, more compute, and endless data cleaning. That is not sophistication, it is waste.
This is where the opportunity for making becomes obvious. Instead of building generic robots or chasing full autonomy, the real leverage is in building small, purpose-built perception instruments. A node, not a platform. One sensing problem, one or two sensors, tightly integrated electronics, deterministic timing, clean power, and just enough local intelligence to extract structure from the signal. Everything else can be pushed upstream.
The podcast emphasized simulation and synthetic data as first-class tools, not backups. That only works if your hardware is well defined. When you control the sensor characteristics, the sampling, the noise profile, and the geometry, simulation becomes meaningful. When your hardware is ad hoc, synthetic data becomes fiction. Making your own electronics is what closes that gap. It turns simulation into a usable engineering tool rather than a marketing slide.
From a practical standpoint, this reframes how I think about AI on the edge. The device does not need to be smart in a human sense. It needs to be precise. Timestamping, synchronization, filtering, event detection, compression, maybe a small embedding or classifier. That is enough. The heavy reasoning, training, and iteration live on a workstation or server where iteration is cheap. Edge intelligence exists to reduce ambiguity and bandwidth, not to impress.
The build loop becomes very concrete. Design a small board around a camera, IMU, microphone, or low-cost LiDAR. Get the clocking right. Get the power right. Mount it correctly. Collect data you trust. Augment it with simulation that actually matches the device. Train a narrow model for one task. Deploy it back. Observe failure modes. Revise both the electronics and the model. Repeat. This loop is faster and more educational than any abstract model comparison.
What I take away most strongly is that iteration speed beats theoretical optimality. Teams and individuals who can close the loop from field failure back to retraining and redeployment will always outperform those chasing perfect architectures. Custom hardware accelerates that loop because it removes unknowns. You know what the sensor is doing because you built it.
For anyone interested in #make perception with AI, the path is clear. Do not start with autonomy. Start with perception primitives. Build devices that see, hear, or feel one thing well. Treat electronics as part of the learning system, not a carrier for it. When physics and electronics are handled with care, the AI becomes smaller, simpler, and more reliable. That is not a compromise. That is good engineering.
3D Perception for Ai machines
I like to plan what the near future will hold for us.
Movies like this one convince me that the 3D perception is the interface between Ai machines and the real world.
Shobudo new year's resolutions and goals
Yesterday, Kevin Cummings Sensei and Michael Morris Master Sensei mentioned that they trained under Matt Sensei. Our Shōbudō(尚武道)karate traces its lineage to Toyama Kanken’s Toyama-ryū(當山流), transmitted through Shūdōkan dōjō(修道館道場)in Naha, Okinawa(沖縄). At this point, I do not yet have further details.
To my pleasant surprise, as a 2026 new year's resolution, Sensei discussed tradition, discipline and the use of Japanese language. I really welcome it because as a student of cultural anthropology of East Asia, ex-U.S. Marine I am a traditionalist and I dislike the current trend of "Mc-dojos".Seiretsu(整列) Line up! Students line up in sempai(先輩)to kōhai(後輩)order on a single line, by belt rank first and age second, juniors closer to the door.
Musubi-dachi(結び立ち) Attention stance. Heels together, toes slightly apart, used when receiving formal instruction and during opening and closing.
Hachiji-dachi(八字立ち) Ready stance. Feet shoulder-width apart, forming a stable hourglass shape, used while awaiting the next command.
Shōmen ni rei(正面に礼) Bow to the front. A formal bow toward the shōmen(正面)of the dōjō(道場).
Sensei ni rei(先生に礼) Bow to the instructor. A formal bow directed to the sensei(先生)leading the class.
Onegai shimasu(お願いします) Spoken before training begins. A formal request to train together under instruction.
Joint Embedding Predictive Architecture (JEPA)
Overview
Recently, I came across a paper on the "Joint Embedding Predictive Architecture" (JEPA), co-authored by Yann LeCun, which may change how we think about robotic perception.
JEPA (Joint Embedding Predictive Architecture) is a method for training machines to understand the world by learning what can be predicted rather than what can be named. The goal is not to describe what is seen, but to grasp the meaning of a situation well enough to anticipate what will happen next.
At its core, JEPA trains a system to understand the world's affordances: what is stable, what can change, and which actions lead to which outcomes. It does this without putting that understanding into any particular language or visual image. Instead, the system learns an internal sense of “what makes sense” by being asked to infer missing pieces of its experience from what remains visible.
This approach is especially natural for 3D perception. Geometry carries constraints that appearance alone cannot provide. Surfaces support or block motion, empty space allows motion, and spatial relationships persist even when parts of the scene are occluded. By learning to predict what must be present but is not directly observed, the system internalizes these constraints as part of its understanding.
The representations learned in this way are not optimized for classification or reconstruction. They are optimized for coherence, or meaning over time. They capture the structure of the environment in a form that remains meaningful as the viewpoint changes or as actions unfold.
This makes JEPA a foundation for robotics. It provides a way to build perception grounded in the physical world, in which meaning emerges from what can happen rather than from how things are labeled. Subsequent chapters will build on this idea, one concept at a time.
The Real World
We all experience the world before we can describe it.Long before words or drawings, there is experience. The body moves, the environment responds, and patterns slowly become familiar.
A hand presses against a table and feels resistance. An object released from the hand falls. A path allows movement; a wall does not. These are not lessons delivered through instruction. They are discovered through contact, repeated until they no longer surprise. Understanding begins as a sense of what tends to happen.
There is no clear boundary between perception and action. To function at all, a learning system must stay oriented within this flow, forming expectations that are stable enough to guide the next move.
Not every detail can be predicted. Surfaces vary. Shadows change. Yet most of this variation can be ignored. What matters are constraints that hold. Objects occupy space. Support prevents falling. Push transfers force. These regularities persist even as appearances change.
From these regularities, intuition forms. A cup is understood not as an image but as something that can be grasped, filled, tilted, or dropped. The meaning lies in what can be done and what will follow. These are affordances. They are learned through action, not description.
This kind of understanding is wordless and implicit. It manifests as confidence in movement and restraint, with fewer surprises over time. Prediction improves, and with it, the ability to act without hesitation.
Only later do we begin to name what we already know. Words arrive as mental shortcuts. They point to patterns that existed long before they were labeled. Language facilitates communication, but it rests on a foundation that would exist without it.
The real world teaches first by enforcing consequences. It rewards correct predictions and corrects incorrect ones. From this process, a stable internal sense of space, time, and possibility emerges. That sense precedes abstraction and underpins it.
Text Is Not Enough
Language appears to be intelligent because it is visible. We can read, hear, and test it. When a system produces fluent text, it conveys an understanding. But language is not where understanding begins. It is where it ends up after something quieter and more physical has already occurred.
Text is a record of experience, not the experience itself. It assumes a shared world. When we say that a cup fell, we do not explain gravity, support, or collision. We assume them. A reader already knows what falling means because they have lived in a world where objects drop when released. Language works by leaning on that prior knowledge, not by creating it.
This is the limit of language-only systems. They learn correlations between symbols that already compress the world. They never see the constraints that produced those symbols. They do not feel resistance, continuity, or consequence. As a result, they can describe situations fluently while lacking an internal sense of what can actually happen next.
Real understanding is predictive. It is the ability to anticipate outcomes without narrating them. When you reach for a door handle, you already know how it will turn. When you step onto a surface, you trust it will support your weight. This knowledge is not verbal. It is a lived model of the world, shaped by repeated exposure to structure and constraint.
To move beyond language, we must return to those foundations. Systems must learn from signals that carry the world’s structure directly: space, motion, geometry, and time. They must form internal representations that stay coherent as viewpoints change and actions unfold. The goal is not to predict words or images, but to predict what remains stable and what can change.
This is the shift ahead. From describing the world to modeling it. From naming outcomes to anticipating them. Language will remain useful, but it will sit on top of something deeper: an internal sense of how the world works, learned from the world itself.


