LLM model storage
Store model weights once, outside any tool-specific directory. This allows us to share them across multiple machines.
For example:
~/Documents/MODELS/mlx-community/Mistral-Small-3.2-24B-Instruct-2506-8bit
Create a clean conda environment
Use a conda environment defined in a YAML file. Do not reuse the base.
environment_mlx.yml:
name: env_mlx
channels:
- conda-forge
dependencies:
- python=3.11
- pip
- pip:
- mlx-lm
Create and activate the environment:
conda env create -f environment_mlx.yml
conda activate env_mlx
Install MLX native binaries
Inside the activated environment, install MLX explicitly so the Metal backend is available:
pip install -U mlx
Verify MLX is correctly linked:
python -c "import mlx.core as mx; print(mx)"
You should see a native .so module load without errors.
Run the model
Use the MLX CLI directly:
mlx_lm.generate \
--model ~/Documents/MODELS/mlx-community/Mistral-Small-3.2-24B-Instruct-2506-8bit \
--prompt "What LLM model are you?"
On first run, expect a short Metal warm-up. A successful output confirms that the setup is complete.
==========
I'm a Large Language Model trained by Mistral AI.
==========
Prompt: 8 tokens, 14.191 tokens-per-sec
Generation: 13 tokens, 15.885 tokens-per-sec
Peak memory: 25.096 GB
No comments:
Post a Comment