Running a 24B Mistral Model Locally with MLX macOS

LLM model storage

Store model weights once, outside any tool-specific directory. This allows us to share them across multiple machines.

For example:

~/Documents/MODELS/mlx-community/Mistral-Small-3.2-24B-Instruct-2506-8bit

Use a conda environment defined in a YAML file. Do not reuse the base.

environment_mlx.yml:

name: env_mlx
channels:
- conda-forge
dependencies:
- python=3.11
- pip
- pip:
- mlx-lm

conda env create -f environment_mlx.yml
conda activate env_mlx

Inside the activated environment, install MLX explicitly so the Metal backend is available:

pip install -U mlx

Verify MLX is correctly linked:

python -c "import mlx.core as mx; print(mx)"

You should see a native .so module load without errors.

Use the MLX CLI directly:

mlx_lm.generate \
--model ~/Documents/MODELS/mlx-community/Mistral-Small-3.2-24B-Instruct-2506-8bit \
--prompt "What LLM model are you?"

On first run, expect a short Metal warm-up. A successful output confirms that the setup is complete.

==========

I'm a Large Language Model trained by Mistral AI.

==========

Prompt: 8 tokens, 14.191 tokens-per-sec

Generation: 13 tokens, 15.885 tokens-per-sec

Peak memory: 25.096 GB

As an Amazon Associate I earn from qualifying purchases.