WW-PGD: Projected Gradient Descent optimizer
Announcing: ๐ช๐ช-๐ฃ๐๐ โ ๐ช๐ฒ๐ถ๐ด๐ต๐๐ช๐ฎ๐๐ฐ๐ต๐ฒ๐ฟ ๐ฃ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐๐ฒ๐ฑ ๐๐ฟ๐ฎ๐ฑ๐ถ๐ฒ๐ป๐ ๐๐ฒ๐๐ฐ๐ฒ๐ป๐ 
I just released WW-PGD, a small PyTorch add-on that wraps standard optimizers (SGD, Adam, AdamW, etc.) and applies an epoch-boundary spectral projection using WeightWatcher diagnostics.
Elevator pitch: WW-PGD explicitly nudges each layer toward the Exact Renormalization Group (ERG) critical manifold during training.
๐ง๐ต๐ฒ๐ผ๐ฟ๐ ๐ถ๐ป ๐๐ต๐ผ๐ฟ๐
โข HTSR critical condition: ฮฑ โ 2
โข SETOL ERG condition: trace-log(ฮป) over the spectral tail = 0
WW-PGD makes these explicit optimization targets, rather than post-hoc diagnostics.

๐๐ผ๐ ๐ถ๐ ๐๐ผ๐ฟ๐ธ๐
- Runs weightwatcher (ww) at epoch boundaries
- Uses ww layer quality metrics to identify the spectral tail
- Selects the optimal tail guess at each epoch
- Applies a stable Projected Gradient Descent update on the layer spectral density via a Proximal, Cayley-like step.
- Retracts to exactly satisfy the SETOL ERG condition
- Blends the projected weights back in (with warmup + ramping to avoid early instability)
In other words, it projects the results of your optimizer on the ERG critical manifold, the feasible set in a spectral constraint optimization setup.
๐ฆ๐ฐ๐ผ๐ฝ๐ฒ (๐ถ๐บ๐ฝ๐ผ๐ฟ๐๐ฎ๐ป๐)
This first public release is focused on training small models from scratch. It is not yet intended for large-scale fine-tuning. Itโs the first proof of concept of such an approach.
So far, WW-PGD has been tested on:
- 3-layer MLPs (MNIST / FashionMNIST)
- nano-GPTโstyle small Transformer models
Larger architectures and fine-tuning workflows are active work in progress.
๐๐ฎ๐ฟ๐น๐ ๐ฟ๐ฒ๐๐๐น๐๐ (๐๐ฎ๐๐ต๐ถ๐ผ๐ป๐ ๐ก๐๐ฆ๐ง, ๐ฏ๐ฑ ๐ฒ๐ฝ๐ผ๐ฐ๐ต๐, ๐บ๐ฒ๐ฎ๐ป ยฑ ๐๐๐ฑ)
Below I show the layer alphas for a small (3-layer) MLP, trained on FashionMNIST to 35 epochs, compared to default AdamW
โข ๐๐ฅ๐๐ข๐ง ๐ญ๐๐ฌ๐ญ: Baseline 98.05% ยฑ 0.13ย vsย WW-PGD 97.99% ยฑ 0.17
โข ๐๐ฎ๐ ๐ฆ๐๐ง๐ญ๐๐ ๐ญ๐๐ฌ๐ญ: Baseline 96.24% ยฑ 0.17ย vsย WW-PGD 96.23% ยฑ 0.20
Translation: accuracy is roughly neutral at this scale โ but WW-PGD gives you a spectral control knob and full per-epoch tuning.
๐ฅ๐ฒ๐ฝ๐ผ & ๐ค๐๐ถ๐ฐ๐ธ๐ฆ๐๐ฎ๐ฟ๐
Repo: https://github.com/CalculatedContent/WW_PGD
QuickStart (with MLP3+FashionMNIST example): https://github.com/CalculatedContent/WW_PGD/blob/main/WW_PGD_QuickStart.ipynb
๐ ๐ผ๐ฟ๐ฒ ๐ถ๐ป๐ณ๐ผ: https://weightwatcher.ai/ww_pgd.html
If youโre experimenting with training and optimization on your own models, or want a data-free spectral health monitor + projection step, Iโd love feedback โ especially on other optimizers or small Transformer setups.
Join us on the weightwatcher Community Discord to discuss
https://discord.com/invite/uVVsEAcfyF
A big thanks to hari kishan prakash for helping out here.
And, as always, if you need help with AI, reach out to me here. #talkToChuck
