WW-PGD: Projected Gradient Descent optimizer

digitado ⋅ 13 de dezembro de 2025

Announcing: 𝗪𝗪-𝗣𝗚𝗗 — 𝗪𝗲𝗶𝗴𝗵𝘁𝗪𝗮𝘁𝗰𝗵𝗲𝗿 𝗣𝗿𝗼𝗷𝗲𝗰𝘁𝗲𝗱 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 𝗗𝗲𝘀𝗰𝗲𝗻𝘁

I just released WW-PGD, a small PyTorch add-on that wraps standard optimizers (SGD, Adam, AdamW, etc.) and applies an epoch-boundary spectral projection using WeightWatcher diagnostics.

Elevator pitch: WW-PGD explicitly nudges each layer toward the Exact Renormalization Group (ERG) critical manifold during training.

𝗧𝗵𝗲𝗼𝗿𝘆 𝗶𝗻 𝘀𝗵𝗼𝗿𝘁

• HTSR critical condition: α ≈ 2

• SETOL ERG condition: trace-log(λ) over the spectral tail = 0

WW-PGD makes these explicit optimization targets, rather than post-hoc diagnostics.

𝗛𝗼𝘄 𝗶𝘁 𝘄𝗼𝗿𝗸𝘀

Runs weightwatcher (ww) at epoch boundaries
Uses ww layer quality metrics to identify the spectral tail
Selects the optimal tail guess at each epoch
Applies a stable Projected Gradient Descent update on the layer spectral density via a Proximal, Cayley-like step.
Retracts to exactly satisfy the SETOL ERG condition
Blends the projected weights back in (with warmup + ramping to avoid early instability)

In other words, it projects the results of your optimizer on the ERG critical manifold, the feasible set in a spectral constraint optimization setup.

𝗦𝗰𝗼𝗽𝗲 (𝗶𝗺𝗽𝗼𝗿𝘁𝗮𝗻𝘁)

This first public release is focused on training small models from scratch. It is not yet intended for large-scale fine-tuning. It’s the first proof of concept of such an approach.

So far, WW-PGD has been tested on:

3-layer MLPs (MNIST / FashionMNIST)
nano-GPT–style small Transformer models

Larger architectures and fine-tuning workflows are active work in progress.

𝗘𝗮𝗿𝗹𝘆 𝗿𝗲𝘀𝘂𝗹𝘁𝘀 (𝗙𝗮𝘀𝗵𝗶𝗼𝗻𝗠𝗡𝗜𝗦𝗧, 𝟯𝟱 𝗲𝗽𝗼𝗰𝗵𝘀, 𝗺𝗲𝗮𝗻 ± 𝘀𝘁𝗱)

Below I show the layer alphas for a small (3-layer) MLP, trained on FashionMNIST to 35 epochs, compared to default AdamW

• 𝐏𝐥𝐚𝐢𝐧 𝐭𝐞𝐬𝐭: Baseline 98.05% ± 0.13 vs WW-PGD 97.99% ± 0.17

• 𝐀𝐮𝐠𝐦𝐞𝐧𝐭𝐞𝐝 𝐭𝐞𝐬𝐭: Baseline 96.24% ± 0.17 vs WW-PGD 96.23% ± 0.20

Translation: accuracy is roughly neutral at this scale — but WW-PGD gives you a spectral control knob and full per-epoch tuning.

𝗥𝗲𝗽𝗼 & 𝗤𝘂𝗶𝗰𝗸𝗦𝘁𝗮𝗿𝘁

Repo: https://github.com/CalculatedContent/WW_PGD

QuickStart (with MLP3+FashionMNIST example): https://github.com/CalculatedContent/WW_PGD/blob/main/WW_PGD_QuickStart.ipynb

𝗠𝗼𝗿𝗲 𝗶𝗻𝗳𝗼: https://weightwatcher.ai/ww_pgd.html

If you’re experimenting with training and optimization on your own models, or want a data-free spectral health monitor + projection step, I’d love feedback — especially on other optimizers or small Transformer setups.

Join us on the weightwatcher Community Discord to discuss

https://discord.com/invite/uVVsEAcfyF

A big thanks to hari kishan prakash for helping out here.

And, as always, if you need help with AI, reach out to me here. #talkToChuck

Like 0

Liked Liked