digitado

Learning Rate Annealing Improves Tuning Robustness in Stochastic Optimization

digitado ⋅ 17 de February de 2026

arXiv:2503.09411v2 Announce Type: replace-cross Abstract: The learning rate in stochastic gradient methods is a critical hyperparameter that is notoriously costly to tune via standard grid search, especially for training modern large-scale models with billions of parameters. We identify a theoretical advantage of learning rate annealing schemes that decay the learning rate to zero at a polynomial rate, such as the widely-used cosine schedule, by demonstrating their increased robustness to initial parameter misspecification due to a coarse grid search. […]

Ver mais

Like 0

Liked Liked

technocracy

Codex vs Factory vs Cursor: Does the Harness Really Matter for MVP Prototyping?

digitado ⋅ 25 de February de 2026

When I first thought about this article, I had planned to go with a headline ‘Harness Matters’ or something cool like ‘A good harness is all you need’, but to my own shock, it seems to not actually make that much of an impact – atleast as far as MVP’ing a front end wireframe goes. Maybe much further down the line it might matter more than this and compounds on varying abilities present in each of them, but […]

Ver mais

Like 0

Liked Liked

technocracy

[D] Why are so many ML packages still released using “requirements.txt” or “pip inside conda” as the only installation instruction?

digitado ⋅ 24 de January de 2026

These are often on the “what you are not supposed to do” list, so why are they so commonplace in ML? Bare pip / requirements.txt is quite bad at managing conflicts / build environments and is very difficult to integrate into an existing project. On the other hand, if you are already using conda, why not actually use conda? pip inside a conda environment is just making both package managers’ jobs harder. There seem to be so many […]

Ver mais

Like 0

Liked Liked

technocracy

Is Orion’s heat shield really safe? New NASA chief conducts final review on eve of flight.

digitado ⋅ 9 de January de 2026

WASHINGTON, DC—This week, NASA’s new administrator, Jared Isaacman, said he has “full confidence” in the space agency’s plans to use the existing heat shield to protect the Orion spacecraft during its upcoming lunar mission. Isaacman made the determination after briefings with senior leaders at the agency and a half-day review of NASA’s findings with outside experts. “We have full confidence in the Orion spacecraft and its heat shield, grounded in rigorous analysis and the work of exceptional engineers […]

Ver mais

Like 0

Liked Liked

technocracy

Social-JEPA: Emergent Geometric Isomorphism

digitado ⋅ 4 de March de 2026

arXiv:2603.02263v1 Announce Type: new Abstract: World models compress rich sensory streams into compact latent codes that anticipate future observations. We let separate agents acquire such models from distinct viewpoints of the same environment without any parameter sharing or coordination. After training, their internal representations exhibit a striking emergent property: the two latent spaces are related by an approximate linear isometry, enabling transparent translation between them. This geometric consensus survives large viewpoint shifts and scant overlap in raw pixels. […]

Ver mais

Like 0

Liked Liked

technocracy

The Birth of an Obsession

digitado ⋅ 2 de March de 2026

:::info Astounding Stories of Super-Science October, 1994, by Astounding Stories is part of HackerNoon’s Book Blog Post series. You can jump to any chapter in this book here. The Picture of Dorian Gray – Chapter I Astounding Stories of Super-Science October 1994: The Picture of Dorian Gray – Chapter I By Oscar Wilde ::: The studio was filled with the rich odour of roses, and when the light summer wind stirred amidst the trees of the garden, there came […]

Ver mais

Like 0

Liked Liked

technocracy

[R] GFlowsNets for accelerating ray tracing for radio propagation modeling

digitado ⋅ 4 de March de 2026

Hi everyone! I have just submitted my new journal paper on using Generative Flow Networks (GFlowNets) to speed up radio propagation modeling. Preprint on arXiv Tutorial notebook GitHub repository The problem and our solution Traditional point-to-point ray tracing suffers from exponential computational complexity, scaling with the number of objects raised to the interaction order. To fix this bottleneck, we define path finding as a sequential decision process and trained a generative model to intelligently sample valid ray paths […]

Ver mais

Like 0

Liked Liked

technocracy

Know Your Scientist: KYC as Biosecurity Infrastructure

digitado ⋅ 9 de February de 2026

arXiv:2602.06172v1 Announce Type: new Abstract: Biological AI tools for protein design and structure prediction are advancing rapidly, creating dual-use risks that existing safeguards cannot adequately address. Current model-level restrictions, including keyword filtering, output screening, and content-based access denials, are fundamentally ill-suited to biology, where reliable function prediction remains beyond reach and novel threats evade detection by design. We propose a three-tier Know Your Customer (KYC) framework, inspired by anti-money laundering (AML) practices in the financial sector, that shifts […]

Ver mais

Like 0

Liked Liked

technocracy

Tacit Coordination of Large Language Models

digitado ⋅ 2 de February de 2026

arXiv:2601.22184v1 Announce Type: new Abstract: In tacit coordination games with multiple outcomes, purely rational solution concepts, such as Nash equilibria, provide no guidance for which equilibrium to choose. Shelling’s theory explains how, in these settings, humans coordinate by relying on focal points: solutions or outcomes that naturally arise because they stand out in some way as salient or prominent to all players. This work studies Large Language Models (LLMs) as players in tacit coordination games, and addresses how, […]

Ver mais

Like 0

Liked Liked

technocracy

Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth

digitado ⋅ 7 de January de 2026

arXiv:2601.02609v1 Announce Type: cross Abstract: Large language model fine-tuning is bottlenecked by memory: a 7B parameter model requires 84GB–14GB for weights, 14GB for gradients, and 56GB for FP32 optimizer states–exceeding even A100-40GB capacity. We present Chronicals, an open-source training framework achieving 3.51x speedup over Unsloth through four synergistic optimizations: (1) fused Triton kernels eliminating 75% of memory traffic via RMSNorm (7x), SwiGLU (5x), and QK-RoPE (2.3x) fusion; (2) Cut Cross-Entropy reducing logit memory from 5GB to 135MB through […]

Ver mais

Like 0

Liked Liked