digitado

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

digitado ⋅ 14 de January de 2026

arXiv:2601.07853v1 Announce Type: new Abstract: Financial agents powered by large language models (LLMs) are increasingly deployed for investment analysis, risk assessment, and automated decision-making, where their abilities to plan, invoke tools, and manipulate mutable state introduce new security risks in high-stakes and highly regulated financial environments. However, existing safety evaluations largely focus on language-model-level content compliance or abstract agent settings, failing to capture execution-grounded risks arising from real operational workflows and state-changing actions. To bridge this gap, we […]

Ver mais

Like 0

Liked Liked

technocracy

Large Spikes in Stochastic Gradient Descent: A Large-Deviations View

digitado ⋅ 12 de March de 2026

arXiv:2603.10079v1 Announce Type: new Abstract: We analyse SGD training of a shallow, fully connected network in the NTK scaling and provide a quantitative theory of the catapult phase. We identify an explicit criterion separating two behaviours: When an explicit function $G$, depending only on the kernel, learning rate $eta$ and data, is positive, SGD produces large NTK-flattening spikes with high probability; when $G<0$, their probability decays like $(n/eta)^{-vartheta/2}$, for an explicitly characterised $varthetain (0,infty)$. This yields a concrete […]

Ver mais

Like 0

Liked Liked

technocracy

MM-algorithms for traditional and convex NMF with Tweedie and Negative Binomial cost functions and empirical evaluation

digitado ⋅ 11 de March de 2026

arXiv:2603.09601v1 Announce Type: cross Abstract: Non-negative matrix factorisation (NMF) is a widely used tool for unsupervised learning and feature extraction, with applications ranging from genomics to text analysis and signal processing. Standard formulations of NMF are typically derived under Gaussian or Poisson noise assumptions, which may be inadequate for data exhibiting overdispersion or other complex mean-variance relationships. In this paper, we develop a unified framework for both traditional and convex NMF under a broad class of distributional assumptions, […]

Ver mais

Like 0

Liked Liked

technocracy

Learning State-Tracking from Code Using Linear RNNs

digitado ⋅ 16 de February de 2026

Over the last years, state-tracking tasks, particularly permutation composition, have become a testbed to understand the limits of sequence models architectures like Transformers and RNNs (linear and non-linear). However, these are often sequence-to-sequence tasks: learning to map actions (permutations) to states, which is incompatible with the next-token prediction setting commonly used to train language models. We address this gap by converting permutation composition into code via REPL traces that interleave state-reveals through prints and variable transformations. We show […]

Ver mais

Like 0

Liked Liked

technocracy

Fast and Large-Scale Unbalanced Optimal Transport via its Semi-Dual and Adaptive Gradient Methods

digitado ⋅ 12 de February de 2026

arXiv:2602.10697v1 Announce Type: cross Abstract: Unbalanced Optimal Transport (UOT) has emerged as a robust relaxation of standard Optimal Transport, particularly effective for handling outliers and mass variations. However, scalable algorithms for UOT, specifically those based on Gradient Descent (SGD), remain largely underexplored. In this work, we address this gap by analyzing the semi-dual formulation of Entropic UOT and demonstrating its suitability for adaptive gradient methods. While the semi-dual is a standard tool for large-scale balanced OT, its geometry […]

Ver mais

Like 0

Liked Liked

technocracy

Deep Dense Exploration for LLM Reinforcement Learning via Pivot-Driven Resampling

digitado ⋅ 15 de February de 2026

Effective exploration is a key challenge in reinforcement learning for large language models: discovering high-quality trajectories within a limited sampling budget from the vast natural language sequence space. Existing methods face notable limitations: GRPO samples exclusively from the root, saturating high-probability trajectories while leaving deep, error-prone states under-explored. Tree-based methods blindly disperse budgets across trivial or unrecoverable states, causing sampling dilution that fails to uncover rare correct suffixes and destabilizes local baselines. To address this, we propose Deep […]

Ver mais

Like 0

Liked Liked

technocracy

When Models Know When They Do Not Know: Calibration, Cascading, and Cleaning

digitado ⋅ 14 de January de 2026

arXiv:2601.07965v1 Announce Type: new Abstract: When a model knows when it does not know, many possibilities emerge. The first question is how to enable a model to recognize that it does not know. A promising approach is to use confidence, computed from the model’s internal signals, to reflect its ignorance. Prior work in specific domains has shown that calibration can provide reliable confidence estimates. In this work, we propose a simple, effective, and universal training-free method that applies […]

Ver mais

Like 0

Liked Liked

technocracy

Stability and Robustness via Regularization: Bandit Inference via Regularized Stochastic Mirror Descent

digitado ⋅ 12 de March de 2026

arXiv:2603.10184v1 Announce Type: new Abstract: Statistical inference with bandit data presents fundamental challenges due to adaptive sampling, which violates the independence assumptions underlying classical asymptotic theory. Recent work has identified stability as a sufficient condition for valid inference under adaptivity. This paper develops a systematic theory of stability for bandit algorithms based on stochastic mirror descent, a broad algorithmic framework that includes the widely-used EXP3 algorithm as a special case. Our contributions are threefold. First, we establish a […]

Ver mais

Like 0

Liked Liked

technocracy

Building Production-Ready RAG Systems with Free LLMs: From Zero to Analysis-Ready in 6 Steps

digitado ⋅ 17 de February de 2026

Introduction When I started exploring Retrieval-Augmented Generation (RAG) systems for incident analysis, I realized that jumping straight into paid APIs like Claude or OpenAI wasn’t practical for learning and experimentation. Instead, I wanted to build something completely local, free to run, and powerful enough to handle real production scenarios. This article documents my journey building a fully functional RAG system that analyzes production incidents by learning from past issues — without spending a dime on API calls. Everything runs on […]

Ver mais

Like 0

Liked Liked

technocracy

130k Lines of Formal Topology in Two Weeks: Simple and Cheap Autoformalization for Everyone?

digitado ⋅ 8 de January de 2026

arXiv:2601.03298v1 Announce Type: new Abstract: This is a brief description of a project that has already autoformalized a large portion of the general topology from the Munkres textbook (which has in total 241 pages in 7 chapters and 39 sections). The project has been running since November 21, 2025 and has as of January 4, 2026, produced 160k lines of formalized topology. Most of it (about 130k lines) have been done in two weeks,from December 22 to January […]

Ver mais

Like 0

Liked Liked