digitado

Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation

digitado ⋅ 1 de April de 2026

arXiv:2603.28769v1 Announce Type: new Abstract: Evaluating large language models at scale remains a practical bottleneck for many organizations. While existing evaluation frameworks work well for thousands of examples, they struggle when datasets grow to hundreds of thousands or millions of samples. This scale is common when assessing model behavior across diverse domains or conducting comprehensive regression testing. We present Spark-LLM-Eval, a distributed evaluation framework built natively on Apache Spark. The system treats evaluation as a data-parallel problem, partitioningexamplesacrossexecutorsandaggregatingresultswithproperstatistical […]

Ver mais

Like 0

Liked Liked

technocracy

Not all tokens are needed(NAT): token efficient reinforcement learning

digitado ⋅ 10 de March de 2026

arXiv:2603.06619v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a key driver of progress in large language models, but scaling RL to long chain-of-thought (CoT) trajectories is increasingly constrained by backpropagation over every generated token. Even with optimized rollout engines, full-token updates can consume a large fraction of total training cost, turning token length into a hidden tax on RL. We introduce Not All Tokens Are Needed (NAT), a unified framework that makes the token budget a […]

Ver mais

Like 0

Liked Liked

technocracy

MOSAIC-GS: Monocular Scene Reconstruction via Advanced Initialization for Complex Dynamic Environments

digitado ⋅ 12 de January de 2026

arXiv:2601.05368v1 Announce Type: new Abstract: We present MOSAIC-GS, a novel, fully explicit, and computationally efficient approach for high-fidelity dynamic scene reconstruction from monocular videos using Gaussian Splatting. Monocular reconstruction is inherently ill-posed due to the lack of sufficient multiview constraints, making accurate recovery of object geometry and temporal coherence particularly challenging. To address this, we leverage multiple geometric cues, such as depth, optical flow, dynamic object segmentation, and point tracking. Combined with rigidity-based motion constraints, these cues allow […]

Ver mais

Like 0

Liked Liked

technocracy

Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification

digitado ⋅ 25 de March de 2026

arXiv:2603.22644v1 Announce Type: new Abstract: We consider a PAC-Bayes type learning rule for binary classification, balancing the training error of a randomized ”posterior” predictor with its KL divergence to a pre-specified ”prior”. This can be seen as an extension of a modified two-part-code Minimum Description Length (MDL) learning rule, to continuous priors and randomized predictions. With a balancing parameter of $lambda=1$ this learning rule recovers an (empirical) Bayes posterior and a modified variant recovers the profile posterior, linking […]

Ver mais

Like 0

Liked Liked

technocracy

Influence Diagnostics in High-dimensional M-estimation: Precise Asymptotics

digitado ⋅ 13 de July de 2026

arXiv:2607.09250v1 Announce Type: new Abstract: The impact of a given training point on a statistical model is classically measured through its leave-one-out influence, which quantifies the effect of its removal from the training set on the model accuracy. While the statistics of leave-one-out influences are well understood in the low-dimensional, large sample limit $nto infty, d=O(1)$, they become more intricate in high dimensions, as the influence of a given sample develops non-trivial dependencies on all other training samples. […]

Ver mais

Like 0

Liked Liked

technocracy

A Cache-Aware Hybrid Sieve Combining Segmentation and Bit-Packing for Fast Prime Generation

digitado ⋅ 29 de January de 2026

arXiv:2601.19909v1 Announce Type: new Abstract: Prime generation is a fundamental task in cryptography, number theory, and randomized algorithms. While the classical Sieve of Eratosthenes is simple and efficient in theory, its practical performance on modern central processing units is often limited by memory access inefficiencies. This paper introduces a cache-aware hybrid sieve that integrates segmentation, bit-packing, and cache-line-aligned block processing to optimize memory bandwidth and level one and level two cache locality. The proposed approach reduces memory usage […]

Ver mais

Like 0

Liked Liked

technocracy

Flatter Tokens are More Valuable for Speculative Draft Model Training

digitado ⋅ 28 de January de 2026

arXiv:2601.18902v1 Announce Type: new Abstract: Speculative Decoding (SD) is a key technique for accelerating Large Language Model (LLM) inference, but it typically requires training a draft model on a large dataset. We approach this problem from a data-centric perspective, finding that not all training samples contribute equally to the SD acceptance rate. Specifically, our theoretical analysis and empirical validation reveals that tokens inducing flatter predictive distributions from the target model are more valuable than those yielding sharply peaked […]

Ver mais

Like 0

Liked Liked

technocracy

McDonald’s tests Google-backed AI drive-thru ordering system

digitado ⋅ 10 de June de 2026

McDonald’s is testing a new AI system that can take drive-thru orders and support restaurant operations. The system, called ArchIQ and nicknamed “Archy,” was introduced during the company’s Worldwide convention, according to Restaurant Business. It is being tested at five McDonald’s locations in the United States, though the company has not named the restaurants involved. A video shared on X by a McDonald’s franchise owner showed the system greeting customers, processing order changes, displaying the final total, and […]

Ver mais

Like 0

Liked Liked

technocracy

Confidence Aware Reinforcement Learning: Advancing Large Language Models in Dynamic Environments

digitado ⋅ 21 de April de 2026

Building Large Language Model Predictive Confidence to Navigate Uncertainty with Resiliency and Conviction Environments are in constant change as the physical world and contextual signals evolve to reflect new meaning or redefine ground truth. Large language models (LLM) have also evolved to interpret and understand these signals, from visual inspection of the physical world to formulating contextual meaning of synthetic datasets. The rate of change in these signals is increasing at an exponential gradient that is unbounded and […]

Ver mais

Like 0

Liked Liked

technocracy

The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity

digitado ⋅ 22 de May de 2026

arXiv:2605.21492v1 Announce Type: new Abstract: No feature ranking can be simultaneously faithful, stable, and complete when features are collinear. For collinear pairs, ranking reduces to a coin flip. We prove this impossibility, quantify it for four model classes, resolve it via ensemble averaging (DASH), and machine-verify it with 305 Lean 4 theorems. We characterize the complete attribution design space: exactly two families of methods exist — faithful-complete methods (unstable, with rankings that flip up to 50% of the […]

Ver mais

Like 0

Liked Liked