digitado

Understanding the Role of Training Data in Test-Time Scaling

digitado ⋅ 3 de March de 2026

arXiv:2510.03605v2 Announce Type: replace-cross Abstract: Test-time scaling improves the reasoning capabilities of large language models (LLMs) by allocating extra compute to generate longer Chains-of-Thoughts (CoTs). This enables models to tackle more complex problem by breaking them down into additional steps, backtracking, and correcting mistakes. Despite its strong performance–demonstrated by OpenAI’s o1 and DeepSeek R1, the conditions in the training data under which long CoTs emerge, and when such long CoTs improve the performance, remain unclear. In this paper, […]

Ver mais

Like 0

Liked Liked

technocracy

Data-driven construction of machine-learning-based interatomic potentials for gas-surface scattering dynamics: the case of NO on graphite

digitado ⋅ 19 de March de 2026

Accurate atomistic simulations of gas-surface scattering require potential energy surfaces that remain reliable over broad configurational and energetic ranges while retaining the efficiency needed for extensive trajectory sampling. Here, we develop a data-driven workflow for constructing a machine-learning interatomic potential (MLIP) tailored to gas-surface scattering dynamics, using nitric oxide (NO) scattering from highly oriented pyrolytic graphite (HOPG) as a benchmark system. Starting from an initial ab initio molecular dynamics (AIMD) dataset, local atomic environments are described by SOAP […]

Ver mais

Like 0

Liked Liked

technocracy

Beyond Kemeny Medians: Consensus Ranking Distributions Definition, Properties and Statistical Learning

digitado ⋅ 11 de February de 2026

In this article we develop a new method for summarizing a ranking distribution, textit{i.e.} a probability distribution on the symmetric group $mathfrak{S}_n$, beyond the classical theory of consensus and Kemeny medians. Based on the notion of textit{local ranking median}, we introduce the concept of textit{consensus ranking distribution} ($crd$), a sparse mixture model of Dirac masses on $mathfrak{S}_n$, in order to approximate a ranking distribution with small distortion from a mass transportation perspective. We prove that by choosing the […]

Ver mais

Like 0

Liked Liked

technocracy

MERIT: Memory-Enhanced Retrieval for Interpretable Knowledge Tracing

digitado ⋅ 25 de March de 2026

arXiv:2603.22289v1 Announce Type: new Abstract: Knowledge Tracing (KT) models students’ evolving knowledge states to predict future performance, serving as a foundation for personalized education. While traditional deep learning models achieve high accuracy, they often lack interpretability. Large Language Models (LLMs) offer strong reasoning capabilities but struggle with limited context windows and hallucinations. Furthermore, existing LLM-based methods typically require expensive fine-tuning, limiting scalability and adaptability to new data. We propose MERIT (Memory-Enhanced Retrieval for Interpretable Knowledge Tracing), a training-free […]

Ver mais

Like 0

Liked Liked

technocracy

Coarse-to-Real: Generative Rendering for Populated Dynamic Scenes

digitado ⋅ 2 de February de 2026

arXiv:2601.22301v1 Announce Type: new Abstract: Traditional rendering pipelines rely on complex assets, accurate materials and lighting, and substantial computational resources to produce realistic imagery, yet they still face challenges in scalability and realism for populated dynamic scenes. We present C2R (Coarse-to-Real), a generative rendering framework that synthesizes real-style urban crowd videos from coarse 3D simulations. Our approach uses coarse 3D renderings to explicitly control scene layout, camera motion, and human trajectories, while a learned neural renderer generates realistic […]

Ver mais

Like 0

Liked Liked

technocracy

Governments Need To Take a More Active Role in Regulating AI: Here’s Why

digitado ⋅ 7 de March de 2026

Hi everyone, Ross here—I’m an Investigative Data Journalist at The Markup. We publish a lot of words telling you what AI can (and often shouldn’t) do, but how can public policy keep AI in check? Governments are ramping up their role in keeping AI algorithms accountable and limiting their harms. We covered President Joe Biden’s AI order back in October; last month, a new executive order required all federal agencies to designate a Chief AI Officer, and by […]

Ver mais

Like 0

Liked Liked

technocracy

Canonical LST: A Protocol-Native Liquid Staking Solution for Tezos

digitado ⋅ 5 de May de 2026

arXiv:2605.00828v1 Announce Type: new Abstract: Canonical LST (sTEZ) is an enshrined, protocol-native mechanism designed to mitigate the centralization risks associated with liquid staking intermediaries. Intended to complement direct staking rather than replace it, Canonical LST provides a neutral, public alternative managed directly by the Tezos protocol. It allows any tez holder to participate in aggregated staking without reliance on third-party operators. sTEZ follows an accrual-based design: all slashing events and rewards are reflected in the token’s exchange rate […]

Ver mais

Like 0

Liked Liked

technocracy

The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples

digitado ⋅ 29 de January de 2026

Machine unlearning offers a practical alternative to avoid full model re-training by approximately removing the influence of specific user data. While existing methods certify unlearning via statistical indistinguishability from re-trained models, these guarantees do not naturally extend to model outputs when inputs are adversarially perturbed. In particular, slight perturbations of forget samples may still be correctly recognized by the unlearned model – even when a re-trained model fails to do so – revealing a novel privacy risk: information […]

Ver mais

Like 0

Liked Liked

technocracy

Quiz: Python & APIs: A Winning Combo for Reading Public Data

digitado ⋅ 6 de May de 2026

In this quiz, you’ll test your understanding of Python & APIs: A Winning Combo for Reading Public Data. By working through this quiz, you’ll revisit how APIs send requests and responses, how the requests library works, what status codes and headers mean, and how to handle authentication, pagination, and rate limits in your own code. Good luck! [ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox […]

Ver mais

Like 0

Liked Liked

technocracy

From Noise to Order: Learning to Rank via Denoising Diffusion

digitado ⋅ 12 de February de 2026

In information retrieval (IR), learning-to-rank (LTR) methods have traditionally limited themselves to discriminative machine learning approaches that model the probability of the document being relevant to the query given some feature representation of the query-document pair. In this work, we propose an alternative denoising diffusion-based deep generative approach to LTR that instead models the full joint distribution over feature vectors and relevance labels. While in the discriminative setting, an over-parameterized ranking model may find different ways to fit […]

Ver mais

Like 0

Liked Liked