digitado

Linux Kernel Recency Matters, CVE Severity Doesn’t, and History Fades

digitado ⋅ 2 de February de 2026

arXiv:2601.22196v1 Announce Type: new Abstract: In 2024, the Linux kernel became its own Common Vulnerabilities and Exposures (CVE) Numbering Authority (CNA), formalizing how kernel vulnerabilities are identified and tracked. We analyze the anatomy and dynamics of kernel CVEs using metadata, associated commits, and patch latency to understand what drives patching. Results show that severity and Common Vulnerability Scoring System (CVSS) metrics have a negligible association with patch latency, whereas kernel recency is a reasonable predictor in survival models. […]

Ver mais

Like 0

Liked Liked

technocracy

A Semantic Observer Layer for Autonomous Vehicles: Pre-Deployment Feasibility Study of VLMs for Low-Latency Anomaly Detection

digitado ⋅ 1 de April de 2026

arXiv:2603.28888v1 Announce Type: new Abstract: Semantic anomalies-context-dependent hazards that pixel-level detectors cannot reason about-pose a critical safety risk in autonomous driving. We propose a emph{semantic observer layer}: a quantized vision-language model (VLM) running at 1–2,Hz alongside the primary AV control loop, monitoring for semantic edge cases, and triggering fail-safe handoffs when detected. Using Nvidia Cosmos-Reason1-7B with NVFP4 quantization and FlashAttention2, we achieve ~500 ms inference a ~50x speedup over the unoptimized FP16 baseline (no quantization, standard PyTorch attention) […]

Ver mais

Like 0

Liked Liked

technocracy

Recursive Knowledge Synthesis for Multi-LLM Systems: Stability Analysis and Tri-Agent Audit Framework

digitado ⋅ 15 de January de 2026

arXiv:2601.08839v1 Announce Type: new Abstract: This paper presents a tri-agent cross-validation framework for analyzing stability and explainability in multi-model large language systems. The architecture integrates three heterogeneous LLMs-used for semantic generation, analytical consistency checking, and transparency auditing-into a recursive interaction cycle. This design induces Recursive Knowledge Synthesis (RKS), where intermediate representations are continuously refined through mutually constraining transformations irreducible to single-model behavior. Across 47 controlled trials using public-access LLM deployments (October 2025), we evaluated system stability via four […]

Ver mais

Like 0

Liked Liked

technocracy

Probabilistic Inference and Learning with Stein’s Method

digitado ⋅ 10 de March de 2026

arXiv:2603.07467v1 Announce Type: new Abstract: This monograph provides a rigorous overview of theoretical and methodological aspects of probabilistic inference and learning with Stein’s method. Recipes are provided for constructing Stein discrepancies from Stein operators and Stein sets, and properties of these discrepancies such as computability, separation, convergence detection, and convergence control are discussed. Further, the connection between Stein operators and Stein variational gradient descent is set out in detail. The main definitions and results are precisely stated, and […]

Ver mais

Like 0

Liked Liked

technocracy

Open Responses

digitado ⋅ 16 de January de 2026

Open Responses This is the standardization effort I’ve most wanted in the world of LLMs: a vendor-neutral specification for the JSON API that clients can use to talk to hosted LLMs. Open Responses aims to provide exactly that as a documented standard, derived from OpenAI’s Responses API. I was hoping for one based on their older Chat Completions API since so many other products have cloned the already, but basing it on Responses does make sense since that […]

Ver mais

Like 0

Liked Liked

technocracy

Does Your AI Have a Personality Problem?

digitado ⋅ 24 de June de 2026

Why the way AI interacts with employees may matter as much as what it can do.

Ver mais

Like 0

Liked Liked

technocracy

Lo que Google llama consejo experto es gente opinando en internet

digitado ⋅ 8 de May de 2026

Google acaba de dar un paso muy revelador con su buscador: sus respuestas generadas mediante inteligencia artificial incorporarán más “perspectivas” procedentes de foros, blogs, redes sociales y, de manera muy destacada, Reddit, bajo una etiqueta tan sugerente como peligrosa: «expert advice». La idea plantea una pregunta incómoda: ¿desde cuándo una opinión votada por una comunidad anónima se convierte en «consejo experto»? Reddit es muchas cosas. Es archivo, conversación, desahogo, recomendación, cultura de nicho, memoria colectiva y, en ocasiones, […]

Ver mais

Like 0

Liked Liked

technocracy

The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research

digitado ⋅ 24 de February de 2026

arXiv:2602.18458v1 Announce Type: new Abstract: Reproducibility crises across sciences highlight the limitations of the paper-centric review system in assessing the rigor and reproducibility of research. AI agents that autonomously design and generate large volumes of research outputs exacerbate these challenges. In this work, we address the growing challenges of scalability and rigor by flipping the dynamic and developing AI agents as research evaluators. We propose the first execution-grounded evaluation framework that verifies research beyond narrative review by examining […]

Ver mais

Like 0

Liked Liked

technocracy

Introducing Showboat and Rodney, so agents can demo what they’ve built

digitado ⋅ 10 de February de 2026

A key challenge working with coding agents is having them both test what they’ve built and demonstrate that software to you, their overseer. This goes beyond automated tests – we need artifacts that show their progress and help us see exactly what the agent-produced software is able to do. I’ve just released two new tools aimed at this problem: Showboat and Rodney. Proving code actually works Showboat: Agents build documents to demo their work Rodney: CLI browser automation […]

Ver mais

Like 0

Liked Liked

technocracy

In-Context Function Learning in Large Language Models

digitado ⋅ 12 de February de 2026

Large language models (LLMs) can learn from a few demonstrations provided at inference time. We study this in-context learning phenomenon through the lens of Gaussian Processes (GPs). We build controlled experiments where models observe sequences of multivariate scalar-valued function samples drawn from known GP priors. We evaluate prediction error in relation to the number of demonstrations and compare against two principled references: (i) an empirical GP-regression learner that gives a lower bound on achievable error, and (ii) the […]

Ver mais

Like 0

Liked Liked