digitado

Reproducible DQN / Double DQN / Dueling comparison with diagnostics and generalization tests (LunarLander-v3)

digitado ⋅ 3 de March de 2026

I wanted to compare Vanilla DQN, DDQN and Dueling DDQN beyond just final reward, so I built a structured training and evaluation setup around LunarLander-v3. Instead of tracking only episode return, I monitored: • activation and gradient distributions • update-to-data ratios for optimizer diagnostics • action gap and Q-value dynamics • win rate with 95% CI intervals • generalization via human-prefix rollouts The strongest model (<9k params) achieves 98.4% win rate (±0.24%, 95% CI) across 10k seeds. The […]

Ver mais

Like 0

Liked Liked

technocracy

Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications

digitado ⋅ 24 de February de 2026

arXiv:2602.18582v1 Announce Type: new Abstract: When training artificial intelligence (AI) to perform tasks, humans often care not only about whether a task is completed but also how it is performed. As AI agents tackle increasingly complex tasks, aligning their behavior with human-provided specifications becomes critical for responsible AI deployment. Reward design provides a direct channel for such alignment by translating human expectations into reward functions that guide reinforcement learning (RL). However, existing methods are often too limited to […]

Ver mais

Like 0

Liked Liked

technocracy

Using and Finetuning Pretrained Transformers

digitado ⋅ 20 de April de 2024

What are the different ways to use and finetune pretrained large language models (LLMs)? The three most common ways to use and finetune pretrained LLMs include a feature-based approach, in-context prompting, and updating a subset of the model parameters. First, most pretrained LLMs or language transformers can be utilized without the need for further finetuning. For instance, we can employ a feature-based method to train a new downstream model, such as a linear classifier, using embeddings generated by […]

Ver mais

Like 0

Liked Liked

technocracy

Adaptive Sensing of Continuous Physical Systems for Machine Learning

digitado ⋅ 4 de March de 2026

Physical dynamical systems can be viewed as natural information processors: their systems preserve, transform, and disperse input information. This perspective motivates learning not only from data generated by such systems, but also how to measure them in a way that extracts the most useful information for a given task. We propose a general computing framework for adaptive information extraction from dynamical systems, in which a trainable attention module learns both where to probe the system state and how […]

Ver mais

Like 0

Liked Liked

technocracy

Slurry-as-a-Service: A Modest Proposal on Scalable Pluralistic Alignment for Nutrient Optimization

digitado ⋅ 4 de March de 2026

arXiv:2603.02420v1 Announce Type: new Abstract: Pluralistic alignment has emerged as a promising approach for ensuring that large language models (LLMs) faithfully represent the diversity, nuance, and conflict inherent in human values. In this work, we study a high-stakes deployment context – mulching – where automated systems transform selected individuals into nutrient-rich slurry for the dual purposes of food security and aesthetic population management. Building on recent pluralistic alignment frameworks, we introduce ValueMulch, a reproducible training, deployment, and certification […]

Ver mais

Like 0

Liked Liked

technocracy

MIN-Trust: A Minimum Necessary Information Trust Orchestration Framework for Multi-Agent Collaboration

digitado ⋅ 6 de March de 2026

Large language model (LLM)-based multi-agent systems have demonstrated remarkable capabilities in collaborative task solving. Although the mechanisms that facilitate seamless cooperation, such as shared contexts, role assignments, and iterative message passing, present significant risks of unintentional information disclosure. We present MIN-Trust, a trust orchestration framework that enforces Minimum Necessary Information (MNI) constraints, an operationalization of the data minimization principle for inter-agent communication—while maintaining task effectiveness. Our approach introduces an MNI-Gate that automatically classifies and filters information into essential, […]

Ver mais

Like 0

Liked Liked

technocracy

Disturbance Attenuation Regulator II: Stage Bound Finite Horizon Solution

digitado ⋅ 19 de January de 2026

arXiv:2601.10869v1 Announce Type: new Abstract: This paper develops a generalized finite horizon recursive solution to the discrete time stage bound disturbance attenuation regulator (StDAR) for state feedback control. This problem addresses linear dynamical systems subject to stage bound disturbances, i.e., disturbance sequences constrained independently at each time step through stagewise squared two-norm bounds. The term generalized indicates that the results accommodate arbitrary initial states. By combining game theory and dynamic programming, this work derives a recursive solution for […]

Ver mais

Like 0

Liked Liked

technocracy

LLMs as Integration Endpoints: Building Apache Camel Routes With LangChain4j Chat

digitado ⋅ 1 de February de 2026

Building AI features is straightforward until you need to integrate them into a production system. In production, a simple ‘call an LLM’ often becomes: wiring HTTP/JMS/Kafka/file inputs, enforcing timeouts, retries, fallbacks, stitching context obtained from internal data sources, protecting secrets, making it testable without burning API credits, and ensuring the system is observable when issues occur overnight. This is where Apache Camel excels. With Camel 4.5 and later, you can handle LLM calls as standard integration endpoints using […]

Ver mais

Like 0

Liked Liked

technocracy

Leveraging learning analytics to enhance immersive teacher simulations: Challenges and opportunities

digitado ⋅ 15 de January de 2026

arXiv:2601.08954v1 Announce Type: new Abstract: This chapter examines how data analytics can be leveraged to enhance immersive teacher simulations, situating this inquiry within the broader learning sciences discourse on embodied cognition, data-informed feedback, and teacher professional learning. It explores both conceptual foundations and empirical cases to illustrate how analytics serve as mediational tools that connect immersive experiences with reflective teaching practice. The chapter unfolds in multiple sections: (1) The Innovation Journey: An Overview of Immersive Teacher Simulations outlines […]

Ver mais

Like 0

Liked Liked

technocracy

NASA has a new problem to fix before the next Artemis II countdown test

digitado ⋅ 14 de February de 2026

NASA Administrator Jared Isaacman said Saturday the agency is looking at ways to prevent the fueling problems plaguing the Space Launch System rocket before the Artemis III mission. Artemis III is slated to be the first crew mission to land on the Moon since the Apollo program more than 50 years ago. As for Artemis II, which remains on the launch pad at Kennedy Space Center in Florida after missing a launch window earlier this month, NASA is […]

Ver mais

Like 0

Liked Liked