digitado

CoReflect: Conversational Evaluation via Co-Evolutionary Simulation and Reflective Rubric Refinement

digitado ⋅ 21 de January de 2026

arXiv:2601.12208v1 Announce Type: new Abstract: Evaluating conversational systems in multi-turn settings remains a fundamental challenge. Conventional pipelines typically rely on manually defined rubrics and fixed conversational context$-$a static approach that limits coverage and fails to capture the diverse, emergent behaviors of dialogue models. To address this, we introduce CoReflect (Conversational Evaluation via Co-Evolutionary Simulation and Reflective Rubric Refinement), which unifies dialogue simulation and evaluation into an adaptive, iterative process. CoReflect employs a conversation planner that generates structured templates […]

Ver mais

Like 0

Liked Liked

technocracy

How Amazon uses Amazon Nova models to automate operational readiness testing for new fulfillment centers

digitado ⋅ 10 de February de 2026

Amazon is a global ecommerce and technology company that operates a vast network of fulfillment centers to store, process, and ship products to customers worldwide. The Amazon Global Engineering Services (GES) team is responsible for facilitating operational readiness across the company’s rapidly expanding network of fulfillment centers. When launching new fulfillment centers, Amazon must verify that each facility is properly equipped and ready for operations. This process is called operational readiness testing (ORT) and typically requires 2,000 hours […]

Ver mais

Like 0

Liked Liked

technocracy

Building specialized AI without sacrificing intelligence: Nova Forge data mixing in action

digitado ⋅ 2 de March de 2026

Large language models (LLMs) perform well on general tasks but struggle with specialized work that requires understanding proprietary data, internal processes, and industry-specific terminology. Supervised fine-tuning (SFT) adapts LLMs to these organizational contexts. SFT can be implemented through two distinct methodologies: Parameter-Efficient Fine-Tuning (PEFT), which updates only a subset of model parameters, offering faster training and lower computational costs while maintaining reasonable performance improvements; Full-rank SFT, which updates all model parameters rather than a subset and incorporates more […]

Ver mais

Like 0

Liked Liked

technocracy

Varying-Coefficient Mixture of Experts Model

digitado ⋅ 6 de January de 2026

arXiv:2601.01699v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) is a flexible framework that combines multiple specialized submodels (“experts”), by assigning covariate-dependent weights (“gating functions”) to each expert, and have been commonly used for analyzing heterogeneous data. Existing statistical MoE formulations typically assume constant coefficients, for covariate effects within the expert or gating models, which can be inadequate for longitudinal, spatial, or other dynamic settings where covariate influences and latent subpopulation structure evolve across a known dimension. We propose a […]

Ver mais

Like 0

Liked Liked

technocracy

Context Lake: A System Class Defined by Decision Coherence

digitado ⋅ 27 de January de 2026

arXiv:2601.17019v1 Announce Type: new Abstract: AI agents are increasingly the primary consumers of data, operating continuously to make concurrent, irreversible decisions. Traditional data systems designed for human analysis cycles become correctness bottlenecks under this operating regime. When multiple agents operate over shared resources, their actions interact before reconciliation is possible. Correctness guarantees that apply after the decision window therefore fail to prevent conflicts. We introduce the Decision Coherence Law: for agents that take irreversible actions whose effects interact, […]

Ver mais

Like 0

Liked Liked

technocracy

All sorts of interesting flags and artifacts will fly to the Moon on Artemis II

digitado ⋅ 22 de January de 2026

NASA’s first astronauts to fly to the Moon in more than 50 years will pay tribute to the lunar and space exploration missions that preceded them, as well as aviation and American history, by taking with them artifacts and mementos representing those past accomplishments. NASA, on Wednesday, January 21, revealed the contents of the Artemis II mission’s Official Flight Kit (OFK), continuing a tradition dating back to the Apollo program of packing a duffel bag-sized pouch of symbolic […]

Ver mais

Like 0

Liked Liked

technocracy

Enhancing Scientific Literature Chatbots with Retrieval-Augmented Generation: A Performance Evaluation of Vector and Graph-Based Systems

digitado ⋅ 23 de February de 2026

arXiv:2602.17856v1 Announce Type: new Abstract: This paper investigates the enhancement of scientific literature chatbots through retrieval-augmented generation (RAG), with a focus on evaluating vector- and graph-based retrieval systems. The proposed chatbot leverages both structured (graph) and unstructured (vector) databases to access scientific articles and gray literature, enabling efficient triage of sources according to research objectives. To systematically assess performance, we examine two use-case scenarios: retrieval from a single uploaded document and retrieval from a large-scale corpus. Benchmark test […]

Ver mais

Like 0

Liked Liked

technocracy

How to Purchase Labels? A Cost-Effective Approach Using Active Learning Markets

digitado ⋅ 11 de February de 2026

arXiv:2511.20605v3 Announce Type: replace-cross Abstract: We introduce and analyse active learning markets as a way to purchase labels, in situations where analysts aim to acquire additional data to improve model fitting, or to better train models for predictive analytics applications. This comes in contrast to the many proposals that already exist to purchase features and examples. By originally formalising the market clearing as an optimisation problem, we integrate budget constraints and improvement thresholds into the label acquisition process. […]

Ver mais

Like 0

Liked Liked

technocracy

Robust Federated Learning via Byzantine Filtering over Encrypted Updates

digitado ⋅ 5 de February de 2026

Federated Learning (FL) aims to train a collaborative model while preserving data privacy. However, the distributed nature of this approach still raises privacy and security issues, such as the exposure of sensitive data due to inference attacks and the influence of Byzantine behaviors on the trained model. In particular, achieving both secure aggregation and Byzantine resilience remains challenging, as existing solutions often address these aspects independently. In this work, we propose to address these challenges through a novel […]

Ver mais

Like 0

Liked Liked

technocracy

Context-Aware Pesticide Recommendation via Few-Shot Pest Recognition for Precision Agriculture

digitado ⋅ 6 de January de 2026

arXiv:2601.00243v1 Announce Type: new Abstract: Effective pest management is crucial for enhancing agricultural productivity, especially for crops such as sugarcane and wheat that are highly vulnerable to pest infestations. Traditional pest management methods depend heavily on manual field inspections and the use of chemical pesticides. These approaches are often costly, time-consuming, labor-intensive, and can have a negative impact on the environment. To overcome these challenges, this study presents a lightweight framework for pest detection and pesticide recommendation, designed […]

Ver mais

Like 0

Liked Liked