digitado

HECTOR: Hybrid Editable Compositional Object References for Video Generation

digitado ⋅ 11 de March de 2026

arXiv:2603.08850v1 Announce Type: new Abstract: Real-world videos naturally portray complex interactions among distinct physical objects, effectively forming dynamic compositions of visual elements. However, most current video generation models synthesize scenes holistically and therefore lack mechanisms for explicit compositional manipulation. To address this limitation, we propose HECTOR, a generative pipeline that enables fine-grained compositional control. In contrast to prior methods,HECTOR supports hybrid reference conditioning, allowing generation to be simultaneously guided by static images and/or dynamic videos. Moreover, users can […]

Ver mais

Like 0

Liked Liked

technocracy

An Empirical Analysis of Community and Coding Patterns in OSS4SG vs. Conventional OSS

digitado ⋅ 8 de January de 2026

arXiv:2601.03430v1 Announce Type: new Abstract: Open Source Software for Social Good (OSS4SG) projects aim to address critical societal challenges, such as healthcare access and community safety. Understanding the community dynamics and contributor patterns in these projects is essential for ensuring their sustainability and long-term impact. However, while extensive research has focused on conventional Open Source Software (OSS), little is known about how the mission-driven nature of OSS4SG influences its development practices. To address this gap, we conduct a […]

Ver mais

Like 0

Liked Liked

technocracy

Group Contrastive Learning for Weakly Paired Multimodal Data

digitado ⋅ 5 de February de 2026

arXiv:2602.04021v1 Announce Type: cross Abstract: We present GROOVE, a semi-supervised multi-modal representation learning approach for high-content perturbation data where samples across modalities are weakly paired through shared perturbation labels but lack direct correspondence. Our primary contribution is GroupCLIP, a novel group-level contrastive loss that bridges the gap between CLIP for paired cross-modal data and SupCon for uni-modal supervised contrastive learning, addressing a fundamental gap in contrastive learning for weakly-paired settings. We integrate GroupCLIP with an on-the-fly backtranslating autoencoder […]

Ver mais

Like 0

Liked Liked

technocracy

We study pandemics, and the resurgence of measles is a grim sign of what’s coming

digitado ⋅ 12 de March de 2026

In the three decades between 1993 and 2024, measles in the US was relatively rare—a few hundred cases each year, at most. But suddenly, the disease has become so entrenched in American life that it sometimes fails to make headlines when a new outbreak erupts. As of March 2026, measles has been continuously circulating around the US for more than a year, starting with an outbreak in Texas that lasted from January to August 2025. Before that outbreak […]

Ver mais

Like 0

Liked Liked

technocracy

mémoire pfe

digitado ⋅ 24 de March de 2026

submitted by /u/Outrageous_Elk717 [link] [comments]

Ver mais

Like 0

Liked Liked

technocracy

Leveraging Language Models and RAG for Efficient Knowledge Discovery in Clinical Environments

digitado ⋅ 9 de January de 2026

arXiv:2601.04209v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly recognized as valuable tools across the medical environment, supporting clinical, research, and administrative workflows. However, strict privacy and network security regulations in hospital settings require that sensitive data be processed within fully local infrastructures. Within this context, we developed and evaluated a retrieval-augmented generation (RAG) system designed to recommend research collaborators based on PubMed publications authored by members of a medical institution. The system utilizes PubMedBERT for […]

Ver mais

Like 0

Liked Liked

technocracy

[R] How are you managing long-running preprocessing jobs at scale? Curious what’s actually working

digitado ⋅ 24 de March de 2026

We’re a small ML team for a project and we keep running into the same wall: large preprocessing jobs (think 50–100GB datasets) running on a single machine take hours, and when something fails halfway through, it’s painful. We’ve looked at Prefect, Temporal, and a few others — but they all feel like they require a full-time DevOps person to set up and maintain properly. And most of our team is focused on the models, not the infrastructure. Curious […]

Ver mais

Like 0

Liked Liked

technocracy

Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

digitado ⋅ 26 de February de 2026

Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations. In practice, such models are often fine-tuned to improve zero-shot performance on downstream tasks, allowing them to solve tasks without examples and thereby reducing inference costs. However, fine-tuning can degrade in-context learning, limiting the performance of fine-tuned models on tasks not seen during fine-tuning. Using linear attention models, we provide a theoretical analysis that characterizes how fine-tuning objectives modify attention parameters […]

Ver mais

Like 0

Liked Liked

technocracy

Rivian reveals pricing and trim details for its R2 SUV

digitado ⋅ 12 de March de 2026

Between the antics particular to a certain car company and the industrial chaos that was set off by COVID (then compounded by the invasion of Ukraine) it’s easy to have become cynical about things like timelines. And yet, when Rivian showed off a midsize electric vehicle in 2024 and said it would go on sale during the first half of 2026, it meant it: deliveries of the first R2 SUVs will begin this spring. As a new automaker […]

Ver mais

Like 0

Liked Liked

technocracy

Beyond the noise: intrinsic dimension estimation with optimal neighbourhood identification

digitado ⋅ 28 de January de 2026

arXiv:2405.15132v4 Announce Type: replace Abstract: The Intrinsic Dimension (ID) is a key concept in unsupervised learning and feature selection, as it is a lower bound to the number of variables which are necessary to describe a system. However, in almost any real-world dataset the ID depends on the scale at which the data are analysed. Quite typically at a small scale, the ID is very large, as the data are affected by measurement errors. At large scale, the […]

Ver mais

Like 0

Liked Liked