DRAGON: Robust Classification for Very Large Collections of Software Repositories
arXiv:2602.09071v1 Announce Type: new Abstract: The ability to automatically classify source code repositories with ”topics” that reflect their content and purpose is very useful, especially when navigating or searching through large software collections. However, existing approaches often rely heavily on README files and other metadata, which are frequently missing, limiting their applicability in real-world large-scale settings. We present DRAGON, a repository classifier designed for very large and diverse software collections. It operates entirely on lightweight signals commonly stored […]