I’ve been comparing crates on crates.io against their upstream repositories in an effect to detect (and, ultimately, help prevent) supply chain attacks like the xz backdoor1, where the code published in a package doesn’t match the code in its repository.
The results of these comparisons for the most popular 9992 crates by download count are now available. These come with a bunch of caveats that I’ll get into below, but I hope it’s a useful starting point for discussing code provenance in the Rust ecosystem.
No evidence of malicious activity was detected as part of this work, and approximately 83% of the current versions of these popular crates match their upstream repositories exactly.
Good work.
I don’t know if @kornel@programming.dev still lurks here, but I think he did/does related/similar analysis for https://lib.rs.@BB_C Yes, implemented here: https://gitlab.com/lib.rs/main/-/blob/main/tarball/src/comparator.rs?ref_type=heads