Email or username:

Password:

Forgot your password?
Ludovic Courtès

“Towards reproducible minimal source code tarballs?” by @jas4711:
blog.josefsson.org/2024/04/01/

I think “make dist”-generated tarballs are just one part of the xz debacle (and not the most frightening part), but at least we can do something about them: when they’re the byproduct of a build process, we can build them from source (like Debian does); when they add something that’s not in the VCS (such as .po files), we can at least ensure a reproducible build process as Simon advocates here.

17 comments
Vivien the Trumpeting Elephant

@civodul @jas4711 As I use the extended GNU build system for my own personal projects, I find it a bit frustrating that the PO files would not be present in the source tree, but I understand that if they were, then we would have a big bunch of undesirable “Update PO translation” commits.

As for Gnulib more specifically, I have noticed that in several places, you can safely commit (semi-)generated files, and rely on syntax-check to detect when they should be updated.

Vivien the Trumpeting Elephant

@civodul @jas4711 A somewhat relevant example is how the NEWS file is checked, but there are other examples.

As for “and distributions could package the gnulib git repository (up to some current version)”, I’m not sure it would be easy, because git allows itself to re-pack its objects whenever it wants, so further steps may be needed for the content of the .git repository to remain reproducible over time.

Vivien the Trumpeting Elephant

@civodul @jas4711

Finally, I think we should not forget that the modification time of the .texi file impacts the modification date that is printed in the final output of the manual. Setting it to 1970 is awkward in this case.

Simon Josefsson

@gugurumbe @civodul Having modtime influence PDF output seems bad - can’t we use a @DATE@ macro set at ./configure-time that the .texi file @include? It should probably the time of the last commit to the project.

Vivien the Trumpeting Elephant

@jas4711 @civodul I think you have to look through the git log to find the latest commit that affected the manual in a non-trivial way, because the last-modified date can also be used to detect documentation drift.

Vivien the Trumpeting Elephant

@jas4711 @civodul Another option I considered to avoid the dependency on git is to have another syntax-check that would check that you also changed the modification date of the manual in the most recent commit that modified it. It’s harmless to skip this check if you have a shallow clone of the repository or not have git installed.

Simon Josefsson

@gugurumbe @civodul How does .git content changing break ability to include the entire git repository in Debian’s gnulib package?

Vivien the Trumpeting Elephant

@jas4711 @civodul It would be difficult to consider the .git repository as source code, if it’s just a big bunch of binary blobs that can change for unclear reasons.

Simon Josefsson

@gugurumbe @civodul binary blobs that can change for any reason sounds awfully similar to other binary packages in Debian.

Vivien the Trumpeting Elephant

@civodul @jas4711 (for the PO files, I store them in a separate, unrelated branch of the same repository).

Simon Josefsson

@gugurumbe @civodul PO files in a separate branch is an interesting approach! I still worry that they effectively are vendored files. How about storing po/SHA256SUMS in git and verify any downloaded files against that? There is a data-focused doctrine that says data should only be maintained at its preferred canonical location and not duplicated, and I think it makes sense for source files too.

Vivien the Trumpeting Elephant

@jas4711 @civodul They are vendored files that can crash things if you mess up format strings. You can also target specific locales (so, specific countries) and noone checks their programs in foreign locales in CI systems.

Gettext has tools in place to check that the format arguments stay the same when it detects a format string, so that is good.

Ludovic Courtès

@jas4711 @gugurumbe Another option is to use a Git submodule for .po files.

Ludovic Courtès

@gugurumbe @jas4711 FWIW, Guix and the Shepherd have .po files checked in and periodically updated. I don’t see this as a problem. After all, they’re source, too.

Janneke

@civodul @jas4711
The benefits of Reproducible tarballs are a no-brainer to me.

I've been carrying and developing reproducible source tarball patches for Autotools and GNU Mes for quite some time, party courtesy of Timothy Sample.

I'm embarrassed and confused that after over 10y of Reproducible Builds, GNU and Autotools still need to get used to these ideas (and don't seem to make any progress at all).

Go Up