public inbox for isar-users@googlegroups.com
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@siemens.com>
To: Henning Schild <henning.schild@siemens.com>,
	Venkata.Pyla@toshiba-tsip.com
Cc: isar-users@googlegroups.com, daniel.sangorrin@toshiba.co.jp,
	dinesh.kumar@toshiba-tsip.com
Subject: Re: [isar] reproducible build failures
Date: Mon, 6 Sep 2021 06:57:55 +0200	[thread overview]
Message-ID: <af93e424-cfb6-d9bc-a15d-e3096cd66be3@siemens.com> (raw)
In-Reply-To: <20210903191057.4eb2394d@md1za8fc.ad001.siemens.net>

On 03.09.21 19:10, Henning Schild wrote:
> Hi there,
> 
> Am Fri, 3 Sep 2021 15:19:21 +0000
> schrieb <Venkata.Pyla@toshiba-tsip.com>:
> 
>> Hi,
>>
>> I am using isar system in isar-cip-core project [1] where I found
>> some reproducible failures, which may be good to fix in the isar
>> system. I am not good in modifying the isar system, so could you
>> please guide me to fix these problems?
> 
> Well ... for isar and maybe to some degree also to debian, a truly
> reproducible build would be a new topic that so far has been ignored.
> 

Not completely new, see Debian wiki.

>> Here are the steps to check the reproducible failures in
>> isar-cip-core project:
>> https://gitlab.com/cip-project/cip-core/isar-cip-core/-/issues/12
>> https://gitlab.com/cip-project/cip-core/isar-cip-core/-/issues/13
>>
>> I also verified the reproducibility in the isar system and found
>> similar failures that are copied below:
>> ============================================== tmp/gpghomefHv8eRhk43/
>> tmp/gpghomefHv8eRhk43/private-keys-v1.d/
>> usr/share/doc/hello/changelog.Debian.gz
>> var/cache/debconf/config.dat
>> var/cache/debconf/config.dat-old
>> var/cache/ldconfig/aux-cache
>> var/lib/dpkg/info/enable-fsck.md5sums
>> var/lib/dpkg/info/example-raw.md5sums
>> var/lib/dpkg/info/hello.md5sums
>> var/lib/dpkg/info/isar-disable-apt-cache.md5sums
>> var/lib/dpkg/info/isar-exclude-docs.md5sums
>> var/lib/dpkg/info/sshd-regen-keys.md5sums
>> var/lib/initramfs-tools/4.19.0-17-amd64
>> var/lib/systemd/catalog/database
>> var/log/alternatives.log
>> var/log/bootstrap.log
>> var/log/dpkg.log
>> var/log/apt/history.log
>> var/log/apt/term.log
>> ==============================================
> 
> That said and looking at the list ... it all seems harmless. Maybe not
> _all_ but a log file or a date here and there can maybe be ignored.
> 
> I never really got the idea ... if one wants "exactly" the same result,
> there is no reason to rebuild. You just store/distribute the
> binary result. But hey you might have your reasons and explain those.

The reasons are the same as for reproducible package build: Validate
that your supply chain, in this case the "last mile", is consistent.

> 
>> Steps to check reproducible failures in isar
>> ====================================
>> $ . isar-init-build-env ../build1 && bitbake
>> mc:qemuamd64-buster-tgz:isar-image-base $  . isar-init-build-env
>> ../build2 && bitbake mc:qemuamd64-buster-tgz:isar-image-base $ mkdir
>> -p rootfs1 rootfs2 $ tar -xzvf
>> ./build1/tmp/deploy/images/qemuamd64/isar-image-base-debian-buster-qemuamd64.tar.gz
>> -C ./rootfs1/ $ tar -xzvf
>> ./build2/tmp/deploy/images/qemuamd64/isar-image-base-debian-buster-qemuamd64.tar.gz
>> -C ./rootfs2/ $ rsync -nrclv ./rootfs1/ ./rootfs2/ > difference.txt
>> ====================================
> 
> This is not even remotely close. Here you have been really lucky and
> all the "diff" you got was caused by the build. If you introduce a long
> pause between the builds ... you will get actually very different
> results.
> That is a feature .. because isar is tracking debian. But in scenarios
> like yours it can be seen as a bug. In which case you need to build
> against a custom debian mirror or snapshot.debian.org. (unfortunately
> hard because the servers have rate limits, but isar-image-base could
> work, or you restart that bitbake a few times)
> Snapshot is also a good way to try that ... try a "buster" from a few
> months ago for "build1".
> 
> Or if you want to track build1 but _not_ track build2 you should use
> ISAR_USE_CACHED_BASE_REPO = "1" for that offline rebuild.

Obviously, this only makes sense when building against a snapshot, like
we do in production builds as well, or via the local offline cache.

> 
>>
>> From the reproducible failures I found there are three different
>> areas to fix these problem
>>
>> 1.       Changelog file generation, which is embedding the build time
>> date value at here
>> (https://github.com/ilbers/isar/blob/master/meta/classes/debianize.bbclass#L34
>> )
> 
> That is a good finding if we want to do something about the "problem".
> One could maybe derive the "date" from the file-modification time of
> the recipe calling deb_debianize.
> But now you have fun with git and will need git-restore-mtime. We could
> also force people to put a fixed string there and only call date if
> that string is not in place.
> 
>> 2.       Log files generated by different application, which are
>> adding build time values, I think we can remove these files if it is
>> not required after build. ( I tried at here
>> https://github.com/ilbers/isar/blob/master/meta/classes/image.bbclass#L183
>> but it did not work)
> 
> Doing it there would be a good place. You could also use
> ROOTFS_POSTPROCESS_COMMAND which allows such things for layers so you
> do not need to touch the core.
> 
> I could envision something like
> 
> for f in "find all files not owned by any package":
> 	if f start.with(/etc)
> 		continue
>   	if other funny exception
> 		continue
> 	rm f
> 
> In addition to what you want ... this would also shrink that rootfs,
> which would be nice even for people that do not care about repro.
> logs, tmpdirs, caches would be nice to get rid of.
> 
>> 3.       Cache and temporary files, I think we can delete these files
>> also.
> 
> See previous. Just do all at once asking the package manager which
> files it does not know. This will also enforce a really nice discipline
> on users to not abuse ROOTFS_POSTPROCESS_COMMAND to smuggle files into
> the rootfs.
> 
>> Please guide me to fix these issues.
> 
> So while i am not 100% with the whole repro idea ... and whether it can
> really be done in complex layers ... because you are really not
> building a complicated thing here ...
> 
> More real use-cases will contain many more packages build by isar,
> maybe introducing their own share of "repro" mistakes.
> So the "cherry on the cake" would be a helper script to allow anyone to
> spot repro diffs. It would run the same build, one online once
> ISAR_USE_CACHED_BASE_REPO and spit out two folders and a diff summary.
> To give anyone with repro in mind a chance to check their layer. In
> fact one has to wonder if such a script should be added to OE or
> already exists there.
> And one would add that to CI to find new problems as they are
> introduced.
> 
> I think that allowing to provide a DEBIAN_CHANGELOG_DATE to enforce a
> string and not call date could be an interesting patch.
> 
> And that a "delete everything not owned by any package" would make a
> really nice addition as well.
> 
> Both as steps that are on their own valuable and happen to also work in
> the direction of reproducible builds.
> 

It's a worthwhile goal, no doubt. But it will also be a journey that has
to start somewhere, ideally small. I think we have a couple if nice
first steps here. Adding tooling or at least method descriptions to
detect non-reproducibility in more complex layers will surely be helpful
as well.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

  reply	other threads:[~2021-09-06  4:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-03 15:19 Venkata.Pyla
2021-09-03 16:30 ` Jan Kiszka
2021-09-03 17:10 ` Henning Schild
2021-09-06  4:57   ` Jan Kiszka [this message]
2021-09-06  8:41 ` Henning Schild
2021-09-14  7:29   ` Venkata.Pyla
2021-09-14 10:41     ` Henning Schild

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af93e424-cfb6-d9bc-a15d-e3096cd66be3@siemens.com \
    --to=jan.kiszka@siemens.com \
    --cc=Venkata.Pyla@toshiba-tsip.com \
    --cc=daniel.sangorrin@toshiba.co.jp \
    --cc=dinesh.kumar@toshiba-tsip.com \
    --cc=henning.schild@siemens.com \
    --cc=isar-users@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox