From mboxrd@z Thu Jan 1 00:00:00 1970 X-GM-THRID: 7003728059744387072 X-Received: by 2002:ac2:596e:: with SMTP id h14mr8130829lfp.222.1630904279579; Sun, 05 Sep 2021 21:57:59 -0700 (PDT) X-BeenThere: isar-users@googlegroups.com Received: by 2002:ac2:4146:: with SMTP id c6ls245454lfi.2.gmail; Sun, 05 Sep 2021 21:57:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxWSTz1uZGZJIkZRK1Qtbre6S3BGvMaNYZ/GrRFLIPOGyQdKUalkQgwPBj0TAEjM5mT6e9V X-Received: by 2002:a19:7101:: with SMTP id m1mr8198852lfc.156.1630904278345; Sun, 05 Sep 2021 21:57:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630904278; cv=none; d=google.com; s=arc-20160816; b=txN7oPQyKgbTsvigsGvQ5uMNAgkedVw4l/T2nmT/db8s90tTEdX72cqmGHxFDqNNHH CgEyWLMSjkLhTi2IjtRLtH5Q+dphGXAF1feV/E1wloJYzNAi+Fc06LD6b43eFuhcyKRP TEJJRE1lGREsqMEYxllEghq8MDo2D09rFxat9snxJ3yovngeYF5OjzTfN5ZTlFiqqsec ZPGpajsydAZjxIKha4NYCOGRb+cT9a60WuxpmEuQ4E6UPBp8RRlhUeyjk6qk01r5Ee2y +URI6VxEWl74rcEzoQikDpvv0fKYq3AYAaiIpPyS76MSh1WCokBUkHSsvkn7Mh+G+0gV VA7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:cc:to:subject; bh=elG1FPYI82rAYDuGSx04GYT2MO57xvUdQoE4WPm6ajg=; b=Y8WaV/r978T93W6BSNQx2l5FkFweicWs2d13MnPo7coHr2GV+9CWaYFhqunThIPNl9 bUPxbaHpakvaMtHwG7QLpQLIP+Yzt6ETvztdpStdRqCNuIxQh/2s9cAIVbZIUUddAi5k gyYjjkpFb7D+kHf8x9e0WwOfSOQp4WvICHo7gT3ggqAnJy1I9hEl/tnd5mNprKzU1EyB 9TJCcZRWfOIPRfkyqo6uS+LNH3smML2MCAGBoxkzuqbGZ0AzuTyM2Bd7P+FgyYkWUWSF U0FyZ8Es5um+2rC6eSXZAqFwzWh1Lks9xGJKqznj3RZ0+eM+1XC0RaesUm1L22QwFQl7 GCww== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of jan.kiszka@siemens.com designates 192.35.17.28 as permitted sender) smtp.mailfrom=jan.kiszka@siemens.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=siemens.com Return-Path: Received: from goliath.siemens.de (goliath.siemens.de. [192.35.17.28]) by gmr-mx.google.com with ESMTPS id v25si342190lfr.1.2021.09.05.21.57.57 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 05 Sep 2021 21:57:58 -0700 (PDT) Received-SPF: pass (google.com: domain of jan.kiszka@siemens.com designates 192.35.17.28 as permitted sender) client-ip=192.35.17.28; Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of jan.kiszka@siemens.com designates 192.35.17.28 as permitted sender) smtp.mailfrom=jan.kiszka@siemens.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=siemens.com Received: from mail2.sbs.de (mail2.sbs.de [192.129.41.66]) by goliath.siemens.de (8.15.2/8.15.2) with ESMTPS id 1864vui4026795 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 6 Sep 2021 06:57:56 +0200 Received: from [139.22.33.80] ([139.22.33.80]) by mail2.sbs.de (8.15.2/8.15.2) with ESMTP id 1864vtAU022976; Mon, 6 Sep 2021 06:57:55 +0200 Subject: Re: [isar] reproducible build failures To: Henning Schild , Venkata.Pyla@toshiba-tsip.com Cc: isar-users@googlegroups.com, daniel.sangorrin@toshiba.co.jp, dinesh.kumar@toshiba-tsip.com References: <20210903191057.4eb2394d@md1za8fc.ad001.siemens.net> From: Jan Kiszka Message-ID: Date: Mon, 6 Sep 2021 06:57:55 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <20210903191057.4eb2394d@md1za8fc.ad001.siemens.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TUID: wEltTXKVEVwO On 03.09.21 19:10, Henning Schild wrote: > Hi there, > > Am Fri, 3 Sep 2021 15:19:21 +0000 > schrieb : > >> Hi, >> >> I am using isar system in isar-cip-core project [1] where I found >> some reproducible failures, which may be good to fix in the isar >> system. I am not good in modifying the isar system, so could you >> please guide me to fix these problems? > > Well ... for isar and maybe to some degree also to debian, a truly > reproducible build would be a new topic that so far has been ignored. > Not completely new, see Debian wiki. >> Here are the steps to check the reproducible failures in >> isar-cip-core project: >> https://gitlab.com/cip-project/cip-core/isar-cip-core/-/issues/12 >> https://gitlab.com/cip-project/cip-core/isar-cip-core/-/issues/13 >> >> I also verified the reproducibility in the isar system and found >> similar failures that are copied below: >> ============================================== tmp/gpghomefHv8eRhk43/ >> tmp/gpghomefHv8eRhk43/private-keys-v1.d/ >> usr/share/doc/hello/changelog.Debian.gz >> var/cache/debconf/config.dat >> var/cache/debconf/config.dat-old >> var/cache/ldconfig/aux-cache >> var/lib/dpkg/info/enable-fsck.md5sums >> var/lib/dpkg/info/example-raw.md5sums >> var/lib/dpkg/info/hello.md5sums >> var/lib/dpkg/info/isar-disable-apt-cache.md5sums >> var/lib/dpkg/info/isar-exclude-docs.md5sums >> var/lib/dpkg/info/sshd-regen-keys.md5sums >> var/lib/initramfs-tools/4.19.0-17-amd64 >> var/lib/systemd/catalog/database >> var/log/alternatives.log >> var/log/bootstrap.log >> var/log/dpkg.log >> var/log/apt/history.log >> var/log/apt/term.log >> ============================================== > > That said and looking at the list ... it all seems harmless. Maybe not > _all_ but a log file or a date here and there can maybe be ignored. > > I never really got the idea ... if one wants "exactly" the same result, > there is no reason to rebuild. You just store/distribute the > binary result. But hey you might have your reasons and explain those. The reasons are the same as for reproducible package build: Validate that your supply chain, in this case the "last mile", is consistent. > >> Steps to check reproducible failures in isar >> ==================================== >> $ . isar-init-build-env ../build1 && bitbake >> mc:qemuamd64-buster-tgz:isar-image-base $ . isar-init-build-env >> ../build2 && bitbake mc:qemuamd64-buster-tgz:isar-image-base $ mkdir >> -p rootfs1 rootfs2 $ tar -xzvf >> ./build1/tmp/deploy/images/qemuamd64/isar-image-base-debian-buster-qemuamd64.tar.gz >> -C ./rootfs1/ $ tar -xzvf >> ./build2/tmp/deploy/images/qemuamd64/isar-image-base-debian-buster-qemuamd64.tar.gz >> -C ./rootfs2/ $ rsync -nrclv ./rootfs1/ ./rootfs2/ > difference.txt >> ==================================== > > This is not even remotely close. Here you have been really lucky and > all the "diff" you got was caused by the build. If you introduce a long > pause between the builds ... you will get actually very different > results. > That is a feature .. because isar is tracking debian. But in scenarios > like yours it can be seen as a bug. In which case you need to build > against a custom debian mirror or snapshot.debian.org. (unfortunately > hard because the servers have rate limits, but isar-image-base could > work, or you restart that bitbake a few times) > Snapshot is also a good way to try that ... try a "buster" from a few > months ago for "build1". > > Or if you want to track build1 but _not_ track build2 you should use > ISAR_USE_CACHED_BASE_REPO = "1" for that offline rebuild. Obviously, this only makes sense when building against a snapshot, like we do in production builds as well, or via the local offline cache. > >> >> From the reproducible failures I found there are three different >> areas to fix these problem >> >> 1. Changelog file generation, which is embedding the build time >> date value at here >> (https://github.com/ilbers/isar/blob/master/meta/classes/debianize.bbclass#L34 >> ) > > That is a good finding if we want to do something about the "problem". > One could maybe derive the "date" from the file-modification time of > the recipe calling deb_debianize. > But now you have fun with git and will need git-restore-mtime. We could > also force people to put a fixed string there and only call date if > that string is not in place. > >> 2. Log files generated by different application, which are >> adding build time values, I think we can remove these files if it is >> not required after build. ( I tried at here >> https://github.com/ilbers/isar/blob/master/meta/classes/image.bbclass#L183 >> but it did not work) > > Doing it there would be a good place. You could also use > ROOTFS_POSTPROCESS_COMMAND which allows such things for layers so you > do not need to touch the core. > > I could envision something like > > for f in "find all files not owned by any package": > if f start.with(/etc) > continue > if other funny exception > continue > rm f > > In addition to what you want ... this would also shrink that rootfs, > which would be nice even for people that do not care about repro. > logs, tmpdirs, caches would be nice to get rid of. > >> 3. Cache and temporary files, I think we can delete these files >> also. > > See previous. Just do all at once asking the package manager which > files it does not know. This will also enforce a really nice discipline > on users to not abuse ROOTFS_POSTPROCESS_COMMAND to smuggle files into > the rootfs. > >> Please guide me to fix these issues. > > So while i am not 100% with the whole repro idea ... and whether it can > really be done in complex layers ... because you are really not > building a complicated thing here ... > > More real use-cases will contain many more packages build by isar, > maybe introducing their own share of "repro" mistakes. > So the "cherry on the cake" would be a helper script to allow anyone to > spot repro diffs. It would run the same build, one online once > ISAR_USE_CACHED_BASE_REPO and spit out two folders and a diff summary. > To give anyone with repro in mind a chance to check their layer. In > fact one has to wonder if such a script should be added to OE or > already exists there. > And one would add that to CI to find new problems as they are > introduced. > > I think that allowing to provide a DEBIAN_CHANGELOG_DATE to enforce a > string and not call date could be an interesting patch. > > And that a "delete everything not owned by any package" would make a > really nice addition as well. > > Both as steps that are on their own valuable and happen to also work in > the direction of reproducible builds. > It's a worthwhile goal, no doubt. But it will also be a journey that has to start somewhere, ideally small. I think we have a couple if nice first steps here. Adding tooling or at least method descriptions to detect non-reproducibility in more complex layers will surely be helpful as well. Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux