From mboxrd@z Thu Jan 1 00:00:00 1970 X-GM-THRID: 6558372643829972992 X-Received: by 2002:a2e:8086:: with SMTP id i6-v6mr83123ljg.14.1527063732197; Wed, 23 May 2018 01:22:12 -0700 (PDT) X-BeenThere: isar-users@googlegroups.com Received: by 2002:a19:7008:: with SMTP id h8-v6ls382150lfc.1.gmail; Wed, 23 May 2018 01:22:11 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqrmmEv1IdxLgTcFiDz/eXVssGC5CN7VI39M5biJEsmdIkQsgu7r7rDkiiKgQova/TxKVEq X-Received: by 2002:a19:9603:: with SMTP id y3-v6mr98209lfd.18.1527063731693; Wed, 23 May 2018 01:22:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527063731; cv=none; d=google.com; s=arc-20160816; b=cait3UgEAQcA0ksNrBcRORuEqzH7cCmRDA0H+KQsgBS5EBGhG37w7KJwym/BB/jJar BcA50LQslXvHqiDwk0hw2nzhaKQb4iKRbSmhSscelXjRL06i7Pial5ANRTWYFUgpDaxx PphmjBwA4Mep2Eczlhuy6p+UAyLyqKPw1tU9FAp9vBV179S1+ACZTaGT/0SwnQSm9XCJ 05lD+uJ/3Z+l98xWh3DL0Vmpp4OO74/girA+I3pYC9gwaw3A6cXgEK71jrivADJO/ad2 qtDc6BQ2vsCa9zNpLCVCLQ5C7PhSko0xR+yZOrSMmI2Lms2B7Tu10Kvs27oa+4gSgeu9 i1jQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:to:subject :arc-authentication-results; bh=eF//84C3RHpiQWNS8/+MalWEA4OyI7aWl1Lk8czi5LI=; b=lWJsD9MlYrYfICA9oRcrnI2OytlgJHGOrDR5SEMZHdro8WmZ2uq56bA4nT3ORhEXF/ ZhGULg0vmuzhA3Tbd5jKjZFtAWq0wPdRCNz+FTEK8ONxg9BsYsF7v+x121YmdzRE6Ktn aSGL/Olfe8N9LarRZXFNVmoqzdssn2lhRB2njGBprs06P7s3astDEkACjQXF10ZFm7D3 hWsfB3p1Utmtu9KOEHLClp8Wd72E+oAnOx88H+E2tv2TUhdXy37yplihL9u6qrSOiM9V o9mskgtbtDBF8kJ48o0KQ+4I5iV6Ame4G3GeeR/j+P90uOX4Jk/HwlmqVWTxKeUlQzlg uYsQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of claudius.heine.ext@siemens.com designates 194.138.37.39 as permitted sender) smtp.mailfrom=claudius.heine.ext@siemens.com Return-Path: Received: from lizzard.sbs.de (lizzard.sbs.de. [194.138.37.39]) by gmr-mx.google.com with ESMTPS id b17-v6si666150lfa.2.2018.05.23.01.22.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 May 2018 01:22:11 -0700 (PDT) Received-SPF: pass (google.com: domain of claudius.heine.ext@siemens.com designates 194.138.37.39 as permitted sender) client-ip=194.138.37.39; Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of claudius.heine.ext@siemens.com designates 194.138.37.39 as permitted sender) smtp.mailfrom=claudius.heine.ext@siemens.com Received: from mail2.sbs.de (mail2.sbs.de [192.129.41.66]) by lizzard.sbs.de (8.15.2/8.15.2) with ESMTPS id w4N8MAsV024843 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Wed, 23 May 2018 10:22:11 +0200 Received: from [139.25.69.69] (linux-ses-ext02.ppmd.siemens.net [139.25.69.69]) by mail2.sbs.de (8.15.2/8.15.2) with ESMTP id w4N8MArf022601 for ; Wed, 23 May 2018 10:22:10 +0200 Subject: Re: Idea for implementing reproducible builds To: isar-users References: <3467a5ec-182e-8c9a-cd19-7ad898323be7@siemens.com> <20180522223224.GE5882@yssyq.radix50.net> From: Claudius Heine Message-ID: <89f104dc-f192-8364-92f2-1345ea11207c@siemens.com> Date: Wed, 23 May 2018 10:22:10 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180522223224.GE5882@yssyq.radix50.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TUID: WRaSIwsaWa3w Hi Baurzhan, On 2018-05-23 00:32, Baurzhan Ismagulov wrote: > Hello Claudius, > > On Tue, May 22, 2018 at 01:55:21PM +0200, Claudius Heine wrote: >> I am still working on reproducible builds and here is my current idea to >> solve this. >> >> Simple put: Mount the /var/cache/apt/archives of the images and buildchroot >> to the isar-bootstrap root file system and then create a tarball of it. This >> way we have a tarball of the build just after debootstrap + upgrade with the >> one 'apt update' step done, but without any other changes to it and all used >> packages already in the apt package cache. >> >> When restoring just skip most of the isar-bootstrap steps and extract the >> tarball instead, since the packages are available in the package cache and >> the package index is not updated it will use the packages from the cache. >> >> This way we would side step the obstacle to make debootstrap reproducible by >> just using its product while the reset of the process can be redone by isar. > > Thanks for sharing. > > As I understand it: > > 1. The user runs bitbake isar-image-base, which > 1. Debootstraps a rootfs > 2. Tars it > 3. Unpacks the tar into buildchroot/rootfs and isar-image-base/rootfs Not exactly. See my RFC patches. I described the process in bullet points in the cover letter. > 2. The user adds the tarball to the product repo No, to the last point. I go into detail below. > > Is this correct? > > > In this scenario: > > * Step 1: How does bitbake decide whether to debootstrap or use the tarball? In my proposed patchset I use the 'ISAR_BOOTSTRAP_TARBALL' variable. > > * Step 2: If I have the following repo, where should the tar file be located > and versioned? > > myrepo > - meta > - meta-isar > - product1 > - product2 The tarfile has to be versioned outside of the repo, since there is a 1-to-many relationship between the source repo commit and the tarball. For instance openssl updates would not necessarly mean a new change to the repo, just a new build. > > * If two products built from one repo have non-identical rootfses, what does > the tarball contain? The tarball contains just what is done by debootstrap + apt update + apt config. All installation of further packages is done latter and are not part of the tarball. But you are pointing out a interesting topic. We have to make sure that the isar-bootstrap rootfs does not contain any product specific configuration. I could imagine that our current implementation of the multi-repo support might be to simple. > * What is the user supposed to do if he wants to update the tar to the current > upstream, fully or in part? AFAIK partial update is always a pain and I am not sure if that is something we should support on our first implementation. Fully updating is just not using the tarball and building everything again. > Considering our existing use cases, I'd suggest a couple of changes to your > concept. > > > Let's abbreviate our copy of Debian artifacts as "debian-mirror" (be it in form > of a tarball or anything else). > > > I see the following use cases: > > U1. debian-mirror doesn't exist. Create debian-mirror from upstream. This is done in my proposal. I just uses the apt cache. This contains just the used packages not the whole debian mirror. > U2.1. debian-mirror is versioned, e.g. in git. That is left to the user IMO, because that belongs into the choice of the backup strategy. Maybe people want to used btrfs-snapshots, tape drive or something else for this. The tarball also doesn't contain anything that is part of bitbake, so the downloads directory is in need to be saved as well. > U2.2. Use debian-mirror for buildchroot/rootfs and isar-image-base/rootfs. In my proposal the debian-mirror is created from buildchroot and isar-image-base and then is used to rebuild the isar-bootstrap where both recipes get their base rootfs from. > U2.3. Don't use upstream for building buildchroot/rootfs and > isar-image-base/rootfs. Since those packages are available in the cache, no additional download is needed. > U3.1. debian-mirror exists. Update all packages from upstream into > debian-mirror. Why is that needed? You could just delete the debian-mirror and then it is recreated with the current upstream anyway. > > U3.2. debian-mirror exists. Update chosen packages from upstream into > debian-mirror. E.g., openssl, optionally its dependencies, optionally its > dependents. Currently that means that the apt index needs to be updated partially. I don't know if its possible to update this index on a package + dependency level, but I doubt it. The result of this is that we need to merge upstream index with our own and pin all other packages to the old version. Even if we just create a complete mirror of all debian mirror, updating just one package with its dependencies is a serious scripting effort. Because of the complexity involved I would postpone this feature. > U3.3. debian-mirror exists and is used by two products. One product has to be > updated. The other one will be updated later. For product 1, update > chosen or all packages from upstream to debian-mirror. Product 2 should > still use the old packages. Just building one product without a 'ISAR_BOOTSTRAP_TARBALL' variable set while the other still uses this variable. > U3.4. Remove packages not used in any previous commit. I am currently not sure what you mean by that. Why would there be packages that aren't used in any previous commits? > Given those, I'd suggest using debs as versioned entities instead of the rootfs > tarball. I don't get your reasoning here. All of those requirements, apart from one can be done with my solution. An this requirement is hard in any case. > Create an apt repo with dpkg-scanpackages and dpkg-scansources and use it to > debootstrap buildchroot and isar-image-base. > > This would address U2.3 and U3.3. This has been tested in practice, works > well, and is in my opinion the best way to solve the problem. U2.3 and U3.3 are no problem with my approach AFIAK. I look into this and came up with some difficulties when using an alternative debian-mirror repo that is generated from the used packages: 1. You need to change the apt repo urls. Yes multiple ones since we support multi-repos in isar. How are we handling this? Are we throwing stuff from different repos togehter? Or are we creating multiple locale repos for every used repo and then set them back later? Both solutions can cause (un)expected problems. How are we dealing with updates from upstream then? 2. How are we installing additional packages that are currently not part of the debian-mirror? If its just a different repo those packages would not be part of the package index, so those packages would not be available. If its a complete mirror of the repos, then it contains many packages that aren't needed. > With versioned tarballs, an update of a single package would make the whole > tarball change. This makes the history unreadable, Yes, this could be solved by adding some generated information about the tarball in a text file next to it. > wastes disk space, and many Maybe we could try some tar options to make xdeltas smaller. Or we could also try to put the apt index and the package cache outside of the tar file. My first discarded idea was to just extract the package cache + apt index, store those, then while building generate an apt repo from them and use this repo as debootstrap and main apt repo to install those packages. That makes handling of repo urls and installing new packages difficult as described. But maybe extracting the packages cache and index next to the debootstraped root file system might be a good compromise. > tools (including git) have problems with big files. Then don't use those tools for handling binary backups, because they aren't fit for the job. There is git-annex or btrfs-snapshots or maybe create incremental tarballs [1]. > From the UX perspective, I'd prefer to separate building images from preparing > debian-mirror if possible. A separate command / task / bitbake run with a var > set / unset, etc. E.g., bitbake -C createmirror isar-image-base, bitbake -C > updatemirror isar-image-base, etc. Ok. In the current RFC patchset this file is created all the time, I don't have an issue with changing that. > Please include user documentation when you provide patches. After we agreed to something I will document it. Thanks, Claudius [1] https://www.gnu.org/software/tar/manual/html_node/Incremental-Dumps.html -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de