public inbox for isar-users@googlegroups.com
 help / color / mirror / Atom feed
From: Claudius Heine <claudius.heine.ext@siemens.com>
To: isar-users <isar-users@googlegroups.com>
Subject: Re: Idea for implementing reproducible builds
Date: Wed, 23 May 2018 10:22:10 +0200	[thread overview]
Message-ID: <89f104dc-f192-8364-92f2-1345ea11207c@siemens.com> (raw)
In-Reply-To: <20180522223224.GE5882@yssyq.radix50.net>

Hi Baurzhan,

On 2018-05-23 00:32, Baurzhan Ismagulov wrote:
> Hello Claudius,
> 
> On Tue, May 22, 2018 at 01:55:21PM +0200, Claudius Heine wrote:
>> I am still working on reproducible builds and here is my current idea to
>> solve this.
>>
>> Simple put: Mount the /var/cache/apt/archives of the images and buildchroot
>> to the isar-bootstrap root file system and then create a tarball of it. This
>> way we have a tarball of the build just after debootstrap + upgrade with the
>> one 'apt update' step done, but without any other changes to it and all used
>> packages already in the apt package cache.
>>
>> When restoring just skip most of the isar-bootstrap steps and extract the
>> tarball instead, since the packages are available in the package cache and
>> the package index is not updated it will use the packages from the cache.
>>
>> This way we would side step the obstacle to make debootstrap reproducible by
>> just using its product while the reset of the process can be redone by isar.
> 
> Thanks for sharing.
> 
> As I understand it:
> 
> 1. The user runs bitbake isar-image-base, which
>     1. Debootstraps a rootfs >     2. Tars it
>     3. Unpacks the tar into buildchroot/rootfs and isar-image-base/rootfs

Not exactly. See my RFC patches. I described the process in bullet 
points in the cover letter.

> 2. The user adds the tarball to the product repo

No, to the last point. I go into detail below.

> 
> Is this correct?
> 
> 
> In this scenario:
> 
> * Step 1: How does bitbake decide whether to debootstrap or use the tarball?

In my proposed patchset I use the 'ISAR_BOOTSTRAP_TARBALL' variable.

> 
> * Step 2: If I have the following repo, where should the tar file be located
>    and versioned?
> 
>    myrepo
>    - meta
>    - meta-isar
>    - product1
>    - product2

The tarfile has to be versioned outside of the repo, since there is a 
1-to-many relationship between the source repo commit and the tarball.

For instance openssl updates would not necessarly mean a new change to 
the repo, just a new build.

> 
> * If two products built from one repo have non-identical rootfses, what does
>    the tarball contain?

The tarball contains just what is done by debootstrap + apt update + apt 
config. All installation of further packages is done latter and are not 
part of the tarball.

But you are pointing out a interesting topic. We have to make sure that 
the isar-bootstrap rootfs does not contain any product specific 
configuration. I could imagine that our current implementation of the 
multi-repo support might be to simple.


> * What is the user supposed to do if he wants to update the tar to the current
>    upstream, fully or in part?

AFAIK partial update is always a pain and I am not sure if that is 
something we should support on our first implementation. Fully updating 
is just not using the tarball and building everything again.

> Considering our existing use cases, I'd suggest a couple of changes to your
> concept.
> 
> 
> Let's abbreviate our copy of Debian artifacts as "debian-mirror" (be it in form
> of a tarball or anything else).
> 
> 
> I see the following use cases:
> 
> U1. debian-mirror doesn't exist. Create debian-mirror from upstream.

This is done in my proposal. I just uses the apt cache. This contains 
just the used packages not the whole debian mirror.

> U2.1. debian-mirror is versioned, e.g. in git.

That is left to the user IMO, because that belongs into the choice of 
the backup strategy. Maybe people want to used btrfs-snapshots, tape 
drive or something else for this.

The tarball also doesn't contain anything that is part of bitbake, so 
the downloads directory is in need to be saved as well.

> U2.2. Use debian-mirror for buildchroot/rootfs and isar-image-base/rootfs.

In my proposal the debian-mirror is created from buildchroot and 
isar-image-base and then is used to rebuild the isar-bootstrap where 
both recipes get their base rootfs from.

> U2.3. Don't use upstream for building buildchroot/rootfs and
>        isar-image-base/rootfs.

Since those packages are available in the cache, no additional download 
is needed.

> U3.1. debian-mirror exists. Update all packages from upstream into
>        debian-mirror.

Why is that needed? You could just delete the debian-mirror and then it 
is recreated with the current upstream anyway.

> 
> U3.2. debian-mirror exists. Update chosen packages from upstream into
>        debian-mirror. E.g., openssl, optionally its dependencies, optionally its
>        dependents.

Currently that means that the apt index needs to be updated partially.
I don't know if its possible to update this index on a package + 
dependency level, but I doubt it.
The result of this is that we need to merge upstream index with our own 
and pin all other packages to the old version.

Even if we just create a complete mirror of all debian mirror, updating 
just one package with its dependencies is a serious scripting effort.

Because of the complexity involved I would postpone this feature.

> U3.3. debian-mirror exists and is used by two products. One product has to be
>        updated. The other one will be updated later. For product 1, update
>        chosen or all packages from upstream to debian-mirror. Product 2 should
>        still use the old packages.

Just building one product without a 'ISAR_BOOTSTRAP_TARBALL' variable 
set while the other still uses this variable.

> U3.4. Remove packages not used in any previous commit.

I am currently not sure what you mean by that. Why would there be 
packages that aren't used in any previous commits?

> Given those, I'd suggest using debs as versioned entities instead of the rootfs
> tarball.

I don't get your reasoning here. All of those requirements, apart from 
one can be done with my solution. An this requirement is hard in any case.

> Create an apt repo with dpkg-scanpackages and dpkg-scansources and use it to
> debootstrap buildchroot and isar-image-base.
> 
> This would address U2.3 and U3.3. This has been tested in practice, works
> well, and is in my opinion the best way to solve the problem.

U2.3 and U3.3 are no problem with my approach AFIAK.

I look into this and came up with some difficulties when using an 
alternative debian-mirror repo that is generated from the used packages:

     1. You need to change the apt repo urls. Yes multiple ones since we
        support multi-repos in isar. How are we handling this? Are we
        throwing stuff from different repos togehter? Or are we creating
        multiple locale repos for every used repo and then set them back
        later? Both solutions can cause (un)expected problems.
        How are we dealing with updates from upstream then?
     2. How are we installing additional packages that are currently not
        part of the debian-mirror? If its just a different repo those
        packages would not be part of the package index, so those
        packages would not be available. If its a complete mirror of the
        repos, then it contains many packages that aren't needed.

> With versioned tarballs, an update of a single package would make the whole
> tarball change. This makes the history unreadable,

Yes, this could be solved by adding some generated information about the 
tarball in a text file next to it.

> wastes disk space, and many

Maybe we could try some tar options to make xdeltas smaller. Or we could 
also try to put the apt index and the package cache outside of the tar file.

My first discarded idea was to just extract the package cache + apt 
index, store those, then while building generate an apt repo from them 
and use this repo as debootstrap and main apt repo to install those 
packages. That makes handling of repo urls and installing new packages 
difficult as described. But maybe extracting the packages cache and 
index next to the debootstraped root file system might be a good compromise.

> tools (including git) have problems with big files.

Then don't use those tools for handling binary backups, because they 
aren't fit for the job. There is git-annex or btrfs-snapshots or maybe 
create incremental tarballs [1].

>  From the UX perspective, I'd prefer to separate building images from preparing
> debian-mirror if possible. A separate command / task / bitbake run with a var
> set / unset, etc. E.g., bitbake -C createmirror isar-image-base, bitbake -C
> updatemirror isar-image-base, etc.

Ok. In the current RFC patchset this file is created all the time, I 
don't have an issue with changing that.

> Please include user documentation when you provide patches.

After we agreed to something I will document it.

Thanks,
Claudius

[1] https://www.gnu.org/software/tar/manual/html_node/Incremental-Dumps.html

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

  reply	other threads:[~2018-05-23  8:22 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-22 11:55 Claudius Heine
2018-05-22 13:47 ` Andreas Reichel
2018-05-22 14:24   ` Claudius Heine
2018-05-22 22:32 ` Baurzhan Ismagulov
2018-05-23  8:22   ` Claudius Heine [this message]
2018-05-23 11:34     ` Claudius Heine
2018-06-04 11:48     ` Baurzhan Ismagulov
2018-05-23  6:32 ` [RFC PATCH 0/3] Reproducible build claudius.heine.ext
2018-05-23  6:32   ` [RFC PATCH 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives claudius.heine.ext
2018-05-23  6:32   ` [RFC PATCH 2/3] meta/classes/image: added isar_bootstrap_tarball task claudius.heine.ext
2018-05-23  6:32   ` [RFC PATCH 3/3] meta/isar-bootstrap: add 'do_restore_from_tarball' task claudius.heine.ext
2018-05-23 14:30   ` [RFC PATCH 0/3] Reproducible build Maxim Yu. Osipov
2018-05-23 15:20     ` Claudius Heine
2018-05-24 16:00   ` Henning Schild
2018-05-25  8:10     ` Claudius Heine
2018-05-25 11:57       ` Maxim Yu. Osipov
2018-05-25 17:04         ` Claudius Heine
2018-06-04 11:37           ` Baurzhan Ismagulov
2018-06-04 16:05             ` Claudius Heine
2018-06-05 10:42               ` Claudius Heine
2018-06-06  9:17                 ` Claudius Heine
2018-06-06 14:20                   ` Claudius Heine
2018-06-07  8:50                     ` Baurzhan Ismagulov
2018-06-07  8:08                 ` Maxim Yu. Osipov
2018-06-11  8:45                   ` Claudius Heine
2018-06-11 13:51                     ` Claudius Heine
2018-06-14  8:50                       ` Claudius Heine
2018-06-20  4:20                         ` Maxim Yu. Osipov
2018-06-20  8:12                           ` Claudius Heine
2018-05-23 13:26 ` [RFC PATCH v2 " claudius.heine.ext
2018-05-23 13:26 ` [RFC PATCH v2 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives claudius.heine.ext
2018-05-23 13:26 ` [RFC PATCH v2 2/3] meta/classes/image: added isar_bootstrap_tarball task claudius.heine.ext
2018-05-23 13:26 ` [RFC PATCH v2 3/3] meta/isar-bootstrap: add 'do_restore_from_tarball' task claudius.heine.ext

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89f104dc-f192-8364-92f2-1345ea11207c@siemens.com \
    --to=claudius.heine.ext@siemens.com \
    --cc=isar-users@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox