public inbox for isar-users@googlegroups.com
 help / color / mirror / Atom feed
* Idea for implementing reproducible builds
@ 2018-05-22 11:55 Claudius Heine
  2018-05-22 13:47 ` Andreas Reichel
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Claudius Heine @ 2018-05-22 11:55 UTC (permalink / raw)
  To: isar-users

Hi,

I am still working on reproducible builds and here is my current idea to 
solve this.

Simple put: Mount the /var/cache/apt/archives of the images and 
buildchroot to the isar-bootstrap root file system and then create a 
tarball of it. This way we have a tarball of the build just after 
debootstrap + upgrade with the one 'apt update' step done, but without 
any other changes to it and all used packages already in the apt package 
cache.

When restoring just skip most of the isar-bootstrap steps and extract 
the tarball instead, since the packages are available in the package 
cache and the package index is not updated it will use the packages from 
the cache.

This way we would side step the obstacle to make debootstrap 
reproducible by just using its product while the reset of the process 
can be redone by isar.

Is this solution enough? Or are you seeing any problem with this approach?

Claudius

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Idea for implementing reproducible builds
  2018-05-22 11:55 Idea for implementing reproducible builds Claudius Heine
@ 2018-05-22 13:47 ` Andreas Reichel
  2018-05-22 14:24   ` Claudius Heine
  2018-05-22 22:32 ` Baurzhan Ismagulov
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Andreas Reichel @ 2018-05-22 13:47 UTC (permalink / raw)
  To: [ext] Claudius Heine; +Cc: isar-users

On Tue, May 22, 2018 at 01:55:21PM +0200, [ext] Claudius Heine wrote:
> Hi,
> 
> I am still working on reproducible builds and here is my current idea to
> solve this.
> 
> Simple put: Mount the /var/cache/apt/archives of the images and buildchroot
> to the isar-bootstrap root file system and then create a tarball of it. This
> way we have a tarball of the build just after debootstrap + upgrade with the
> one 'apt update' step done, but without any other changes to it and all used
> packages already in the apt package cache.
> 
> When restoring just skip most of the isar-bootstrap steps and extract the
> tarball instead, since the packages are available in the package cache and
> the package index is not updated it will use the packages from the cache.
> 
> This way we would side step the obstacle to make debootstrap reproducible by
> just using its product while the reset of the process can be redone by isar.
"while the reset of the process can be redone by Isar."
I think I did not understand that exactly :)

Andreas

> 
> Is this solution enough? Or are you seeing any problem with this approach?
> 
> Claudius
> 
> -- 
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de
> 
> -- 
> You received this message because you are subscribed to the Google Groups "isar-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isar-users+unsubscribe@googlegroups.com.
> To post to this group, send email to isar-users@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/isar-users/3467a5ec-182e-8c9a-cd19-7ad898323be7%40siemens.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
Andreas Reichel
Dipl.-Phys. (Univ.)
Software Consultant

Andreas.Reichel@tngtech.com, +49-174-3180074
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterfoehring
Geschaeftsfuehrer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Mueller
Sitz: Unterfoehring * Amtsgericht Muenchen * HRB 135082


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Idea for implementing reproducible builds
  2018-05-22 13:47 ` Andreas Reichel
@ 2018-05-22 14:24   ` Claudius Heine
  0 siblings, 0 replies; 33+ messages in thread
From: Claudius Heine @ 2018-05-22 14:24 UTC (permalink / raw)
  To: Andreas Reichel; +Cc: isar-users

Hi,

On 2018-05-22 15:47, Andreas Reichel wrote:
> On Tue, May 22, 2018 at 01:55:21PM +0200, [ext] Claudius Heine wrote:
>> Hi,
>>
>> I am still working on reproducible builds and here is my current idea to
>> solve this.
>>
>> Simple put: Mount the /var/cache/apt/archives of the images and buildchroot
>> to the isar-bootstrap root file system and then create a tarball of it. This
>> way we have a tarball of the build just after debootstrap + upgrade with the
>> one 'apt update' step done, but without any other changes to it and all used
>> packages already in the apt package cache.
>>
>> When restoring just skip most of the isar-bootstrap steps and extract the
>> tarball instead, since the packages are available in the package cache and
>> the package index is not updated it will use the packages from the cache.
>>
>> This way we would side step the obstacle to make debootstrap reproducible by
>> just using its product while the reset of the process can be redone by isar.
> "while the reset of the process can be redone by Isar."
> I think I did not understand that exactly :)

Thats a typo. Should be "while the rest of the process can be redone by 
Isar".

What I mean by that is that everything done by isar after the base 
isar-bootstrap file system is generated or everything that is 
independent for the isar-bootstrap file system will be done in the same 
way again. Only the generation of the isar-bootstrap is changed from 
using debootstrap + 'apt update' + 'apt upgrade' + config deployment to 
just extracting the tarball.

Claudius

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Idea for implementing reproducible builds
  2018-05-22 11:55 Idea for implementing reproducible builds Claudius Heine
  2018-05-22 13:47 ` Andreas Reichel
@ 2018-05-22 22:32 ` Baurzhan Ismagulov
  2018-05-23  8:22   ` Claudius Heine
  2018-05-23  6:32 ` [RFC PATCH 0/3] Reproducible build claudius.heine.ext
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Baurzhan Ismagulov @ 2018-05-22 22:32 UTC (permalink / raw)
  To: isar-users

Hello Claudius,

On Tue, May 22, 2018 at 01:55:21PM +0200, Claudius Heine wrote:
> I am still working on reproducible builds and here is my current idea to
> solve this.
> 
> Simple put: Mount the /var/cache/apt/archives of the images and buildchroot
> to the isar-bootstrap root file system and then create a tarball of it. This
> way we have a tarball of the build just after debootstrap + upgrade with the
> one 'apt update' step done, but without any other changes to it and all used
> packages already in the apt package cache.
> 
> When restoring just skip most of the isar-bootstrap steps and extract the
> tarball instead, since the packages are available in the package cache and
> the package index is not updated it will use the packages from the cache.
> 
> This way we would side step the obstacle to make debootstrap reproducible by
> just using its product while the reset of the process can be redone by isar.

Thanks for sharing.

As I understand it:

1. The user runs bitbake isar-image-base, which
   1. Debootstraps a rootfs
   2. Tars it
   3. Unpacks the tar into buildchroot/rootfs and isar-image-base/rootfs
2. The user adds the tarball to the product repo

Is this correct?


In this scenario:

* Step 1: How does bitbake decide whether to debootstrap or use the tarball?

* Step 2: If I have the following repo, where should the tar file be located
  and versioned?

  myrepo
  - meta
  - meta-isar
  - product1
  - product2

* If two products built from one repo have non-identical rootfses, what does
  the tarball contain?

* What is the user supposed to do if he wants to update the tar to the current
  upstream, fully or in part?


Considering our existing use cases, I'd suggest a couple of changes to your
concept.


Let's abbreviate our copy of Debian artifacts as "debian-mirror" (be it in form
of a tarball or anything else).


I see the following use cases:

U1. debian-mirror doesn't exist. Create debian-mirror from upstream.

U2.1. debian-mirror is versioned, e.g. in git.

U2.2. Use debian-mirror for buildchroot/rootfs and isar-image-base/rootfs.

U2.3. Don't use upstream for building buildchroot/rootfs and
      isar-image-base/rootfs.

U3.1. debian-mirror exists. Update all packages from upstream into
      debian-mirror.

U3.2. debian-mirror exists. Update chosen packages from upstream into
      debian-mirror. E.g., openssl, optionally its dependencies, optionally its
      dependents.

U3.3. debian-mirror exists and is used by two products. One product has to be
      updated. The other one will be updated later. For product 1, update
      chosen or all packages from upstream to debian-mirror. Product 2 should
      still use the old packages.

U3.4. Remove packages not used in any previous commit.


Given those, I'd suggest using debs as versioned entities instead of the rootfs
tarball.

Create an apt repo with dpkg-scanpackages and dpkg-scansources and use it to
debootstrap buildchroot and isar-image-base.

This would address U2.3 and U3.3. This has been tested in practice, works
well, and is in my opinion the best way to solve the problem.

With versioned tarballs, an update of a single package would make the whole
tarball change. This makes the history unreadable, wastes disk space, and many
tools (including git) have problems with big files.


>From the UX perspective, I'd prefer to separate building images from preparing
debian-mirror if possible. A separate command / task / bitbake run with a var
set / unset, etc. E.g., bitbake -C createmirror isar-image-base, bitbake -C
updatemirror isar-image-base, etc.


Please include user documentation when you provide patches.


I think that in the end, we'll have to providing specialized infrastructure for
managing debian-mirror(s). It may or may not be based on anything from
https://wiki.debian.org/DebianRepository/Setup?action=show&redirect=HowToSetupADebianRepository
.


With kind regards,
Baurzhan.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 0/3] Reproducible build
  2018-05-22 11:55 Idea for implementing reproducible builds Claudius Heine
  2018-05-22 13:47 ` Andreas Reichel
  2018-05-22 22:32 ` Baurzhan Ismagulov
@ 2018-05-23  6:32 ` claudius.heine.ext
  2018-05-23  6:32   ` [RFC PATCH 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives claudius.heine.ext
                     ` (4 more replies)
  2018-05-23 13:26 ` [RFC PATCH v2 " claudius.heine.ext
                   ` (3 subsequent siblings)
  6 siblings, 5 replies; 33+ messages in thread
From: claudius.heine.ext @ 2018-05-23  6:32 UTC (permalink / raw)
  To: isar-users; +Cc: Claudius Heine

From: Claudius Heine <ch@denx.de>

Hi,

this patchset contains a implementation of my proposed solution for
reproducible builds.

I am currenlty not quite sure if that is the right approach, but it is
the simplest I can think of currently.

As already described in my proposal, this patchset does the following:

  1. Takes care that the package cache in the isar-bootstrap root file
     system contains all the packages used for this distro/architecture.
  2. A tarball is created after the package cache contains all the
     packages needed by the image.
  3. This tarball can be used as the basis of subsequent builds by
     setting a bitbake variable.

This is just a first draft of this feature, maybe we can further improve
some steps and maybe there are better ideas to improve the usability.

Cheers,
Claudius

Claudius Heine (3):
  meta/isar-bootstrap-helper+dpkg.bbclass: bind mount
    /var/cache/apt/archives
  meta/classes/image: added isar_bootstrap_tarball task
  meta/isar-bootstrap: add 'do_restore_from_tarball' task

 meta/classes/dpkg.bbclass                     |  5 ++++
 meta/classes/image.bbclass                    | 10 +++++++
 meta/classes/isar-bootstrap-helper.bbclass    |  9 ++++++-
 .../isar-bootstrap/isar-bootstrap.bb          | 27 ++++++++++++++++++-
 4 files changed, 49 insertions(+), 2 deletions(-)

-- 
2.17.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives
  2018-05-23  6:32 ` [RFC PATCH 0/3] Reproducible build claudius.heine.ext
@ 2018-05-23  6:32   ` claudius.heine.ext
  2018-05-23  6:32   ` [RFC PATCH 2/3] meta/classes/image: added isar_bootstrap_tarball task claudius.heine.ext
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: claudius.heine.ext @ 2018-05-23  6:32 UTC (permalink / raw)
  To: isar-users; +Cc: Claudius Heine

From: Claudius Heine <ch@denx.de>

Bind mount the /var/cache/apt/archives directory to the original
isar-bootstrap root file system, so that the cache is shared between all
images based on it. Central package cache is the first step of
reproducible builds.

This should allow faster execution of subsequent builds.

Signed-off-by: Claudius Heine <ch@denx.de>
---
 meta/classes/dpkg.bbclass                  | 5 +++++
 meta/classes/isar-bootstrap-helper.bbclass | 9 ++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/meta/classes/dpkg.bbclass b/meta/classes/dpkg.bbclass
index c8d4ac5..5422b9a 100644
--- a/meta/classes/dpkg.bbclass
+++ b/meta/classes/dpkg.bbclass
@@ -5,6 +5,11 @@ inherit dpkg-base
 
 # Build package from sources using build script
 dpkg_runbuild() {
+    DEBOOTSTRAP_DIR="${DEPLOY_DIR_IMAGE}/isar-bootstrap-${DISTRO}-${DISTRO_ARCH}/"
     E="${@ bb.utils.export_proxies(d)}"
+    mountpoint -q "${BUILDCHROOT_DIR}/var/cache/apt/archives" || \
+        sudo mount --bind \
+            "$DEBOOTSTRAP_DIR/var/cache/apt/archives" \
+            "${BUILDCHROOT_DIR}/var/cache/apt/archives"
     sudo -E chroot ${BUILDCHROOT_DIR} /build.sh ${PP}/${PPS}
 }
diff --git a/meta/classes/isar-bootstrap-helper.bbclass b/meta/classes/isar-bootstrap-helper.bbclass
index 76e20f6..fa68a9f 100644
--- a/meta/classes/isar-bootstrap-helper.bbclass
+++ b/meta/classes/isar-bootstrap-helper.bbclass
@@ -17,8 +17,10 @@ setup_root_file_system() {
               -o Debug::pkgProblemResolver=yes"
     CLEAN_FILES="${ROOTFSDIR}/etc/hostname ${ROOTFSDIR}/etc/resolv.conf"
 
+    DEBOOTSTRAP_DIR="${DEPLOY_DIR_IMAGE}/isar-bootstrap-${DISTRO}-${DISTRO_ARCH}/"
+
     sudo cp -Trpfx \
-        "${DEPLOY_DIR_IMAGE}/isar-bootstrap-${DISTRO}-${DISTRO_ARCH}/" \
+        "$DEBOOTSTRAP_DIR" \
         "$ROOTFSDIR"
 
     echo "deb file:///isar-apt ${DEBDISTRONAME} main" | \
@@ -27,6 +29,10 @@ setup_root_file_system() {
     echo "Package: *\nPin: release n=${DEBDISTRONAME}\nPin-Priority: 1000" | \
         sudo tee "$ROOTFSDIR/etc/apt/preferences.d/isar" >/dev/null
 
+    sudo mount --bind \
+        "$DEBOOTSTRAP_DIR/var/cache/apt/archives" \
+        "$ROOTFSDIR/var/cache/apt/archives"
+
     sudo mount --bind ${DEPLOY_DIR_APT}/${DISTRO} $ROOTFSDIR/isar-apt
     sudo mount -t devtmpfs -o mode=0755,nosuid devtmpfs $ROOTFSDIR/dev
     sudo mount -t proc none $ROOTFSDIR/proc
@@ -55,6 +61,7 @@ setup_root_file_system() {
             /usr/bin/apt-get purge -y ${IMAGE_CFG_PACKAGE}
     fi
     if [ "clean" = ${CLEAN} ]; then
+        sudo umount -l "$ROOTFSDIR/var/cache/apt/archives"
         sudo -E chroot "$ROOTFSDIR" \
             /usr/bin/apt-get autoremove --purge -y
         sudo -E chroot "$ROOTFSDIR" \
-- 
2.17.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 2/3] meta/classes/image: added isar_bootstrap_tarball task
  2018-05-23  6:32 ` [RFC PATCH 0/3] Reproducible build claudius.heine.ext
  2018-05-23  6:32   ` [RFC PATCH 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives claudius.heine.ext
@ 2018-05-23  6:32   ` claudius.heine.ext
  2018-05-23  6:32   ` [RFC PATCH 3/3] meta/isar-bootstrap: add 'do_restore_from_tarball' task claudius.heine.ext
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: claudius.heine.ext @ 2018-05-23  6:32 UTC (permalink / raw)
  To: isar-users; +Cc: Claudius Heine

From: Claudius Heine <ch@denx.de>

This patch adds the 'isar_bootstrap_tarball' task to the image bbclass.

This task creates a tarball of the isar-bootstrap file system after the
image was generated. This tarball can be later used to regenerate the
image from within isar.

Signed-off-by: Claudius Heine <ch@denx.de>
---
 meta/classes/image.bbclass | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/meta/classes/image.bbclass b/meta/classes/image.bbclass
index 3bdcb2f..5276825 100644
--- a/meta/classes/image.bbclass
+++ b/meta/classes/image.bbclass
@@ -71,3 +71,13 @@ do_copy_boot_files() {
 addtask copy_boot_files before do_build after do_rootfs
 do_copy_boot_files[dirs] = "${DEPLOY_DIR_IMAGE}"
 do_copy_boot_files[stamp-extra-info] = "${DISTRO}-${MACHINE}"
+
+do_isar_bootstrap_tarball() {
+    DEBOOTSTRAP_DIR="${DEPLOY_DIR_IMAGE}/isar-bootstrap-${DISTRO}-${DISTRO_ARCH}/"
+    ISAR_BOOTSTRAP_TARBALL="${DEPLOY_DIR_IMAGE}/isar-bootstrap-${DISTRO}-${DISTRO_ARCH}-${PN}-${MACHINE}.tgz"
+    sudo tar -czf "${ISAR_BOOTSTRAP_TARBALL}" -C ${DEBOOTSTRAP_DIR} \
+        --exclude='./dev/*' --exclude='./proc/*' --exclude='./sys/*' .
+}
+addtask isar_bootstrap_tarball before do_build after do_rootfs
+do_isar_bootstrap_tarball[dirs] = "${DEPLOY_DIR_IMAGE}"
+do_isar_bootstrap_tarball[stamp-extra-info] = "${DISTRO}-${MACHINE}"
-- 
2.17.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 3/3] meta/isar-bootstrap: add 'do_restore_from_tarball' task
  2018-05-23  6:32 ` [RFC PATCH 0/3] Reproducible build claudius.heine.ext
  2018-05-23  6:32   ` [RFC PATCH 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives claudius.heine.ext
  2018-05-23  6:32   ` [RFC PATCH 2/3] meta/classes/image: added isar_bootstrap_tarball task claudius.heine.ext
@ 2018-05-23  6:32   ` claudius.heine.ext
  2018-05-23 14:30   ` [RFC PATCH 0/3] Reproducible build Maxim Yu. Osipov
  2018-05-24 16:00   ` Henning Schild
  4 siblings, 0 replies; 33+ messages in thread
From: claudius.heine.ext @ 2018-05-23  6:32 UTC (permalink / raw)
  To: isar-users; +Cc: Claudius Heine

From: Claudius Heine <ch@denx.de>

This new task 'do_restore_from_tarball' is deactivated by default
("noexec" flag is set) and is only activated if the
'ISAR_BOOTSTRAP_TARBALL' variable is set.

The 'ISAR_BOOTSTRAP_TARBALL' variable should point the the
isar-bootstrap tarball generated by a image-recipe.

If this variable is set, tasks that are normally creating the
isar-bootstrap root file system are deactivated and instead the
'do_restore_from_tarball' task is activated and restores the
isar-bootstrap root file system from the tarball.

Signed-off-by: Claudius Heine <ch@denx.de>
---
 .../isar-bootstrap/isar-bootstrap.bb          | 27 ++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/meta/recipes-core/isar-bootstrap/isar-bootstrap.bb b/meta/recipes-core/isar-bootstrap/isar-bootstrap.bb
index 2089386..998113d 100644
--- a/meta/recipes-core/isar-bootstrap/isar-bootstrap.bb
+++ b/meta/recipes-core/isar-bootstrap/isar-bootstrap.bb
@@ -229,12 +229,37 @@ do_apt_update() {
 }
 addtask apt_update before do_build after do_apt_config_install
 
+python() {
+    if d.getVar("ISAR_BOOTSTRAP_TARBALL", True):
+        d.setVarFlag("do_generate_keyring", "noexec", "1")
+        d.setVarFlag("do_apt_config_prepare", "noexec", "1")
+        d.setVarFlag("do_bootstrap", "noexec", "1")
+        d.setVarFlag("do_apt_config_install", "noexec", "1")
+        d.setVarFlag("do_apt_update", "noexec", "1")
+        d.delVarFlag("do_restore_from_tarball", "noexec")
+}
+
+do_restore_from_tarball[noexec] = "1"
+do_restore_from_tarball[stamp-extra-info] = "${DISTRO}-${DISTRO_ARCH}"
+do_restore_from_tarball() {
+    if [ -e "${ROOTFSDIR}" ]; then
+       sudo umount -l "${ROOTFSDIR}/dev" || true
+       sudo umount -l "${ROOTFSDIR}/proc" || true
+       sudo rm -rf "${ROOTFSDIR}"
+    fi
+    sudo mkdir -p "${ROOTFSDIR}"
+    sudo tar xf "${ISAR_BOOTSTRAP_TARBALL}" -C "${ROOTFSDIR}" 
+    sudo mount -t devtmpfs -o mode=0755,nosuid devtmpfs ${ROOTFSDIR}/dev
+    sudo mount -t proc none ${ROOTFSDIR}/proc
+}
+addtask restore_from_tarball before do_build after do_unpack
+
 do_deploy[stamp-extra-info] = "${DISTRO}-${DISTRO_ARCH}"
 do_deploy[dirs] = "${DEPLOY_DIR_IMAGE}"
 do_deploy() {
     ln -Tfsr "${ROOTFSDIR}" "${DEPLOY_DIR_IMAGE}/${PN}-${DISTRO}-${DISTRO_ARCH}"
 }
-addtask deploy before do_build after do_apt_update
+addtask deploy before do_build after do_apt_update do_restore_from_tarball
 
 CLEANFUNCS = "clean_deploy"
 clean_deploy() {
-- 
2.17.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Idea for implementing reproducible builds
  2018-05-22 22:32 ` Baurzhan Ismagulov
@ 2018-05-23  8:22   ` Claudius Heine
  2018-05-23 11:34     ` Claudius Heine
  2018-06-04 11:48     ` Baurzhan Ismagulov
  0 siblings, 2 replies; 33+ messages in thread
From: Claudius Heine @ 2018-05-23  8:22 UTC (permalink / raw)
  To: isar-users

Hi Baurzhan,

On 2018-05-23 00:32, Baurzhan Ismagulov wrote:
> Hello Claudius,
> 
> On Tue, May 22, 2018 at 01:55:21PM +0200, Claudius Heine wrote:
>> I am still working on reproducible builds and here is my current idea to
>> solve this.
>>
>> Simple put: Mount the /var/cache/apt/archives of the images and buildchroot
>> to the isar-bootstrap root file system and then create a tarball of it. This
>> way we have a tarball of the build just after debootstrap + upgrade with the
>> one 'apt update' step done, but without any other changes to it and all used
>> packages already in the apt package cache.
>>
>> When restoring just skip most of the isar-bootstrap steps and extract the
>> tarball instead, since the packages are available in the package cache and
>> the package index is not updated it will use the packages from the cache.
>>
>> This way we would side step the obstacle to make debootstrap reproducible by
>> just using its product while the reset of the process can be redone by isar.
> 
> Thanks for sharing.
> 
> As I understand it:
> 
> 1. The user runs bitbake isar-image-base, which
>     1. Debootstraps a rootfs >     2. Tars it
>     3. Unpacks the tar into buildchroot/rootfs and isar-image-base/rootfs

Not exactly. See my RFC patches. I described the process in bullet 
points in the cover letter.

> 2. The user adds the tarball to the product repo

No, to the last point. I go into detail below.

> 
> Is this correct?
> 
> 
> In this scenario:
> 
> * Step 1: How does bitbake decide whether to debootstrap or use the tarball?

In my proposed patchset I use the 'ISAR_BOOTSTRAP_TARBALL' variable.

> 
> * Step 2: If I have the following repo, where should the tar file be located
>    and versioned?
> 
>    myrepo
>    - meta
>    - meta-isar
>    - product1
>    - product2

The tarfile has to be versioned outside of the repo, since there is a 
1-to-many relationship between the source repo commit and the tarball.

For instance openssl updates would not necessarly mean a new change to 
the repo, just a new build.

> 
> * If two products built from one repo have non-identical rootfses, what does
>    the tarball contain?

The tarball contains just what is done by debootstrap + apt update + apt 
config. All installation of further packages is done latter and are not 
part of the tarball.

But you are pointing out a interesting topic. We have to make sure that 
the isar-bootstrap rootfs does not contain any product specific 
configuration. I could imagine that our current implementation of the 
multi-repo support might be to simple.


> * What is the user supposed to do if he wants to update the tar to the current
>    upstream, fully or in part?

AFAIK partial update is always a pain and I am not sure if that is 
something we should support on our first implementation. Fully updating 
is just not using the tarball and building everything again.

> Considering our existing use cases, I'd suggest a couple of changes to your
> concept.
> 
> 
> Let's abbreviate our copy of Debian artifacts as "debian-mirror" (be it in form
> of a tarball or anything else).
> 
> 
> I see the following use cases:
> 
> U1. debian-mirror doesn't exist. Create debian-mirror from upstream.

This is done in my proposal. I just uses the apt cache. This contains 
just the used packages not the whole debian mirror.

> U2.1. debian-mirror is versioned, e.g. in git.

That is left to the user IMO, because that belongs into the choice of 
the backup strategy. Maybe people want to used btrfs-snapshots, tape 
drive or something else for this.

The tarball also doesn't contain anything that is part of bitbake, so 
the downloads directory is in need to be saved as well.

> U2.2. Use debian-mirror for buildchroot/rootfs and isar-image-base/rootfs.

In my proposal the debian-mirror is created from buildchroot and 
isar-image-base and then is used to rebuild the isar-bootstrap where 
both recipes get their base rootfs from.

> U2.3. Don't use upstream for building buildchroot/rootfs and
>        isar-image-base/rootfs.

Since those packages are available in the cache, no additional download 
is needed.

> U3.1. debian-mirror exists. Update all packages from upstream into
>        debian-mirror.

Why is that needed? You could just delete the debian-mirror and then it 
is recreated with the current upstream anyway.

> 
> U3.2. debian-mirror exists. Update chosen packages from upstream into
>        debian-mirror. E.g., openssl, optionally its dependencies, optionally its
>        dependents.

Currently that means that the apt index needs to be updated partially.
I don't know if its possible to update this index on a package + 
dependency level, but I doubt it.
The result of this is that we need to merge upstream index with our own 
and pin all other packages to the old version.

Even if we just create a complete mirror of all debian mirror, updating 
just one package with its dependencies is a serious scripting effort.

Because of the complexity involved I would postpone this feature.

> U3.3. debian-mirror exists and is used by two products. One product has to be
>        updated. The other one will be updated later. For product 1, update
>        chosen or all packages from upstream to debian-mirror. Product 2 should
>        still use the old packages.

Just building one product without a 'ISAR_BOOTSTRAP_TARBALL' variable 
set while the other still uses this variable.

> U3.4. Remove packages not used in any previous commit.

I am currently not sure what you mean by that. Why would there be 
packages that aren't used in any previous commits?

> Given those, I'd suggest using debs as versioned entities instead of the rootfs
> tarball.

I don't get your reasoning here. All of those requirements, apart from 
one can be done with my solution. An this requirement is hard in any case.

> Create an apt repo with dpkg-scanpackages and dpkg-scansources and use it to
> debootstrap buildchroot and isar-image-base.
> 
> This would address U2.3 and U3.3. This has been tested in practice, works
> well, and is in my opinion the best way to solve the problem.

U2.3 and U3.3 are no problem with my approach AFIAK.

I look into this and came up with some difficulties when using an 
alternative debian-mirror repo that is generated from the used packages:

     1. You need to change the apt repo urls. Yes multiple ones since we
        support multi-repos in isar. How are we handling this? Are we
        throwing stuff from different repos togehter? Or are we creating
        multiple locale repos for every used repo and then set them back
        later? Both solutions can cause (un)expected problems.
        How are we dealing with updates from upstream then?
     2. How are we installing additional packages that are currently not
        part of the debian-mirror? If its just a different repo those
        packages would not be part of the package index, so those
        packages would not be available. If its a complete mirror of the
        repos, then it contains many packages that aren't needed.

> With versioned tarballs, an update of a single package would make the whole
> tarball change. This makes the history unreadable,

Yes, this could be solved by adding some generated information about the 
tarball in a text file next to it.

> wastes disk space, and many

Maybe we could try some tar options to make xdeltas smaller. Or we could 
also try to put the apt index and the package cache outside of the tar file.

My first discarded idea was to just extract the package cache + apt 
index, store those, then while building generate an apt repo from them 
and use this repo as debootstrap and main apt repo to install those 
packages. That makes handling of repo urls and installing new packages 
difficult as described. But maybe extracting the packages cache and 
index next to the debootstraped root file system might be a good compromise.

> tools (including git) have problems with big files.

Then don't use those tools for handling binary backups, because they 
aren't fit for the job. There is git-annex or btrfs-snapshots or maybe 
create incremental tarballs [1].

>  From the UX perspective, I'd prefer to separate building images from preparing
> debian-mirror if possible. A separate command / task / bitbake run with a var
> set / unset, etc. E.g., bitbake -C createmirror isar-image-base, bitbake -C
> updatemirror isar-image-base, etc.

Ok. In the current RFC patchset this file is created all the time, I 
don't have an issue with changing that.

> Please include user documentation when you provide patches.

After we agreed to something I will document it.

Thanks,
Claudius

[1] https://www.gnu.org/software/tar/manual/html_node/Incremental-Dumps.html

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Idea for implementing reproducible builds
  2018-05-23  8:22   ` Claudius Heine
@ 2018-05-23 11:34     ` Claudius Heine
  2018-06-04 11:48     ` Baurzhan Ismagulov
  1 sibling, 0 replies; 33+ messages in thread
From: Claudius Heine @ 2018-05-23 11:34 UTC (permalink / raw)
  To: isar-users

Hi,

On 2018-05-23 10:22, [ext] Claudius Heine wrote:
>>
>> U3.2. debian-mirror exists. Update chosen packages from upstream into
>>        debian-mirror. E.g., openssl, optionally its dependencies, 
>> optionally its
>>        dependents.
> 
> Currently that means that the apt index needs to be updated partially.
> I don't know if its possible to update this index on a package + 
> dependency level, but I doubt it.
> The result of this is that we need to merge upstream index with our own 
> and pin all other packages to the old version.
> 
> Even if we just create a complete mirror of all debian mirror, updating 
> just one package with its dependencies is a serious scripting effort.
> 
> Because of the complexity involved I would postpone this feature.

About this point. One way to implement this on top of this 
implementation could be:

     1. Use a script that takes a packagename and repository path and
        then generates a list of deb packages that describe the current
        version of this package + its dependencies.
     2. This list of packages can be inserted into a bitbake recipe,
        which downloads those packages, adds them into the isar-apt
        repository and installs them to the root file system.

This way we would be explicit about these partial updates. Of course 
this script that parses the apt repo might become complex, but there 
might be some libraries or tools I currently don't know about to help here.

TBH I'm mainly an Archlinux user, and with Archlinux partial updates 
aren't supported, so I am a bit worried about this feature. It may be 
alright to use this in Debian but I have no idea how good that works in 
practice. So being very explicit when using it might be a good way to do 
this.

---

I also took a look at aptly [1]. What aptly is missing AFAIK is a way to 
operate as an apt caching proxy. I mean by this that the mirror command 
does not download all packages just the index and that the packages will 
be downloaded by apt requesting them instead of using the 'aptly repo 
import' command. Maybe if that can be implemented in aptly that might be 
an alternative to this approach. Partial updates might then be easier to 
do then.

Claudius

[1] https://www.aptly.info

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 0/3] Reproducible build
  2018-05-22 11:55 Idea for implementing reproducible builds Claudius Heine
                   ` (2 preceding siblings ...)
  2018-05-23  6:32 ` [RFC PATCH 0/3] Reproducible build claudius.heine.ext
@ 2018-05-23 13:26 ` claudius.heine.ext
  2018-05-23 13:26 ` [RFC PATCH v2 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives claudius.heine.ext
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: claudius.heine.ext @ 2018-05-23 13:26 UTC (permalink / raw)
  To: isar-users; +Cc: Claudius Heine

From: Claudius Heine <ch@denx.de>

Hi,

Changes from v1:
  - Made the 'isar_bootstrap_tarball' task manual
  - Rebased to current next and added 'do_set_locale' to noexec if
    isar-bootstrap-tarball is used.

Claudius Heine (3):
  meta/isar-bootstrap-helper+dpkg.bbclass: bind mount
    /var/cache/apt/archives
  meta/classes/image: added isar_bootstrap_tarball task
  meta/isar-bootstrap: add 'do_restore_from_tarball' task

 meta/classes/dpkg.bbclass                     |  5 ++++
 meta/classes/image.bbclass                    | 10 +++++++
 meta/classes/isar-bootstrap-helper.bbclass    |  9 +++++-
 .../isar-bootstrap/isar-bootstrap.bb          | 28 ++++++++++++++++++-
 4 files changed, 50 insertions(+), 2 deletions(-)

-- 
2.17.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives
  2018-05-22 11:55 Idea for implementing reproducible builds Claudius Heine
                   ` (3 preceding siblings ...)
  2018-05-23 13:26 ` [RFC PATCH v2 " claudius.heine.ext
@ 2018-05-23 13:26 ` claudius.heine.ext
  2018-05-23 13:26 ` [RFC PATCH v2 2/3] meta/classes/image: added isar_bootstrap_tarball task claudius.heine.ext
  2018-05-23 13:26 ` [RFC PATCH v2 3/3] meta/isar-bootstrap: add 'do_restore_from_tarball' task claudius.heine.ext
  6 siblings, 0 replies; 33+ messages in thread
From: claudius.heine.ext @ 2018-05-23 13:26 UTC (permalink / raw)
  To: isar-users; +Cc: Claudius Heine

From: Claudius Heine <ch@denx.de>

Bind mount the /var/cache/apt/archives directory to the original
isar-bootstrap root file system, so that the cache is shared between all
images based on it. Central package cache is the first step of
reproducible builds.

This should allow faster execution of subsequent builds.

Signed-off-by: Claudius Heine <ch@denx.de>
---
 meta/classes/dpkg.bbclass                  | 5 +++++
 meta/classes/isar-bootstrap-helper.bbclass | 9 ++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/meta/classes/dpkg.bbclass b/meta/classes/dpkg.bbclass
index c8d4ac5..5422b9a 100644
--- a/meta/classes/dpkg.bbclass
+++ b/meta/classes/dpkg.bbclass
@@ -5,6 +5,11 @@ inherit dpkg-base
 
 # Build package from sources using build script
 dpkg_runbuild() {
+    DEBOOTSTRAP_DIR="${DEPLOY_DIR_IMAGE}/isar-bootstrap-${DISTRO}-${DISTRO_ARCH}/"
     E="${@ bb.utils.export_proxies(d)}"
+    mountpoint -q "${BUILDCHROOT_DIR}/var/cache/apt/archives" || \
+        sudo mount --bind \
+            "$DEBOOTSTRAP_DIR/var/cache/apt/archives" \
+            "${BUILDCHROOT_DIR}/var/cache/apt/archives"
     sudo -E chroot ${BUILDCHROOT_DIR} /build.sh ${PP}/${PPS}
 }
diff --git a/meta/classes/isar-bootstrap-helper.bbclass b/meta/classes/isar-bootstrap-helper.bbclass
index 76e20f6..fa68a9f 100644
--- a/meta/classes/isar-bootstrap-helper.bbclass
+++ b/meta/classes/isar-bootstrap-helper.bbclass
@@ -17,8 +17,10 @@ setup_root_file_system() {
               -o Debug::pkgProblemResolver=yes"
     CLEAN_FILES="${ROOTFSDIR}/etc/hostname ${ROOTFSDIR}/etc/resolv.conf"
 
+    DEBOOTSTRAP_DIR="${DEPLOY_DIR_IMAGE}/isar-bootstrap-${DISTRO}-${DISTRO_ARCH}/"
+
     sudo cp -Trpfx \
-        "${DEPLOY_DIR_IMAGE}/isar-bootstrap-${DISTRO}-${DISTRO_ARCH}/" \
+        "$DEBOOTSTRAP_DIR" \
         "$ROOTFSDIR"
 
     echo "deb file:///isar-apt ${DEBDISTRONAME} main" | \
@@ -27,6 +29,10 @@ setup_root_file_system() {
     echo "Package: *\nPin: release n=${DEBDISTRONAME}\nPin-Priority: 1000" | \
         sudo tee "$ROOTFSDIR/etc/apt/preferences.d/isar" >/dev/null
 
+    sudo mount --bind \
+        "$DEBOOTSTRAP_DIR/var/cache/apt/archives" \
+        "$ROOTFSDIR/var/cache/apt/archives"
+
     sudo mount --bind ${DEPLOY_DIR_APT}/${DISTRO} $ROOTFSDIR/isar-apt
     sudo mount -t devtmpfs -o mode=0755,nosuid devtmpfs $ROOTFSDIR/dev
     sudo mount -t proc none $ROOTFSDIR/proc
@@ -55,6 +61,7 @@ setup_root_file_system() {
             /usr/bin/apt-get purge -y ${IMAGE_CFG_PACKAGE}
     fi
     if [ "clean" = ${CLEAN} ]; then
+        sudo umount -l "$ROOTFSDIR/var/cache/apt/archives"
         sudo -E chroot "$ROOTFSDIR" \
             /usr/bin/apt-get autoremove --purge -y
         sudo -E chroot "$ROOTFSDIR" \
-- 
2.17.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 2/3] meta/classes/image: added isar_bootstrap_tarball task
  2018-05-22 11:55 Idea for implementing reproducible builds Claudius Heine
                   ` (4 preceding siblings ...)
  2018-05-23 13:26 ` [RFC PATCH v2 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives claudius.heine.ext
@ 2018-05-23 13:26 ` claudius.heine.ext
  2018-05-23 13:26 ` [RFC PATCH v2 3/3] meta/isar-bootstrap: add 'do_restore_from_tarball' task claudius.heine.ext
  6 siblings, 0 replies; 33+ messages in thread
From: claudius.heine.ext @ 2018-05-23 13:26 UTC (permalink / raw)
  To: isar-users; +Cc: Claudius Heine

From: Claudius Heine <ch@denx.de>

This patch adds the 'isar_bootstrap_tarball' task to the image bbclass.

This task creates a tarball of the isar-bootstrap file system after the
image was generated. This tarball can be later used to regenerate the
image from within isar.

This task is needs to be triggered manually.

Signed-off-by: Claudius Heine <ch@denx.de>
---
 meta/classes/image.bbclass | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/meta/classes/image.bbclass b/meta/classes/image.bbclass
index 3bdcb2f..3702a5f 100644
--- a/meta/classes/image.bbclass
+++ b/meta/classes/image.bbclass
@@ -71,3 +71,13 @@ do_copy_boot_files() {
 addtask copy_boot_files before do_build after do_rootfs
 do_copy_boot_files[dirs] = "${DEPLOY_DIR_IMAGE}"
 do_copy_boot_files[stamp-extra-info] = "${DISTRO}-${MACHINE}"
+
+do_isar_bootstrap_tarball() {
+    DEBOOTSTRAP_DIR="${DEPLOY_DIR_IMAGE}/isar-bootstrap-${DISTRO}-${DISTRO_ARCH}/"
+    ISAR_BOOTSTRAP_TARBALL="${DEPLOY_DIR_IMAGE}/isar-bootstrap-${DISTRO}-${DISTRO_ARCH}-${PN}-${MACHINE}.tgz"
+    sudo tar -czf "${ISAR_BOOTSTRAP_TARBALL}" -C ${DEBOOTSTRAP_DIR} \
+        --exclude='./dev/*' --exclude='./proc/*' --exclude='./sys/*' .
+}
+addtask isar_bootstrap_tarball after do_rootfs
+do_isar_bootstrap_tarball[dirs] = "${DEPLOY_DIR_IMAGE}"
+do_isar_bootstrap_tarball[stamp-extra-info] = "${DISTRO}-${MACHINE}"
-- 
2.17.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 3/3] meta/isar-bootstrap: add 'do_restore_from_tarball' task
  2018-05-22 11:55 Idea for implementing reproducible builds Claudius Heine
                   ` (5 preceding siblings ...)
  2018-05-23 13:26 ` [RFC PATCH v2 2/3] meta/classes/image: added isar_bootstrap_tarball task claudius.heine.ext
@ 2018-05-23 13:26 ` claudius.heine.ext
  6 siblings, 0 replies; 33+ messages in thread
From: claudius.heine.ext @ 2018-05-23 13:26 UTC (permalink / raw)
  To: isar-users; +Cc: Claudius Heine

From: Claudius Heine <ch@denx.de>

This new task 'do_restore_from_tarball' is deactivated by default
("noexec" flag is set) and is only activated if the
'ISAR_BOOTSTRAP_TARBALL' variable is set.

The 'ISAR_BOOTSTRAP_TARBALL' variable should point the the
isar-bootstrap tarball generated by a image-recipe.

If this variable is set, tasks that are normally creating the
isar-bootstrap root file system are deactivated and instead the
'do_restore_from_tarball' task is activated and restores the
isar-bootstrap root file system from the tarball.

Signed-off-by: Claudius Heine <ch@denx.de>
---
 .../isar-bootstrap/isar-bootstrap.bb          | 28 ++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/meta/recipes-core/isar-bootstrap/isar-bootstrap.bb b/meta/recipes-core/isar-bootstrap/isar-bootstrap.bb
index bb3992b..02c09aa 100644
--- a/meta/recipes-core/isar-bootstrap/isar-bootstrap.bb
+++ b/meta/recipes-core/isar-bootstrap/isar-bootstrap.bb
@@ -239,12 +239,38 @@ do_apt_update() {
 }
 addtask apt_update before do_build after do_apt_config_install do_set_locale
 
+python() {
+    if d.getVar("ISAR_BOOTSTRAP_TARBALL", True):
+        d.setVarFlag("do_generate_keyring", "noexec", "1")
+        d.setVarFlag("do_apt_config_prepare", "noexec", "1")
+        d.setVarFlag("do_set_locale", "noexec", "1")
+        d.setVarFlag("do_bootstrap", "noexec", "1")
+        d.setVarFlag("do_apt_config_install", "noexec", "1")
+        d.setVarFlag("do_apt_update", "noexec", "1")
+        d.delVarFlag("do_restore_from_tarball", "noexec")
+}
+
+do_restore_from_tarball[noexec] = "1"
+do_restore_from_tarball[stamp-extra-info] = "${DISTRO}-${DISTRO_ARCH}"
+do_restore_from_tarball() {
+    if [ -e "${ROOTFSDIR}" ]; then
+       sudo umount -l "${ROOTFSDIR}/dev" || true
+       sudo umount -l "${ROOTFSDIR}/proc" || true
+       sudo rm -rf "${ROOTFSDIR}"
+    fi
+    sudo mkdir -p "${ROOTFSDIR}"
+    sudo tar xf "${ISAR_BOOTSTRAP_TARBALL}" -C "${ROOTFSDIR}" 
+    sudo mount -t devtmpfs -o mode=0755,nosuid devtmpfs ${ROOTFSDIR}/dev
+    sudo mount -t proc none ${ROOTFSDIR}/proc
+}
+addtask restore_from_tarball before do_build after do_unpack
+
 do_deploy[stamp-extra-info] = "${DISTRO}-${DISTRO_ARCH}"
 do_deploy[dirs] = "${DEPLOY_DIR_IMAGE}"
 do_deploy() {
     ln -Tfsr "${ROOTFSDIR}" "${DEPLOY_DIR_IMAGE}/${PN}-${DISTRO}-${DISTRO_ARCH}"
 }
-addtask deploy before do_build after do_apt_update
+addtask deploy before do_build after do_apt_update do_restore_from_tarball
 
 CLEANFUNCS = "clean_deploy"
 clean_deploy() {
-- 
2.17.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-05-23  6:32 ` [RFC PATCH 0/3] Reproducible build claudius.heine.ext
                     ` (2 preceding siblings ...)
  2018-05-23  6:32   ` [RFC PATCH 3/3] meta/isar-bootstrap: add 'do_restore_from_tarball' task claudius.heine.ext
@ 2018-05-23 14:30   ` Maxim Yu. Osipov
  2018-05-23 15:20     ` Claudius Heine
  2018-05-24 16:00   ` Henning Schild
  4 siblings, 1 reply; 33+ messages in thread
From: Maxim Yu. Osipov @ 2018-05-23 14:30 UTC (permalink / raw)
  To: claudius.heine.ext, isar-users; +Cc: Claudius Heine

Hi Claudius,

I've looked through discussion thread.

As far as I understood with the proposed approach we don't have
the ability to reproduce this tarball - it contains some unversioned 
snapshot of isar-bootstrap rootfs, containing unversioned snapshot of 
debian's packages cache used to create rootfs. It's fine if you just 
want to reproduce locally the current build from the scratch in your 
sandbox by avoiding debootstrap stage (fetching again packages, etc).

Do you have another use-case scenario in mind?

F.e. to share this tarball with other developers (linked to particular 
version of isar tree) so they can fully reproduce the build?

If yes, how do you plan to version/manage such growing list of tarballs? 
As it was mentioned in the discussion, upgrading one package from debian 
repo will result to other tarball.

Kind regards,
Maxim.

On 05/23/2018 08:32 AM, claudius.heine.ext@siemens.com wrote:

> From: Claudius Heine <ch@denx.de>
> 
> Hi,
> 
> this patchset contains a implementation of my proposed solution for
> reproducible builds.
> 
> I am currenlty not quite sure if that is the right approach, but it is
> the simplest I can think of currently.
> 
> As already described in my proposal, this patchset does the following:
> 
>    1. Takes care that the package cache in the isar-bootstrap root file
>       system contains all the packages used for this distro/architecture.
>    2. A tarball is created after the package cache contains all the
>       packages needed by the image.
>    3. This tarball can be used as the basis of subsequent builds by
>       setting a bitbake variable.
> 
> This is just a first draft of this feature, maybe we can further improve
> some steps and maybe there are better ideas to improve the usability.
> 
> Cheers,
> Claudius
> 
> Claudius Heine (3):
>    meta/isar-bootstrap-helper+dpkg.bbclass: bind mount
>      /var/cache/apt/archives
>    meta/classes/image: added isar_bootstrap_tarball task
>    meta/isar-bootstrap: add 'do_restore_from_tarball' task
> 
>   meta/classes/dpkg.bbclass                     |  5 ++++
>   meta/classes/image.bbclass                    | 10 +++++++
>   meta/classes/isar-bootstrap-helper.bbclass    |  9 ++++++-
>   .../isar-bootstrap/isar-bootstrap.bb          | 27 ++++++++++++++++++-
>   4 files changed, 49 insertions(+), 2 deletions(-)
> 


-- 
Maxim Osipov
ilbers GmbH
Maria-Merian-Str. 8
85521 Ottobrunn
Germany
+49 (151) 6517 6917
mosipov@ilbers.de
http://ilbers.de/
Commercial register Munich, HRB 214197
General Manager: Baurzhan Ismagulov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-05-23 14:30   ` [RFC PATCH 0/3] Reproducible build Maxim Yu. Osipov
@ 2018-05-23 15:20     ` Claudius Heine
  0 siblings, 0 replies; 33+ messages in thread
From: Claudius Heine @ 2018-05-23 15:20 UTC (permalink / raw)
  To: Maxim Yu. Osipov, claudius.heine.ext, isar-users

[-- Attachment #1: Type: text/plain, Size: 3603 bytes --]

Hi Maxim.

On Wed, 2018-05-23 at 16:30 +0200, Maxim Yu. Osipov wrote:
> Hi Claudius,
> 
> I've looked through discussion thread.
> 
> As far as I understood with the proposed approach we don't have
> the ability to reproduce this tarball - it contains some unversioned 
> snapshot of isar-bootstrap rootfs, containing unversioned snapshot
> of 
> debian's packages cache used to create rootfs. It's fine if you just 
> want to reproduce locally the current build from the scratch in your 
> sandbox by avoiding debootstrap stage (fetching again packages, etc).
> 
> Do you have another use-case scenario in mind?
> 
> F.e. to share this tarball with other developers (linked to
> particular 
> version of isar tree) so they can fully reproduce the build?
> 
> If yes, how do you plan to version/manage such growing list of
> tarballs? 
> As it was mentioned in the discussion, upgrading one package from
> debian 
> repo will result to other tarball.

My focus to tackle the reproducibility was two fold:

  1. Output the build input in some form
  2. Allow the build input to be used in the subsequent builds while
     allowing customization e.g. fixes to isar packages.

I choose the form of the output to be just one tarball because its
pretty simple to move it around. To backup it somewhere you can extract
it if you like. Maybe put it into a OSTree repo or just create
incremental tarballs as I described before. How this tarball is
versioned, how and where it is stored was not in the scope of my work
and belongs to the backup mechanism chosen by the users IMO. We can
just point to the files that need backuping. That is how OE does it as
well with the downloads directory. They just don't pack it together,
but they also don't have to worry about a root fs with permissions.

Now to the choice of the 'build input'. Of course it would be great to
make the debootstrap step reproducible as well, but that means I have
to create a repository with the packages. Creating a new repository
from the packages in the cache means that later adding new packages,
that were not part of the cache isn't possible, since they aren't part
of the repository.

My earlier suggestion (months ago) was using a manual controllable
repository proxy. That could be in the form of a http webserver or
proxy. This proxy would fetch and store the index and packages from
upstream. That is serious implementation effort, however.

apt_cacher_ng and aptly are missing important features, that would need
to be implemented there. So we could try to interest those projects to
this and propose patches. And maybe we should do that. Aplty looks
really nice, but I don't know how they stand of creating partial
repositories where the index contains entries that aren't available in
the repo itself.

So to solve this locally in isar was just using what debootstrap
outputs + all the used packages from the cache as the base for the next
build. I agree that this currently isn't the nicest solution possible,
but its pretty simple to implement and could be expanded upon. If this
isn't enough then I would need to look into aptly for example to create
a more extensive solution.

Claudius

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

            PGP key: 6FF2 E59F 00C6 BC28 31D8 64C1 1173 CB19 9808 B153
                              Keyserver: hkp://pool.sks-keyservers.net

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-05-23  6:32 ` [RFC PATCH 0/3] Reproducible build claudius.heine.ext
                     ` (3 preceding siblings ...)
  2018-05-23 14:30   ` [RFC PATCH 0/3] Reproducible build Maxim Yu. Osipov
@ 2018-05-24 16:00   ` Henning Schild
  2018-05-25  8:10     ` Claudius Heine
  4 siblings, 1 reply; 33+ messages in thread
From: Henning Schild @ 2018-05-24 16:00 UTC (permalink / raw)
  To: [ext] claudius.heine.ext@siemens.com; +Cc: isar-users, Claudius Heine

Am Wed, 23 May 2018 08:32:03 +0200
schrieb "[ext] claudius.heine.ext@siemens.com"
<claudius.heine.ext@siemens.com>:

> From: Claudius Heine <ch@denx.de>
> 
> Hi,
> 
> this patchset contains a implementation of my proposed solution for
> reproducible builds.
> 
> I am currenlty not quite sure if that is the right approach, but it is
> the simplest I can think of currently.

I did not look at the patches yet. And because it sounds so simple my
first reaction is that it can not be complete.
One thing we will need for sure is the sources that lead to the
packages we built ourselfs, otherwise we can not rebuild them later on.
And that seems to be a tricky part, not covered by stealing the cache.
Maybe stealing the DLDIR of bitbake as well?

> As already described in my proposal, this patchset does the following:
> 
>   1. Takes care that the package cache in the isar-bootstrap root file
>      system contains all the packages used for this
> distro/architecture. 2. A tarball is created after the package cache
> contains all the packages needed by the image.

Are you sure that "apt-get clean" is the only reason for cache
eviction? What will happen if i install a ton of packages, not that apt
will want to safe space at some point.

Henning

>   3. This tarball can be used as the basis of subsequent builds by
>      setting a bitbake variable.
> 
> This is just a first draft of this feature, maybe we can further
> improve some steps and maybe there are better ideas to improve the
> usability.
> 
> Cheers,
> Claudius
> 
> Claudius Heine (3):
>   meta/isar-bootstrap-helper+dpkg.bbclass: bind mount
>     /var/cache/apt/archives
>   meta/classes/image: added isar_bootstrap_tarball task
>   meta/isar-bootstrap: add 'do_restore_from_tarball' task
> 
>  meta/classes/dpkg.bbclass                     |  5 ++++
>  meta/classes/image.bbclass                    | 10 +++++++
>  meta/classes/isar-bootstrap-helper.bbclass    |  9 ++++++-
>  .../isar-bootstrap/isar-bootstrap.bb          | 27
> ++++++++++++++++++- 4 files changed, 49 insertions(+), 2 deletions(-)
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-05-24 16:00   ` Henning Schild
@ 2018-05-25  8:10     ` Claudius Heine
  2018-05-25 11:57       ` Maxim Yu. Osipov
  0 siblings, 1 reply; 33+ messages in thread
From: Claudius Heine @ 2018-05-25  8:10 UTC (permalink / raw)
  To: Henning Schild, [ext] claudius.heine.ext@siemens.com; +Cc: isar-users


[-- Attachment #1.1: Type: text/plain, Size: 3194 bytes --]

Hi Henning,

On 05/24/2018 06:00 PM, Henning Schild wrote:
> Am Wed, 23 May 2018 08:32:03 +0200
> schrieb "[ext] claudius.heine.ext@siemens.com"
> <claudius.heine.ext@siemens.com>:
> 
>> From: Claudius Heine <ch@denx.de>
>>
>> Hi,
>>
>> this patchset contains a implementation of my proposed solution for
>> reproducible builds.
>>
>> I am currenlty not quite sure if that is the right approach, but it is
>> the simplest I can think of currently.
> 
> I did not look at the patches yet. And because it sounds so simple my
> first reaction is that it can not be complete.> One thing we will need for sure is the sources that lead to the
> packages we built ourselfs, otherwise we can not rebuild them later on.
> And that seems to be a tricky part, not covered by stealing the cache.

You are right, this solution is not complete and Rom was not build on
one day. My goal was to improve the situation just one small step and
then build on top of it.

> Maybe stealing the DLDIR of bitbake as well?
> 
>> As already described in my proposal, this patchset does the following:
>>
>>   1. Takes care that the package cache in the isar-bootstrap root file
>>      system contains all the packages used for this
>> distro/architecture. 2. A tarball is created after the package cache
>> contains all the packages needed by the image.
> 
> Are you sure that "apt-get clean" is the only reason for cache
> eviction? What will happen if i install a ton of packages, not that apt
> will want to safe space at some point.

Yes, I might be useful to set the apt.conf to disable all autocleaning
options.
But normally apt removes packages from cache only if they are no longer
downloadable and since the local index of the upstream repos are not
updated it shouldn't detect if they are no longer downloadable and
therefore not remove them. Disabling this completely is still the better
option.

Claudius

> 
> Henning
> 
>>   3. This tarball can be used as the basis of subsequent builds by
>>      setting a bitbake variable.
>>
>> This is just a first draft of this feature, maybe we can further
>> improve some steps and maybe there are better ideas to improve the
>> usability.
>>
>> Cheers,
>> Claudius
>>
>> Claudius Heine (3):
>>   meta/isar-bootstrap-helper+dpkg.bbclass: bind mount
>>     /var/cache/apt/archives
>>   meta/classes/image: added isar_bootstrap_tarball task
>>   meta/isar-bootstrap: add 'do_restore_from_tarball' task
>>
>>  meta/classes/dpkg.bbclass                     |  5 ++++
>>  meta/classes/image.bbclass                    | 10 +++++++
>>  meta/classes/isar-bootstrap-helper.bbclass    |  9 ++++++-
>>  .../isar-bootstrap/isar-bootstrap.bb          | 27
>> ++++++++++++++++++- 4 files changed, 49 insertions(+), 2 deletions(-)
>>
> 

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

           PGP key: 6FF2 E59F 00C6 BC28 31D8 64C1 1173 CB19 9808 B153
                             Keyserver: hkp://pool.sks-keyservers.net


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-05-25  8:10     ` Claudius Heine
@ 2018-05-25 11:57       ` Maxim Yu. Osipov
  2018-05-25 17:04         ` Claudius Heine
  0 siblings, 1 reply; 33+ messages in thread
From: Maxim Yu. Osipov @ 2018-05-25 11:57 UTC (permalink / raw)
  To: Claudius Heine, Henning Schild, [ext] claudius.heine.ext@siemens.com
  Cc: isar-users

Hi Claudius,

Let me summarize the patchset status.

1. This patchset is just a first step towards more generic traceable 
reproducible build - tarball versioning/reproducibility features are out 
of scope of this patch set.

2. Baurzhan always asks to provide some bits of information into the 
documentation when a new feature is added (or changed).

3. Henning asked about "stealing DL_DIR of bitbake as well" (see his 
email below). What is your opinion?

Kind regards,
Maxim.

On 05/25/2018 10:10 AM, Claudius Heine wrote:
> Hi Henning,
> 
> On 05/24/2018 06:00 PM, Henning Schild wrote:
>> Am Wed, 23 May 2018 08:32:03 +0200
>> schrieb "[ext] claudius.heine.ext@siemens.com"
>> <claudius.heine.ext@siemens.com>:
>>
>>> From: Claudius Heine <ch@denx.de>
>>>
>>> Hi,
>>>
>>> this patchset contains a implementation of my proposed solution for
>>> reproducible builds.
>>>
>>> I am currenlty not quite sure if that is the right approach, but it is
>>> the simplest I can think of currently.
>>
>> I did not look at the patches yet. And because it sounds so simple my
>> first reaction is that it can not be complete.> One thing we will need for sure is the sources that lead to the
>> packages we built ourselfs, otherwise we can not rebuild them later on.
>> And that seems to be a tricky part, not covered by stealing the cache.
> 
> You are right, this solution is not complete and Rom was not build on
> one day. My goal was to improve the situation just one small step and
> then build on top of it.
> 
>> Maybe stealing the DLDIR of bitbake as well?
>>
>>> As already described in my proposal, this patchset does the following:
>>>
>>>    1. Takes care that the package cache in the isar-bootstrap root file
>>>       system contains all the packages used for this
>>> distro/architecture. 2. A tarball is created after the package cache
>>> contains all the packages needed by the image.
>>
>> Are you sure that "apt-get clean" is the only reason for cache
>> eviction? What will happen if i install a ton of packages, not that apt
>> will want to safe space at some point.
> 
> Yes, I might be useful to set the apt.conf to disable all autocleaning
> options.
> But normally apt removes packages from cache only if they are no longer
> downloadable and since the local index of the upstream repos are not
> updated it shouldn't detect if they are no longer downloadable and
> therefore not remove them. Disabling this completely is still the better
> option.
> 
> Claudius
> 
>>
>> Henning
>>
>>>    3. This tarball can be used as the basis of subsequent builds by
>>>       setting a bitbake variable.
>>>
>>> This is just a first draft of this feature, maybe we can further
>>> improve some steps and maybe there are better ideas to improve the
>>> usability.
>>>
>>> Cheers,
>>> Claudius
>>>
>>> Claudius Heine (3):
>>>    meta/isar-bootstrap-helper+dpkg.bbclass: bind mount
>>>      /var/cache/apt/archives
>>>    meta/classes/image: added isar_bootstrap_tarball task
>>>    meta/isar-bootstrap: add 'do_restore_from_tarball' task
>>>
>>>   meta/classes/dpkg.bbclass                     |  5 ++++
>>>   meta/classes/image.bbclass                    | 10 +++++++
>>>   meta/classes/isar-bootstrap-helper.bbclass    |  9 ++++++-
>>>   .../isar-bootstrap/isar-bootstrap.bb          | 27
>>> ++++++++++++++++++- 4 files changed, 49 insertions(+), 2 deletions(-)
>>>
>>
> 


-- 
Maxim Osipov
ilbers GmbH
Maria-Merian-Str. 8
85521 Ottobrunn
Germany
+49 (151) 6517 6917
mosipov@ilbers.de
http://ilbers.de/
Commercial register Munich, HRB 214197
General Manager: Baurzhan Ismagulov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-05-25 11:57       ` Maxim Yu. Osipov
@ 2018-05-25 17:04         ` Claudius Heine
  2018-06-04 11:37           ` Baurzhan Ismagulov
  0 siblings, 1 reply; 33+ messages in thread
From: Claudius Heine @ 2018-05-25 17:04 UTC (permalink / raw)
  To: Maxim Yu. Osipov, Henning Schild, [ext] claudius.heine.ext@siemens.com
  Cc: isar-users

[-- Attachment #1: Type: text/plain, Size: 11726 bytes --]

Hi Maxim,

On Fri, 2018-05-25 at 13:57 +0200, Maxim Yu. Osipov wrote:
> Hi Claudius,
> 
> Let me summarize the patchset status.
> 
> 1. This patchset is just a first step towards more generic traceable 
> reproducible build - tarball versioning/reproducibility features are
> out 
> of scope of this patch set.

It is the first step and a RFC patch. It is mainly meant to support the
discussion about the possible options. So I if we are on the same page
about the solution for reproducibility in Isar and how to go forward
from there I will post a non-RFC patch with documentation.

> 2. Baurzhan always asks to provide some bits of information into the 
> documentation when a new feature is added (or changed).

See answer above.

> 
> 3. Henning asked about "stealing DL_DIR of bitbake as well" (see his 
> email below). What is your opinion?

My opinion is that we should document want files are needed be archived
in order to reproduce the complete build. It is also that we (as in
isar) cannot dictate to the user how she has to archived those
artifacts.

There are many different systems for archiving binary files available
and we can offer them some suggestions or even write some example code
for a couple of systems, but we should still be able to support all of
their own ideas. Those code examples can be in the form of some
shell/python scripts, but I would be against binding isar to one of
such system at this point.

To be honest I didn't 100% get what Henning meant as 'stealing DL_DIR
of bitbake'. I suppose that he meant putting files from there into the
tarball as well? But I currently don't see a good reason for doing so.
My patch was about producing an artifact that isn't covered by the
DL_DIR and the normal reproducibility mechanisms of bitbake, mixing
this just causes redundancy and confusion IMO.
I also skipped answering this questions because I thought I answered
that in the passage before by pointing out that any expansions to this
(like source packages) can be done later.

Just to make it clear: I don't want to shut down discussion about this
by always pointing out to any argument against this solution, that it
can be done later. Hennings and Baurzhans critique points are very
welcome, because they ask: "Can your solution really be expanded to
include those use-cases/requirements?"

For instance this partial update feature is something that is not so
easy to do with this simple mechanism. It would be much easier to do
this if aptly had "proxy caching" support [1] and we would use that to
solve reproducibility in isar. Also Hennings point about source
packages could be done easier with some coding inside an apt repo
proxy/web server. So are we just saving complexity now and get it later
in heaps or are we gaining a simple normal case while having some
hurdles in the odd special one? I don't know yet. Please tell me!

What we have now is a solution space, from a simple solution like this
RFC patch to a possible complex solution with an "apt caching proxy".
Maybe someone can think of a good solution in between or if some
important feature or UX concern requires a more complex approach.

Here are some ideas I have seen mentioned and my opinion on them
including some pros and cons I just came up with. This is from memory,
so please correct me if I remembered something incorrectly:

Idea 0: Store tarball of debootstrap output with filled apt cache and
use that to restore isar-bootstrap.
Critique 0: Thats in short my 'simple solution'
    Pro: simple to implement
    Con: Debootstrap process is not done on every build.
         Archival of a binary root file system is strange.
         How to archive source packages?
           => add apt-get source to the installation process
         How to handle partial update?
           => write a script that generates an isar recipe that deploys
              those packages to the isar-apt repo.

Idea 1: Generate a repository from the cache and use that for the next
debootstrap run.
Critique 1: Similar to my 'simple solution' but adds the creation of an
additional repository to it. -> higher complexity
    Pro: debootstrap process is done on every build.
    Con: Different apt repo urls are used.
           For me that is a no-go, because that means the configuration
           is different between the initial and subsequent builds.
         How to add new packages later? (maybe like partial update?)
         How to handle multiple repos?
           => map all repos from initial run to the local one.
              And then what? => cannot be reverted, loss of information
         How to archive source packages? (same as Idea 0)
         How to handle partial update? (same as Idea 0)

Idea 2: Like idea 1 but with aptly. And then use aptly to manage
packages.
Critique 2: I am not that familiar with aptly, so I please correct me.
    Pro: debootstrap process is done on every build.
         Better repo management tools.
    Con: Different apt repo urls are used.
         Need a whole mirror locally? (See Idea 3 and 4)
         Dependency on external tool.
         Possible some roadblocks since aptly isn't really designed for
         our use case.

Idea 3: Create a whole repo mirror with aptly or similar and strip
unused packages later.
Critique 3:
    Pro: debootstrap process is done on every build.
         Better repo management tools.
    Con: Need a whole mirror locally.
           For me that is a no-go as well, it should only be downloaded
           what is necessary for a build, nothing more.
         Dependency on external tool.
         Adding new packages later is a double step: adding in aptly then to isar         Possible some roadblocks since aptly isn't really designed for
         our use case.

Idea 4: Create a whole repo mirror with aptly or similar and import
used package into a new repo.
Critique 4:
    Pro: debootstrap process is done on every build.
         Better repo management tools.
    Con: Different apt repo urls are used.
         Need a whole mirror locally?
           That might be unnecessary. Per aptly documentation it could
           possible to create a mirror with a package filter to
           only allow used packages. Then this is similar to idea 2.
         Dependency on external tool.
         Possible some roadblocks since aptly isn't really designed for
         our use case.

Idea 5: Implementing a 'caching proxy' feature in aptly.
Critique 5:
    Pro: debootstrap process is done on every build.
         Better repo management tools.
    Con: Dependency on external tool.
         Needs some implementation in aptly.

Idea 6: Implementing a caching proxy feature in isar.
Critique 6: That
was my initial idea way back.
    Pro: debootstrap process is done on
every build.
    Con: Needs a lot of python scripting and code
maintenance in isar.

If I missed or misrepresented an idea, please don't hesitate to correct
me or add them.

Ideas 2 to 4 are just slight variations from another. Those are just
the different ideas I could imagine using aptly for our purposes.

Because of the contra arguments 'whole local mirror' and 'different apt
repo urls are used' I would got for 0 and 5.

Idea 6 I discarded after some experimentation. Writing a async http
proxy with only std-lib python is a pain. However writing blocking code
with thread pools might be easier.

So I think I rambled enough now... Sorry for that.

Cheers for anyone left reading to this point,
Claudius


[1] What I mean by this is that aptly operates as some kine of lazy
fetching http/ftp/rsync/... repo. Any request to a unavailable file is
downloaded from upstream and then stored on the build machine. I don't
mean that aptly should necessarily be a http proxy, could also just be
a web server.

> 
> Kind regards,
> Maxim.
> 
> On 05/25/2018 10:10 AM, Claudius Heine wrote:
> > Hi Henning,
> > 
> > On 05/24/2018 06:00 PM, Henning Schild wrote:
> > > Am Wed, 23 May 2018 08:32:03 +0200
> > > schrieb "[ext] claudius.heine.ext@siemens.com"
> > > <claudius.heine.ext@siemens.com>:
> > > 
> > > > From: Claudius Heine <ch@denx.de>
> > > > 
> > > > Hi,
> > > > 
> > > > this patchset contains a implementation of my proposed solution
> > > > for
> > > > reproducible builds.
> > > > 
> > > > I am currenlty not quite sure if that is the right approach,
> > > > but it is
> > > > the simplest I can think of currently.
> > > 
> > > I did not look at the patches yet. And because it sounds so
> > > simple my
> > > first reaction is that it can not be complete.> One thing we will
> > > need for sure is the sources that lead to the
> > > packages we built ourselfs, otherwise we can not rebuild them
> > > later on.
> > > And that seems to be a tricky part, not covered by stealing the
> > > cache.
> > 
> > You are right, this solution is not complete and Rom was not build
> > on
> > one day. My goal was to improve the situation just one small step
> > and
> > then build on top of it.
> > 
> > > Maybe stealing the DLDIR of bitbake as well?
> > > 
> > > > As already described in my proposal, this patchset does the
> > > > following:
> > > > 
> > > >    1. Takes care that the package cache in the isar-bootstrap
> > > > root file
> > > >       system contains all the packages used for this
> > > > distro/architecture. 2. A tarball is created after the package
> > > > cache
> > > > contains all the packages needed by the image.
> > > 
> > > Are you sure that "apt-get clean" is the only reason for cache
> > > eviction? What will happen if i install a ton of packages, not
> > > that apt
> > > will want to safe space at some point.
> > 
> > Yes, I might be useful to set the apt.conf to disable all
> > autocleaning
> > options.
> > But normally apt removes packages from cache only if they are no
> > longer
> > downloadable and since the local index of the upstream repos are
> > not
> > updated it shouldn't detect if they are no longer downloadable and
> > therefore not remove them. Disabling this completely is still the
> > better
> > option.
> > 
> > Claudius
> > 
> > > 
> > > Henning
> > > 
> > > >    3. This tarball can be used as the basis of subsequent
> > > > builds by
> > > >       setting a bitbake variable.
> > > > 
> > > > This is just a first draft of this feature, maybe we can
> > > > further
> > > > improve some steps and maybe there are better ideas to improve
> > > > the
> > > > usability.
> > > > 
> > > > Cheers,
> > > > Claudius
> > > > 
> > > > Claudius Heine (3):
> > > >    meta/isar-bootstrap-helper+dpkg.bbclass: bind mount
> > > >      /var/cache/apt/archives
> > > >    meta/classes/image: added isar_bootstrap_tarball task
> > > >    meta/isar-bootstrap: add 'do_restore_from_tarball' task
> > > > 
> > > >   meta/classes/dpkg.bbclass                     |  5 ++++
> > > >   meta/classes/image.bbclass                    | 10 +++++++
> > > >   meta/classes/isar-bootstrap-helper.bbclass    |  9 ++++++-
> > > >   .../isar-bootstrap/isar-bootstrap.bb          | 27
> > > > ++++++++++++++++++- 4 files changed, 49 insertions(+), 2
> > > > deletions(-)
> > > > 
> 
> 
-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

            PGP key: 6FF2 E59F 00C6 BC28 31D8 64C1 1173 CB19 9808 B153
                              Keyserver: hkp://pool.sks-keyservers.net

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-05-25 17:04         ` Claudius Heine
@ 2018-06-04 11:37           ` Baurzhan Ismagulov
  2018-06-04 16:05             ` Claudius Heine
  0 siblings, 1 reply; 33+ messages in thread
From: Baurzhan Ismagulov @ 2018-06-04 11:37 UTC (permalink / raw)
  To: isar-users

Hello Claudius,

On Fri, May 25, 2018 at 07:04:53PM +0200, Claudius Heine wrote:
> - Idea 0: Store tarball of debootstrap output with filled apt cache and use
>   that to restore isar-bootstrap.
> - Idea 1: Generate a repository from the cache and use that for the next
>   debootstrap run.
> - Idea 2: Like idea 1 but with aptly. And then use aptly to manage packages.
> - Idea 3: Create a whole repo mirror with aptly or similar and strip unused
>   packages later.
> - Idea 4: Create a whole repo mirror with aptly or similar and import used
>   package into a new repo.
> - Idea 5: Implementing a 'caching proxy' feature in aptly.
> - Idea 6: Implementing a caching proxy feature in isar.

Thanks for summarizing, this makes it easier to communicate.


Some general points first:

* I'm ok with a partial implementation that goes in the right direction.

* I'd really like to see user docs, also in RFC, because UX is a part of the
  design. It shows what use cases the change covers and how it does that.


Regarding the implementation, I think idea 1 is the right way to go. Today, we
operate with pure Debian inputs -- packages and metadata -- to build our
outputs. Debian inputs are what we should store.


> Because of the contra arguments 'whole local mirror' and 'different apt
> repo urls are used' I would got for 0 and 5.

Idea 1 is very similar to your current implementation and is achievable with
dpkg-scanpackages and debootstrapping.

I'm not proposing the whole mirror, just the packages you debootstrap +
dpkg-scanpackages.

Our actual problem is:

1. Getting the list of packages we need.

2. Fetching and managing them locally.

Proxying is a quick approach to avoid solving the problem rather than
addressing it. Also, it wouldn't support all Debian's fetch methods.


> Critique 1: Similar to my 'simple solution' but adds the creation of an
> additional repository to it. -> higher complexity
>     Pro: debootstrap process is done on every build.
>     Con: Different apt repo urls are used.
>            For me that is a no-go, because that means the configuration
>            is different between the initial and subsequent builds.

IIUC, this is also the case with your current implementation. You build without
or with ISAR_BOOTSTRAP_TARBALL. This could be changed to building with or
without e.g. ISAR_BOOTSTRAP_SOURCE containing a complete sources.list line.


>          How to add new packages later? (maybe like partial update?)

With the tarball, you suggest deleting and starting from scratch for now. For
the first step, I'd suggest to limit the usage to that. That is possible with
idea 1, too.

In the future, we'd need some tool. FWIW, I'm currently not aware of a tool
that does both (1) and (2) above or is sufficiently suitable for that. So, I
think we should work with Debian to get introspection on debootstrap and
apt-get and work on the tool for (2). Cooperating with some project would be
nice, but isn't a requirement for me.


>          How to handle multiple repos?
>            => map all repos from initial run to the local one.

Currently, you suggest to use multiple tarballs. With idea 1, you could provide
multiple directories.

FWIW, Alex's implementation [1] did (1) and (2) in a Debian way in a single
repo, without duplication.


>               And then what? => cannot be reverted, loss of information

It doesn't have to be reverted. Maintaining that manually would be
time-consuming, but that is what people are forced to do today anyway. The
feature would ease that burden till partial mirror management is implemented.


So, I'd be happy to review patches with:

1. Package directory as output, dpkg-scanpackages, and debootstrap on every
   build.

2. User docs giving an example of initial usage, versioning, and one update
   (e.g., deleting everything and starting from scratch). I agree that we don't
   impose versioning tools on our users, but we should demonstrate one simple
   working setup, even if with binary artifacts in git.


References:

1. https://groups.google.com/forum/#!topic/isar-users/QQUsVmSaAGk


With kind regards,
Baurzhan.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Idea for implementing reproducible builds
  2018-05-23  8:22   ` Claudius Heine
  2018-05-23 11:34     ` Claudius Heine
@ 2018-06-04 11:48     ` Baurzhan Ismagulov
  1 sibling, 0 replies; 33+ messages in thread
From: Baurzhan Ismagulov @ 2018-06-04 11:48 UTC (permalink / raw)
  To: isar-users

On Wed, May 23, 2018 at 10:22:10AM +0200, Claudius Heine wrote:
> The tarfile has to be versioned outside of the repo, since there is a
> 1-to-many relationship between the source repo commit and the tarball.
> 
> For instance openssl updates would not necessarly mean a new change to the
> repo, just a new build.

This is why I'd like to see user docs to understand your intended use case.
What should be the names of the tarballs (just one simple example how it could
work)? How would you tell the repo to use the new tarball?


> >U3.1. debian-mirror exists. Update all packages from upstream into
> >       debian-mirror.
> 
> Why is that needed?

For updating debian-mirror completely. 


> You could just delete the debian-mirror and then it is recreated with the
> current upstream anyway.

That answers my question and is ok with me for the first step.


> >U3.4. Remove packages not used in any previous commit.
> 
> I am currently not sure what you mean by that. Why would there be packages
> that aren't used in any previous commits?

Bad wording, I meant just "remove unused packages".


> I look into this and came up with some difficulties when using an
> alternative debian-mirror repo that is generated from the used packages:
> 
>     1. You need to change the apt repo urls. Yes multiple ones since we
>        support multi-repos in isar. How are we handling this? Are we
>        throwing stuff from different repos togehter? Or are we creating
>        multiple locale repos for every used repo and then set them back
>        later? Both solutions can cause (un)expected problems.
>        How are we dealing with updates from upstream then?

As answered in the other mail, you could proceed just as now, with separate
package dirs. Managing a single dir would be better. Package dirs solve those
problems as well as tarballs.


>     2. How are we installing additional packages that are currently not
>        part of the debian-mirror? If its just a different repo those
>        packages would not be part of the package index, so those
>        packages would not be available.

Which additional packages do you have in mind? If hello and friends, the
current installation way should not be changed in this step. If you mean
updating debian-mirror, for now e.g. just delete and start from scratch, as
with tarballs, till we have something that is better.


>        If its a complete mirror of the repos, then it contains many packages
>        that aren't needed.

No, not complete mirrors, just the packages we need from Debian.


With kind regards,
Baurzhan.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-06-04 11:37           ` Baurzhan Ismagulov
@ 2018-06-04 16:05             ` Claudius Heine
  2018-06-05 10:42               ` Claudius Heine
  0 siblings, 1 reply; 33+ messages in thread
From: Claudius Heine @ 2018-06-04 16:05 UTC (permalink / raw)
  To: isar-users

Hi Baurzhan,

On 2018-06-04 13:37, Baurzhan Ismagulov wrote:
> Hello Claudius,
> 
> On Fri, May 25, 2018 at 07:04:53PM +0200, Claudius Heine wrote:
>> - Idea 0: Store tarball of debootstrap output with filled apt cache and use
>>    that to restore isar-bootstrap.
>> - Idea 1: Generate a repository from the cache and use that for the next
>>    debootstrap run.
>> - Idea 2: Like idea 1 but with aptly. And then use aptly to manage packages.
>> - Idea 3: Create a whole repo mirror with aptly or similar and strip unused
>>    packages later.
>> - Idea 4: Create a whole repo mirror with aptly or similar and import used
>>    package into a new repo.
>> - Idea 5: Implementing a 'caching proxy' feature in aptly.
>> - Idea 6: Implementing a caching proxy feature in isar.
> 
> Thanks for summarizing, this makes it easier to communicate.
> 
> 
> Some general points first:
> 
> * I'm ok with a partial implementation that goes in the right direction.
> 
> * I'd really like to see user docs, also in RFC, because UX is a part of the
>    design. It shows what use cases the change covers and how it does that.

For me the most detailed documentation to developers is in the commit 
message, cover letter and code and general discussion on the ML. From 
this the developers that review those patches and see how they work and 
how they affect the UX. There should be enough in this understand what a 
patch and patchset provides.
If it doesn't then I would ask the patch creator to go into the further 
details somewhere there.

Other documentation is mostly necessary for new users or people that 
want to catch up or look up something without the need to search for the 
right commit message IMO. Requiring that for RFC patches is a big much 
and slows down the development.

> Regarding the implementation, I think idea 1 is the right way to go. Today, we
> operate with pure Debian inputs -- packages and metadata -- to build our
> outputs. Debian inputs are what we should store.
> 
> 
>> Because of the contra arguments 'whole local mirror' and 'different apt
>> repo urls are used' I would got for 0 and 5.
> 
> Idea 1 is very similar to your current implementation and is achievable with
> dpkg-scanpackages and debootstrapping.
> 
> I'm not proposing the whole mirror, just the packages you debootstrap +
> dpkg-scanpackages.
> 
> Our actual problem is:
> 
> 1. Getting the list of packages we need.
> 
> 2. Fetching and managing them locally.
> 
> Proxying is a quick approach to avoid solving the problem rather than
> addressing it.

I wouldn't call it quick or avoiding solving the issue. First you have 
to implement a proxy first and that takes time and resources and since 
you are solving reproducibility you are addressing the problem.

> Also, it wouldn't support all Debian's fetch methods.

Is supporting other fetch methods really important? I would say that 
supporting only http/https would be enough. FTP is deprecated (at least 
ftp.debian.org disabled FTP AFAIK). Ok rsync might be nice, but thats 
not available in company networks anyway. As for local repos and optical 
mediums, I don't see the reason for it.

Is there a fetch method you would miss particularly?

>> Critique 1: Similar to my 'simple solution' but adds the creation of an
>> additional repository to it. -> higher complexity
>>      Pro: debootstrap process is done on every build.
>>      Con: Different apt repo urls are used.
>>             For me that is a no-go, because that means the configuration
>>             is different between the initial and subsequent builds.
> 
> IIUC, this is also the case with your current implementation. You build without
> or with ISAR_BOOTSTRAP_TARBALL. This could be changed to building with or
> without e.g. ISAR_BOOTSTRAP_SOURCE containing a complete sources.list line.

There is a difference, in one case the root file system is modified in 
the other it isn't.

In my implementation only some steps are skipped and instead the tarball 
is extracted and thats it.

Idea 1 results in a different apt source configuration and resulting in 
a different apt index. Maybe different apt preferences etc. Packages are 
fetched from a different source. There are a lot more variables involved 
in this. That is what I meant with 'configuration is different' not some 
variables in bitbake but a different root file system.

>>           How to add new packages later? (maybe like partial update?)
> 
> With the tarball, you suggest deleting and starting from scratch for now.

I don't think I suggested that.
With idea 0 you can just add some upstream packages to the list, those 
need to be still available on the upstream sources, since the index will 
not be updates. If those aren't availabe then you can add those packages 
to the cache. It has to be the package in the version of the current apt 
index however, since the apt index is like a package-less snapshot of 
the whole consistent debian system.

With idea 1 the you don't really have such a index what package versions 
belong together, so you have to trust the metadata of each package to 
specify the right version ranges.

> For
> the first step, I'd suggest to limit the usage to that. That is possible with
> idea 1, too.

With idea 1 you could add packages to the local repository like you 
would overwrite old packages on a partial update. That was the idea I 
meant here.

> 
> In the future, we'd need some tool. FWIW, I'm currently not aware of a tool
> that does both (1) and (2) above or is sufficiently suitable for that. So, I
> think we should work with Debian to get introspection on debootstrap and
> apt-get and work on the tool for (2). Cooperating with some project would be
> nice, but isn't a requirement for me.

For 1 on debootstrap, you could just:

     apt-cache depends --recurse -i apt ...

change to options and apt configuration to mirror the desired distro and 
arch, cleanup the output a bit, then you have a list.

For 2 you can (and we currently do) use apt-get install --download-only 
or apt-get install --print-uris and fetch them yourselves.

Maybe with some grep finagling you could even get the source repository 
for this.

> 
> 
>>           How to handle multiple repos?
>>             => map all repos from initial run to the local one.
> 
> Currently, you suggest to use multiple tarballs.

No. Where do you get my suggestions from? Not me apparently ;)

You don't need multiple tarballs for multipe debian repos.
That works just out of the box.

> With idea 1, you could provide
> multiple directories.

The mapping is what interests me here. For instance you have most 
packages from debian jessie, some packages from debian stretch, some 
from ubuntu or linuxmint repo and docker from upstream debian docker 
repository and maybe some others. How are you taking care that there are 
no conflicts? That each repo you use has a 1:1 mapping to one repo with 
multiple dictionaries?

Maybe try to create dictionaries while hashing the source uri? Or some 
string replacements? How are you dealing with mirrors of those repos?

> FWIW, Alex's implementation [1] did (1) and (2) in a Debian way in a single
> repo, without duplication.

I didn't review those patches since I was N/A this month. Is there a 
followup in the works?

Also 'Debian way' is misleading, since we would not have this discussion 
if there was a Debian way to solve all our problems. But since there 
isn't we have to build our own way here. We could try to minimized the 
work by using as much as possible already build by the Debian project.

Also using bitbake instead of sbuild, debian-installer and friends is 
pretty much per design not the Debian way ;)

>>                And then what? => cannot be reverted, loss of information
> 
> It doesn't have to be reverted. Maintaining that manually would be
> time-consuming, but that is what people are forced to do today anyway. The
> feature would ease that burden till partial mirror management is implemented.

If we are going that way, maybe we should take a look at apt-move.

Maybe we should restructure the build process a bit?

1. debootstrap uses upstream uri if local cache uri does not exist to 
build a rfs
2. Set the local cache pin prio >1000 in order to prefer any packages 
from there
3. On each recipe, image, buildchroot fill the local cache repo with 
upstream bin and src packages, isar generated still land in isar-apt
    Done with apt-move and apt-cache depends etc.

Maybe integrate apt-move in some additional image tasks for those other 
features like updating or adding packages.

Maybe create this mirror inside the buildchroot? This way we could avoid 
host dependencies and contamination from the start. Any other ideas how 
to handle this comfortably?

I will try to post a small graphic about this soon.

Cheers,
Claudius

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-06-04 16:05             ` Claudius Heine
@ 2018-06-05 10:42               ` Claudius Heine
  2018-06-06  9:17                 ` Claudius Heine
  2018-06-07  8:08                 ` Maxim Yu. Osipov
  0 siblings, 2 replies; 33+ messages in thread
From: Claudius Heine @ 2018-06-05 10:42 UTC (permalink / raw)
  To: isar-users; +Cc: Silvano Cirujano Cuesta

[-- Attachment #1: Type: text/plain, Size: 996 bytes --]

Hi,

On 2018-06-04 18:05, [ext] Claudius Heine wrote:
> I will try to post a small graphic about this soon.

I attached the design Jan and me came up with yesterday.

Hopefully its understandable enough even with all those arrows pointing 
around. I tried to vary the arrow lines and heads to signify 
dependencies/execution order, using of repositories and deploying 
packages to the repositories.

The main point about this is that it will work with apt preferences 
repository pinning and that we will try to move all installed packages 
to the local cache after every step.

I highlighted the new components in the diagram as green and the changed 
components as light green.

If this is the way we want to go then I would try to get a RFC patchset 
started.

Cheers,
Claudius

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

[-- Attachment #2: reproducible-build-design-planuml.svg --]
[-- Type: image/svg+xml, Size: 33591 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-06-05 10:42               ` Claudius Heine
@ 2018-06-06  9:17                 ` Claudius Heine
  2018-06-06 14:20                   ` Claudius Heine
  2018-06-07  8:08                 ` Maxim Yu. Osipov
  1 sibling, 1 reply; 33+ messages in thread
From: Claudius Heine @ 2018-06-06  9:17 UTC (permalink / raw)
  To: isar-users; +Cc: Silvano Cirujano Cuesta

Hi,

On 2018-06-05 12:42, [ext] Claudius Heine wrote:
> Hi,
> 
> On 2018-06-04 18:05, [ext] Claudius Heine wrote:
>> I will try to post a small graphic about this soon.
> 
> I attached the design Jan and me came up with yesterday.
> 
> Hopefully its understandable enough even with all those arrows pointing 
> around. I tried to vary the arrow lines and heads to signify 
> dependencies/execution order, using of repositories and deploying 
> packages to the repositories.
> 
> The main point about this is that it will work with apt preferences 
> repository pinning and that we will try to move all installed packages 
> to the local cache after every step.
> 
> I highlighted the new components in the diagram as green and the changed 
> components as light green.
> 
> If this is the way we want to go then I would try to get a RFC patchset 
> started.

Unfortunately 'apt-move' has some bugs that make usage with debootstrap 
near impossible without some patches.

The required patches to solve this issue were posted to the debian 
mailinglist 5 years ago, so I guess that will not be merged in the near 
future [1]. This means apt-move is pretty much dead.

So we have to find a different solution or need to implement it ourselves.

Claudius

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=662003
-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-06-06  9:17                 ` Claudius Heine
@ 2018-06-06 14:20                   ` Claudius Heine
  2018-06-07  8:50                     ` Baurzhan Ismagulov
  0 siblings, 1 reply; 33+ messages in thread
From: Claudius Heine @ 2018-06-06 14:20 UTC (permalink / raw)
  To: isar-users; +Cc: Silvano Cirujano Cuesta

Hi,

On 2018-06-06 11:17, [ext] Claudius Heine wrote:
> Hi,
> 
> On 2018-06-05 12:42, [ext] Claudius Heine wrote:
>> Hi,
>>
>> On 2018-06-04 18:05, [ext] Claudius Heine wrote:
>>> I will try to post a small graphic about this soon.
>>
>> I attached the design Jan and me came up with yesterday.
>>
>> Hopefully its understandable enough even with all those arrows 
>> pointing around. I tried to vary the arrow lines and heads to signify 
>> dependencies/execution order, using of repositories and deploying 
>> packages to the repositories.
>>
>> The main point about this is that it will work with apt preferences 
>> repository pinning and that we will try to move all installed packages 
>> to the local cache after every step.
>>
>> I highlighted the new components in the diagram as green and the 
>> changed components as light green.
>>
>> If this is the way we want to go then I would try to get a RFC 
>> patchset started.
> 
> Unfortunately 'apt-move' has some bugs that make usage with debootstrap 
> near impossible without some patches.
> 
> The required patches to solve this issue were posted to the debian 
> mailinglist 5 years ago, so I guess that will not be merged in the near 
> future [1]. This means apt-move is pretty much dead.
> 
> So we have to find a different solution or need to implement it ourselves.

I did some further research into the different solutions for creating 
apt repositories:

  0. apt-move
     - Would fit our usage very well, but this is effectively dead and
       cannot be used, because of the unfixed bugs and some missing
       features.
  1. reprepro
     - We are using in in isar already, but development there seems to be
       a bit stale [1].
     - Fetching the corresponding source packages would need to be
       implemented by us.
  2. apt-ftparchive or dpkg-scanpackages
     - Can be used to generate Package and Release files, but does not
       care about creating the pool directory structure
     - We could implement the missing functionality as well as fetching
       the corresponding source packages ourselves.
  3. mini_dinstall
     - Implementation of a repository management daemon supporting the
       official maintenance cycle.
     - Submitting of package to be added to the repository is done with
       .changes files, that are generated by building source packages.
       We would have to generate those files ourselves. But I think
       that this might get awkward to implement around.
  4. aptly
     - Implemented in Go. I am a bit reluctant of using Go projects,
       because my experiences have shown that Go applications have the
       tendency to not care much about reproducibility and stable
       maintenance. (Importing code directly from github urls without
       even enforcing commitids???)
  5. pulp
     - Focused on RPM and Puppet, has DEB support with the Note:
       "WARNING: There may be bugs."

I looked at other products as well, but most are just unmaintained for a 
long time or doesn't solve our issues (mrepo, local-apt-repository).

If we don't find a solution that does not require any implementation 
ourselves, then I would prefer implementing the missing functionality in 
python instead of shell, with the hope that it becomes easier to maintain.

Claudius

[1] https://tracker.debian.org/pkg/reprepro


-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-06-05 10:42               ` Claudius Heine
  2018-06-06  9:17                 ` Claudius Heine
@ 2018-06-07  8:08                 ` Maxim Yu. Osipov
  2018-06-11  8:45                   ` Claudius Heine
  1 sibling, 1 reply; 33+ messages in thread
From: Maxim Yu. Osipov @ 2018-06-07  8:08 UTC (permalink / raw)
  To: Claudius Heine, isar-users; +Cc: Silvano Cirujano Cuesta

Hi Claudius,

As far as I understood, 'apt-move' doesn't fit your requirements.

Nevertheless, diagram illustrating the approach you propose would be be 
very helpful.

As for diagram you attached to email below, honestly, it's rather 
difficult to understand it - too many arrows :(. It's worth to 
additionally describe the steps in text (and put corresponding numbers 
on the picture).

Kind regards,
Maxim.

On 06/05/2018 12:42 PM, Claudius Heine wrote:
> Hi,
> 
> On 2018-06-04 18:05, [ext] Claudius Heine wrote:
>> I will try to post a small graphic about this soon.
> 
> I attached the design Jan and me came up with yesterday.
> 
> Hopefully its understandable enough even with all those arrows pointing 
> around. I tried to vary the arrow lines and heads to signify 
> dependencies/execution order, using of repositories and deploying 
> packages to the repositories.
> 
> The main point about this is that it will work with apt preferences 
> repository pinning and that we will try to move all installed packages 
> to the local cache after every step.
> 
> I highlighted the new components in the diagram as green and the changed 
> components as light green.
> 
> If this is the way we want to go then I would try to get a RFC patchset 
> started.
> 
> Cheers,
> Claudius
> 


-- 
Maxim Osipov
ilbers GmbH
Maria-Merian-Str. 8
85521 Ottobrunn
Germany
+49 (151) 6517 6917
mosipov@ilbers.de
http://ilbers.de/
Commercial register Munich, HRB 214197
General Manager: Baurzhan Ismagulov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-06-06 14:20                   ` Claudius Heine
@ 2018-06-07  8:50                     ` Baurzhan Ismagulov
  0 siblings, 0 replies; 33+ messages in thread
From: Baurzhan Ismagulov @ 2018-06-07  8:50 UTC (permalink / raw)
  To: isar-users

Hello Claudius,

On Wed, Jun 06, 2018 at 04:20:56PM +0200, Claudius Heine wrote:
>  0. apt-move
>  1. reprepro
>  2. apt-ftparchive or dpkg-scanpackages
>  3. mini_dinstall
>  4. aptly
>  5. pulp
> 
> I looked at other products as well, but most are just unmaintained for a
> long time or doesn't solve our issues (mrepo, local-apt-repository).
> 
> If we don't find a solution that does not require any implementation
> ourselves, then I would prefer implementing the missing functionality in
> python instead of shell, with the hope that it becomes easier to maintain.

Thanks for looking at the tools and summarizing, that's very valuable.

Regarding the tool, I'd suggest to continue using reprepro. It does its job
well. If it works, don't fix it.

Regarding the diagram, I'd also like to see some step numbers and a brief
description of those.

With kind regards,
Baurzhan.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-06-07  8:08                 ` Maxim Yu. Osipov
@ 2018-06-11  8:45                   ` Claudius Heine
  2018-06-11 13:51                     ` Claudius Heine
  0 siblings, 1 reply; 33+ messages in thread
From: Claudius Heine @ 2018-06-11  8:45 UTC (permalink / raw)
  To: Maxim Yu. Osipov, isar-users; +Cc: Silvano Cirujano Cuesta

Hi Maxim,

On 2018-06-07 10:08, Maxim Yu. Osipov wrote:
> Hi Claudius,
> 
> As far as I understood, 'apt-move' doesn't fit your requirements.

The documented functionality of apt-move would fit the requirements, but 
since its no longer maintained and has bugs that makes it incompatible 
with debootstrap, it cannot be used.

> Nevertheless, diagram illustrating the approach you propose would be be 
> very helpful.
> 
> As for diagram you attached to email below, honestly, it's rather 
> difficult to understand it - too many arrows :(. It's worth to 
> additionally describe the steps in text (and put corresponding numbers 
> on the picture).

Yes I know its a bit overwhelming with all those arrows. It was much 
worse before and this is the result of much simplifications.

But either way, since we cannot use apt-move I have to investigate what 
is possible with reprepro and change the diagram accordingly.

Claudius

> 
> Kind regards,
> Maxim.
> 
> On 06/05/2018 12:42 PM, Claudius Heine wrote:
>> Hi,
>>
>> On 2018-06-04 18:05, [ext] Claudius Heine wrote:
>>> I will try to post a small graphic about this soon.
>>
>> I attached the design Jan and me came up with yesterday.
>>
>> Hopefully its understandable enough even with all those arrows 
>> pointing around. I tried to vary the arrow lines and heads to signify 
>> dependencies/execution order, using of repositories and deploying 
>> packages to the repositories.
>>
>> The main point about this is that it will work with apt preferences 
>> repository pinning and that we will try to move all installed packages 
>> to the local cache after every step.
>>
>> I highlighted the new components in the diagram as green and the 
>> changed components as light green.
>>
>> If this is the way we want to go then I would try to get a RFC 
>> patchset started.
>>
>> Cheers,
>> Claudius
>>
> 
> 

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-06-11  8:45                   ` Claudius Heine
@ 2018-06-11 13:51                     ` Claudius Heine
  2018-06-14  8:50                       ` Claudius Heine
  0 siblings, 1 reply; 33+ messages in thread
From: Claudius Heine @ 2018-06-11 13:51 UTC (permalink / raw)
  To: Maxim Yu. Osipov, isar-users; +Cc: Silvano Cirujano Cuesta

[-- Attachment #1: Type: text/plain, Size: 4575 bytes --]

Hi,

On 2018-06-11 10:45, [ext] Claudius Heine wrote:
> Hi Maxim,
> 
> On 2018-06-07 10:08, Maxim Yu. Osipov wrote:
>> Hi Claudius,
>>
>> As far as I understood, 'apt-move' doesn't fit your requirements.
> 
> The documented functionality of apt-move would fit the requirements, but 
> since its no longer maintained and has bugs that makes it incompatible 
> with debootstrap, it cannot be used.
> 
>> Nevertheless, diagram illustrating the approach you propose would be 
>> be very helpful.
>>
>> As for diagram you attached to email below, honestly, it's rather 
>> difficult to understand it - too many arrows :(. It's worth to 
>> additionally describe the steps in text (and put corresponding numbers 
>> on the picture).
> 
> Yes I know its a bit overwhelming with all those arrows. It was much 
> worse before and this is the result of much simplifications.
> 
> But either way, since we cannot use apt-move I have to investigate what 
> is possible with reprepro and change the diagram accordingly.

reprepro does provide many nice features that might be interesting to 
use. For instance the 'gensnapshot' command. This caused me to further 
redesign this approach as the diagram attached shows.

I know this kind of diagram does not 100% the UML meaning, but I could 
not find a diagram type that really fits to what I want to show here.

This time I try to explain a bit more about what those arrows mean and 
how to read this diagram.

Dashed lines symbolizes dependencies between steps or 'packages' (in 
this case I mean those 'isar-bootstrap', 'buildchroot', 'recipes' and 
'image'. So if they are followed from the 'debootstrap download' in the 
'isar-bootstrap' package upwards, then this is the execution order. When 
the 'package import' component is reached the next 'package' in the 
dependency graph are started. This is 'buildchroot' or 'image'. Normally 
that is the buildchroot. Only if there are no recieps around that need a 
buildchroot it could be skipped. That means the last step to be executed 
is the 'finished' one.

The arrows going out from those database things, mean that packages are 
used from their. Those arrows have the same arrow head as the dependency 
arrows. In this diagram I made the lines of the arrows that are going 
out from 'local partial upstream mirror' a bit thicker to better 
differentiate them from the other ones.

The arrows that are going to 'to local mirror', 'to isar repository or 
'create local snapshot' nodes have a slightly different arrow head. 
These arrows mean that there are debian packages that are added to repos.

The basic idea it that after every step that installs or upgrades 
packages to the rootfs those packages are added to the local partial 
mirror. This can be done by first doing a download-only step adding it 
to the repo and then doing an install from the repo or the cache.

I also changed how the repo is build. The current idea it to have one 
repository with 2 components. One component for all upstream packages 
and one for all isar built packages. When the snapshot is generated in 
the end it will create one containing both components.

Maybe we should start adding the build time to all files in the deploy 
directory. This way we could add this to the name of the snapshot as 
well, so the association between those is made clear.

What do you think?

best regards,
Claudius

> 
> Claudius
> 
>>
>> Kind regards,
>> Maxim.
>>
>> On 06/05/2018 12:42 PM, Claudius Heine wrote:
>>> Hi,
>>>
>>> On 2018-06-04 18:05, [ext] Claudius Heine wrote:
>>>> I will try to post a small graphic about this soon.
>>>
>>> I attached the design Jan and me came up with yesterday.
>>>
>>> Hopefully its understandable enough even with all those arrows 
>>> pointing around. I tried to vary the arrow lines and heads to signify 
>>> dependencies/execution order, using of repositories and deploying 
>>> packages to the repositories.
>>>
>>> The main point about this is that it will work with apt preferences 
>>> repository pinning and that we will try to move all installed 
>>> packages to the local cache after every step.
>>>
>>> I highlighted the new components in the diagram as green and the 
>>> changed components as light green.
>>>
>>> If this is the way we want to go then I would try to get a RFC 
>>> patchset started.
>>>
>>> Cheers,
>>> Claudius
>>>
>>
>>
> 

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

[-- Attachment #2: reproducible-build-design-planuml.svg --]
[-- Type: image/svg+xml, Size: 46994 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-06-11 13:51                     ` Claudius Heine
@ 2018-06-14  8:50                       ` Claudius Heine
  2018-06-20  4:20                         ` Maxim Yu. Osipov
  0 siblings, 1 reply; 33+ messages in thread
From: Claudius Heine @ 2018-06-14  8:50 UTC (permalink / raw)
  To: Maxim Yu. Osipov, isar-users; +Cc: Silvano Cirujano Cuesta

Hi,

On 2018-06-11 15:51, [ext] Claudius Heine wrote:
> Hi,
> 
> On 2018-06-11 10:45, [ext] Claudius Heine wrote:
>> Hi Maxim,
>>
>> On 2018-06-07 10:08, Maxim Yu. Osipov wrote:
>>> Hi Claudius,
>>>
>>> As far as I understood, 'apt-move' doesn't fit your requirements.
>>
>> The documented functionality of apt-move would fit the requirements, 
>> but since its no longer maintained and has bugs that makes it 
>> incompatible with debootstrap, it cannot be used.
>>
>>> Nevertheless, diagram illustrating the approach you propose would be 
>>> be very helpful.
>>>
>>> As for diagram you attached to email below, honestly, it's rather 
>>> difficult to understand it - too many arrows :(. It's worth to 
>>> additionally describe the steps in text (and put corresponding 
>>> numbers on the picture).
>>
>> Yes I know its a bit overwhelming with all those arrows. It was much 
>> worse before and this is the result of much simplifications.
>>
>> But either way, since we cannot use apt-move I have to investigate 
>> what is possible with reprepro and change the diagram accordingly.
> 
> reprepro does provide many nice features that might be interesting to 
> use. For instance the 'gensnapshot' command. This caused me to further 
> redesign this approach as the diagram attached shows.
> 
> I know this kind of diagram does not 100% the UML meaning, but I could 
> not find a diagram type that really fits to what I want to show here.
> 
> This time I try to explain a bit more about what those arrows mean and 
> how to read this diagram.
> 
> Dashed lines symbolizes dependencies between steps or 'packages' (in 
> this case I mean those 'isar-bootstrap', 'buildchroot', 'recipes' and 
> 'image'. So if they are followed from the 'debootstrap download' in the 
> 'isar-bootstrap' package upwards, then this is the execution order. When 
> the 'package import' component is reached the next 'package' in the 
> dependency graph are started. This is 'buildchroot' or 'image'. Normally 
> that is the buildchroot. Only if there are no recieps around that need a 
> buildchroot it could be skipped. That means the last step to be executed 
> is the 'finished' one.
> 
> The arrows going out from those database things, mean that packages are 
> used from their. Those arrows have the same arrow head as the dependency 
> arrows. In this diagram I made the lines of the arrows that are going 
> out from 'local partial upstream mirror' a bit thicker to better 
> differentiate them from the other ones.
> 
> The arrows that are going to 'to local mirror', 'to isar repository or 
> 'create local snapshot' nodes have a slightly different arrow head. 
> These arrows mean that there are debian packages that are added to repos.
> 
> The basic idea it that after every step that installs or upgrades 
> packages to the rootfs those packages are added to the local partial 
> mirror. This can be done by first doing a download-only step adding it 
> to the repo and then doing an install from the repo or the cache.
> 
> I also changed how the repo is build. The current idea it to have one 
> repository with 2 components. One component for all upstream packages 
> and one for all isar built packages. When the snapshot is generated in 
> the end it will create one containing both components.
> 
> Maybe we should start adding the build time to all files in the deploy 
> directory. This way we could add this to the name of the snapshot as 
> well, so the association between those is made clear.
> 
> What do you think?

This solution has one issue I can currently think of. It doesn't solve this:

 >>> U3.4. Remove packages not used in any previous commit.
 >>
 >> I am currently not sure what you mean by that. Why would there be 
packages
 >> that aren't used in any previous commits?
 >
 > Bad wording, I meant just "remove unused packages".

We could solve that by always creating a fresh repository and add the 
repository from the old build as primary source for the current one.

However, this is getting even more complex and I might need more 
arrows... :/

It would be simple if we didn't need to combine this with updatability 
of selected packages and automated fetching of new packages.

Cheers,
Claudius

> 
> best regards,
> Claudius
> 
>>
>> Claudius
>>
>>>
>>> Kind regards,
>>> Maxim.
>>>
>>> On 06/05/2018 12:42 PM, Claudius Heine wrote:
>>>> Hi,
>>>>
>>>> On 2018-06-04 18:05, [ext] Claudius Heine wrote:
>>>>> I will try to post a small graphic about this soon.
>>>>
>>>> I attached the design Jan and me came up with yesterday.
>>>>
>>>> Hopefully its understandable enough even with all those arrows 
>>>> pointing around. I tried to vary the arrow lines and heads to 
>>>> signify dependencies/execution order, using of repositories and 
>>>> deploying packages to the repositories.
>>>>
>>>> The main point about this is that it will work with apt preferences 
>>>> repository pinning and that we will try to move all installed 
>>>> packages to the local cache after every step.
>>>>
>>>> I highlighted the new components in the diagram as green and the 
>>>> changed components as light green.
>>>>
>>>> If this is the way we want to go then I would try to get a RFC 
>>>> patchset started.
>>>>
>>>> Cheers,
>>>> Claudius
>>>>
>>>
>>>
>>
> 

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-06-14  8:50                       ` Claudius Heine
@ 2018-06-20  4:20                         ` Maxim Yu. Osipov
  2018-06-20  8:12                           ` Claudius Heine
  0 siblings, 1 reply; 33+ messages in thread
From: Maxim Yu. Osipov @ 2018-06-20  4:20 UTC (permalink / raw)
  To: Claudius Heine, isar-users; +Cc: Silvano Cirujano Cuesta

Hi Claudius,

On 06/14/2018 10:50 AM, Claudius Heine wrote:
> Hi,
> 
> On 2018-06-11 15:51, [ext] Claudius Heine wrote:
>> Hi,
>>
>> On 2018-06-11 10:45, [ext] Claudius Heine wrote:
>>> Hi Maxim,
>>>
>>> On 2018-06-07 10:08, Maxim Yu. Osipov wrote:
>>>> Hi Claudius,
>>>>
>>>> As far as I understood, 'apt-move' doesn't fit your requirements.
>>>
>>> The documented functionality of apt-move would fit the requirements, 
>>> but since its no longer maintained and has bugs that makes it 
>>> incompatible with debootstrap, it cannot be used.
>>>
>>>> Nevertheless, diagram illustrating the approach you propose would be 
>>>> be very helpful.
>>>>
>>>> As for diagram you attached to email below, honestly, it's rather 
>>>> difficult to understand it - too many arrows :(. It's worth to 
>>>> additionally describe the steps in text (and put corresponding 
>>>> numbers on the picture).
>>>
>>> Yes I know its a bit overwhelming with all those arrows. It was much 
>>> worse before and this is the result of much simplifications.
>>>
>>> But either way, since we cannot use apt-move I have to investigate 
>>> what is possible with reprepro and change the diagram accordingly.
>>
>> reprepro does provide many nice features that might be interesting to 
>> use. For instance the 'gensnapshot' command. This caused me to further 
>> redesign this approach as the diagram attached shows.
>>
>> I know this kind of diagram does not 100% the UML meaning, but I could 
>> not find a diagram type that really fits to what I want to show here.
>>
>> This time I try to explain a bit more about what those arrows mean and 
>> how to read this diagram.
>>
>> Dashed lines symbolizes dependencies between steps or 'packages' (in 
>> this case I mean those 'isar-bootstrap', 'buildchroot', 'recipes' and 
>> 'image'. So if they are followed from the 'debootstrap download' in 
>> the 'isar-bootstrap' package upwards, then this is the execution 
>> order. When the 'package import' component is reached the next 
>> 'package' in the dependency graph are started. This is 'buildchroot' 
>> or 'image'. Normally that is the buildchroot. Only if there are no 
>> recieps around that need a buildchroot it could be skipped. That means 
>> the last step to be executed is the 'finished' one.
>>
>> The arrows going out from those database things, mean that packages 
>> are used from their. Those arrows have the same arrow head as the 
>> dependency arrows. In this diagram I made the lines of the arrows that 
>> are going out from 'local partial upstream mirror' a bit thicker to 
>> better differentiate them from the other ones.
>>
>> The arrows that are going to 'to local mirror', 'to isar repository or 
>> 'create local snapshot' nodes have a slightly different arrow head. 
>> These arrows mean that there are debian packages that are added to repos.
>>
>> The basic idea it that after every step that installs or upgrades 
>> packages to the rootfs those packages are added to the local partial 
>> mirror. This can be done by first doing a download-only step adding it 
>> to the repo and then doing an install from the repo or the cache.
>>
>> I also changed how the repo is build. The current idea it to have one 
>> repository with 2 components. One component for all upstream packages 
>> and one for all isar built packages. When the snapshot is generated in 
>> the end it will create one containing both components.
>>
>> Maybe we should start adding the build time to all files in the deploy 
>> directory. This way we could add this to the name of the snapshot as 
>> well, so the association between those is made clear.
>>
>> What do you think?
> 
> This solution has one issue I can currently think of. It doesn't solve 
> this:
> 
>  >>> U3.4. Remove packages not used in any previous commit.
>  >>
>  >> I am currently not sure what you mean by that. Why would there be 
> packages
>  >> that aren't used in any previous commits?
>  >
>  > Bad wording, I meant just "remove unused packages".
> 
> We could solve that by always creating a fresh repository and add the 
> repository from the old build as primary source for the current one.
> 
> However, this is getting even more complex and I might need more 
> arrows... :/
> 
> It would be simple if we didn't need to combine this with updatability 
> of selected packages and automated fetching of new packages.

I agree with you - we may drop this use case as this is a kind of 
overkill which makes design more complex.

I need some clarification on your diagram on the box 'apt upgrade'. The 
comment to 'apt upgrade' states "prefers local mirror over upstream".

Do I understand correctly that as soon as we created our local partial 
mirror it will be only updated as a side effect of installation of build 
dependencies (buildchroot or package) or if the package listed in 
image's IMAGE_PREINSTALL is not in the local repo?

So we don't call 'apt upgrade' over upstream apt repos anymore, right?

It would be nice for understanding of how local mirroring works is you 
describe the case when we need a package which is not present in our 
local mirror (just imagine the case that we stick for a long time to our 
local mirror and we add a new package to IMAGE_PREINSTALL which is not 
in our local mirror and this package depends on the updated versions of 
upstream packages).

Kind regards,
Maxim.

> Cheers,
> Claudius
> 
>>
>> best regards,
>> Claudius
>>
>>>
>>> Claudius
>>>
>>>>
>>>> Kind regards,
>>>> Maxim.
>>>>
>>>> On 06/05/2018 12:42 PM, Claudius Heine wrote:
>>>>> Hi,
>>>>>
>>>>> On 2018-06-04 18:05, [ext] Claudius Heine wrote:
>>>>>> I will try to post a small graphic about this soon.
>>>>>
>>>>> I attached the design Jan and me came up with yesterday.
>>>>>
>>>>> Hopefully its understandable enough even with all those arrows 
>>>>> pointing around. I tried to vary the arrow lines and heads to 
>>>>> signify dependencies/execution order, using of repositories and 
>>>>> deploying packages to the repositories.
>>>>>
>>>>> The main point about this is that it will work with apt preferences 
>>>>> repository pinning and that we will try to move all installed 
>>>>> packages to the local cache after every step.
>>>>>
>>>>> I highlighted the new components in the diagram as green and the 
>>>>> changed components as light green.
>>>>>
>>>>> If this is the way we want to go then I would try to get a RFC 
>>>>> patchset started.
>>>>>
>>>>> Cheers,
>>>>> Claudius
>>>>>
>>>>
>>>>
>>>
>>
> 


-- 
Maxim Osipov
ilbers GmbH
Maria-Merian-Str. 8
85521 Ottobrunn
Germany
+49 (151) 6517 6917
mosipov@ilbers.de
http://ilbers.de/
Commercial register Munich, HRB 214197
General Manager: Baurzhan Ismagulov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/3] Reproducible build
  2018-06-20  4:20                         ` Maxim Yu. Osipov
@ 2018-06-20  8:12                           ` Claudius Heine
  0 siblings, 0 replies; 33+ messages in thread
From: Claudius Heine @ 2018-06-20  8:12 UTC (permalink / raw)
  To: Maxim Yu. Osipov, isar-users; +Cc: Silvano Cirujano Cuesta

Hi Maxim,

On 2018-06-20 06:20, Maxim Yu. Osipov wrote:
> Hi Claudius,
> 
> On 06/14/2018 10:50 AM, Claudius Heine wrote:
>> Hi,
>>
>> On 2018-06-11 15:51, [ext] Claudius Heine wrote:
>>> Hi,
>>>
>>> On 2018-06-11 10:45, [ext] Claudius Heine wrote:
>>>> Hi Maxim,
>>>>
>>>> On 2018-06-07 10:08, Maxim Yu. Osipov wrote:
>>>>> Hi Claudius,
>>>>>
>>>>> As far as I understood, 'apt-move' doesn't fit your requirements.
>>>>
>>>> The documented functionality of apt-move would fit the requirements, 
>>>> but since its no longer maintained and has bugs that makes it 
>>>> incompatible with debootstrap, it cannot be used.
>>>>
>>>>> Nevertheless, diagram illustrating the approach you propose would 
>>>>> be be very helpful.
>>>>>
>>>>> As for diagram you attached to email below, honestly, it's rather 
>>>>> difficult to understand it - too many arrows :(. It's worth to 
>>>>> additionally describe the steps in text (and put corresponding 
>>>>> numbers on the picture).
>>>>
>>>> Yes I know its a bit overwhelming with all those arrows. It was much 
>>>> worse before and this is the result of much simplifications.
>>>>
>>>> But either way, since we cannot use apt-move I have to investigate 
>>>> what is possible with reprepro and change the diagram accordingly.
>>>
>>> reprepro does provide many nice features that might be interesting to 
>>> use. For instance the 'gensnapshot' command. This caused me to 
>>> further redesign this approach as the diagram attached shows.
>>>
>>> I know this kind of diagram does not 100% the UML meaning, but I 
>>> could not find a diagram type that really fits to what I want to show 
>>> here.
>>>
>>> This time I try to explain a bit more about what those arrows mean 
>>> and how to read this diagram.
>>>
>>> Dashed lines symbolizes dependencies between steps or 'packages' (in 
>>> this case I mean those 'isar-bootstrap', 'buildchroot', 'recipes' and 
>>> 'image'. So if they are followed from the 'debootstrap download' in 
>>> the 'isar-bootstrap' package upwards, then this is the execution 
>>> order. When the 'package import' component is reached the next 
>>> 'package' in the dependency graph are started. This is 'buildchroot' 
>>> or 'image'. Normally that is the buildchroot. Only if there are no 
>>> recieps around that need a buildchroot it could be skipped. That 
>>> means the last step to be executed is the 'finished' one.
>>>
>>> The arrows going out from those database things, mean that packages 
>>> are used from their. Those arrows have the same arrow head as the 
>>> dependency arrows. In this diagram I made the lines of the arrows 
>>> that are going out from 'local partial upstream mirror' a bit thicker 
>>> to better differentiate them from the other ones.
>>>
>>> The arrows that are going to 'to local mirror', 'to isar repository 
>>> or 'create local snapshot' nodes have a slightly different arrow 
>>> head. These arrows mean that there are debian packages that are added 
>>> to repos.
>>>
>>> The basic idea it that after every step that installs or upgrades 
>>> packages to the rootfs those packages are added to the local partial 
>>> mirror. This can be done by first doing a download-only step adding 
>>> it to the repo and then doing an install from the repo or the cache.
>>>
>>> I also changed how the repo is build. The current idea it to have one 
>>> repository with 2 components. One component for all upstream packages 
>>> and one for all isar built packages. When the snapshot is generated 
>>> in the end it will create one containing both components.
>>>
>>> Maybe we should start adding the build time to all files in the 
>>> deploy directory. This way we could add this to the name of the 
>>> snapshot as well, so the association between those is made clear.
>>>
>>> What do you think?
>>
>> This solution has one issue I can currently think of. It doesn't solve 
>> this:
>>
>>  >>> U3.4. Remove packages not used in any previous commit.
>>  >>
>>  >> I am currently not sure what you mean by that. Why would there be 
>> packages
>>  >> that aren't used in any previous commits?
>>  >
>>  > Bad wording, I meant just "remove unused packages".
>>
>> We could solve that by always creating a fresh repository and add the 
>> repository from the old build as primary source for the current one.
>>
>> However, this is getting even more complex and I might need more 
>> arrows... :/
>>
>> It would be simple if we didn't need to combine this with updatability 
>> of selected packages and automated fetching of new packages.
> 
> I agree with you - we may drop this use case as this is a kind of 
> overkill which makes design more complex.
> 
> I need some clarification on your diagram on the box 'apt upgrade'. The 
> comment to 'apt upgrade' states "prefers local mirror over upstream".
> 
> Do I understand correctly that as soon as we created our local partial 
> mirror it will be only updated as a side effect of installation of build 
> dependencies (buildchroot or package) or if the package listed in 
> image's IMAGE_PREINSTALL is not in the local repo?
> 
> So we don't call 'apt upgrade' over upstream apt repos anymore, right?

So the basic idea is that the local partial mirror is given a priority 
of 1001 via apt preferences. So calling an unrestricted `apt update` and 
`apt upgrade` can be done anytime in the build. `apt upgrade` would 
prefer installing any package from the local partial mirror over any 
upstream distribution even if upstream has a more recent version of that 
package.

Only packages that are not available in the local partial mirror would 
be fetched from upstream, since it has to get them from somewhere. After 
that though this new package is added to the local partial mirror, so 
any builds after that would fetch the same version of the package from 
the local partial mirror instead of the upstream.

> It would be nice for understanding of how local mirroring works is you 
> describe the case when we need a package which is not present in our 
> local mirror (just imagine the case that we stick for a long time to our 
> local mirror and we add a new package to IMAGE_PREINSTALL which is not 
> in our local mirror and this package depends on the updated versions of 
> upstream packages).

Ok, I try to describe some scenarios that might help understand what I 
imagine:

1. Adding a package to an old build

   - Add the package to the IMAGE_PREINSTALL list
   - Start build
     - Everything until the image recipe runs the same, since all
       packages that are used by isar-bootstrap and buildchroot are the
       same and can be fetched from the local partial mirror.
       Only difference is that the image contains an updated index of the
       upstream repositories.
     - In the image recipe a new package is installed, that is not
       available in the local partial mirror. That image is then fetched
       from the upstream mirror and installed.
       - That could cause a conflict with the old versions of the rest of
         the system and hopefully lead to an error, requiring manual
         steps to add an old but compatible package to be added to the
         local partial mirror by hand.
         I don't think this kind of stuff can be prevented or automated
         by isar reasonably.
     - The newly installed package is installed to the local partial
       mirror, making it available for subsequent builds.

2. Removing a package from an old build

   This requires some changes of the current system. A new 'local partial
   mirror repo' needs to be created at every build and filled with
   packages needed for that specific build. This way no unused packages
   find their way into the repo.
   An old local partial mirror repo will used as an immutable package
   repository with the 1001 priority.
   This approach borrows from concepts like copy-on-write and functional
   programming.

   - Remove the package from the IMAGE_PREINSTALL list
   - Start build
     - Everything is done like the build before while using the local
       partial mirror repository from the build before as a base.
       The removed package is not installed to the image or added to the
       fresh local partial mirror repo.

3. Updating a package from an old build

    The goal here is to get an updated version of a package to the image
    and by extension to the local partial mirror repo.
    To solve this depends of where this package is originating.

    If its a specific version the developer wants, it might be better to
    download that package add add it to isar via a recipe. Since the
    apt-isar repo, containing the packages 'build' by isar, should have
    a higher priority than the local partial mirror repository, the
    specific version would be preferred over any version from upstream
    or the local partial mirror repo.
    To solve this, we might need to implement some helper scripts outside
    of bitbake build. Similar to the `bitbake-layers` script.

    If the developer generally wants to use *some* version of the package
    that is currently used by upstream, pinning rules could be used for
    one run and then disabled again, just to make sure that those
    packages are now in the local partial mirror repo.

    The other solution for both of those is to modify the local partial
    mirror outside of the bitbake build. This is the way I showed in the
    diagram as well. I am a bit hesitant to do that tough, since it
    modifies something that is used for reproducible builds outside of
    the build.

    Maybe if we have the changes I described in point 2, we could just
    add those packages to the newly created local partial mirror cache,
    but then this needs to be part of the build as well. This would
    result in these priorities:

      1. local partial mirror repo from previous build: 1001
      2. local partial mirror repo from current build:  1002
      3. isar apt repo for current isar built packages: 1003

    So what I meant was adding the updated packages to 2 before the
    build is started. And I am hesitant about adding packages to 1,
    because that modifies stuff from a previous build.

    If updating certain packages from upstream should be done within the
    build, then it could be done between setting the sources.list entry
    and apt-preferences. This way those packages will be fetched from
    upstream if its more up-to-date there and then added to the local
    partial mirror cache.

    In any-case tough this could lead to a 'local partial mirror repo'
    that uses a mix of many different upstream repos snapshots, where it
    can be obscure as to *why* specific versions are used.

The first 2 points behave similar when adding/removing build or runtime 
dependencies, then they might just apply to the buildchroot as well.

regards,
Claudius

> 
> Kind regards,
> Maxim.
> 
>> Cheers,
>> Claudius
>>
>>>
>>> best regards,
>>> Claudius
>>>
>>>>
>>>> Claudius
>>>>
>>>>>
>>>>> Kind regards,
>>>>> Maxim.
>>>>>
>>>>> On 06/05/2018 12:42 PM, Claudius Heine wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 2018-06-04 18:05, [ext] Claudius Heine wrote:
>>>>>>> I will try to post a small graphic about this soon.
>>>>>>
>>>>>> I attached the design Jan and me came up with yesterday.
>>>>>>
>>>>>> Hopefully its understandable enough even with all those arrows 
>>>>>> pointing around. I tried to vary the arrow lines and heads to 
>>>>>> signify dependencies/execution order, using of repositories and 
>>>>>> deploying packages to the repositories.
>>>>>>
>>>>>> The main point about this is that it will work with apt 
>>>>>> preferences repository pinning and that we will try to move all 
>>>>>> installed packages to the local cache after every step.
>>>>>>
>>>>>> I highlighted the new components in the diagram as green and the 
>>>>>> changed components as light green.
>>>>>>
>>>>>> If this is the way we want to go then I would try to get a RFC 
>>>>>> patchset started.
>>>>>>
>>>>>> Cheers,
>>>>>> Claudius
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
> 
> 

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2018-06-20  8:12 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-22 11:55 Idea for implementing reproducible builds Claudius Heine
2018-05-22 13:47 ` Andreas Reichel
2018-05-22 14:24   ` Claudius Heine
2018-05-22 22:32 ` Baurzhan Ismagulov
2018-05-23  8:22   ` Claudius Heine
2018-05-23 11:34     ` Claudius Heine
2018-06-04 11:48     ` Baurzhan Ismagulov
2018-05-23  6:32 ` [RFC PATCH 0/3] Reproducible build claudius.heine.ext
2018-05-23  6:32   ` [RFC PATCH 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives claudius.heine.ext
2018-05-23  6:32   ` [RFC PATCH 2/3] meta/classes/image: added isar_bootstrap_tarball task claudius.heine.ext
2018-05-23  6:32   ` [RFC PATCH 3/3] meta/isar-bootstrap: add 'do_restore_from_tarball' task claudius.heine.ext
2018-05-23 14:30   ` [RFC PATCH 0/3] Reproducible build Maxim Yu. Osipov
2018-05-23 15:20     ` Claudius Heine
2018-05-24 16:00   ` Henning Schild
2018-05-25  8:10     ` Claudius Heine
2018-05-25 11:57       ` Maxim Yu. Osipov
2018-05-25 17:04         ` Claudius Heine
2018-06-04 11:37           ` Baurzhan Ismagulov
2018-06-04 16:05             ` Claudius Heine
2018-06-05 10:42               ` Claudius Heine
2018-06-06  9:17                 ` Claudius Heine
2018-06-06 14:20                   ` Claudius Heine
2018-06-07  8:50                     ` Baurzhan Ismagulov
2018-06-07  8:08                 ` Maxim Yu. Osipov
2018-06-11  8:45                   ` Claudius Heine
2018-06-11 13:51                     ` Claudius Heine
2018-06-14  8:50                       ` Claudius Heine
2018-06-20  4:20                         ` Maxim Yu. Osipov
2018-06-20  8:12                           ` Claudius Heine
2018-05-23 13:26 ` [RFC PATCH v2 " claudius.heine.ext
2018-05-23 13:26 ` [RFC PATCH v2 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives claudius.heine.ext
2018-05-23 13:26 ` [RFC PATCH v2 2/3] meta/classes/image: added isar_bootstrap_tarball task claudius.heine.ext
2018-05-23 13:26 ` [RFC PATCH v2 3/3] meta/isar-bootstrap: add 'do_restore_from_tarball' task claudius.heine.ext

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox