public inbox for isar-users@googlegroups.com
 help / color / mirror / Atom feed
From: Claudius Heine <claudius.heine.ext@siemens.com>
To: isar-users@googlegroups.com
Subject: Re: [RFC PATCH 0/3] Reproducible build
Date: Mon, 4 Jun 2018 18:05:41 +0200	[thread overview]
Message-ID: <3431e051-cdc2-1ba6-8d8f-c426679c6954@siemens.com> (raw)
In-Reply-To: <20180604113736.GD5657@yssyq.radix50.net>

Hi Baurzhan,

On 2018-06-04 13:37, Baurzhan Ismagulov wrote:
> Hello Claudius,
> 
> On Fri, May 25, 2018 at 07:04:53PM +0200, Claudius Heine wrote:
>> - Idea 0: Store tarball of debootstrap output with filled apt cache and use
>>    that to restore isar-bootstrap.
>> - Idea 1: Generate a repository from the cache and use that for the next
>>    debootstrap run.
>> - Idea 2: Like idea 1 but with aptly. And then use aptly to manage packages.
>> - Idea 3: Create a whole repo mirror with aptly or similar and strip unused
>>    packages later.
>> - Idea 4: Create a whole repo mirror with aptly or similar and import used
>>    package into a new repo.
>> - Idea 5: Implementing a 'caching proxy' feature in aptly.
>> - Idea 6: Implementing a caching proxy feature in isar.
> 
> Thanks for summarizing, this makes it easier to communicate.
> 
> 
> Some general points first:
> 
> * I'm ok with a partial implementation that goes in the right direction.
> 
> * I'd really like to see user docs, also in RFC, because UX is a part of the
>    design. It shows what use cases the change covers and how it does that.

For me the most detailed documentation to developers is in the commit 
message, cover letter and code and general discussion on the ML. From 
this the developers that review those patches and see how they work and 
how they affect the UX. There should be enough in this understand what a 
patch and patchset provides.
If it doesn't then I would ask the patch creator to go into the further 
details somewhere there.

Other documentation is mostly necessary for new users or people that 
want to catch up or look up something without the need to search for the 
right commit message IMO. Requiring that for RFC patches is a big much 
and slows down the development.

> Regarding the implementation, I think idea 1 is the right way to go. Today, we
> operate with pure Debian inputs -- packages and metadata -- to build our
> outputs. Debian inputs are what we should store.
> 
> 
>> Because of the contra arguments 'whole local mirror' and 'different apt
>> repo urls are used' I would got for 0 and 5.
> 
> Idea 1 is very similar to your current implementation and is achievable with
> dpkg-scanpackages and debootstrapping.
> 
> I'm not proposing the whole mirror, just the packages you debootstrap +
> dpkg-scanpackages.
> 
> Our actual problem is:
> 
> 1. Getting the list of packages we need.
> 
> 2. Fetching and managing them locally.
> 
> Proxying is a quick approach to avoid solving the problem rather than
> addressing it.

I wouldn't call it quick or avoiding solving the issue. First you have 
to implement a proxy first and that takes time and resources and since 
you are solving reproducibility you are addressing the problem.

> Also, it wouldn't support all Debian's fetch methods.

Is supporting other fetch methods really important? I would say that 
supporting only http/https would be enough. FTP is deprecated (at least 
ftp.debian.org disabled FTP AFAIK). Ok rsync might be nice, but thats 
not available in company networks anyway. As for local repos and optical 
mediums, I don't see the reason for it.

Is there a fetch method you would miss particularly?

>> Critique 1: Similar to my 'simple solution' but adds the creation of an
>> additional repository to it. -> higher complexity
>>      Pro: debootstrap process is done on every build.
>>      Con: Different apt repo urls are used.
>>             For me that is a no-go, because that means the configuration
>>             is different between the initial and subsequent builds.
> 
> IIUC, this is also the case with your current implementation. You build without
> or with ISAR_BOOTSTRAP_TARBALL. This could be changed to building with or
> without e.g. ISAR_BOOTSTRAP_SOURCE containing a complete sources.list line.

There is a difference, in one case the root file system is modified in 
the other it isn't.

In my implementation only some steps are skipped and instead the tarball 
is extracted and thats it.

Idea 1 results in a different apt source configuration and resulting in 
a different apt index. Maybe different apt preferences etc. Packages are 
fetched from a different source. There are a lot more variables involved 
in this. That is what I meant with 'configuration is different' not some 
variables in bitbake but a different root file system.

>>           How to add new packages later? (maybe like partial update?)
> 
> With the tarball, you suggest deleting and starting from scratch for now.

I don't think I suggested that.
With idea 0 you can just add some upstream packages to the list, those 
need to be still available on the upstream sources, since the index will 
not be updates. If those aren't availabe then you can add those packages 
to the cache. It has to be the package in the version of the current apt 
index however, since the apt index is like a package-less snapshot of 
the whole consistent debian system.

With idea 1 the you don't really have such a index what package versions 
belong together, so you have to trust the metadata of each package to 
specify the right version ranges.

> For
> the first step, I'd suggest to limit the usage to that. That is possible with
> idea 1, too.

With idea 1 you could add packages to the local repository like you 
would overwrite old packages on a partial update. That was the idea I 
meant here.

> 
> In the future, we'd need some tool. FWIW, I'm currently not aware of a tool
> that does both (1) and (2) above or is sufficiently suitable for that. So, I
> think we should work with Debian to get introspection on debootstrap and
> apt-get and work on the tool for (2). Cooperating with some project would be
> nice, but isn't a requirement for me.

For 1 on debootstrap, you could just:

     apt-cache depends --recurse -i apt ...

change to options and apt configuration to mirror the desired distro and 
arch, cleanup the output a bit, then you have a list.

For 2 you can (and we currently do) use apt-get install --download-only 
or apt-get install --print-uris and fetch them yourselves.

Maybe with some grep finagling you could even get the source repository 
for this.

> 
> 
>>           How to handle multiple repos?
>>             => map all repos from initial run to the local one.
> 
> Currently, you suggest to use multiple tarballs.

No. Where do you get my suggestions from? Not me apparently ;)

You don't need multiple tarballs for multipe debian repos.
That works just out of the box.

> With idea 1, you could provide
> multiple directories.

The mapping is what interests me here. For instance you have most 
packages from debian jessie, some packages from debian stretch, some 
from ubuntu or linuxmint repo and docker from upstream debian docker 
repository and maybe some others. How are you taking care that there are 
no conflicts? That each repo you use has a 1:1 mapping to one repo with 
multiple dictionaries?

Maybe try to create dictionaries while hashing the source uri? Or some 
string replacements? How are you dealing with mirrors of those repos?

> FWIW, Alex's implementation [1] did (1) and (2) in a Debian way in a single
> repo, without duplication.

I didn't review those patches since I was N/A this month. Is there a 
followup in the works?

Also 'Debian way' is misleading, since we would not have this discussion 
if there was a Debian way to solve all our problems. But since there 
isn't we have to build our own way here. We could try to minimized the 
work by using as much as possible already build by the Debian project.

Also using bitbake instead of sbuild, debian-installer and friends is 
pretty much per design not the Debian way ;)

>>                And then what? => cannot be reverted, loss of information
> 
> It doesn't have to be reverted. Maintaining that manually would be
> time-consuming, but that is what people are forced to do today anyway. The
> feature would ease that burden till partial mirror management is implemented.

If we are going that way, maybe we should take a look at apt-move.

Maybe we should restructure the build process a bit?

1. debootstrap uses upstream uri if local cache uri does not exist to 
build a rfs
2. Set the local cache pin prio >1000 in order to prefer any packages 
from there
3. On each recipe, image, buildchroot fill the local cache repo with 
upstream bin and src packages, isar generated still land in isar-apt
    Done with apt-move and apt-cache depends etc.

Maybe integrate apt-move in some additional image tasks for those other 
features like updating or adding packages.

Maybe create this mirror inside the buildchroot? This way we could avoid 
host dependencies and contamination from the start. Any other ideas how 
to handle this comfortably?

I will try to post a small graphic about this soon.

Cheers,
Claudius

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

  reply	other threads:[~2018-06-04 16:05 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-22 11:55 Idea for implementing reproducible builds Claudius Heine
2018-05-22 13:47 ` Andreas Reichel
2018-05-22 14:24   ` Claudius Heine
2018-05-22 22:32 ` Baurzhan Ismagulov
2018-05-23  8:22   ` Claudius Heine
2018-05-23 11:34     ` Claudius Heine
2018-06-04 11:48     ` Baurzhan Ismagulov
2018-05-23  6:32 ` [RFC PATCH 0/3] Reproducible build claudius.heine.ext
2018-05-23  6:32   ` [RFC PATCH 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives claudius.heine.ext
2018-05-23  6:32   ` [RFC PATCH 2/3] meta/classes/image: added isar_bootstrap_tarball task claudius.heine.ext
2018-05-23  6:32   ` [RFC PATCH 3/3] meta/isar-bootstrap: add 'do_restore_from_tarball' task claudius.heine.ext
2018-05-23 14:30   ` [RFC PATCH 0/3] Reproducible build Maxim Yu. Osipov
2018-05-23 15:20     ` Claudius Heine
2018-05-24 16:00   ` Henning Schild
2018-05-25  8:10     ` Claudius Heine
2018-05-25 11:57       ` Maxim Yu. Osipov
2018-05-25 17:04         ` Claudius Heine
2018-06-04 11:37           ` Baurzhan Ismagulov
2018-06-04 16:05             ` Claudius Heine [this message]
2018-06-05 10:42               ` Claudius Heine
2018-06-06  9:17                 ` Claudius Heine
2018-06-06 14:20                   ` Claudius Heine
2018-06-07  8:50                     ` Baurzhan Ismagulov
2018-06-07  8:08                 ` Maxim Yu. Osipov
2018-06-11  8:45                   ` Claudius Heine
2018-06-11 13:51                     ` Claudius Heine
2018-06-14  8:50                       ` Claudius Heine
2018-06-20  4:20                         ` Maxim Yu. Osipov
2018-06-20  8:12                           ` Claudius Heine
2018-05-23 13:26 ` [RFC PATCH v2 " claudius.heine.ext
2018-05-23 13:26 ` [RFC PATCH v2 1/3] meta/isar-bootstrap-helper+dpkg.bbclass: bind mount /var/cache/apt/archives claudius.heine.ext
2018-05-23 13:26 ` [RFC PATCH v2 2/3] meta/classes/image: added isar_bootstrap_tarball task claudius.heine.ext
2018-05-23 13:26 ` [RFC PATCH v2 3/3] meta/isar-bootstrap: add 'do_restore_from_tarball' task claudius.heine.ext

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3431e051-cdc2-1ba6-8d8f-c426679c6954@siemens.com \
    --to=claudius.heine.ext@siemens.com \
    --cc=isar-users@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox