public inbox for isar-users@googlegroups.com
 help / color / mirror / Atom feed
From: Claudius Heine <ch@denx.de>
To: "[ext] Christian Storm" <christian.storm@siemens.com>,
	 isar-users@googlegroups.com
Subject: Re: Reproducibility of builds
Date: Fri, 17 Nov 2017 19:14:55 +0100	[thread overview]
Message-ID: <1510942495.3306.107.camel@denx.de> (raw)
In-Reply-To: <20171117165358.zyl7jjsu3rxutyod@MD1KR9XC.ww002.siemens.net>

[-- Attachment #1: Type: text/plain, Size: 7829 bytes --]

Hi,

On Fri, 2017-11-17 at 17:53 +0100, [ext] Christian Storm wrote:
> > > since I'm very interested in this feature, I'd like to resume
> > > this
> > > discussion and to eventually come to an agreed upon proposal on
> > > how
> > > to implement it. So, without further ado, here are my thoughts on
> > > the subject:
> > > 
> > > Regardless of the concrete technical implementation, I guess we
> > > can
> > > agree on the need for a local cache/repository/store in which the
> > > Debian
> > > binary packages plus their sources have to be stored since one
> > > may not
> > > rely on the availability of those files online for eternity.
> > > 
> > > These files in this cache/repository/store are the union of the
> > > Debian
> > > binary packages installed in the resulting image plus their
> > > sources as
> > > well as those installed in the buildchroot plus their sources.
> > > The latter is required to be able to rebuild Debian packages
> > > built from
> > > source with the same compiler version, libraries, -dev packages,
> > > etc. pp.
> > > 
> > > Having the cache/repository/store at hand, there should be a
> > > mechanism
> > > to prime Isar with it, i.e., Isar should only and exclusively use
> > > Debian
> > > binary packages and sources from this cache/repository/store.
> > > This is again, irrespective of the technical implementation, be
> > > it via
> > > a repository cache or other means like git, a proxy server or
> > > whatsoever.
> > > 
> > > Granted, if one changes, e.g, IMAGE_INSTALL_append, the build
> > > fails but
> > > does so rightfully as the set of packages is modified, resulting
> > > in a
> > > new version/epoch (=set of Debian packages plus their sources).
> > > So,
> > > there should be a convenient "interface" provided by Isar to
> > > maintain
> > > the cache/repository/store. For example, one may want to have
> > > different
> > > versions/epochs that may correspond to particular versions (git
> > > sha) of
> > > the Isar layer. Or one wants to later add a Debian package plus
> > > its
> > > source (which is automatically fetched), resulting in a new
> > > version/epoch etc.
> > > 
> > > The remaining question is how to fill the cache/repository/store.
> > > In
> > > order to have a consistent version/epoch (=set of Debian packages
> > > plus
> > > their sources), there should not be duplicate packages in it,
> > > i.e., the
> > > same Debian package but with different versions.
> > > This could currently happen because there is a "window of
> > > vulnerability":
> > > multistrap is run twice, once for isar-image-base.bb and once for
> > > buildchroot.bb. In between those two runs, the Debian mirror used
> > > could
> > > get updated, resulting in a different version of the Debian
> > > package
> > > being installed in buildchroot than in the resulting image.
> > > This is an inherent problem of relying on the Debian way of
> > > distributing
> > > packages as one cannot a priori control what particular package
> > > versions
> > > one gets: In contrast to, e.g., Yocto where the particular
> > > package
> > > versions are specified in the recipes, this does not hold for
> > > Isar as
> > > the particular package versions are defined by the Debian mirror
> > > used,
> > > hence, one gets "injected" the particular package versions.
> > > So, what's required to reduce the "window of vulnerability" and
> > > to have
> > > a consistent cache/repository/store for a particular
> > > version/epoch is to
> > > make a snapshot-type download of the required packages. For this,
> > > of
> > > course, one needs to know the concrete set of packages. This list
> > > could
> > > be delivered by a "package trace" Isar run since not only
> > > multistrap
> > > does install packages but sprinkled apt-get install commands do
> > > as well.
> > > Thereafter, knowing the list, the snapshot-type download can
> > > happen,
> > > hopefully resulting in a consistent cache/repository/store.
> > > 
> > > 
> > > So, what do you think?
> > 
> > I agree with your formulation of the problem here.
> > 
> > Simple tracing of installed packages will have the problem you 
> > described, that its possible that different versions of a package
> > are 
> > installed into buildchroot and image. So this trace needs to be
> > cleaned 
> > up and then based on that the whole process has to be started again
> > to 
> > create a consistent package list between buildchroot and image.
> > This 
> > doubles the build time in the trivial implementation.
> 
> Sure, there's no free lunch here :)
> I'd rather strive for a good solution and avoid trivial
> implementations
> to make lunch as close to free as it gets, to stay in the picture.
> 
> 
> > With my suggestion of using a caching proxy, this could be solved 
> > without any additional overhead.
> 
> Could be the case, what are the drawbacks?

More complexity and stuff to implement. Also maybe download speed.

>  What proxy do you propose to
> use?

I was at first going with my own standalone proxy implementation in
pure stdlib python, so that it could be completely integrated into
isar. I had a very simple solution ready rather quickly, but it was
only synchronous and as such could only handle one connection at a
time. Instead of just throwing more threads at it, I wanted to go the
asyncio route. Sadly the python stdlib does not provide a http
implementation for asyncio. I wasn't clear how to proceed from here
further (aiohttp dependency or minimal own http implementation).

The other idea is to just use a ready made apt caching proxy like apt-
cache-ng. But here I am unsure if its flexible enough to use in our
case. Starting it multiple times in parallel with different ports for
different caches and only user privileges might be possible but I
suspect that seperating the pool and the dists folder (pool should go
to DL_DIR while dists is part of the TMP_DIR) could be more difficult.

>  Maybe I missed something on the proxy suggestion.. Could you
> please elaborate on this?

As for the integration the basic idea was that for taged bitbake tasks
the proxy is started and sets the *_PROXY environment variables. This
should be doable with some mods to the base.bbclass and some external
python scripts.

> 
> 
> > I do have other ideas to do this, but that would restructure most
> > of isar.
> 
> Well, at least speaking for myself, I'd like to hear those as I
> consider
> this feature to be essential. Choice in solutions is always good :)
> 

One idea that I got when I first investigated isar, was trying to be oe
compatible as much as possible. So using this idea would solve the
reproducable builds as well:

Basically implementing debootstrap with bitbake recipes that are
created virtually on runtime by downloading and parsing the
'dists/*/*/*/Packages.gz' file.

I suppose it should be possible to fetch the Packages file at an early
parsing step in a bitbake build, if its not already preset, and fill
the bitbake data store with recipe definitions that fetch those binary
deb packages, have the appropriate dependencies and install them into
the root file system.

However, this idea is still in the brain storming phase.

Since that would involve a very big redesign I don't think its feasible
currently.

Cheers,
Claudius

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

            PGP key: 6FF2 E59F 00C6 BC28 31D8 64C1 1173 CB19 9808 B153
                              Keyserver: hkp://pool.sks-keyservers.net

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2017-11-17 18:15 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-03  8:13 Claudius Heine
2017-08-21 11:23 ` Claudius Heine
2017-08-28 11:27   ` Claudius Heine
2017-09-05 10:05     ` Alexander Smirnov
2017-09-05 10:38       ` Jan Kiszka
2017-09-05 11:50         ` Alexander Smirnov
2017-09-05 11:54       ` Claudius Heine
2017-09-06 13:39         ` Claudius Heine
2017-09-18 15:05 ` Baurzhan Ismagulov
2017-09-19  8:55   ` Claudius Heine
2017-11-14 16:04 ` Christian Storm
2017-11-14 16:22   ` Claudius Heine
2017-11-17 16:53     ` [ext] Christian Storm
2017-11-17 18:14       ` Claudius Heine [this message]
2017-11-20  8:33         ` [ext] Christian Storm
2017-11-20  9:16           ` Claudius Heine
2017-11-29 18:53             ` Alexander Smirnov
2017-11-29 19:02               ` Jan Kiszka
2017-11-30  8:04                 ` Alexander Smirnov
2017-11-30 14:48                   ` Jan Kiszka
2017-11-30  9:31               ` Claudius Heine
2017-12-06 16:21                 ` Alexander Smirnov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1510942495.3306.107.camel@denx.de \
    --to=ch@denx.de \
    --cc=christian.storm@siemens.com \
    --cc=isar-users@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox