From mboxrd@z Thu Jan 1 00:00:00 1970 X-GM-THRID: 6558372643829972992 X-Received: by 2002:a2e:3e05:: with SMTP id l5-v6mr1227540lja.33.1528128342595; Mon, 04 Jun 2018 09:05:42 -0700 (PDT) X-BeenThere: isar-users@googlegroups.com Received: by 2002:a19:7014:: with SMTP id h20-v6ls715987lfc.6.gmail; Mon, 04 Jun 2018 09:05:42 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKiN5nAP+wqvxpIJEqSiXu9apT4Ev9nquJY1ydrTPiWKNSr5TCfzIskY/MMGHGE56w+wsvL X-Received: by 2002:a19:d34e:: with SMTP id k75-v6mr903925lfg.26.1528128342105; Mon, 04 Jun 2018 09:05:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528128342; cv=none; d=google.com; s=arc-20160816; b=hqs4UORpgzeNrur2DZ1V8MAVN9hS5sHR4Qwfsrjn6Haax61iQQZ0S9HEqAd3ZKlKUD /rdJHXCTSbFP2OrBWkHa/KVLMszE0UrD9H/PccbWzI0JDWghTza7lN/4F+CCW55zFgmY KkkiZZ4jYv8RI6uRU7gIM3Tjfeok6f2XO/gktOIYrXagGsOldRjj6BbWYmWVZQrP+j17 gPx4uvlwbAvlCS3MX7lTECt2rV/NZPCW3gCXbNqgxYR12zS8lvmbcquJ8HnFpiREPAV+ 4+8GJkQBV1E/nFMHEkli3f4nyGQD10WCSeigQzVUu0JKyU/IaIHEWKxoqW0DIkWsDUfM CPug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:to:subject :arc-authentication-results; bh=RUoarLSxaV21Dvsfu4raFI/WXK/mGXwXT5SfXVmi0Qs=; b=LVgujwpogGXILaNElLZs1e+sLD5SArJ1WRJ0n+o8HLu9pBG8GC+SIvZoa+6bsZmMfS Fj6Mf87MyOCw0PqTuinaDLKWPqumv1fAwZvlGN79BywoKYaXq8NgLGaRZIgpS6Q8d1I8 O5cDRT97uT+va3ejf3DQuL7F75/WuDHsJMpDAOaNq5wdqxjMUJTK8EQ66BTvHW8+8zfE gDAVBd9apQ9dDyEtMafP3o+WWippYcFymfvA20FRvfxBusEcYPweCuypuVsxwlVOn81u NcrQumySzMVrZk0OboXFptUkWG6p+ufneqxBOr0FEfWZl0xkpkKMb8YeOUrr7agFlrp+ uasw== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of claudius.heine.ext@siemens.com designates 194.138.37.39 as permitted sender) smtp.mailfrom=claudius.heine.ext@siemens.com Return-Path: Received: from lizzard.sbs.de (lizzard.sbs.de. [194.138.37.39]) by gmr-mx.google.com with ESMTPS id v13-v6si1889790lji.0.2018.06.04.09.05.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Jun 2018 09:05:42 -0700 (PDT) Received-SPF: pass (google.com: domain of claudius.heine.ext@siemens.com designates 194.138.37.39 as permitted sender) client-ip=194.138.37.39; Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of claudius.heine.ext@siemens.com designates 194.138.37.39 as permitted sender) smtp.mailfrom=claudius.heine.ext@siemens.com Received: from mail1.sbs.de (mail1.sbs.de [192.129.41.35]) by lizzard.sbs.de (8.15.2/8.15.2) with ESMTPS id w54G5fmF027828 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 4 Jun 2018 18:05:41 +0200 Received: from [139.25.69.69] (linux-ses-ext02.ppmd.siemens.net [139.25.69.69]) by mail1.sbs.de (8.15.2/8.15.2) with ESMTP id w54G5fv8021635 for ; Mon, 4 Jun 2018 18:05:41 +0200 Subject: Re: [RFC PATCH 0/3] Reproducible build To: isar-users@googlegroups.com References: <3467a5ec-182e-8c9a-cd19-7ad898323be7@siemens.com> <20180523063206.29180-1-claudius.heine.ext@siemens.com> <20180524180027.09b7b880@md1pvb1c.ad001.siemens.net> <3a6032fee718de6cf44fff4e8051a8c7a89a6471.camel@denx.de> <20180604113736.GD5657@yssyq.radix50.net> From: Claudius Heine Message-ID: <3431e051-cdc2-1ba6-8d8f-c426679c6954@siemens.com> Date: Mon, 4 Jun 2018 18:05:41 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180604113736.GD5657@yssyq.radix50.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TUID: hJPlvExwG+vd Hi Baurzhan, On 2018-06-04 13:37, Baurzhan Ismagulov wrote: > Hello Claudius, > > On Fri, May 25, 2018 at 07:04:53PM +0200, Claudius Heine wrote: >> - Idea 0: Store tarball of debootstrap output with filled apt cache and use >> that to restore isar-bootstrap. >> - Idea 1: Generate a repository from the cache and use that for the next >> debootstrap run. >> - Idea 2: Like idea 1 but with aptly. And then use aptly to manage packages. >> - Idea 3: Create a whole repo mirror with aptly or similar and strip unused >> packages later. >> - Idea 4: Create a whole repo mirror with aptly or similar and import used >> package into a new repo. >> - Idea 5: Implementing a 'caching proxy' feature in aptly. >> - Idea 6: Implementing a caching proxy feature in isar. > > Thanks for summarizing, this makes it easier to communicate. > > > Some general points first: > > * I'm ok with a partial implementation that goes in the right direction. > > * I'd really like to see user docs, also in RFC, because UX is a part of the > design. It shows what use cases the change covers and how it does that. For me the most detailed documentation to developers is in the commit message, cover letter and code and general discussion on the ML. From this the developers that review those patches and see how they work and how they affect the UX. There should be enough in this understand what a patch and patchset provides. If it doesn't then I would ask the patch creator to go into the further details somewhere there. Other documentation is mostly necessary for new users or people that want to catch up or look up something without the need to search for the right commit message IMO. Requiring that for RFC patches is a big much and slows down the development. > Regarding the implementation, I think idea 1 is the right way to go. Today, we > operate with pure Debian inputs -- packages and metadata -- to build our > outputs. Debian inputs are what we should store. > > >> Because of the contra arguments 'whole local mirror' and 'different apt >> repo urls are used' I would got for 0 and 5. > > Idea 1 is very similar to your current implementation and is achievable with > dpkg-scanpackages and debootstrapping. > > I'm not proposing the whole mirror, just the packages you debootstrap + > dpkg-scanpackages. > > Our actual problem is: > > 1. Getting the list of packages we need. > > 2. Fetching and managing them locally. > > Proxying is a quick approach to avoid solving the problem rather than > addressing it. I wouldn't call it quick or avoiding solving the issue. First you have to implement a proxy first and that takes time and resources and since you are solving reproducibility you are addressing the problem. > Also, it wouldn't support all Debian's fetch methods. Is supporting other fetch methods really important? I would say that supporting only http/https would be enough. FTP is deprecated (at least ftp.debian.org disabled FTP AFAIK). Ok rsync might be nice, but thats not available in company networks anyway. As for local repos and optical mediums, I don't see the reason for it. Is there a fetch method you would miss particularly? >> Critique 1: Similar to my 'simple solution' but adds the creation of an >> additional repository to it. -> higher complexity >> Pro: debootstrap process is done on every build. >> Con: Different apt repo urls are used. >> For me that is a no-go, because that means the configuration >> is different between the initial and subsequent builds. > > IIUC, this is also the case with your current implementation. You build without > or with ISAR_BOOTSTRAP_TARBALL. This could be changed to building with or > without e.g. ISAR_BOOTSTRAP_SOURCE containing a complete sources.list line. There is a difference, in one case the root file system is modified in the other it isn't. In my implementation only some steps are skipped and instead the tarball is extracted and thats it. Idea 1 results in a different apt source configuration and resulting in a different apt index. Maybe different apt preferences etc. Packages are fetched from a different source. There are a lot more variables involved in this. That is what I meant with 'configuration is different' not some variables in bitbake but a different root file system. >> How to add new packages later? (maybe like partial update?) > > With the tarball, you suggest deleting and starting from scratch for now. I don't think I suggested that. With idea 0 you can just add some upstream packages to the list, those need to be still available on the upstream sources, since the index will not be updates. If those aren't availabe then you can add those packages to the cache. It has to be the package in the version of the current apt index however, since the apt index is like a package-less snapshot of the whole consistent debian system. With idea 1 the you don't really have such a index what package versions belong together, so you have to trust the metadata of each package to specify the right version ranges. > For > the first step, I'd suggest to limit the usage to that. That is possible with > idea 1, too. With idea 1 you could add packages to the local repository like you would overwrite old packages on a partial update. That was the idea I meant here. > > In the future, we'd need some tool. FWIW, I'm currently not aware of a tool > that does both (1) and (2) above or is sufficiently suitable for that. So, I > think we should work with Debian to get introspection on debootstrap and > apt-get and work on the tool for (2). Cooperating with some project would be > nice, but isn't a requirement for me. For 1 on debootstrap, you could just: apt-cache depends --recurse -i apt ... change to options and apt configuration to mirror the desired distro and arch, cleanup the output a bit, then you have a list. For 2 you can (and we currently do) use apt-get install --download-only or apt-get install --print-uris and fetch them yourselves. Maybe with some grep finagling you could even get the source repository for this. > > >> How to handle multiple repos? >> => map all repos from initial run to the local one. > > Currently, you suggest to use multiple tarballs. No. Where do you get my suggestions from? Not me apparently ;) You don't need multiple tarballs for multipe debian repos. That works just out of the box. > With idea 1, you could provide > multiple directories. The mapping is what interests me here. For instance you have most packages from debian jessie, some packages from debian stretch, some from ubuntu or linuxmint repo and docker from upstream debian docker repository and maybe some others. How are you taking care that there are no conflicts? That each repo you use has a 1:1 mapping to one repo with multiple dictionaries? Maybe try to create dictionaries while hashing the source uri? Or some string replacements? How are you dealing with mirrors of those repos? > FWIW, Alex's implementation [1] did (1) and (2) in a Debian way in a single > repo, without duplication. I didn't review those patches since I was N/A this month. Is there a followup in the works? Also 'Debian way' is misleading, since we would not have this discussion if there was a Debian way to solve all our problems. But since there isn't we have to build our own way here. We could try to minimized the work by using as much as possible already build by the Debian project. Also using bitbake instead of sbuild, debian-installer and friends is pretty much per design not the Debian way ;) >> And then what? => cannot be reverted, loss of information > > It doesn't have to be reverted. Maintaining that manually would be > time-consuming, but that is what people are forced to do today anyway. The > feature would ease that burden till partial mirror management is implemented. If we are going that way, maybe we should take a look at apt-move. Maybe we should restructure the build process a bit? 1. debootstrap uses upstream uri if local cache uri does not exist to build a rfs 2. Set the local cache pin prio >1000 in order to prefer any packages from there 3. On each recipe, image, buildchroot fill the local cache repo with upstream bin and src packages, isar generated still land in isar-apt Done with apt-move and apt-cache depends etc. Maybe integrate apt-move in some additional image tasks for those other features like updating or adding packages. Maybe create this mirror inside the buildchroot? This way we could avoid host dependencies and contamination from the start. Any other ideas how to handle this comfortably? I will try to post a small graphic about this soon. Cheers, Claudius -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de