From mboxrd@z Thu Jan 1 00:00:00 1970 X-GM-THRID: 6449958519840964608 X-Received: by 10.46.71.198 with SMTP id u189mr235886lja.14.1504705200820; Wed, 06 Sep 2017 06:40:00 -0700 (PDT) X-BeenThere: isar-users@googlegroups.com Received: by 10.28.47.207 with SMTP id v198ls1130807wmv.10.gmail; Wed, 06 Sep 2017 06:40:00 -0700 (PDT) X-Google-Smtp-Source: ADKCNb4vmZ3gbizrEAuYwQD1SOcESzGpmP3Fq7NZepD8dJuttQrxtpHcjMzxnO+r6qEIYRu8vdLM X-Received: by 10.28.47.206 with SMTP id v197mr219608wmv.10.1504705200411; Wed, 06 Sep 2017 06:40:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1504705200; cv=none; d=google.com; s=arc-20160816; b=rIGBjOKjasl2PI9+RKkmw39+tDsu7knccpIL9wKghqfFVjVlf/bmfVyQC2suSJCWuo 4/zcA26w4EeUuE64tBLeU+cnQF/GGr5M60yaE1SMQ5qkFh36edXxKDVdVk1XYdc/mSJg 2Hj7HyQ0WxaLioJl9MCgnV8ckZ9muUzwJPA00KL0fmBEeqZRQFRuC3lRL5cCAgZId4fw WCODvNYA3G6NHVK88fDbzTNWv34Bhd11eCk8++OfQJ0Jtst+4dVuVIT4sa6+saSy9qM0 TaEDVl01nJ1vuB6Q5cKlYQqxGjYvQzCk8k2n8Q+BxbKrl3Q/bX+NM46/8EeSaZ+0UYaI vmpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:references:cc:to:from:subject :arc-authentication-results; bh=7F0kJVi7/5hjFsjZGFB6mubnlg0q9OCS9czCSzmxAgc=; b=zJDm5/klnickB/93gQhPtHDP+dR9WEVw3SawMg4arvngFQTqJGTFjp1rDg6qxar15r jrPsNF6ngXvO2OBunSy6xMW199Ux3ozTc5egXnT268iMToDkll8JyPaMQAlBGWd8NWGw f7vw475xMHxPvy2yUYv9Y5RKQbBtEw0E/12+GCPQJd6TjNO4b7SUYA0o+DcB4XJq9BMx LPH3zGkoVR+YylCQgMH0baf3rvgrHu/rgLZZ5uoyTi42h5UfBZlO6wvhiu4HJQJNJaTk MKoafrbvqYS/ShR7wxjU1WexD8PuE0I4UoJK9p7b4pK4Z56gbbbSPvWqM9IfbIXpSAN6 k2vw== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=neutral (google.com: 194.138.37.40 is neither permitted nor denied by best guess record for domain of claudius.heine.ext@siemens.com) smtp.mailfrom=claudius.heine.ext@siemens.com Return-Path: Received: from gecko.sbs.de (gecko.sbs.de. [194.138.37.40]) by gmr-mx.google.com with ESMTPS id 7si125875wme.7.2017.09.06.06.40.00 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Sep 2017 06:40:00 -0700 (PDT) Received-SPF: neutral (google.com: 194.138.37.40 is neither permitted nor denied by best guess record for domain of claudius.heine.ext@siemens.com) client-ip=194.138.37.40; Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 194.138.37.40 is neither permitted nor denied by best guess record for domain of claudius.heine.ext@siemens.com) smtp.mailfrom=claudius.heine.ext@siemens.com Received: from mail2.sbs.de (mail2.sbs.de [192.129.41.66]) by gecko.sbs.de (8.15.2/8.15.2) with ESMTPS id v86Ddm1h024757 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 6 Sep 2017 15:39:48 +0200 Received: from [139.25.68.223] (linux-ses-ext02.ppmd.siemens.net [139.25.68.223]) by mail2.sbs.de (8.15.2/8.15.2) with ESMTP id v86Ddm4e003227; Wed, 6 Sep 2017 15:39:48 +0200 Subject: Re: Reproducibility of builds From: Claudius Heine To: Alexander Smirnov , isar-users@googlegroups.com Cc: Alexander Smirnov , Baurzhan Ismagulov , Henning Schild References: <42b9bc93-5192-a62a-1e79-19cd572e7c03@siemens.com> <9b39df31-1549-0397-f52a-8643cbc9fcc4@siemens.com> <8d0d7df3-7e3c-06a7-597c-6151d2cdbaad@siemens.com> Message-ID: Date: Wed, 6 Sep 2017 15:39:48 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <8d0d7df3-7e3c-06a7-597c-6151d2cdbaad@siemens.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TUID: s+C2+TtdyMQx Hi, On 09/05/2017 01:54 PM, [ext] Claudius Heine wrote: > Hi, > > On 09/05/2017 12:05 PM, Alexander Smirnov wrote: >> >> >> On 08/28/2017 02:27 PM, Claudius Heine wrote: >>> Hi, >>> >>> On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote: >>>> Hi, >>>> >>>> On 08/03/2017 10:13 AM, Claudius Heine wrote: >>>>> Hi, >>>>> >>>>> am I right that Isar supports or should support reproducible root >>>>> file system build? >>>>> >>>>> If I understand correctly, when multistrap is called, it fetches >>>>> always the latest version of all packages from the debian >>>>> repository mirrors. Am I mistaken or is this feature still on the >>>>> roadmap? >>>>> >>>>> I that is on the roadmap, how are you thinking of solving this issue? >>>>> >>>>> The openembedded way would be to seperate the fetch and 'install' >>>>> step and first download all packages into the DL_DIR and then use >>>>> them from there. Maybe we could create this pipeline: >>>>> >>>>> dpkg-binary Recipe: >>>>> >>>>> fetch deb file into downloads -> insert into local repository >>>>> >>>>> dpkg-source Recipe: >>>>> >>>>> fetch sources into downloads -> build packages -> insert into local >>>>> repository >>>>> >>>>> image Recipe: >>>>> >>>>> fetch all required packages into downloads -> insert all of them >>>>> into the local repository -> create root fs using only the local >>>>> repository >>>>> >>>>> Multistrap provides a '--source-dir DIR' parameter, that stores all >>>>> installed packages into a directory. So if we would use that as a >>>>> fetcher, then we would create a temporary rootfs just to get all >>>>> required packages for the project. >>>>> >>>>> Are there other possible solutions for this? >>>> >>>> The problem with this solution is that its not possible to create >>>> multiple images with different sets of packages that share the >>>> version of the all the common packages. >>>> >>>> An alternative solution is to employ a repository cacher that caches >>>> the 'Packages.gz' of the first request. This way it would also be >>>> faster then running multistrap one additional time just to fetch all >>>> required packages. >>>> >>>> Maybe apt-cacher-ng or something similar can be used for this. >>>> However I am currently not sure how this can be integrated into the >>>> current build process. Some ideas? Maybe implementing a simple repo >>>> caching proxy that is integrated into isar? >>>> >>>> The repository cacher is likely a daemon running in parallel to >>>> multistrap and fetches everything to the DL_DIR that is requested by >>>> it. Maybe provide a 'clean_package_cache' task, that deletes the >>>> cached 'Packages.gz', causing the next root fs build to use new >>>> package versions. >>>> >>>> I would really like to hear some feedback on this. >>> >>> In our meeting today, it was discussed that we should collect all >>> requirements for this feature and discuss possible implementation >>> ideas based on those requirements. >>> >>> Here are some requirements from my side: >>> >>> 1 If multiple different images with some common set of packages are >>> build with one bitbake call, then all images should contain >>> exactly the same version of every package that it has in common >>> with any of the other images. >>> >>> 2 The resulting image should only depend on the build environment >>> and isar metadata, not on the point in time it is build. >>> This means if the environment, including the downloads directory, >>> is complete (for instance by an earlier build of the image), >>> every >>> following build of this image recipe should result in exactly the >>> same packages installed on this image. >>> >>> 3 Binary and source packages should be part of the archival >>> process. >>> Source packages are useful in case some package needs to be >>> patched at a later date. Binary packages are useful, because >>> building them from source packages is currently not 100% >>> reproducible in Debian upstream. [1] >>> >>> 4 For development, it should be possible to easily reset the >>> environment, triggering an upgrade of the packages on the next >>> image build. >>> >>> 5 Deployable in CI environments. What those are exactly should be >>> further discussed. Here are some: >>> >>> 5.1 Possibility to use a download cache, that is not bound to >>> only >>> one product/image/environment >>> >>> 5.2 More than one build at the same time in one environment >>> should >>> be possible >>> >>> 6 Efficiency: The reproducibility feature should be time and >>> resource efficient as possible. E.g. Process should only fetch >>> and >>> store the required files. >>> >>> 7 Outputs a description file with the name and version of every >>> package deployed/used in the image/environment. 8 Use this description and/or an archive file to restore the environment state on a fresh directory so that the same image can be recreated. >>> >>> To 5: Since I don't have much experience with CI systems, >>> requirements mentioned here might not be correct. >>> >>> Any comment or requirement additions are welcome. >> >> Thank you for the requirements, they quite good describe your usecase. >> Unfortunately, ATM I don't know all the capabilities of >> multistrap/debootstrap, so could not propose too much. >> >> In general, I think there could be following solutions: >> >> - Create local apt cache with specified packages versions. >> - Patch multistrap to add capabilities to specify package versions. >> - Add hook to multistrap hooks (for example, in configscript.sh), >> that will re-install desired package versions via apt-get. > > My solution is a bit different and does not require patching multistrap, > should also work with other bootstraping mechanism. (AFAIK it should be > possible to change the bootstraping mechanism at a later date, since the > multistrap project is dead.) > > I started implementing a http proxy in python, that caches all requests > of '/pool/' and '/dists/' uris in seperate directories. 'dists' is part > of the build environment while 'pool' contains all the packages and > should be part of the download directory. > > I am currently not actively working on this proxy, because of other > tasks with higher priority, but I can give you access to it if you like. > > On my TODO list for this is: > > - Port to asyncio (with a simple http implementation) > This proxy is currently single threaded and can only handle one > connection at a time. Porting to asyncio is possible, but since the > python standard library does not provide a http implementation based > on asyncio a small http implementation based on this has to be > implemented as well. > - Integrate into bitbake/isar as a scripts/lib and a bbclass > To ease early development I implemented this proxy outside of > bitbake, but with the idea to integrate it into bitbake at a later > date. It should be easily doable to integrate this into bitbake via > two tasks. One that starts the proxy, and one that shuts it down. > Maybe add a shutdown via a bitbake event as well, so that it will be > shut down regardless of the tasks handled. Or do it completely via > bitbake events. > > The current proxy limits repositories to the http protocol. But maybe > its possible to have https proxies as well, but there its necessary to > break the ssl chain. The next point of my list would be the save and restore functionality. This would be necessary to reproduce a build with a fresh build environment. There are a couple of ways to do this. Here are some that are currently on my mind: * Just create a tarball of the 'dists' and 'pool' directory, archive that and import it into the respective directories in the fresh environment. This might not be resource efficient, because the pool could contain packages that are not used in the image. * Log requested files in the proxy and use this list afterwards to create an archive that can be used to recreate the proxy directories. This can not be done in an image recipe, but has to be done just before bitbake is finished. Because the archive should contain not only the packages that are used in one image, but all the packages that are used in one bitbake build run. * Use the 'source directory' feature of multistrap to create a directory containing all used packages for an image and use these packages to create an independent repository. This repo is then used as the "upstream repo" in later builds. If multistrap is no longer used, extract all these packages from the apt-cache in the created root file system to emulate this multistrap feature. And some other variations of those three ideas. I currently have no concrete idea how to archive the source packages yet. Since the mapping of binary and source packages is not bijective, its not trivial and dpkg & apt needs to be used to fetch them form the repositories. Cheers, Claudius -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de