* Reproducibility of builds @ 2017-08-03 8:13 Claudius Heine 2017-08-21 11:23 ` Claudius Heine ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Claudius Heine @ 2017-08-03 8:13 UTC (permalink / raw) To: isar-users Hi, am I right that Isar supports or should support reproducible root file system build? If I understand correctly, when multistrap is called, it fetches always the latest version of all packages from the debian repository mirrors. Am I mistaken or is this feature still on the roadmap? I that is on the roadmap, how are you thinking of solving this issue? The openembedded way would be to seperate the fetch and 'install' step and first download all packages into the DL_DIR and then use them from there. Maybe we could create this pipeline: dpkg-binary Recipe: fetch deb file into downloads -> insert into local repository dpkg-source Recipe: fetch sources into downloads -> build packages -> insert into local repository image Recipe: fetch all required packages into downloads -> insert all of them into the local repository -> create root fs using only the local repository Multistrap provides a '--source-dir DIR' parameter, that stores all installed packages into a directory. So if we would use that as a fetcher, then we would create a temporary rootfs just to get all required packages for the project. Are there other possible solutions for this? Cheers, Claudius ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-08-03 8:13 Reproducibility of builds Claudius Heine @ 2017-08-21 11:23 ` Claudius Heine 2017-08-28 11:27 ` Claudius Heine 2017-09-18 15:05 ` Baurzhan Ismagulov 2017-11-14 16:04 ` Christian Storm 2 siblings, 1 reply; 22+ messages in thread From: Claudius Heine @ 2017-08-21 11:23 UTC (permalink / raw) To: isar-users; +Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild Hi, On 08/03/2017 10:13 AM, Claudius Heine wrote: > Hi, > > am I right that Isar supports or should support reproducible root file > system build? > > If I understand correctly, when multistrap is called, it fetches always > the latest version of all packages from the debian repository mirrors. > Am I mistaken or is this feature still on the roadmap? > > I that is on the roadmap, how are you thinking of solving this issue? > > The openembedded way would be to seperate the fetch and 'install' step > and first download all packages into the DL_DIR and then use them from > there. Maybe we could create this pipeline: > > dpkg-binary Recipe: > > fetch deb file into downloads -> insert into local repository > > dpkg-source Recipe: > > fetch sources into downloads -> build packages -> insert into local > repository > > image Recipe: > > fetch all required packages into downloads -> insert all of them into > the local repository -> create root fs using only the local repository > > Multistrap provides a '--source-dir DIR' parameter, that stores all > installed packages into a directory. So if we would use that as a > fetcher, then we would create a temporary rootfs just to get all > required packages for the project. > > Are there other possible solutions for this? The problem with this solution is that its not possible to create multiple images with different sets of packages that share the version of the all the common packages. An alternative solution is to employ a repository cacher that caches the 'Packages.gz' of the first request. This way it would also be faster then running multistrap one additional time just to fetch all required packages. Maybe apt-cacher-ng or something similar can be used for this. However I am currently not sure how this can be integrated into the current build process. Some ideas? Maybe implementing a simple repo caching proxy that is integrated into isar? The repository cacher is likely a daemon running in parallel to multistrap and fetches everything to the DL_DIR that is requested by it. Maybe provide a 'clean_package_cache' task, that deletes the cached 'Packages.gz', causing the next root fs build to use new package versions. I would really like to hear some feedback on this. Cheers, Claudius ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-08-21 11:23 ` Claudius Heine @ 2017-08-28 11:27 ` Claudius Heine 2017-09-05 10:05 ` Alexander Smirnov 0 siblings, 1 reply; 22+ messages in thread From: Claudius Heine @ 2017-08-28 11:27 UTC (permalink / raw) To: isar-users; +Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild Hi, On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote: > Hi, > > On 08/03/2017 10:13 AM, Claudius Heine wrote: >> Hi, >> >> am I right that Isar supports or should support reproducible root file >> system build? >> >> If I understand correctly, when multistrap is called, it fetches >> always the latest version of all packages from the debian repository >> mirrors. Am I mistaken or is this feature still on the roadmap? >> >> I that is on the roadmap, how are you thinking of solving this issue? >> >> The openembedded way would be to seperate the fetch and 'install' step >> and first download all packages into the DL_DIR and then use them from >> there. Maybe we could create this pipeline: >> >> dpkg-binary Recipe: >> >> fetch deb file into downloads -> insert into local repository >> >> dpkg-source Recipe: >> >> fetch sources into downloads -> build packages -> insert into local >> repository >> >> image Recipe: >> >> fetch all required packages into downloads -> insert all of them into >> the local repository -> create root fs using only the local repository >> >> Multistrap provides a '--source-dir DIR' parameter, that stores all >> installed packages into a directory. So if we would use that as a >> fetcher, then we would create a temporary rootfs just to get all >> required packages for the project. >> >> Are there other possible solutions for this? > > The problem with this solution is that its not possible to create > multiple images with different sets of packages that share the version > of the all the common packages. > > An alternative solution is to employ a repository cacher that caches the > 'Packages.gz' of the first request. This way it would also be faster > then running multistrap one additional time just to fetch all required > packages. > > Maybe apt-cacher-ng or something similar can be used for this. > However I am currently not sure how this can be integrated into the > current build process. Some ideas? Maybe implementing a simple repo > caching proxy that is integrated into isar? > > The repository cacher is likely a daemon running in parallel to > multistrap and fetches everything to the DL_DIR that is requested by it. > Maybe provide a 'clean_package_cache' task, that deletes the cached > 'Packages.gz', causing the next root fs build to use new package versions. > > I would really like to hear some feedback on this. In our meeting today, it was discussed that we should collect all requirements for this feature and discuss possible implementation ideas based on those requirements. Here are some requirements from my side: 1 If multiple different images with some common set of packages are build with one bitbake call, then all images should contain exactly the same version of every package that it has in common with any of the other images. 2 The resulting image should only depend on the build environment and isar metadata, not on the point in time it is build. This means if the environment, including the downloads directory, is complete (for instance by an earlier build of the image), every following build of this image recipe should result in exactly the same packages installed on this image. 3 Binary and source packages should be part of the archival process. Source packages are useful in case some package needs to be patched at a later date. Binary packages are useful, because building them from source packages is currently not 100% reproducible in Debian upstream. [1] 4 For development, it should be possible to easily reset the environment, triggering an upgrade of the packages on the next image build. 5 Deployable in CI environments. What those are exactly should be further discussed. Here are some: 5.1 Possibility to use a download cache, that is not bound to only one product/image/environment 5.2 More than one build at the same time in one environment should be possible 6 Efficiency: The reproducibility feature should be time and resource efficient as possible. E.g. Process should only fetch and store the required files. 7 Outputs a description file with the name and version of every package deployed/used in the image/environment. To 5: Since I don't have much experience with CI systems, requirements mentioned here might not be correct. Any comment or requirement additions are welcome. Cheers, Claudius [1] https://tests.reproducible-builds.org/debian/reproducible.html -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-08-28 11:27 ` Claudius Heine @ 2017-09-05 10:05 ` Alexander Smirnov 2017-09-05 10:38 ` Jan Kiszka 2017-09-05 11:54 ` Claudius Heine 0 siblings, 2 replies; 22+ messages in thread From: Alexander Smirnov @ 2017-09-05 10:05 UTC (permalink / raw) To: Claudius Heine, isar-users Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild On 08/28/2017 02:27 PM, Claudius Heine wrote: > Hi, > > On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote: >> Hi, >> >> On 08/03/2017 10:13 AM, Claudius Heine wrote: >>> Hi, >>> >>> am I right that Isar supports or should support reproducible root >>> file system build? >>> >>> If I understand correctly, when multistrap is called, it fetches >>> always the latest version of all packages from the debian repository >>> mirrors. Am I mistaken or is this feature still on the roadmap? >>> >>> I that is on the roadmap, how are you thinking of solving this issue? >>> >>> The openembedded way would be to seperate the fetch and 'install' >>> step and first download all packages into the DL_DIR and then use >>> them from there. Maybe we could create this pipeline: >>> >>> dpkg-binary Recipe: >>> >>> fetch deb file into downloads -> insert into local repository >>> >>> dpkg-source Recipe: >>> >>> fetch sources into downloads -> build packages -> insert into local >>> repository >>> >>> image Recipe: >>> >>> fetch all required packages into downloads -> insert all of them into >>> the local repository -> create root fs using only the local repository >>> >>> Multistrap provides a '--source-dir DIR' parameter, that stores all >>> installed packages into a directory. So if we would use that as a >>> fetcher, then we would create a temporary rootfs just to get all >>> required packages for the project. >>> >>> Are there other possible solutions for this? >> >> The problem with this solution is that its not possible to create >> multiple images with different sets of packages that share the version >> of the all the common packages. >> >> An alternative solution is to employ a repository cacher that caches >> the 'Packages.gz' of the first request. This way it would also be >> faster then running multistrap one additional time just to fetch all >> required packages. >> >> Maybe apt-cacher-ng or something similar can be used for this. >> However I am currently not sure how this can be integrated into the >> current build process. Some ideas? Maybe implementing a simple repo >> caching proxy that is integrated into isar? >> >> The repository cacher is likely a daemon running in parallel to >> multistrap and fetches everything to the DL_DIR that is requested by >> it. Maybe provide a 'clean_package_cache' task, that deletes the >> cached 'Packages.gz', causing the next root fs build to use new >> package versions. >> >> I would really like to hear some feedback on this. > > In our meeting today, it was discussed that we should collect all > requirements for this feature and discuss possible implementation ideas > based on those requirements. > > Here are some requirements from my side: > > 1 If multiple different images with some common set of packages are > build with one bitbake call, then all images should contain > exactly the same version of every package that it has in common > with any of the other images. > > 2 The resulting image should only depend on the build environment > and isar metadata, not on the point in time it is build. > This means if the environment, including the downloads directory, > is complete (for instance by an earlier build of the image), every > following build of this image recipe should result in exactly the > same packages installed on this image. > > 3 Binary and source packages should be part of the archival process. > Source packages are useful in case some package needs to be > patched at a later date. Binary packages are useful, because > building them from source packages is currently not 100% > reproducible in Debian upstream. [1] > > 4 For development, it should be possible to easily reset the > environment, triggering an upgrade of the packages on the next > image build. > > 5 Deployable in CI environments. What those are exactly should be > further discussed. Here are some: > > 5.1 Possibility to use a download cache, that is not bound to only > one product/image/environment > > 5.2 More than one build at the same time in one environment should > be possible > > 6 Efficiency: The reproducibility feature should be time and > resource efficient as possible. E.g. Process should only fetch and > store the required files. > > 7 Outputs a description file with the name and version of every > package deployed/used in the image/environment. > > To 5: Since I don't have much experience with CI systems, requirements > mentioned here might not be correct. > > Any comment or requirement additions are welcome. Thank you for the requirements, they quite good describe your usecase. Unfortunately, ATM I don't know all the capabilities of multistrap/debootstrap, so could not propose too much. In general, I think there could be following solutions: - Create local apt cache with specified packages versions. - Patch multistrap to add capabilities to specify package versions. - Add hook to multistrap hooks (for example, in configscript.sh), that will re-install desired package versions via apt-get. -- With best regards, Alexander Smirnov ilbers GmbH Baierbrunner Str. 28c D-81379 Munich +49 (89) 122 67 24-0 http://ilbers.de/ Commercial register Munich, HRB 214197 General manager: Baurzhan Ismagulov ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-09-05 10:05 ` Alexander Smirnov @ 2017-09-05 10:38 ` Jan Kiszka 2017-09-05 11:50 ` Alexander Smirnov 2017-09-05 11:54 ` Claudius Heine 1 sibling, 1 reply; 22+ messages in thread From: Jan Kiszka @ 2017-09-05 10:38 UTC (permalink / raw) To: Alexander Smirnov, Claudius Heine, isar-users Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild On 2017-09-05 12:05, Alexander Smirnov wrote: > > > On 08/28/2017 02:27 PM, Claudius Heine wrote: >> Hi, >> >> On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote: >>> Hi, >>> >>> On 08/03/2017 10:13 AM, Claudius Heine wrote: >>>> Hi, >>>> >>>> am I right that Isar supports or should support reproducible root >>>> file system build? >>>> >>>> If I understand correctly, when multistrap is called, it fetches >>>> always the latest version of all packages from the debian repository >>>> mirrors. Am I mistaken or is this feature still on the roadmap? >>>> >>>> I that is on the roadmap, how are you thinking of solving this issue? >>>> >>>> The openembedded way would be to seperate the fetch and 'install' >>>> step and first download all packages into the DL_DIR and then use >>>> them from there. Maybe we could create this pipeline: >>>> >>>> dpkg-binary Recipe: >>>> >>>> fetch deb file into downloads -> insert into local repository >>>> >>>> dpkg-source Recipe: >>>> >>>> fetch sources into downloads -> build packages -> insert into local >>>> repository >>>> >>>> image Recipe: >>>> >>>> fetch all required packages into downloads -> insert all of them >>>> into the local repository -> create root fs using only the local >>>> repository >>>> >>>> Multistrap provides a '--source-dir DIR' parameter, that stores all >>>> installed packages into a directory. So if we would use that as a >>>> fetcher, then we would create a temporary rootfs just to get all >>>> required packages for the project. >>>> >>>> Are there other possible solutions for this? >>> >>> The problem with this solution is that its not possible to create >>> multiple images with different sets of packages that share the >>> version of the all the common packages. >>> >>> An alternative solution is to employ a repository cacher that caches >>> the 'Packages.gz' of the first request. This way it would also be >>> faster then running multistrap one additional time just to fetch all >>> required packages. >>> >>> Maybe apt-cacher-ng or something similar can be used for this. >>> However I am currently not sure how this can be integrated into the >>> current build process. Some ideas? Maybe implementing a simple repo >>> caching proxy that is integrated into isar? >>> >>> The repository cacher is likely a daemon running in parallel to >>> multistrap and fetches everything to the DL_DIR that is requested by >>> it. Maybe provide a 'clean_package_cache' task, that deletes the >>> cached 'Packages.gz', causing the next root fs build to use new >>> package versions. >>> >>> I would really like to hear some feedback on this. >> >> In our meeting today, it was discussed that we should collect all >> requirements for this feature and discuss possible implementation >> ideas based on those requirements. >> >> Here are some requirements from my side: >> >> 1 If multiple different images with some common set of packages are >> build with one bitbake call, then all images should contain >> exactly the same version of every package that it has in common >> with any of the other images. >> >> 2 The resulting image should only depend on the build environment >> and isar metadata, not on the point in time it is build. >> This means if the environment, including the downloads directory, >> is complete (for instance by an earlier build of the image), every >> following build of this image recipe should result in exactly the >> same packages installed on this image. >> >> 3 Binary and source packages should be part of the archival process. >> Source packages are useful in case some package needs to be >> patched at a later date. Binary packages are useful, because >> building them from source packages is currently not 100% >> reproducible in Debian upstream. [1] >> >> 4 For development, it should be possible to easily reset the >> environment, triggering an upgrade of the packages on the next >> image build. >> >> 5 Deployable in CI environments. What those are exactly should be >> further discussed. Here are some: >> >> 5.1 Possibility to use a download cache, that is not bound to only >> one product/image/environment >> >> 5.2 More than one build at the same time in one environment should >> be possible >> >> 6 Efficiency: The reproducibility feature should be time and >> resource efficient as possible. E.g. Process should only fetch and >> store the required files. >> >> 7 Outputs a description file with the name and version of every >> package deployed/used in the image/environment. >> >> To 5: Since I don't have much experience with CI systems, requirements >> mentioned here might not be correct. >> >> Any comment or requirement additions are welcome. > > Thank you for the requirements, they quite good describe your usecase. > Unfortunately, ATM I don't know all the capabilities of > multistrap/debootstrap, so could not propose too much. Then I guess that needs to be explored further before we can decide which patch to go. Who could contribute to this? Jan > > In general, I think there could be following solutions: > > - Create local apt cache with specified packages versions. > - Patch multistrap to add capabilities to specify package versions. > - Add hook to multistrap hooks (for example, in configscript.sh), that > will re-install desired package versions via apt-get. > -- Siemens AG, Corporate Technology, CT RDA ITP SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-09-05 10:38 ` Jan Kiszka @ 2017-09-05 11:50 ` Alexander Smirnov 0 siblings, 0 replies; 22+ messages in thread From: Alexander Smirnov @ 2017-09-05 11:50 UTC (permalink / raw) To: Jan Kiszka, Claudius Heine, isar-users Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild On 09/05/2017 01:38 PM, Jan Kiszka wrote: > On 2017-09-05 12:05, Alexander Smirnov wrote: >> >> >> On 08/28/2017 02:27 PM, Claudius Heine wrote: >>> Hi, >>> >>> On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote: >>>> Hi, >>>> >>>> On 08/03/2017 10:13 AM, Claudius Heine wrote: >>>>> Hi, >>>>> >>>>> am I right that Isar supports or should support reproducible root >>>>> file system build? >>>>> >>>>> If I understand correctly, when multistrap is called, it fetches >>>>> always the latest version of all packages from the debian repository >>>>> mirrors. Am I mistaken or is this feature still on the roadmap? >>>>> >>>>> I that is on the roadmap, how are you thinking of solving this issue? >>>>> >>>>> The openembedded way would be to seperate the fetch and 'install' >>>>> step and first download all packages into the DL_DIR and then use >>>>> them from there. Maybe we could create this pipeline: >>>>> >>>>> dpkg-binary Recipe: >>>>> >>>>> fetch deb file into downloads -> insert into local repository >>>>> >>>>> dpkg-source Recipe: >>>>> >>>>> fetch sources into downloads -> build packages -> insert into local >>>>> repository >>>>> >>>>> image Recipe: >>>>> >>>>> fetch all required packages into downloads -> insert all of them >>>>> into the local repository -> create root fs using only the local >>>>> repository >>>>> >>>>> Multistrap provides a '--source-dir DIR' parameter, that stores all >>>>> installed packages into a directory. So if we would use that as a >>>>> fetcher, then we would create a temporary rootfs just to get all >>>>> required packages for the project. >>>>> >>>>> Are there other possible solutions for this? >>>> >>>> The problem with this solution is that its not possible to create >>>> multiple images with different sets of packages that share the >>>> version of the all the common packages. >>>> >>>> An alternative solution is to employ a repository cacher that caches >>>> the 'Packages.gz' of the first request. This way it would also be >>>> faster then running multistrap one additional time just to fetch all >>>> required packages. >>>> >>>> Maybe apt-cacher-ng or something similar can be used for this. >>>> However I am currently not sure how this can be integrated into the >>>> current build process. Some ideas? Maybe implementing a simple repo >>>> caching proxy that is integrated into isar? >>>> >>>> The repository cacher is likely a daemon running in parallel to >>>> multistrap and fetches everything to the DL_DIR that is requested by >>>> it. Maybe provide a 'clean_package_cache' task, that deletes the >>>> cached 'Packages.gz', causing the next root fs build to use new >>>> package versions. >>>> >>>> I would really like to hear some feedback on this. >>> >>> In our meeting today, it was discussed that we should collect all >>> requirements for this feature and discuss possible implementation >>> ideas based on those requirements. >>> >>> Here are some requirements from my side: >>> >>> 1 If multiple different images with some common set of packages are >>> build with one bitbake call, then all images should contain >>> exactly the same version of every package that it has in common >>> with any of the other images. >>> >>> 2 The resulting image should only depend on the build environment >>> and isar metadata, not on the point in time it is build. >>> This means if the environment, including the downloads directory, >>> is complete (for instance by an earlier build of the image), every >>> following build of this image recipe should result in exactly the >>> same packages installed on this image. >>> >>> 3 Binary and source packages should be part of the archival process. >>> Source packages are useful in case some package needs to be >>> patched at a later date. Binary packages are useful, because >>> building them from source packages is currently not 100% >>> reproducible in Debian upstream. [1] >>> >>> 4 For development, it should be possible to easily reset the >>> environment, triggering an upgrade of the packages on the next >>> image build. >>> >>> 5 Deployable in CI environments. What those are exactly should be >>> further discussed. Here are some: >>> >>> 5.1 Possibility to use a download cache, that is not bound to only >>> one product/image/environment >>> >>> 5.2 More than one build at the same time in one environment should >>> be possible >>> >>> 6 Efficiency: The reproducibility feature should be time and >>> resource efficient as possible. E.g. Process should only fetch and >>> store the required files. >>> >>> 7 Outputs a description file with the name and version of every >>> package deployed/used in the image/environment. >>> >>> To 5: Since I don't have much experience with CI systems, requirements >>> mentioned here might not be correct. >>> >>> Any comment or requirement additions are welcome. >> >> Thank you for the requirements, they quite good describe your usecase. >> Unfortunately, ATM I don't know all the capabilities of >> multistrap/debootstrap, so could not propose too much. > > Then I guess that needs to be explored further before we can decide > which patch to go. Who could contribute to this? I think the one who will investigate the opportunities, also should implement this. It's a complete feature, so I can't handle it right now due to load with another features. Alex > >> >> In general, I think there could be following solutions: >> >> - Create local apt cache with specified packages versions. >> - Patch multistrap to add capabilities to specify package versions. >> - Add hook to multistrap hooks (for example, in configscript.sh), that >> will re-install desired package versions via apt-get. >> > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-09-05 10:05 ` Alexander Smirnov 2017-09-05 10:38 ` Jan Kiszka @ 2017-09-05 11:54 ` Claudius Heine 2017-09-06 13:39 ` Claudius Heine 1 sibling, 1 reply; 22+ messages in thread From: Claudius Heine @ 2017-09-05 11:54 UTC (permalink / raw) To: Alexander Smirnov, isar-users Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild Hi, On 09/05/2017 12:05 PM, Alexander Smirnov wrote: > > > On 08/28/2017 02:27 PM, Claudius Heine wrote: >> Hi, >> >> On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote: >>> Hi, >>> >>> On 08/03/2017 10:13 AM, Claudius Heine wrote: >>>> Hi, >>>> >>>> am I right that Isar supports or should support reproducible root >>>> file system build? >>>> >>>> If I understand correctly, when multistrap is called, it fetches >>>> always the latest version of all packages from the debian repository >>>> mirrors. Am I mistaken or is this feature still on the roadmap? >>>> >>>> I that is on the roadmap, how are you thinking of solving this issue? >>>> >>>> The openembedded way would be to seperate the fetch and 'install' >>>> step and first download all packages into the DL_DIR and then use >>>> them from there. Maybe we could create this pipeline: >>>> >>>> dpkg-binary Recipe: >>>> >>>> fetch deb file into downloads -> insert into local repository >>>> >>>> dpkg-source Recipe: >>>> >>>> fetch sources into downloads -> build packages -> insert into local >>>> repository >>>> >>>> image Recipe: >>>> >>>> fetch all required packages into downloads -> insert all of them >>>> into the local repository -> create root fs using only the local >>>> repository >>>> >>>> Multistrap provides a '--source-dir DIR' parameter, that stores all >>>> installed packages into a directory. So if we would use that as a >>>> fetcher, then we would create a temporary rootfs just to get all >>>> required packages for the project. >>>> >>>> Are there other possible solutions for this? >>> >>> The problem with this solution is that its not possible to create >>> multiple images with different sets of packages that share the >>> version of the all the common packages. >>> >>> An alternative solution is to employ a repository cacher that caches >>> the 'Packages.gz' of the first request. This way it would also be >>> faster then running multistrap one additional time just to fetch all >>> required packages. >>> >>> Maybe apt-cacher-ng or something similar can be used for this. >>> However I am currently not sure how this can be integrated into the >>> current build process. Some ideas? Maybe implementing a simple repo >>> caching proxy that is integrated into isar? >>> >>> The repository cacher is likely a daemon running in parallel to >>> multistrap and fetches everything to the DL_DIR that is requested by >>> it. Maybe provide a 'clean_package_cache' task, that deletes the >>> cached 'Packages.gz', causing the next root fs build to use new >>> package versions. >>> >>> I would really like to hear some feedback on this. >> >> In our meeting today, it was discussed that we should collect all >> requirements for this feature and discuss possible implementation >> ideas based on those requirements. >> >> Here are some requirements from my side: >> >> 1 If multiple different images with some common set of packages are >> build with one bitbake call, then all images should contain >> exactly the same version of every package that it has in common >> with any of the other images. >> >> 2 The resulting image should only depend on the build environment >> and isar metadata, not on the point in time it is build. >> This means if the environment, including the downloads directory, >> is complete (for instance by an earlier build of the image), every >> following build of this image recipe should result in exactly the >> same packages installed on this image. >> >> 3 Binary and source packages should be part of the archival process. >> Source packages are useful in case some package needs to be >> patched at a later date. Binary packages are useful, because >> building them from source packages is currently not 100% >> reproducible in Debian upstream. [1] >> >> 4 For development, it should be possible to easily reset the >> environment, triggering an upgrade of the packages on the next >> image build. >> >> 5 Deployable in CI environments. What those are exactly should be >> further discussed. Here are some: >> >> 5.1 Possibility to use a download cache, that is not bound to only >> one product/image/environment >> >> 5.2 More than one build at the same time in one environment should >> be possible >> >> 6 Efficiency: The reproducibility feature should be time and >> resource efficient as possible. E.g. Process should only fetch and >> store the required files. >> >> 7 Outputs a description file with the name and version of every >> package deployed/used in the image/environment. >> >> To 5: Since I don't have much experience with CI systems, requirements >> mentioned here might not be correct. >> >> Any comment or requirement additions are welcome. > > Thank you for the requirements, they quite good describe your usecase. > Unfortunately, ATM I don't know all the capabilities of > multistrap/debootstrap, so could not propose too much. > > In general, I think there could be following solutions: > > - Create local apt cache with specified packages versions. > - Patch multistrap to add capabilities to specify package versions. > - Add hook to multistrap hooks (for example, in configscript.sh), that > will re-install desired package versions via apt-get. My solution is a bit different and does not require patching multistrap, should also work with other bootstraping mechanism. (AFAIK it should be possible to change the bootstraping mechanism at a later date, since the multistrap project is dead.) I started implementing a http proxy in python, that caches all requests of '/pool/' and '/dists/' uris in seperate directories. 'dists' is part of the build environment while 'pool' contains all the packages and should be part of the download directory. I am currently not actively working on this proxy, because of other tasks with higher priority, but I can give you access to it if you like. On my TODO list for this is: - Port to asyncio (with a simple http implementation) This proxy is currently single threaded and can only handle one connection at a time. Porting to asyncio is possible, but since the python standard library does not provide a http implementation based on asyncio a small http implementation based on this has to be implemented as well. - Integrate into bitbake/isar as a scripts/lib and a bbclass To ease early development I implemented this proxy outside of bitbake, but with the idea to integrate it into bitbake at a later date. It should be easily doable to integrate this into bitbake via two tasks. One that starts the proxy, and one that shuts it down. Maybe add a shutdown via a bitbake event as well, so that it will be shut down regardless of the tasks handled. Or do it completely via bitbake events. The current proxy limits repositories to the http protocol. But maybe its possible to have https proxies as well, but there its necessary to break the ssl chain. Claudius -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-09-05 11:54 ` Claudius Heine @ 2017-09-06 13:39 ` Claudius Heine 0 siblings, 0 replies; 22+ messages in thread From: Claudius Heine @ 2017-09-06 13:39 UTC (permalink / raw) To: Alexander Smirnov, isar-users Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild Hi, On 09/05/2017 01:54 PM, [ext] Claudius Heine wrote: > Hi, > > On 09/05/2017 12:05 PM, Alexander Smirnov wrote: >> >> >> On 08/28/2017 02:27 PM, Claudius Heine wrote: >>> Hi, >>> >>> On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote: >>>> Hi, >>>> >>>> On 08/03/2017 10:13 AM, Claudius Heine wrote: >>>>> Hi, >>>>> >>>>> am I right that Isar supports or should support reproducible root >>>>> file system build? >>>>> >>>>> If I understand correctly, when multistrap is called, it fetches >>>>> always the latest version of all packages from the debian >>>>> repository mirrors. Am I mistaken or is this feature still on the >>>>> roadmap? >>>>> >>>>> I that is on the roadmap, how are you thinking of solving this issue? >>>>> >>>>> The openembedded way would be to seperate the fetch and 'install' >>>>> step and first download all packages into the DL_DIR and then use >>>>> them from there. Maybe we could create this pipeline: >>>>> >>>>> dpkg-binary Recipe: >>>>> >>>>> fetch deb file into downloads -> insert into local repository >>>>> >>>>> dpkg-source Recipe: >>>>> >>>>> fetch sources into downloads -> build packages -> insert into local >>>>> repository >>>>> >>>>> image Recipe: >>>>> >>>>> fetch all required packages into downloads -> insert all of them >>>>> into the local repository -> create root fs using only the local >>>>> repository >>>>> >>>>> Multistrap provides a '--source-dir DIR' parameter, that stores all >>>>> installed packages into a directory. So if we would use that as a >>>>> fetcher, then we would create a temporary rootfs just to get all >>>>> required packages for the project. >>>>> >>>>> Are there other possible solutions for this? >>>> >>>> The problem with this solution is that its not possible to create >>>> multiple images with different sets of packages that share the >>>> version of the all the common packages. >>>> >>>> An alternative solution is to employ a repository cacher that caches >>>> the 'Packages.gz' of the first request. This way it would also be >>>> faster then running multistrap one additional time just to fetch all >>>> required packages. >>>> >>>> Maybe apt-cacher-ng or something similar can be used for this. >>>> However I am currently not sure how this can be integrated into the >>>> current build process. Some ideas? Maybe implementing a simple repo >>>> caching proxy that is integrated into isar? >>>> >>>> The repository cacher is likely a daemon running in parallel to >>>> multistrap and fetches everything to the DL_DIR that is requested by >>>> it. Maybe provide a 'clean_package_cache' task, that deletes the >>>> cached 'Packages.gz', causing the next root fs build to use new >>>> package versions. >>>> >>>> I would really like to hear some feedback on this. >>> >>> In our meeting today, it was discussed that we should collect all >>> requirements for this feature and discuss possible implementation >>> ideas based on those requirements. >>> >>> Here are some requirements from my side: >>> >>> 1 If multiple different images with some common set of packages are >>> build with one bitbake call, then all images should contain >>> exactly the same version of every package that it has in common >>> with any of the other images. >>> >>> 2 The resulting image should only depend on the build environment >>> and isar metadata, not on the point in time it is build. >>> This means if the environment, including the downloads directory, >>> is complete (for instance by an earlier build of the image), >>> every >>> following build of this image recipe should result in exactly the >>> same packages installed on this image. >>> >>> 3 Binary and source packages should be part of the archival >>> process. >>> Source packages are useful in case some package needs to be >>> patched at a later date. Binary packages are useful, because >>> building them from source packages is currently not 100% >>> reproducible in Debian upstream. [1] >>> >>> 4 For development, it should be possible to easily reset the >>> environment, triggering an upgrade of the packages on the next >>> image build. >>> >>> 5 Deployable in CI environments. What those are exactly should be >>> further discussed. Here are some: >>> >>> 5.1 Possibility to use a download cache, that is not bound to >>> only >>> one product/image/environment >>> >>> 5.2 More than one build at the same time in one environment >>> should >>> be possible >>> >>> 6 Efficiency: The reproducibility feature should be time and >>> resource efficient as possible. E.g. Process should only fetch >>> and >>> store the required files. >>> >>> 7 Outputs a description file with the name and version of every >>> package deployed/used in the image/environment. 8 Use this description and/or an archive file to restore the environment state on a fresh directory so that the same image can be recreated. >>> >>> To 5: Since I don't have much experience with CI systems, >>> requirements mentioned here might not be correct. >>> >>> Any comment or requirement additions are welcome. >> >> Thank you for the requirements, they quite good describe your usecase. >> Unfortunately, ATM I don't know all the capabilities of >> multistrap/debootstrap, so could not propose too much. >> >> In general, I think there could be following solutions: >> >> - Create local apt cache with specified packages versions. >> - Patch multistrap to add capabilities to specify package versions. >> - Add hook to multistrap hooks (for example, in configscript.sh), >> that will re-install desired package versions via apt-get. > > My solution is a bit different and does not require patching multistrap, > should also work with other bootstraping mechanism. (AFAIK it should be > possible to change the bootstraping mechanism at a later date, since the > multistrap project is dead.) > > I started implementing a http proxy in python, that caches all requests > of '/pool/' and '/dists/' uris in seperate directories. 'dists' is part > of the build environment while 'pool' contains all the packages and > should be part of the download directory. > > I am currently not actively working on this proxy, because of other > tasks with higher priority, but I can give you access to it if you like. > > On my TODO list for this is: > > - Port to asyncio (with a simple http implementation) > This proxy is currently single threaded and can only handle one > connection at a time. Porting to asyncio is possible, but since the > python standard library does not provide a http implementation based > on asyncio a small http implementation based on this has to be > implemented as well. > - Integrate into bitbake/isar as a scripts/lib and a bbclass > To ease early development I implemented this proxy outside of > bitbake, but with the idea to integrate it into bitbake at a later > date. It should be easily doable to integrate this into bitbake via > two tasks. One that starts the proxy, and one that shuts it down. > Maybe add a shutdown via a bitbake event as well, so that it will be > shut down regardless of the tasks handled. Or do it completely via > bitbake events. > > The current proxy limits repositories to the http protocol. But maybe > its possible to have https proxies as well, but there its necessary to > break the ssl chain. The next point of my list would be the save and restore functionality. This would be necessary to reproduce a build with a fresh build environment. There are a couple of ways to do this. Here are some that are currently on my mind: * Just create a tarball of the 'dists' and 'pool' directory, archive that and import it into the respective directories in the fresh environment. This might not be resource efficient, because the pool could contain packages that are not used in the image. * Log requested files in the proxy and use this list afterwards to create an archive that can be used to recreate the proxy directories. This can not be done in an image recipe, but has to be done just before bitbake is finished. Because the archive should contain not only the packages that are used in one image, but all the packages that are used in one bitbake build run. * Use the 'source directory' feature of multistrap to create a directory containing all used packages for an image and use these packages to create an independent repository. This repo is then used as the "upstream repo" in later builds. If multistrap is no longer used, extract all these packages from the apt-cache in the created root file system to emulate this multistrap feature. And some other variations of those three ideas. I currently have no concrete idea how to archive the source packages yet. Since the mapping of binary and source packages is not bijective, its not trivial and dpkg & apt needs to be used to fetch them form the repositories. Cheers, Claudius -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-08-03 8:13 Reproducibility of builds Claudius Heine 2017-08-21 11:23 ` Claudius Heine @ 2017-09-18 15:05 ` Baurzhan Ismagulov 2017-09-19 8:55 ` Claudius Heine 2017-11-14 16:04 ` Christian Storm 2 siblings, 1 reply; 22+ messages in thread From: Baurzhan Ismagulov @ 2017-09-18 15:05 UTC (permalink / raw) To: isar-users Hello Claudius, thanks much for sharing the concept and the requirements! Let's check whether I understand your concept correctly. Assume we have minimal stretch binaries + hello source. Your concept provides for: 1. Start bitbake, which: 1. Downloads debs to be installed and adds them into a local apt repo. 2. Fetches hello sources, builds them, and copies the deb to the local apt repo. 3. Bootstraps Debian and hello binary debs from the local apt repo. Please correct me if I got anything incorrectly. I like this workflow. This is what is used in Isar's predecessor. There, it is implemented manually using debmirror. The reason why Isar installs Debian packages from apt repos in Internet is to give first-time users a setup working OOTB. If they start developing a product, they are expected to create their own apt repos. See e.g. http://events.linuxfoundation.org/sites/events/files/slides/isar-elce-2016_1.pdf, slides 19 ("Debian apt" repo) and 26 ("Create repos for all components: Debian..."). That is why I was originally thinking about a tool that would support this manual workflow. After pondering on your proposal, I think it makes sense. I'd like to see the following features: * The functionality is implemented in a standalone tool usable manually or from bitbake. * The functionality is implemented based on dry-run output of {deboot,multi,...}strap. * The feature can be turned off in Isar's local configuration. * The tool supports initial mirroring as well as an update. This should also be controllable in Isar's local config. What I don't like is the implementation via a http proxy. IMHO, it's too indirect for the task (why bother with dynamic proxying if the list of packages is defined statically in a given apt repo). It supports only one of apt's six fetch methods (such as https, file, ssh, etc., see sources.list(5), more could be defined in the future or in special environments). The implementation is going to be complex, since it needs to distinguish between different build process chains in the same environment (two bitbakes running in a single docker). It should be trivial to get a list of packages from multistrap. The same functionality is available in debootstrap, when we move to it. Mirroring could be done by an existing or a new tool. The latter may be a step to identify requirements and get experience with the workflow before integrating the functionality into the former (possibly upon feedback from Debian community). Archiving of the apt repo is a CM issue outside of Isar. For reproducing older versions, it should be managed in an SCM (e.g., git). Synchronization between the right product and apt repo revisions is also outside Isar and could be solved e.g. with kas. Or, one goes hard-core and commits apt stuff into the product repo. In the future, we might come with a better solution for archiving and version pinning; at this stage I'd like to utilize existing Debian means first before going further. The details of the pinning concept would be affected by bitbake debian/control backend implementation. Similarly, at this stage I don't address advanced issues like sharing modified and non-modified apt repos, which could be implemented by a KISS jessie/Packages and myjessie/Packages with the shared pool. If we have many of them in practice (which I doubt), we could still return to the issue. Some comments below. On Thu, Aug 03, 2017 at 10:13:12AM +0200, Claudius Heine wrote: > am I right that Isar supports or should support reproducible root file > system build? Yes, this is possible outside of Isar. We wish that Isar makes that easier. > If I understand correctly, when multistrap is called, it fetches always the > latest version of all packages from the debian repository mirrors. Am I > mistaken or is this feature still on the roadmap? In the sense I interpret your wording, yes, multistrap always fetches the latest version of all packages from the Debian repo. That said, for a given repo, there is only one version of every package, defined in the Packages file. It is the latest for that repo. Given a URI in Internet, multistrap always fetches its "latest" (and the only) Packages and installs the "latest" (and the only) package versions. With kind regards, Baurzhan. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-09-18 15:05 ` Baurzhan Ismagulov @ 2017-09-19 8:55 ` Claudius Heine 0 siblings, 0 replies; 22+ messages in thread From: Claudius Heine @ 2017-09-19 8:55 UTC (permalink / raw) To: isar-users Hi Baurzhan, On 09/18/2017 05:05 PM, Baurzhan Ismagulov wrote: > What I don't like is the implementation via a http proxy. IMHO, it's too > indirect for the task (why bother with dynamic proxying if the list of packages > is defined statically in a given apt repo). Only if someone bothers to create a separate debian mirror repository for every product. It uses much more resources. It would be much easier to have a global package cache and a project local package index for it. IMO that would be only possible with a caching repo proxy. > It supports only one of apt's six > fetch methods (such as https, file, ssh, etc., see sources.list(5), more could > be defined in the future or in special environments). At first. I started with a http proxy it the easiest to implement. Its always possible to add additional functionality to the proxy to support other fetch methods if necessary. But IMO that not really that important. > The implementation is > going to be complex, since it needs to distinguish between different build > process chains in the same environment (two bitbakes running in a single > docker). Why? We have more then one port available, so we can run more then one proxy simultaneously for each build. My current implementation just chooses a free port and makes it available to the calling process. > It should be trivial to get a list of packages from multistrap. The same > functionality is available in debootstrap, when we move to it. The problem is we still need to use apt in the buildchroot to install additional build dependencies for each recipe. Those are not part of what multistrap/debootstrap lists out. But since it would go through the http proxy it would be part of the static package cache. > Mirroring could > be done by an existing or a new tool. The latter may be a step to identify > requirements and get experience with the workflow before integrating the > functionality into the former (possibly upon feedback from Debian community). As I said, I don't see the sense in creating a full debian mirror for every project. And partial mirrors are difficult to create because of multistrap/debootstrap (in case of the buildchroot) don't know about every package that is added to the image. > > Archiving of the apt repo is a CM issue outside of Isar. For reproducing older > versions, it should be managed in an SCM (e.g., git). That should be possible. Just archive the package index in a git repo and the packages in a git lfs repo. > Synchronization between > the right product and apt repo revisions is also outside Isar and could be > solved e.g. with kas. Never said that it is. But isar is responsible for providing ways to import/export some kind of package list into a build. > Or, one goes hard-core and commits apt stuff into the > product repo. That might depend on your 'product' definition, but for me product is not a image. So products can have varying package versions, while images obviously doesn't. So committing them together with products makes no sense to me. But committing them together with the final image with a reference of the used refspec of the product repository makes more sense. > In the future, we might come with a better solution for archiving > and version pinning; at this stage I'd like to utilize existing Debian means > first before going further. The details of the pinning concept would be > affected by bitbake debian/control backend implementation. I said nothing about pining, because IMO package updates etc. should still be possible on the target if wanted. But we should be able to recreate images at least from a package list. So apt package pinning is just a different solution for a different problem. If you mean pinning just in the bootstraping phase, then yes, that would be nice. But I don't know how that can solve the buildchroot problem. Also since the package index contains just one version of each package, I don't see how it would be possible to pin them to an older version at this stage, because those would no longer be available in the index and *bootstrap would not know where to fetch them. AFAIK Debian currently has no convenient means for solving these issues yet. I used apt-cacher-ng when I worked with elbe, but setting that up for every project separately is a big hassle. I want a easy solutions where this stuff is done inside of the normal bitbake process, where not every developer has to wire up her own process of building root file systems. Because if they have to build their own most developers don't care about it or it becomes impossible to recreate images because a couple of unknown software packages in some unknown version and with unknown configuration are necessary for it. So its important to use normal upstream package mirrors and have a process in place inside bitbake that cares about these issues transparently. IMO its important to not have to many options and unneeded complexity. So reproducibility should be the default and everyone is free to update/clear the package index manually via a single bitbake task. Cheers, Claudius -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-08-03 8:13 Reproducibility of builds Claudius Heine 2017-08-21 11:23 ` Claudius Heine 2017-09-18 15:05 ` Baurzhan Ismagulov @ 2017-11-14 16:04 ` Christian Storm 2017-11-14 16:22 ` Claudius Heine 2 siblings, 1 reply; 22+ messages in thread From: Christian Storm @ 2017-11-14 16:04 UTC (permalink / raw) To: isar-users Hi, since I'm very interested in this feature, I'd like to resume this discussion and to eventually come to an agreed upon proposal on how to implement it. So, without further ado, here are my thoughts on the subject: Regardless of the concrete technical implementation, I guess we can agree on the need for a local cache/repository/store in which the Debian binary packages plus their sources have to be stored since one may not rely on the availability of those files online for eternity. These files in this cache/repository/store are the union of the Debian binary packages installed in the resulting image plus their sources as well as those installed in the buildchroot plus their sources. The latter is required to be able to rebuild Debian packages built from source with the same compiler version, libraries, -dev packages, etc. pp. Having the cache/repository/store at hand, there should be a mechanism to prime Isar with it, i.e., Isar should only and exclusively use Debian binary packages and sources from this cache/repository/store. This is again, irrespective of the technical implementation, be it via a repository cache or other means like git, a proxy server or whatsoever. Granted, if one changes, e.g, IMAGE_INSTALL_append, the build fails but does so rightfully as the set of packages is modified, resulting in a new version/epoch (=set of Debian packages plus their sources). So, there should be a convenient "interface" provided by Isar to maintain the cache/repository/store. For example, one may want to have different versions/epochs that may correspond to particular versions (git sha) of the Isar layer. Or one wants to later add a Debian package plus its source (which is automatically fetched), resulting in a new version/epoch etc. The remaining question is how to fill the cache/repository/store. In order to have a consistent version/epoch (=set of Debian packages plus their sources), there should not be duplicate packages in it, i.e., the same Debian package but with different versions. This could currently happen because there is a "window of vulnerability": multistrap is run twice, once for isar-image-base.bb and once for buildchroot.bb. In between those two runs, the Debian mirror used could get updated, resulting in a different version of the Debian package being installed in buildchroot than in the resulting image. This is an inherent problem of relying on the Debian way of distributing packages as one cannot a priori control what particular package versions one gets: In contrast to, e.g., Yocto where the particular package versions are specified in the recipes, this does not hold for Isar as the particular package versions are defined by the Debian mirror used, hence, one gets "injected" the particular package versions. So, what's required to reduce the "window of vulnerability" and to have a consistent cache/repository/store for a particular version/epoch is to make a snapshot-type download of the required packages. For this, of course, one needs to know the concrete set of packages. This list could be delivered by a "package trace" Isar run since not only multistrap does install packages but sprinkled apt-get install commands do as well. Thereafter, knowing the list, the snapshot-type download can happen, hopefully resulting in a consistent cache/repository/store. So, what do you think? Besten Gru�, Christian -- Dr. Christian Storm Siemens AG, Corporate Technology, CT RDA ITP SES-DE Otto-Hahn-Ring 6, 81739 M�nchen, Germany ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-11-14 16:04 ` Christian Storm @ 2017-11-14 16:22 ` Claudius Heine 2017-11-17 16:53 ` [ext] Christian Storm 0 siblings, 1 reply; 22+ messages in thread From: Claudius Heine @ 2017-11-14 16:22 UTC (permalink / raw) To: [ext] Christian Storm; +Cc: isar-users Hi Christian, On 11/14/2017 05:04 PM, [ext] Christian Storm wrote: > Hi, > > since I'm very interested in this feature, I'd like to resume this > discussion and to eventually come to an agreed upon proposal on how > to implement it. So, without further ado, here are my thoughts on > the subject: > > Regardless of the concrete technical implementation, I guess we can > agree on the need for a local cache/repository/store in which the Debian > binary packages plus their sources have to be stored since one may not > rely on the availability of those files online for eternity. > > These files in this cache/repository/store are the union of the Debian > binary packages installed in the resulting image plus their sources as > well as those installed in the buildchroot plus their sources. > The latter is required to be able to rebuild Debian packages built from > source with the same compiler version, libraries, -dev packages, etc. pp. > > Having the cache/repository/store at hand, there should be a mechanism > to prime Isar with it, i.e., Isar should only and exclusively use Debian > binary packages and sources from this cache/repository/store. > This is again, irrespective of the technical implementation, be it via > a repository cache or other means like git, a proxy server or whatsoever. > > Granted, if one changes, e.g, IMAGE_INSTALL_append, the build fails but > does so rightfully as the set of packages is modified, resulting in a > new version/epoch (=set of Debian packages plus their sources). So, > there should be a convenient "interface" provided by Isar to maintain > the cache/repository/store. For example, one may want to have different > versions/epochs that may correspond to particular versions (git sha) of > the Isar layer. Or one wants to later add a Debian package plus its > source (which is automatically fetched), resulting in a new > version/epoch etc. > > The remaining question is how to fill the cache/repository/store. In > order to have a consistent version/epoch (=set of Debian packages plus > their sources), there should not be duplicate packages in it, i.e., the > same Debian package but with different versions. > This could currently happen because there is a "window of vulnerability": > multistrap is run twice, once for isar-image-base.bb and once for > buildchroot.bb. In between those two runs, the Debian mirror used could > get updated, resulting in a different version of the Debian package > being installed in buildchroot than in the resulting image. > This is an inherent problem of relying on the Debian way of distributing > packages as one cannot a priori control what particular package versions > one gets: In contrast to, e.g., Yocto where the particular package > versions are specified in the recipes, this does not hold for Isar as > the particular package versions are defined by the Debian mirror used, > hence, one gets "injected" the particular package versions. > So, what's required to reduce the "window of vulnerability" and to have > a consistent cache/repository/store for a particular version/epoch is to > make a snapshot-type download of the required packages. For this, of > course, one needs to know the concrete set of packages. This list could > be delivered by a "package trace" Isar run since not only multistrap > does install packages but sprinkled apt-get install commands do as well. > Thereafter, knowing the list, the snapshot-type download can happen, > hopefully resulting in a consistent cache/repository/store. > > > So, what do you think? I agree with your formulation of the problem here. Simple tracing of installed packages will have the problem you described, that its possible that different versions of a package are installed into buildchroot and image. So this trace needs to be cleaned up and then based on that the whole process has to be started again to create a consistent package list between buildchroot and image. This doubles the build time in the trivial implementation. With my suggestion of using a caching proxy, this could be solved without any additional overhead. I do have other ideas to do this, but that would restructure most of isar. Cheers, Claudius -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-11-14 16:22 ` Claudius Heine @ 2017-11-17 16:53 ` [ext] Christian Storm 2017-11-17 18:14 ` Claudius Heine 0 siblings, 1 reply; 22+ messages in thread From: [ext] Christian Storm @ 2017-11-17 16:53 UTC (permalink / raw) To: isar-users > > since I'm very interested in this feature, I'd like to resume this > > discussion and to eventually come to an agreed upon proposal on how > > to implement it. So, without further ado, here are my thoughts on > > the subject: > > > > Regardless of the concrete technical implementation, I guess we can > > agree on the need for a local cache/repository/store in which the Debian > > binary packages plus their sources have to be stored since one may not > > rely on the availability of those files online for eternity. > > > > These files in this cache/repository/store are the union of the Debian > > binary packages installed in the resulting image plus their sources as > > well as those installed in the buildchroot plus their sources. > > The latter is required to be able to rebuild Debian packages built from > > source with the same compiler version, libraries, -dev packages, etc. pp. > > > > Having the cache/repository/store at hand, there should be a mechanism > > to prime Isar with it, i.e., Isar should only and exclusively use Debian > > binary packages and sources from this cache/repository/store. > > This is again, irrespective of the technical implementation, be it via > > a repository cache or other means like git, a proxy server or whatsoever. > > > > Granted, if one changes, e.g, IMAGE_INSTALL_append, the build fails but > > does so rightfully as the set of packages is modified, resulting in a > > new version/epoch (=set of Debian packages plus their sources). So, > > there should be a convenient "interface" provided by Isar to maintain > > the cache/repository/store. For example, one may want to have different > > versions/epochs that may correspond to particular versions (git sha) of > > the Isar layer. Or one wants to later add a Debian package plus its > > source (which is automatically fetched), resulting in a new > > version/epoch etc. > > > > The remaining question is how to fill the cache/repository/store. In > > order to have a consistent version/epoch (=set of Debian packages plus > > their sources), there should not be duplicate packages in it, i.e., the > > same Debian package but with different versions. > > This could currently happen because there is a "window of vulnerability": > > multistrap is run twice, once for isar-image-base.bb and once for > > buildchroot.bb. In between those two runs, the Debian mirror used could > > get updated, resulting in a different version of the Debian package > > being installed in buildchroot than in the resulting image. > > This is an inherent problem of relying on the Debian way of distributing > > packages as one cannot a priori control what particular package versions > > one gets: In contrast to, e.g., Yocto where the particular package > > versions are specified in the recipes, this does not hold for Isar as > > the particular package versions are defined by the Debian mirror used, > > hence, one gets "injected" the particular package versions. > > So, what's required to reduce the "window of vulnerability" and to have > > a consistent cache/repository/store for a particular version/epoch is to > > make a snapshot-type download of the required packages. For this, of > > course, one needs to know the concrete set of packages. This list could > > be delivered by a "package trace" Isar run since not only multistrap > > does install packages but sprinkled apt-get install commands do as well. > > Thereafter, knowing the list, the snapshot-type download can happen, > > hopefully resulting in a consistent cache/repository/store. > > > > > > So, what do you think? > > I agree with your formulation of the problem here. > > Simple tracing of installed packages will have the problem you > described, that its possible that different versions of a package are > installed into buildchroot and image. So this trace needs to be cleaned > up and then based on that the whole process has to be started again to > create a consistent package list between buildchroot and image. This > doubles the build time in the trivial implementation. Sure, there's no free lunch here :) I'd rather strive for a good solution and avoid trivial implementations to make lunch as close to free as it gets, to stay in the picture. > With my suggestion of using a caching proxy, this could be solved > without any additional overhead. Could be the case, what are the drawbacks? What proxy do you propose to use? Maybe I missed something on the proxy suggestion.. Could you please elaborate on this? > I do have other ideas to do this, but that would restructure most of isar. Well, at least speaking for myself, I'd like to hear those as I consider this feature to be essential. Choice in solutions is always good :) Kind regards, Christian -- Dr. Christian Storm Siemens AG, Corporate Technology, CT RDA ITP SES-DE Otto-Hahn-Ring 6, 81739 M�nchen, Germany ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-11-17 16:53 ` [ext] Christian Storm @ 2017-11-17 18:14 ` Claudius Heine 2017-11-20 8:33 ` [ext] Christian Storm 0 siblings, 1 reply; 22+ messages in thread From: Claudius Heine @ 2017-11-17 18:14 UTC (permalink / raw) To: [ext] Christian Storm, isar-users [-- Attachment #1: Type: text/plain, Size: 7829 bytes --] Hi, On Fri, 2017-11-17 at 17:53 +0100, [ext] Christian Storm wrote: > > > since I'm very interested in this feature, I'd like to resume > > > this > > > discussion and to eventually come to an agreed upon proposal on > > > how > > > to implement it. So, without further ado, here are my thoughts on > > > the subject: > > > > > > Regardless of the concrete technical implementation, I guess we > > > can > > > agree on the need for a local cache/repository/store in which the > > > Debian > > > binary packages plus their sources have to be stored since one > > > may not > > > rely on the availability of those files online for eternity. > > > > > > These files in this cache/repository/store are the union of the > > > Debian > > > binary packages installed in the resulting image plus their > > > sources as > > > well as those installed in the buildchroot plus their sources. > > > The latter is required to be able to rebuild Debian packages > > > built from > > > source with the same compiler version, libraries, -dev packages, > > > etc. pp. > > > > > > Having the cache/repository/store at hand, there should be a > > > mechanism > > > to prime Isar with it, i.e., Isar should only and exclusively use > > > Debian > > > binary packages and sources from this cache/repository/store. > > > This is again, irrespective of the technical implementation, be > > > it via > > > a repository cache or other means like git, a proxy server or > > > whatsoever. > > > > > > Granted, if one changes, e.g, IMAGE_INSTALL_append, the build > > > fails but > > > does so rightfully as the set of packages is modified, resulting > > > in a > > > new version/epoch (=set of Debian packages plus their sources). > > > So, > > > there should be a convenient "interface" provided by Isar to > > > maintain > > > the cache/repository/store. For example, one may want to have > > > different > > > versions/epochs that may correspond to particular versions (git > > > sha) of > > > the Isar layer. Or one wants to later add a Debian package plus > > > its > > > source (which is automatically fetched), resulting in a new > > > version/epoch etc. > > > > > > The remaining question is how to fill the cache/repository/store. > > > In > > > order to have a consistent version/epoch (=set of Debian packages > > > plus > > > their sources), there should not be duplicate packages in it, > > > i.e., the > > > same Debian package but with different versions. > > > This could currently happen because there is a "window of > > > vulnerability": > > > multistrap is run twice, once for isar-image-base.bb and once for > > > buildchroot.bb. In between those two runs, the Debian mirror used > > > could > > > get updated, resulting in a different version of the Debian > > > package > > > being installed in buildchroot than in the resulting image. > > > This is an inherent problem of relying on the Debian way of > > > distributing > > > packages as one cannot a priori control what particular package > > > versions > > > one gets: In contrast to, e.g., Yocto where the particular > > > package > > > versions are specified in the recipes, this does not hold for > > > Isar as > > > the particular package versions are defined by the Debian mirror > > > used, > > > hence, one gets "injected" the particular package versions. > > > So, what's required to reduce the "window of vulnerability" and > > > to have > > > a consistent cache/repository/store for a particular > > > version/epoch is to > > > make a snapshot-type download of the required packages. For this, > > > of > > > course, one needs to know the concrete set of packages. This list > > > could > > > be delivered by a "package trace" Isar run since not only > > > multistrap > > > does install packages but sprinkled apt-get install commands do > > > as well. > > > Thereafter, knowing the list, the snapshot-type download can > > > happen, > > > hopefully resulting in a consistent cache/repository/store. > > > > > > > > > So, what do you think? > > > > I agree with your formulation of the problem here. > > > > Simple tracing of installed packages will have the problem you > > described, that its possible that different versions of a package > > are > > installed into buildchroot and image. So this trace needs to be > > cleaned > > up and then based on that the whole process has to be started again > > to > > create a consistent package list between buildchroot and image. > > This > > doubles the build time in the trivial implementation. > > Sure, there's no free lunch here :) > I'd rather strive for a good solution and avoid trivial > implementations > to make lunch as close to free as it gets, to stay in the picture. > > > > With my suggestion of using a caching proxy, this could be solved > > without any additional overhead. > > Could be the case, what are the drawbacks? More complexity and stuff to implement. Also maybe download speed. > What proxy do you propose to > use? I was at first going with my own standalone proxy implementation in pure stdlib python, so that it could be completely integrated into isar. I had a very simple solution ready rather quickly, but it was only synchronous and as such could only handle one connection at a time. Instead of just throwing more threads at it, I wanted to go the asyncio route. Sadly the python stdlib does not provide a http implementation for asyncio. I wasn't clear how to proceed from here further (aiohttp dependency or minimal own http implementation). The other idea is to just use a ready made apt caching proxy like apt- cache-ng. But here I am unsure if its flexible enough to use in our case. Starting it multiple times in parallel with different ports for different caches and only user privileges might be possible but I suspect that seperating the pool and the dists folder (pool should go to DL_DIR while dists is part of the TMP_DIR) could be more difficult. > Maybe I missed something on the proxy suggestion.. Could you > please elaborate on this? As for the integration the basic idea was that for taged bitbake tasks the proxy is started and sets the *_PROXY environment variables. This should be doable with some mods to the base.bbclass and some external python scripts. > > > > I do have other ideas to do this, but that would restructure most > > of isar. > > Well, at least speaking for myself, I'd like to hear those as I > consider > this feature to be essential. Choice in solutions is always good :) > One idea that I got when I first investigated isar, was trying to be oe compatible as much as possible. So using this idea would solve the reproducable builds as well: Basically implementing debootstrap with bitbake recipes that are created virtually on runtime by downloading and parsing the 'dists/*/*/*/Packages.gz' file. I suppose it should be possible to fetch the Packages file at an early parsing step in a bitbake build, if its not already preset, and fill the bitbake data store with recipe definitions that fetch those binary deb packages, have the appropriate dependencies and install them into the root file system. However, this idea is still in the brain storming phase. Since that would involve a very big redesign I don't think its feasible currently. Cheers, Claudius -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de PGP key: 6FF2 E59F 00C6 BC28 31D8 64C1 1173 CB19 9808 B153 Keyserver: hkp://pool.sks-keyservers.net [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-11-17 18:14 ` Claudius Heine @ 2017-11-20 8:33 ` [ext] Christian Storm 2017-11-20 9:16 ` Claudius Heine 0 siblings, 1 reply; 22+ messages in thread From: [ext] Christian Storm @ 2017-11-20 8:33 UTC (permalink / raw) To: isar-users > > [...] > > > With my suggestion of using a caching proxy, this could be solved > > > without any additional overhead. > > > > Could be the case, what are the drawbacks? > > More complexity and stuff to implement. Also maybe download speed. > > > What proxy do you propose to use? > > I was at first going with my own standalone proxy implementation in > pure stdlib python, so that it could be completely integrated into > isar. Why not hooking this into the fetcher(s) so that it's integrated rather than a standalone thing? As a bonus, you'll have full control on this from the Isar core/code. I think the main invention here is the code that does the consistent version/epoch guarantee anyway... > I had a very simple solution ready rather quickly, but it was > only synchronous and as such could only handle one connection at a > time. Instead of just throwing more threads at it, I wanted to go the > asyncio route. Sadly the python stdlib does not provide a http > implementation for asyncio. I wasn't clear how to proceed from here > further (aiohttp dependency or minimal own http implementation). Ah, OK. Wouldn't this account for premature optimization? :) > The other idea is to just use a ready made apt caching proxy like apt- > cache-ng. But here I am unsure if its flexible enough to use in our > case. Starting it multiple times in parallel with different ports for > different caches and only user privileges might be possible but I > suspect that seperating the pool and the dists folder (pool should go > to DL_DIR while dists is part of the TMP_DIR) could be more difficult. I would consider on the bonus side for this that we don't have to develop/maintain a custom solution, given that it suits our purposes of course... > > Maybe I missed something on the proxy suggestion.. Could you > > please elaborate on this? > > As for the integration the basic idea was that for taged bitbake tasks > the proxy is started and sets the *_PROXY environment variables. This > should be doable with some mods to the base.bbclass and some external > python scripts. > > > > > > > > I do have other ideas to do this, but that would restructure most > > > of isar. > > > > Well, at least speaking for myself, I'd like to hear those as I > > consider > > this feature to be essential. Choice in solutions is always good :) > > > > One idea that I got when I first investigated isar, was trying to be oe > compatible as much as possible. So using this idea would solve the > reproducable builds as well: > > Basically implementing debootstrap with bitbake recipes that are > created virtually on runtime by downloading and parsing the > 'dists/*/*/*/Packages.gz' file. Those virtual recipes then will have to be serialized as they contain the version number of the package, right? > I suppose it should be possible to fetch the Packages file at an early > parsing step in a bitbake build, if its not already preset, and fill > the bitbake data store with recipe definitions that fetch those binary > deb packages, have the appropriate dependencies and install them into > the root file system. Yes, or do a 'download-only' step prior to building as it's available on Yocto. > However, this idea is still in the brain storming phase. > > Since that would involve a very big redesign I don't think its feasible > currently. Sounds interesting, at least for me... Kind regards, Christian -- Dr. Christian Storm Siemens AG, Corporate Technology, CT RDA ITP SES-DE Otto-Hahn-Ring 6, 81739 M�nchen, Germany ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-11-20 8:33 ` [ext] Christian Storm @ 2017-11-20 9:16 ` Claudius Heine 2017-11-29 18:53 ` Alexander Smirnov 0 siblings, 1 reply; 22+ messages in thread From: Claudius Heine @ 2017-11-20 9:16 UTC (permalink / raw) To: isar-users [-- Attachment #1.1: Type: text/plain, Size: 5778 bytes --] Hi Christian, On 20.11.2017 09:33, [ext] Christian Storm wrote: >>> [...] >>>> With my suggestion of using a caching proxy, this could be solved >>>> without any additional overhead. >>> >>> Could be the case, what are the drawbacks? >> >> More complexity and stuff to implement. Also maybe download speed. >> >>> What proxy do you propose to use? >> >> I was at first going with my own standalone proxy implementation in >> pure stdlib python, so that it could be completely integrated into >> isar. > > Why not hooking this into the fetcher(s) so that it's integrated rather > than a standalone thing? The bitbake fetcher is not the only step that downloads stuff in isar. There is also multistrap and possible 'apt-get install' calls within a chroot environment. I was going to integrate it into isar at some point, but first I wanted to have a working proof of concept without bitbake in between to be easily testable. Then integrate it tightly into isar later. > As a bonus, you'll have full control on this > from the Isar core/code. I think the main invention here is the code > that does the consistent version/epoch guarantee anyway... Hmm... My hope is that this will be solved by itself, by splitting 'dists' and 'pool'. > > >> I had a very simple solution ready rather quickly, but it was >> only synchronous and as such could only handle one connection at a >> time. Instead of just throwing more threads at it, I wanted to go the >> asyncio route. Sadly the python stdlib does not provide a http >> implementation for asyncio. I wasn't clear how to proceed from here >> further (aiohttp dependency or minimal own http implementation). > > Ah, OK. Wouldn't this account for premature optimization? :) Handling more than one connection in parallel should be possible IMO. Going from one to two is harder then from two to n (n>2). So I was lucky, in a sense, to discover at that early point in implementation that this is harder to do than expected. >> The other idea is to just use a ready made apt caching proxy like apt- >> cache-ng. But here I am unsure if its flexible enough to use in our >> case. Starting it multiple times in parallel with different ports for >> different caches and only user privileges might be possible but I >> suspect that seperating the pool and the dists folder (pool should go >> to DL_DIR while dists is part of the TMP_DIR) could be more difficult. > > I would consider on the bonus side for this that we don't have to > develop/maintain a custom solution, given that it suits our purposes of > course... Agree. But if it only 'sort of' suits our purpose, we might need to write wrapper code around its short comings and maintain that. >>> Maybe I missed something on the proxy suggestion.. Could you >>> please elaborate on this? >> >> As for the integration the basic idea was that for taged bitbake tasks >> the proxy is started and sets the *_PROXY environment variables. This >> should be doable with some mods to the base.bbclass and some external >> python scripts. >> >>> >>> >>>> I do have other ideas to do this, but that would restructure most >>>> of isar. >>> >>> Well, at least speaking for myself, I'd like to hear those as I >>> consider >>> this feature to be essential. Choice in solutions is always good :) >>> >> >> One idea that I got when I first investigated isar, was trying to be oe >> compatible as much as possible. So using this idea would solve the >> reproducable builds as well: >> >> Basically implementing debootstrap with bitbake recipes that are >> created virtually on runtime by downloading and parsing the >> 'dists/*/*/*/Packages.gz' file. > > Those virtual recipes then will have to be serialized as they contain > the version number of the package, right? I'm not sure if I understand your point correctly. I don't think the recipes needs to be written down as a file somewhere. We might have to take a look at the parsing part of bitbake, were the recipe data store is filled. So were the deserialization happens from '*.bb' to entry in ds. Here we just take one or more Debian package lists with some additional information, like the repo url and fill the ds with generated recipes. >> I suppose it should be possible to fetch the Packages file at an early >> parsing step in a bitbake build, if its not already preset, and fill >> the bitbake data store with recipe definitions that fetch those binary >> deb packages, have the appropriate dependencies and install them into >> the root file system. > > Yes, or do a 'download-only' step prior to building as it's available > on Yocto. Not sure if that is possible. Task execution is done after all those recipes are parsed and dependencies are resolved. To add virtual packages ourselves we need to do that before any task is triggered. So fetching the 'Packages.gz' file needs to be very early outside of what recipes normally do. I suspect that this is possible by using bitbake event handlers [1]. >> However, this idea is still in the brain storming phase. >> >> Since that would involve a very big redesign I don't think its feasible >> currently. > > Sounds interesting, at least for me... Thanks. Claudius [1] https://www.yoctoproject.org/docs/latest/bitbake-user-manual/bitbake-user-manual.html#events -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de PGP key: 6FF2 E59F 00C6 BC28 31D8 64C1 1173 CB19 9808 B153 Keyserver: hkp://pool.sks-keyservers.net [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-11-20 9:16 ` Claudius Heine @ 2017-11-29 18:53 ` Alexander Smirnov 2017-11-29 19:02 ` Jan Kiszka 2017-11-30 9:31 ` Claudius Heine 0 siblings, 2 replies; 22+ messages in thread From: Alexander Smirnov @ 2017-11-29 18:53 UTC (permalink / raw) To: isar-users Hi everybody, I've started working on this topic and here I'd like to share my vision. At the moment I've implemented simple PoC in my branch 'asmirnov/build_rep'. What it does: 1. There is new recipe: base-apt. It provides task which: - Fetches packages from origin Debian apt to local folder using deboostrap. - Put these packages via 'reprepro' to local repository called 'base-apt'. 2. Buildchroot uses 'base-apt' to generate rootfs. 3. Isar image uses 'base-apt' and 'isar' repos to generate rootfs. What are the key benefits of this approach: 1. Download session for upstream packages is performed in a single step. 2. You could use your local 'versioned' apt repository instead of downloading origin packages. 3. Having local apt repository managed by 'reprepro' provides us possibility to implement version pinning. Reprepro provides lots of things like: - Get package name. - Get package version. - Remove specific package from repo. - Add single package to repo. So in general, if we have know which package version we want to have, we need to get binary with this version and put it to 'base-apt'. Which issues I see at the moment: 1. The key issue for me the list of packages for 'base-apt'. So before 'base-apt' task is executed, we should prepare full list of packages that will be used by: - buildchroot (BUILDCHROOT_PREINSTALL). - packages to build (their build deps). - image (IMAGE_PREINSTALL). So I have an idea how to implement this via special tasks, will push patch for RFC, but if you have your own proposals, I'll be happy to discuss them! Alex ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-11-29 18:53 ` Alexander Smirnov @ 2017-11-29 19:02 ` Jan Kiszka 2017-11-30 8:04 ` Alexander Smirnov 2017-11-30 9:31 ` Claudius Heine 1 sibling, 1 reply; 22+ messages in thread From: Jan Kiszka @ 2017-11-29 19:02 UTC (permalink / raw) To: Alexander Smirnov, isar-users On 2017-11-29 19:53, Alexander Smirnov wrote: > Hi everybody, > > I've started working on this topic and here I'd like to share my vision. > At the moment I've implemented simple PoC in my branch > 'asmirnov/build_rep'. > > What it does: > > 1. There is new recipe: base-apt. It provides task which: > > - Fetches packages from origin Debian apt to local folder using > deboostrap. > - Put these packages via 'reprepro' to local repository called 'base-apt'. > > 2. Buildchroot uses 'base-apt' to generate rootfs. > > 3. Isar image uses 'base-apt' and 'isar' repos to generate rootfs. > > > > What are the key benefits of this approach: > > 1. Download session for upstream packages is performed in a single step. > > 2. You could use your local 'versioned' apt repository instead of > downloading origin packages. > > 3. Having local apt repository managed by 'reprepro' provides us > possibility to implement version pinning. Reprepro provides lots of > things like: > - Get package name. > - Get package version. > - Remove specific package from repo. > - Add single package to repo. > > So in general, if we have know which package version we want to have, we > need to get binary with this version and put it to 'base-apt'. > But this encodes the versions of the packages to be used implicitly into their unique presence inside some local apt repo, no? I would prefer a solution that stores the packages list with versions as well and only uses that list, when provided, independent of the repo content. That way we can throw all downloaded packages back into a single archive repo. Have one repo per project version will quickly explode storage-wise (or you need extra deduplication mechanisms). That said, I'm fine with getting there in several steps, and this can be a valid first one. Jan > > > Which issues I see at the moment: > > 1. The key issue for me the list of packages for 'base-apt'. So before > 'base-apt' task is executed, we should prepare full list of packages > that will be used by: > - buildchroot (BUILDCHROOT_PREINSTALL). > - packages to build (their build deps). > - image (IMAGE_PREINSTALL). > > So I have an idea how to implement this via special tasks, will push > patch for RFC, but if you have your own proposals, I'll be happy to > discuss them! > > Alex > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-11-29 19:02 ` Jan Kiszka @ 2017-11-30 8:04 ` Alexander Smirnov 2017-11-30 14:48 ` Jan Kiszka 0 siblings, 1 reply; 22+ messages in thread From: Alexander Smirnov @ 2017-11-30 8:04 UTC (permalink / raw) To: Jan Kiszka, isar-users Hi Jan, On 11/29/2017 10:02 PM, Jan Kiszka wrote: > On 2017-11-29 19:53, Alexander Smirnov wrote: >> Hi everybody, >> >> I've started working on this topic and here I'd like to share my vision. >> At the moment I've implemented simple PoC in my branch >> 'asmirnov/build_rep'. >> >> What it does: >> >> 1. There is new recipe: base-apt. It provides task which: >> >> - Fetches packages from origin Debian apt to local folder using >> deboostrap. >> - Put these packages via 'reprepro' to local repository called 'base-apt'. >> >> 2. Buildchroot uses 'base-apt' to generate rootfs. >> >> 3. Isar image uses 'base-apt' and 'isar' repos to generate rootfs. >> >> >> >> What are the key benefits of this approach: >> >> 1. Download session for upstream packages is performed in a single step. >> >> 2. You could use your local 'versioned' apt repository instead of >> downloading origin packages. >> >> 3. Having local apt repository managed by 'reprepro' provides us >> possibility to implement version pinning. Reprepro provides lots of >> things like: >> - Get package name. >> - Get package version. >> - Remove specific package from repo. >> - Add single package to repo. >> >> So in general, if we have know which package version we want to have, we >> need to get binary with this version and put it to 'base-apt'. >> > > But this encodes the versions of the packages to be used implicitly into > their unique presence inside some local apt repo, no? > > I would prefer a solution that stores the packages list with versions as > well and only uses that list, when provided, independent of the repo > content. That way we can throw all downloaded packages back into a > single archive repo. Have one repo per project version will quickly > explode storage-wise (or you need extra deduplication mechanisms). > > That said, I'm fine with getting there in several steps, and this can be > a valid first one. > I got it. Here I only mean that there are some tools that could help us in implementing specific logic. At the moment I don't have the final vision, but hope it will appear during experiments with this PoC. My main wish is to avoid manual hacks with Debian artifacts and use generic tools as much as possible. Alex > Jan > >> >> >> Which issues I see at the moment: >> >> 1. The key issue for me the list of packages for 'base-apt'. So before >> 'base-apt' task is executed, we should prepare full list of packages >> that will be used by: >> - buildchroot (BUILDCHROOT_PREINSTALL). >> - packages to build (their build deps). >> - image (IMAGE_PREINSTALL). >> >> So I have an idea how to implement this via special tasks, will push >> patch for RFC, but if you have your own proposals, I'll be happy to >> discuss them! >> >> Alex >> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-11-30 8:04 ` Alexander Smirnov @ 2017-11-30 14:48 ` Jan Kiszka 0 siblings, 0 replies; 22+ messages in thread From: Jan Kiszka @ 2017-11-30 14:48 UTC (permalink / raw) To: Alexander Smirnov, isar-users On 2017-11-30 09:04, Alexander Smirnov wrote: > Hi Jan, > > On 11/29/2017 10:02 PM, Jan Kiszka wrote: >> On 2017-11-29 19:53, Alexander Smirnov wrote: >>> Hi everybody, >>> >>> I've started working on this topic and here I'd like to share my vision. >>> At the moment I've implemented simple PoC in my branch >>> 'asmirnov/build_rep'. >>> >>> What it does: >>> >>> 1. There is new recipe: base-apt. It provides task which: >>> >>> - Fetches packages from origin Debian apt to local folder using >>> deboostrap. >>> - Put these packages via 'reprepro' to local repository called >>> 'base-apt'. >>> >>> 2. Buildchroot uses 'base-apt' to generate rootfs. >>> >>> 3. Isar image uses 'base-apt' and 'isar' repos to generate rootfs. >>> >>> >>> >>> What are the key benefits of this approach: >>> >>> 1. Download session for upstream packages is performed in a single step. >>> >>> 2. You could use your local 'versioned' apt repository instead of >>> downloading origin packages. >>> >>> 3. Having local apt repository managed by 'reprepro' provides us >>> possibility to implement version pinning. Reprepro provides lots of >>> things like: >>> - Get package name. >>> - Get package version. >>> - Remove specific package from repo. >>> - Add single package to repo. >>> >>> So in general, if we have know which package version we want to have, we >>> need to get binary with this version and put it to 'base-apt'. >>> >> >> But this encodes the versions of the packages to be used implicitly into >> their unique presence inside some local apt repo, no? >> >> I would prefer a solution that stores the packages list with versions as >> well and only uses that list, when provided, independent of the repo >> content. That way we can throw all downloaded packages back into a >> single archive repo. Have one repo per project version will quickly >> explode storage-wise (or you need extra deduplication mechanisms). >> >> That said, I'm fine with getting there in several steps, and this can be >> a valid first one. >> > > I got it. > > Here I only mean that there are some tools that could help us in > implementing specific logic. At the moment I don't have the final > vision, but hope it will appear during experiments with this PoC. > > My main wish is to avoid manual hacks with Debian artifacts and use > generic tools as much as possible. > I do agree. Jan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-11-29 18:53 ` Alexander Smirnov 2017-11-29 19:02 ` Jan Kiszka @ 2017-11-30 9:31 ` Claudius Heine 2017-12-06 16:21 ` Alexander Smirnov 1 sibling, 1 reply; 22+ messages in thread From: Claudius Heine @ 2017-11-30 9:31 UTC (permalink / raw) To: Alexander Smirnov, isar-users [-- Attachment #1.1: Type: text/plain, Size: 2856 bytes --] Hi Alex, On 11/29/2017 07:53 PM, Alexander Smirnov wrote: > Hi everybody, > > I've started working on this topic and here I'd like to share my vision. > At the moment I've implemented simple PoC in my branch > 'asmirnov/build_rep'. > > What it does: > > 1. There is new recipe: base-apt. It provides task which: > > - Fetches packages from origin Debian apt to local folder using > deboostrap. > - Put these packages via 'reprepro' to local repository called 'base-apt'. > > 2. Buildchroot uses 'base-apt' to generate rootfs. > > 3. Isar image uses 'base-apt' and 'isar' repos to generate rootfs. > > > > What are the key benefits of this approach: > > 1. Download session for upstream packages is performed in a single step. > > 2. You could use your local 'versioned' apt repository instead of > downloading origin packages. > > 3. Having local apt repository managed by 'reprepro' provides us > possibility to implement version pinning. Reprepro provides lots of > things like: > - Get package name. > - Get package version. > - Remove specific package from repo. > - Add single package to repo. > > So in general, if we have know which package version we want to have, we > need to get binary with this version and put it to 'base-apt'. > > > > Which issues I see at the moment: > > 1. The key issue for me the list of packages for 'base-apt'. So before > 'base-apt' task is executed, we should prepare full list of packages > that will be used by: > - buildchroot (BUILDCHROOT_PREINSTALL). > - packages to build (their build deps). > - image (IMAGE_PREINSTALL). Maybe try to do this flexible, because it should be also possible for example to generate lxc images that are deployed to the final target in the same isar run. Also as Jan said, deduplication of packages so maybe try to fetch those packages in the DL_DIR first, so that rebuilding is possible without internet access, no tmp_dir, a populated DL_DIR and a package+version list of sorts. Then have a task that copies those packages into the tmp_dir into a repo within the tmp_dir and install from there. This way the DL_DIR would only contain one instance of every packages and the repo in the tmp_dir only have a copy (or maybe even just a symlink). Archiving the DL_DIR would in this case be enough to build different sets of images. If you can solve this, than this solution looks promising. Cheers, Claudius -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de PGP key: 6FF2 E59F 00C6 BC28 31D8 64C1 1173 CB19 9808 B153 Keyserver: hkp://pool.sks-keyservers.net [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds 2017-11-30 9:31 ` Claudius Heine @ 2017-12-06 16:21 ` Alexander Smirnov 0 siblings, 0 replies; 22+ messages in thread From: Alexander Smirnov @ 2017-12-06 16:21 UTC (permalink / raw) To: Storm, Christian; +Cc: isar-users Hi Christian, [...] I've pushed my branch 'build_rep' which does the following: 1. Prepare list of packages being used by buildchroot, images and packages to be built; 2. Create local apt with the packages listed above. 3. Generate buildchroot and image from local apt. So it uses only one 'downloading' session. Could you please test this branch with your custom meta? Alex ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2017-12-06 16:21 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-03 8:13 Reproducibility of builds Claudius Heine 2017-08-21 11:23 ` Claudius Heine 2017-08-28 11:27 ` Claudius Heine 2017-09-05 10:05 ` Alexander Smirnov 2017-09-05 10:38 ` Jan Kiszka 2017-09-05 11:50 ` Alexander Smirnov 2017-09-05 11:54 ` Claudius Heine 2017-09-06 13:39 ` Claudius Heine 2017-09-18 15:05 ` Baurzhan Ismagulov 2017-09-19 8:55 ` Claudius Heine 2017-11-14 16:04 ` Christian Storm 2017-11-14 16:22 ` Claudius Heine 2017-11-17 16:53 ` [ext] Christian Storm 2017-11-17 18:14 ` Claudius Heine 2017-11-20 8:33 ` [ext] Christian Storm 2017-11-20 9:16 ` Claudius Heine 2017-11-29 18:53 ` Alexander Smirnov 2017-11-29 19:02 ` Jan Kiszka 2017-11-30 8:04 ` Alexander Smirnov 2017-11-30 14:48 ` Jan Kiszka 2017-11-30 9:31 ` Claudius Heine 2017-12-06 16:21 ` Alexander Smirnov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox