* Reproducibility of builds
@ 2017-08-03 8:13 Claudius Heine
2017-08-21 11:23 ` Claudius Heine
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Claudius Heine @ 2017-08-03 8:13 UTC (permalink / raw)
To: isar-users
Hi,
am I right that Isar supports or should support reproducible root file
system build?
If I understand correctly, when multistrap is called, it fetches always
the latest version of all packages from the debian repository mirrors.
Am I mistaken or is this feature still on the roadmap?
I that is on the roadmap, how are you thinking of solving this issue?
The openembedded way would be to seperate the fetch and 'install' step
and first download all packages into the DL_DIR and then use them from
there. Maybe we could create this pipeline:
dpkg-binary Recipe:
fetch deb file into downloads -> insert into local repository
dpkg-source Recipe:
fetch sources into downloads -> build packages -> insert into local
repository
image Recipe:
fetch all required packages into downloads -> insert all of them into
the local repository -> create root fs using only the local repository
Multistrap provides a '--source-dir DIR' parameter, that stores all
installed packages into a directory. So if we would use that as a
fetcher, then we would create a temporary rootfs just to get all
required packages for the project.
Are there other possible solutions for this?
Cheers,
Claudius
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-08-03 8:13 Reproducibility of builds Claudius Heine
@ 2017-08-21 11:23 ` Claudius Heine
2017-08-28 11:27 ` Claudius Heine
2017-09-18 15:05 ` Baurzhan Ismagulov
2017-11-14 16:04 ` Christian Storm
2 siblings, 1 reply; 22+ messages in thread
From: Claudius Heine @ 2017-08-21 11:23 UTC (permalink / raw)
To: isar-users; +Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild
Hi,
On 08/03/2017 10:13 AM, Claudius Heine wrote:
> Hi,
>
> am I right that Isar supports or should support reproducible root file
> system build?
>
> If I understand correctly, when multistrap is called, it fetches always
> the latest version of all packages from the debian repository mirrors.
> Am I mistaken or is this feature still on the roadmap?
>
> I that is on the roadmap, how are you thinking of solving this issue?
>
> The openembedded way would be to seperate the fetch and 'install' step
> and first download all packages into the DL_DIR and then use them from
> there. Maybe we could create this pipeline:
>
> dpkg-binary Recipe:
>
> fetch deb file into downloads -> insert into local repository
>
> dpkg-source Recipe:
>
> fetch sources into downloads -> build packages -> insert into local
> repository
>
> image Recipe:
>
> fetch all required packages into downloads -> insert all of them into
> the local repository -> create root fs using only the local repository
>
> Multistrap provides a '--source-dir DIR' parameter, that stores all
> installed packages into a directory. So if we would use that as a
> fetcher, then we would create a temporary rootfs just to get all
> required packages for the project.
>
> Are there other possible solutions for this?
The problem with this solution is that its not possible to create
multiple images with different sets of packages that share the version
of the all the common packages.
An alternative solution is to employ a repository cacher that caches the
'Packages.gz' of the first request. This way it would also be faster
then running multistrap one additional time just to fetch all required
packages.
Maybe apt-cacher-ng or something similar can be used for this.
However I am currently not sure how this can be integrated into the
current build process. Some ideas? Maybe implementing a simple repo
caching proxy that is integrated into isar?
The repository cacher is likely a daemon running in parallel to
multistrap and fetches everything to the DL_DIR that is requested by it.
Maybe provide a 'clean_package_cache' task, that deletes the cached
'Packages.gz', causing the next root fs build to use new package versions.
I would really like to hear some feedback on this.
Cheers,
Claudius
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-08-21 11:23 ` Claudius Heine
@ 2017-08-28 11:27 ` Claudius Heine
2017-09-05 10:05 ` Alexander Smirnov
0 siblings, 1 reply; 22+ messages in thread
From: Claudius Heine @ 2017-08-28 11:27 UTC (permalink / raw)
To: isar-users; +Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild
Hi,
On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote:
> Hi,
>
> On 08/03/2017 10:13 AM, Claudius Heine wrote:
>> Hi,
>>
>> am I right that Isar supports or should support reproducible root file
>> system build?
>>
>> If I understand correctly, when multistrap is called, it fetches
>> always the latest version of all packages from the debian repository
>> mirrors. Am I mistaken or is this feature still on the roadmap?
>>
>> I that is on the roadmap, how are you thinking of solving this issue?
>>
>> The openembedded way would be to seperate the fetch and 'install' step
>> and first download all packages into the DL_DIR and then use them from
>> there. Maybe we could create this pipeline:
>>
>> dpkg-binary Recipe:
>>
>> fetch deb file into downloads -> insert into local repository
>>
>> dpkg-source Recipe:
>>
>> fetch sources into downloads -> build packages -> insert into local
>> repository
>>
>> image Recipe:
>>
>> fetch all required packages into downloads -> insert all of them into
>> the local repository -> create root fs using only the local repository
>>
>> Multistrap provides a '--source-dir DIR' parameter, that stores all
>> installed packages into a directory. So if we would use that as a
>> fetcher, then we would create a temporary rootfs just to get all
>> required packages for the project.
>>
>> Are there other possible solutions for this?
>
> The problem with this solution is that its not possible to create
> multiple images with different sets of packages that share the version
> of the all the common packages.
>
> An alternative solution is to employ a repository cacher that caches the
> 'Packages.gz' of the first request. This way it would also be faster
> then running multistrap one additional time just to fetch all required
> packages.
>
> Maybe apt-cacher-ng or something similar can be used for this.
> However I am currently not sure how this can be integrated into the
> current build process. Some ideas? Maybe implementing a simple repo
> caching proxy that is integrated into isar?
>
> The repository cacher is likely a daemon running in parallel to
> multistrap and fetches everything to the DL_DIR that is requested by it.
> Maybe provide a 'clean_package_cache' task, that deletes the cached
> 'Packages.gz', causing the next root fs build to use new package versions.
>
> I would really like to hear some feedback on this.
In our meeting today, it was discussed that we should collect all
requirements for this feature and discuss possible implementation ideas
based on those requirements.
Here are some requirements from my side:
1 If multiple different images with some common set of packages are
build with one bitbake call, then all images should contain
exactly the same version of every package that it has in common
with any of the other images.
2 The resulting image should only depend on the build environment
and isar metadata, not on the point in time it is build.
This means if the environment, including the downloads directory,
is complete (for instance by an earlier build of the image), every
following build of this image recipe should result in exactly the
same packages installed on this image.
3 Binary and source packages should be part of the archival process.
Source packages are useful in case some package needs to be
patched at a later date. Binary packages are useful, because
building them from source packages is currently not 100%
reproducible in Debian upstream. [1]
4 For development, it should be possible to easily reset the
environment, triggering an upgrade of the packages on the next
image build.
5 Deployable in CI environments. What those are exactly should be
further discussed. Here are some:
5.1 Possibility to use a download cache, that is not bound to only
one product/image/environment
5.2 More than one build at the same time in one environment should
be possible
6 Efficiency: The reproducibility feature should be time and
resource efficient as possible. E.g. Process should only fetch and
store the required files.
7 Outputs a description file with the name and version of every
package deployed/used in the image/environment.
To 5: Since I don't have much experience with CI systems, requirements
mentioned here might not be correct.
Any comment or requirement additions are welcome.
Cheers,
Claudius
[1] https://tests.reproducible-builds.org/debian/reproducible.html
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-08-28 11:27 ` Claudius Heine
@ 2017-09-05 10:05 ` Alexander Smirnov
2017-09-05 10:38 ` Jan Kiszka
2017-09-05 11:54 ` Claudius Heine
0 siblings, 2 replies; 22+ messages in thread
From: Alexander Smirnov @ 2017-09-05 10:05 UTC (permalink / raw)
To: Claudius Heine, isar-users
Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild
On 08/28/2017 02:27 PM, Claudius Heine wrote:
> Hi,
>
> On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote:
>> Hi,
>>
>> On 08/03/2017 10:13 AM, Claudius Heine wrote:
>>> Hi,
>>>
>>> am I right that Isar supports or should support reproducible root
>>> file system build?
>>>
>>> If I understand correctly, when multistrap is called, it fetches
>>> always the latest version of all packages from the debian repository
>>> mirrors. Am I mistaken or is this feature still on the roadmap?
>>>
>>> I that is on the roadmap, how are you thinking of solving this issue?
>>>
>>> The openembedded way would be to seperate the fetch and 'install'
>>> step and first download all packages into the DL_DIR and then use
>>> them from there. Maybe we could create this pipeline:
>>>
>>> dpkg-binary Recipe:
>>>
>>> fetch deb file into downloads -> insert into local repository
>>>
>>> dpkg-source Recipe:
>>>
>>> fetch sources into downloads -> build packages -> insert into local
>>> repository
>>>
>>> image Recipe:
>>>
>>> fetch all required packages into downloads -> insert all of them into
>>> the local repository -> create root fs using only the local repository
>>>
>>> Multistrap provides a '--source-dir DIR' parameter, that stores all
>>> installed packages into a directory. So if we would use that as a
>>> fetcher, then we would create a temporary rootfs just to get all
>>> required packages for the project.
>>>
>>> Are there other possible solutions for this?
>>
>> The problem with this solution is that its not possible to create
>> multiple images with different sets of packages that share the version
>> of the all the common packages.
>>
>> An alternative solution is to employ a repository cacher that caches
>> the 'Packages.gz' of the first request. This way it would also be
>> faster then running multistrap one additional time just to fetch all
>> required packages.
>>
>> Maybe apt-cacher-ng or something similar can be used for this.
>> However I am currently not sure how this can be integrated into the
>> current build process. Some ideas? Maybe implementing a simple repo
>> caching proxy that is integrated into isar?
>>
>> The repository cacher is likely a daemon running in parallel to
>> multistrap and fetches everything to the DL_DIR that is requested by
>> it. Maybe provide a 'clean_package_cache' task, that deletes the
>> cached 'Packages.gz', causing the next root fs build to use new
>> package versions.
>>
>> I would really like to hear some feedback on this.
>
> In our meeting today, it was discussed that we should collect all
> requirements for this feature and discuss possible implementation ideas
> based on those requirements.
>
> Here are some requirements from my side:
>
> 1 If multiple different images with some common set of packages are
> build with one bitbake call, then all images should contain
> exactly the same version of every package that it has in common
> with any of the other images.
>
> 2 The resulting image should only depend on the build environment
> and isar metadata, not on the point in time it is build.
> This means if the environment, including the downloads directory,
> is complete (for instance by an earlier build of the image), every
> following build of this image recipe should result in exactly the
> same packages installed on this image.
>
> 3 Binary and source packages should be part of the archival process.
> Source packages are useful in case some package needs to be
> patched at a later date. Binary packages are useful, because
> building them from source packages is currently not 100%
> reproducible in Debian upstream. [1]
>
> 4 For development, it should be possible to easily reset the
> environment, triggering an upgrade of the packages on the next
> image build.
>
> 5 Deployable in CI environments. What those are exactly should be
> further discussed. Here are some:
>
> 5.1 Possibility to use a download cache, that is not bound to only
> one product/image/environment
>
> 5.2 More than one build at the same time in one environment should
> be possible
>
> 6 Efficiency: The reproducibility feature should be time and
> resource efficient as possible. E.g. Process should only fetch and
> store the required files.
>
> 7 Outputs a description file with the name and version of every
> package deployed/used in the image/environment.
>
> To 5: Since I don't have much experience with CI systems, requirements
> mentioned here might not be correct.
>
> Any comment or requirement additions are welcome.
Thank you for the requirements, they quite good describe your usecase.
Unfortunately, ATM I don't know all the capabilities of
multistrap/debootstrap, so could not propose too much.
In general, I think there could be following solutions:
- Create local apt cache with specified packages versions.
- Patch multistrap to add capabilities to specify package versions.
- Add hook to multistrap hooks (for example, in configscript.sh), that
will re-install desired package versions via apt-get.
--
With best regards,
Alexander Smirnov
ilbers GmbH
Baierbrunner Str. 28c
D-81379 Munich
+49 (89) 122 67 24-0
http://ilbers.de/
Commercial register Munich, HRB 214197
General manager: Baurzhan Ismagulov
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-09-05 10:05 ` Alexander Smirnov
@ 2017-09-05 10:38 ` Jan Kiszka
2017-09-05 11:50 ` Alexander Smirnov
2017-09-05 11:54 ` Claudius Heine
1 sibling, 1 reply; 22+ messages in thread
From: Jan Kiszka @ 2017-09-05 10:38 UTC (permalink / raw)
To: Alexander Smirnov, Claudius Heine, isar-users
Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild
On 2017-09-05 12:05, Alexander Smirnov wrote:
>
>
> On 08/28/2017 02:27 PM, Claudius Heine wrote:
>> Hi,
>>
>> On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote:
>>> Hi,
>>>
>>> On 08/03/2017 10:13 AM, Claudius Heine wrote:
>>>> Hi,
>>>>
>>>> am I right that Isar supports or should support reproducible root
>>>> file system build?
>>>>
>>>> If I understand correctly, when multistrap is called, it fetches
>>>> always the latest version of all packages from the debian repository
>>>> mirrors. Am I mistaken or is this feature still on the roadmap?
>>>>
>>>> I that is on the roadmap, how are you thinking of solving this issue?
>>>>
>>>> The openembedded way would be to seperate the fetch and 'install'
>>>> step and first download all packages into the DL_DIR and then use
>>>> them from there. Maybe we could create this pipeline:
>>>>
>>>> dpkg-binary Recipe:
>>>>
>>>> fetch deb file into downloads -> insert into local repository
>>>>
>>>> dpkg-source Recipe:
>>>>
>>>> fetch sources into downloads -> build packages -> insert into local
>>>> repository
>>>>
>>>> image Recipe:
>>>>
>>>> fetch all required packages into downloads -> insert all of them
>>>> into the local repository -> create root fs using only the local
>>>> repository
>>>>
>>>> Multistrap provides a '--source-dir DIR' parameter, that stores all
>>>> installed packages into a directory. So if we would use that as a
>>>> fetcher, then we would create a temporary rootfs just to get all
>>>> required packages for the project.
>>>>
>>>> Are there other possible solutions for this?
>>>
>>> The problem with this solution is that its not possible to create
>>> multiple images with different sets of packages that share the
>>> version of the all the common packages.
>>>
>>> An alternative solution is to employ a repository cacher that caches
>>> the 'Packages.gz' of the first request. This way it would also be
>>> faster then running multistrap one additional time just to fetch all
>>> required packages.
>>>
>>> Maybe apt-cacher-ng or something similar can be used for this.
>>> However I am currently not sure how this can be integrated into the
>>> current build process. Some ideas? Maybe implementing a simple repo
>>> caching proxy that is integrated into isar?
>>>
>>> The repository cacher is likely a daemon running in parallel to
>>> multistrap and fetches everything to the DL_DIR that is requested by
>>> it. Maybe provide a 'clean_package_cache' task, that deletes the
>>> cached 'Packages.gz', causing the next root fs build to use new
>>> package versions.
>>>
>>> I would really like to hear some feedback on this.
>>
>> In our meeting today, it was discussed that we should collect all
>> requirements for this feature and discuss possible implementation
>> ideas based on those requirements.
>>
>> Here are some requirements from my side:
>>
>> 1 If multiple different images with some common set of packages are
>> build with one bitbake call, then all images should contain
>> exactly the same version of every package that it has in common
>> with any of the other images.
>>
>> 2 The resulting image should only depend on the build environment
>> and isar metadata, not on the point in time it is build.
>> This means if the environment, including the downloads directory,
>> is complete (for instance by an earlier build of the image), every
>> following build of this image recipe should result in exactly the
>> same packages installed on this image.
>>
>> 3 Binary and source packages should be part of the archival process.
>> Source packages are useful in case some package needs to be
>> patched at a later date. Binary packages are useful, because
>> building them from source packages is currently not 100%
>> reproducible in Debian upstream. [1]
>>
>> 4 For development, it should be possible to easily reset the
>> environment, triggering an upgrade of the packages on the next
>> image build.
>>
>> 5 Deployable in CI environments. What those are exactly should be
>> further discussed. Here are some:
>>
>> 5.1 Possibility to use a download cache, that is not bound to only
>> one product/image/environment
>>
>> 5.2 More than one build at the same time in one environment should
>> be possible
>>
>> 6 Efficiency: The reproducibility feature should be time and
>> resource efficient as possible. E.g. Process should only fetch and
>> store the required files.
>>
>> 7 Outputs a description file with the name and version of every
>> package deployed/used in the image/environment.
>>
>> To 5: Since I don't have much experience with CI systems, requirements
>> mentioned here might not be correct.
>>
>> Any comment or requirement additions are welcome.
>
> Thank you for the requirements, they quite good describe your usecase.
> Unfortunately, ATM I don't know all the capabilities of
> multistrap/debootstrap, so could not propose too much.
Then I guess that needs to be explored further before we can decide
which patch to go. Who could contribute to this?
Jan
>
> In general, I think there could be following solutions:
>
> - Create local apt cache with specified packages versions.
> - Patch multistrap to add capabilities to specify package versions.
> - Add hook to multistrap hooks (for example, in configscript.sh), that
> will re-install desired package versions via apt-get.
>
--
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-09-05 10:38 ` Jan Kiszka
@ 2017-09-05 11:50 ` Alexander Smirnov
0 siblings, 0 replies; 22+ messages in thread
From: Alexander Smirnov @ 2017-09-05 11:50 UTC (permalink / raw)
To: Jan Kiszka, Claudius Heine, isar-users
Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild
On 09/05/2017 01:38 PM, Jan Kiszka wrote:
> On 2017-09-05 12:05, Alexander Smirnov wrote:
>>
>>
>> On 08/28/2017 02:27 PM, Claudius Heine wrote:
>>> Hi,
>>>
>>> On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote:
>>>> Hi,
>>>>
>>>> On 08/03/2017 10:13 AM, Claudius Heine wrote:
>>>>> Hi,
>>>>>
>>>>> am I right that Isar supports or should support reproducible root
>>>>> file system build?
>>>>>
>>>>> If I understand correctly, when multistrap is called, it fetches
>>>>> always the latest version of all packages from the debian repository
>>>>> mirrors. Am I mistaken or is this feature still on the roadmap?
>>>>>
>>>>> I that is on the roadmap, how are you thinking of solving this issue?
>>>>>
>>>>> The openembedded way would be to seperate the fetch and 'install'
>>>>> step and first download all packages into the DL_DIR and then use
>>>>> them from there. Maybe we could create this pipeline:
>>>>>
>>>>> dpkg-binary Recipe:
>>>>>
>>>>> fetch deb file into downloads -> insert into local repository
>>>>>
>>>>> dpkg-source Recipe:
>>>>>
>>>>> fetch sources into downloads -> build packages -> insert into local
>>>>> repository
>>>>>
>>>>> image Recipe:
>>>>>
>>>>> fetch all required packages into downloads -> insert all of them
>>>>> into the local repository -> create root fs using only the local
>>>>> repository
>>>>>
>>>>> Multistrap provides a '--source-dir DIR' parameter, that stores all
>>>>> installed packages into a directory. So if we would use that as a
>>>>> fetcher, then we would create a temporary rootfs just to get all
>>>>> required packages for the project.
>>>>>
>>>>> Are there other possible solutions for this?
>>>>
>>>> The problem with this solution is that its not possible to create
>>>> multiple images with different sets of packages that share the
>>>> version of the all the common packages.
>>>>
>>>> An alternative solution is to employ a repository cacher that caches
>>>> the 'Packages.gz' of the first request. This way it would also be
>>>> faster then running multistrap one additional time just to fetch all
>>>> required packages.
>>>>
>>>> Maybe apt-cacher-ng or something similar can be used for this.
>>>> However I am currently not sure how this can be integrated into the
>>>> current build process. Some ideas? Maybe implementing a simple repo
>>>> caching proxy that is integrated into isar?
>>>>
>>>> The repository cacher is likely a daemon running in parallel to
>>>> multistrap and fetches everything to the DL_DIR that is requested by
>>>> it. Maybe provide a 'clean_package_cache' task, that deletes the
>>>> cached 'Packages.gz', causing the next root fs build to use new
>>>> package versions.
>>>>
>>>> I would really like to hear some feedback on this.
>>>
>>> In our meeting today, it was discussed that we should collect all
>>> requirements for this feature and discuss possible implementation
>>> ideas based on those requirements.
>>>
>>> Here are some requirements from my side:
>>>
>>> 1 If multiple different images with some common set of packages are
>>> build with one bitbake call, then all images should contain
>>> exactly the same version of every package that it has in common
>>> with any of the other images.
>>>
>>> 2 The resulting image should only depend on the build environment
>>> and isar metadata, not on the point in time it is build.
>>> This means if the environment, including the downloads directory,
>>> is complete (for instance by an earlier build of the image), every
>>> following build of this image recipe should result in exactly the
>>> same packages installed on this image.
>>>
>>> 3 Binary and source packages should be part of the archival process.
>>> Source packages are useful in case some package needs to be
>>> patched at a later date. Binary packages are useful, because
>>> building them from source packages is currently not 100%
>>> reproducible in Debian upstream. [1]
>>>
>>> 4 For development, it should be possible to easily reset the
>>> environment, triggering an upgrade of the packages on the next
>>> image build.
>>>
>>> 5 Deployable in CI environments. What those are exactly should be
>>> further discussed. Here are some:
>>>
>>> 5.1 Possibility to use a download cache, that is not bound to only
>>> one product/image/environment
>>>
>>> 5.2 More than one build at the same time in one environment should
>>> be possible
>>>
>>> 6 Efficiency: The reproducibility feature should be time and
>>> resource efficient as possible. E.g. Process should only fetch and
>>> store the required files.
>>>
>>> 7 Outputs a description file with the name and version of every
>>> package deployed/used in the image/environment.
>>>
>>> To 5: Since I don't have much experience with CI systems, requirements
>>> mentioned here might not be correct.
>>>
>>> Any comment or requirement additions are welcome.
>>
>> Thank you for the requirements, they quite good describe your usecase.
>> Unfortunately, ATM I don't know all the capabilities of
>> multistrap/debootstrap, so could not propose too much.
>
> Then I guess that needs to be explored further before we can decide
> which patch to go. Who could contribute to this?
I think the one who will investigate the opportunities, also should
implement this. It's a complete feature, so I can't handle it right now
due to load with another features.
Alex
>
>>
>> In general, I think there could be following solutions:
>>
>> - Create local apt cache with specified packages versions.
>> - Patch multistrap to add capabilities to specify package versions.
>> - Add hook to multistrap hooks (for example, in configscript.sh), that
>> will re-install desired package versions via apt-get.
>>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-09-05 10:05 ` Alexander Smirnov
2017-09-05 10:38 ` Jan Kiszka
@ 2017-09-05 11:54 ` Claudius Heine
2017-09-06 13:39 ` Claudius Heine
1 sibling, 1 reply; 22+ messages in thread
From: Claudius Heine @ 2017-09-05 11:54 UTC (permalink / raw)
To: Alexander Smirnov, isar-users
Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild
Hi,
On 09/05/2017 12:05 PM, Alexander Smirnov wrote:
>
>
> On 08/28/2017 02:27 PM, Claudius Heine wrote:
>> Hi,
>>
>> On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote:
>>> Hi,
>>>
>>> On 08/03/2017 10:13 AM, Claudius Heine wrote:
>>>> Hi,
>>>>
>>>> am I right that Isar supports or should support reproducible root
>>>> file system build?
>>>>
>>>> If I understand correctly, when multistrap is called, it fetches
>>>> always the latest version of all packages from the debian repository
>>>> mirrors. Am I mistaken or is this feature still on the roadmap?
>>>>
>>>> I that is on the roadmap, how are you thinking of solving this issue?
>>>>
>>>> The openembedded way would be to seperate the fetch and 'install'
>>>> step and first download all packages into the DL_DIR and then use
>>>> them from there. Maybe we could create this pipeline:
>>>>
>>>> dpkg-binary Recipe:
>>>>
>>>> fetch deb file into downloads -> insert into local repository
>>>>
>>>> dpkg-source Recipe:
>>>>
>>>> fetch sources into downloads -> build packages -> insert into local
>>>> repository
>>>>
>>>> image Recipe:
>>>>
>>>> fetch all required packages into downloads -> insert all of them
>>>> into the local repository -> create root fs using only the local
>>>> repository
>>>>
>>>> Multistrap provides a '--source-dir DIR' parameter, that stores all
>>>> installed packages into a directory. So if we would use that as a
>>>> fetcher, then we would create a temporary rootfs just to get all
>>>> required packages for the project.
>>>>
>>>> Are there other possible solutions for this?
>>>
>>> The problem with this solution is that its not possible to create
>>> multiple images with different sets of packages that share the
>>> version of the all the common packages.
>>>
>>> An alternative solution is to employ a repository cacher that caches
>>> the 'Packages.gz' of the first request. This way it would also be
>>> faster then running multistrap one additional time just to fetch all
>>> required packages.
>>>
>>> Maybe apt-cacher-ng or something similar can be used for this.
>>> However I am currently not sure how this can be integrated into the
>>> current build process. Some ideas? Maybe implementing a simple repo
>>> caching proxy that is integrated into isar?
>>>
>>> The repository cacher is likely a daemon running in parallel to
>>> multistrap and fetches everything to the DL_DIR that is requested by
>>> it. Maybe provide a 'clean_package_cache' task, that deletes the
>>> cached 'Packages.gz', causing the next root fs build to use new
>>> package versions.
>>>
>>> I would really like to hear some feedback on this.
>>
>> In our meeting today, it was discussed that we should collect all
>> requirements for this feature and discuss possible implementation
>> ideas based on those requirements.
>>
>> Here are some requirements from my side:
>>
>> 1 If multiple different images with some common set of packages are
>> build with one bitbake call, then all images should contain
>> exactly the same version of every package that it has in common
>> with any of the other images.
>>
>> 2 The resulting image should only depend on the build environment
>> and isar metadata, not on the point in time it is build.
>> This means if the environment, including the downloads directory,
>> is complete (for instance by an earlier build of the image), every
>> following build of this image recipe should result in exactly the
>> same packages installed on this image.
>>
>> 3 Binary and source packages should be part of the archival process.
>> Source packages are useful in case some package needs to be
>> patched at a later date. Binary packages are useful, because
>> building them from source packages is currently not 100%
>> reproducible in Debian upstream. [1]
>>
>> 4 For development, it should be possible to easily reset the
>> environment, triggering an upgrade of the packages on the next
>> image build.
>>
>> 5 Deployable in CI environments. What those are exactly should be
>> further discussed. Here are some:
>>
>> 5.1 Possibility to use a download cache, that is not bound to only
>> one product/image/environment
>>
>> 5.2 More than one build at the same time in one environment should
>> be possible
>>
>> 6 Efficiency: The reproducibility feature should be time and
>> resource efficient as possible. E.g. Process should only fetch and
>> store the required files.
>>
>> 7 Outputs a description file with the name and version of every
>> package deployed/used in the image/environment.
>>
>> To 5: Since I don't have much experience with CI systems, requirements
>> mentioned here might not be correct.
>>
>> Any comment or requirement additions are welcome.
>
> Thank you for the requirements, they quite good describe your usecase.
> Unfortunately, ATM I don't know all the capabilities of
> multistrap/debootstrap, so could not propose too much.
>
> In general, I think there could be following solutions:
>
> - Create local apt cache with specified packages versions.
> - Patch multistrap to add capabilities to specify package versions.
> - Add hook to multistrap hooks (for example, in configscript.sh), that
> will re-install desired package versions via apt-get.
My solution is a bit different and does not require patching multistrap,
should also work with other bootstraping mechanism. (AFAIK it should be
possible to change the bootstraping mechanism at a later date, since the
multistrap project is dead.)
I started implementing a http proxy in python, that caches all requests
of '/pool/' and '/dists/' uris in seperate directories. 'dists' is part
of the build environment while 'pool' contains all the packages and
should be part of the download directory.
I am currently not actively working on this proxy, because of other
tasks with higher priority, but I can give you access to it if you like.
On my TODO list for this is:
- Port to asyncio (with a simple http implementation)
This proxy is currently single threaded and can only handle one
connection at a time. Porting to asyncio is possible, but since the
python standard library does not provide a http implementation based
on asyncio a small http implementation based on this has to be
implemented as well.
- Integrate into bitbake/isar as a scripts/lib and a bbclass
To ease early development I implemented this proxy outside of
bitbake, but with the idea to integrate it into bitbake at a later
date. It should be easily doable to integrate this into bitbake via
two tasks. One that starts the proxy, and one that shuts it down.
Maybe add a shutdown via a bitbake event as well, so that it will be
shut down regardless of the tasks handled. Or do it completely via
bitbake events.
The current proxy limits repositories to the http protocol. But maybe
its possible to have https proxies as well, but there its necessary to
break the ssl chain.
Claudius
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-09-05 11:54 ` Claudius Heine
@ 2017-09-06 13:39 ` Claudius Heine
0 siblings, 0 replies; 22+ messages in thread
From: Claudius Heine @ 2017-09-06 13:39 UTC (permalink / raw)
To: Alexander Smirnov, isar-users
Cc: Alexander Smirnov, Baurzhan Ismagulov, Henning Schild
Hi,
On 09/05/2017 01:54 PM, [ext] Claudius Heine wrote:
> Hi,
>
> On 09/05/2017 12:05 PM, Alexander Smirnov wrote:
>>
>>
>> On 08/28/2017 02:27 PM, Claudius Heine wrote:
>>> Hi,
>>>
>>> On 08/21/2017 01:23 PM, [ext] Claudius Heine wrote:
>>>> Hi,
>>>>
>>>> On 08/03/2017 10:13 AM, Claudius Heine wrote:
>>>>> Hi,
>>>>>
>>>>> am I right that Isar supports or should support reproducible root
>>>>> file system build?
>>>>>
>>>>> If I understand correctly, when multistrap is called, it fetches
>>>>> always the latest version of all packages from the debian
>>>>> repository mirrors. Am I mistaken or is this feature still on the
>>>>> roadmap?
>>>>>
>>>>> I that is on the roadmap, how are you thinking of solving this issue?
>>>>>
>>>>> The openembedded way would be to seperate the fetch and 'install'
>>>>> step and first download all packages into the DL_DIR and then use
>>>>> them from there. Maybe we could create this pipeline:
>>>>>
>>>>> dpkg-binary Recipe:
>>>>>
>>>>> fetch deb file into downloads -> insert into local repository
>>>>>
>>>>> dpkg-source Recipe:
>>>>>
>>>>> fetch sources into downloads -> build packages -> insert into local
>>>>> repository
>>>>>
>>>>> image Recipe:
>>>>>
>>>>> fetch all required packages into downloads -> insert all of them
>>>>> into the local repository -> create root fs using only the local
>>>>> repository
>>>>>
>>>>> Multistrap provides a '--source-dir DIR' parameter, that stores all
>>>>> installed packages into a directory. So if we would use that as a
>>>>> fetcher, then we would create a temporary rootfs just to get all
>>>>> required packages for the project.
>>>>>
>>>>> Are there other possible solutions for this?
>>>>
>>>> The problem with this solution is that its not possible to create
>>>> multiple images with different sets of packages that share the
>>>> version of the all the common packages.
>>>>
>>>> An alternative solution is to employ a repository cacher that caches
>>>> the 'Packages.gz' of the first request. This way it would also be
>>>> faster then running multistrap one additional time just to fetch all
>>>> required packages.
>>>>
>>>> Maybe apt-cacher-ng or something similar can be used for this.
>>>> However I am currently not sure how this can be integrated into the
>>>> current build process. Some ideas? Maybe implementing a simple repo
>>>> caching proxy that is integrated into isar?
>>>>
>>>> The repository cacher is likely a daemon running in parallel to
>>>> multistrap and fetches everything to the DL_DIR that is requested by
>>>> it. Maybe provide a 'clean_package_cache' task, that deletes the
>>>> cached 'Packages.gz', causing the next root fs build to use new
>>>> package versions.
>>>>
>>>> I would really like to hear some feedback on this.
>>>
>>> In our meeting today, it was discussed that we should collect all
>>> requirements for this feature and discuss possible implementation
>>> ideas based on those requirements.
>>>
>>> Here are some requirements from my side:
>>>
>>> 1 If multiple different images with some common set of packages are
>>> build with one bitbake call, then all images should contain
>>> exactly the same version of every package that it has in common
>>> with any of the other images.
>>>
>>> 2 The resulting image should only depend on the build environment
>>> and isar metadata, not on the point in time it is build.
>>> This means if the environment, including the downloads directory,
>>> is complete (for instance by an earlier build of the image),
>>> every
>>> following build of this image recipe should result in exactly the
>>> same packages installed on this image.
>>>
>>> 3 Binary and source packages should be part of the archival
>>> process.
>>> Source packages are useful in case some package needs to be
>>> patched at a later date. Binary packages are useful, because
>>> building them from source packages is currently not 100%
>>> reproducible in Debian upstream. [1]
>>>
>>> 4 For development, it should be possible to easily reset the
>>> environment, triggering an upgrade of the packages on the next
>>> image build.
>>>
>>> 5 Deployable in CI environments. What those are exactly should be
>>> further discussed. Here are some:
>>>
>>> 5.1 Possibility to use a download cache, that is not bound to
>>> only
>>> one product/image/environment
>>>
>>> 5.2 More than one build at the same time in one environment
>>> should
>>> be possible
>>>
>>> 6 Efficiency: The reproducibility feature should be time and
>>> resource efficient as possible. E.g. Process should only fetch
>>> and
>>> store the required files.
>>>
>>> 7 Outputs a description file with the name and version of every
>>> package deployed/used in the image/environment.
8 Use this description and/or an archive file to restore the
environment state on a fresh directory so that the same image can
be recreated.
>>>
>>> To 5: Since I don't have much experience with CI systems,
>>> requirements mentioned here might not be correct.
>>>
>>> Any comment or requirement additions are welcome.
>>
>> Thank you for the requirements, they quite good describe your usecase.
>> Unfortunately, ATM I don't know all the capabilities of
>> multistrap/debootstrap, so could not propose too much.
>>
>> In general, I think there could be following solutions:
>>
>> - Create local apt cache with specified packages versions.
>> - Patch multistrap to add capabilities to specify package versions.
>> - Add hook to multistrap hooks (for example, in configscript.sh),
>> that will re-install desired package versions via apt-get.
>
> My solution is a bit different and does not require patching multistrap,
> should also work with other bootstraping mechanism. (AFAIK it should be
> possible to change the bootstraping mechanism at a later date, since the
> multistrap project is dead.)
>
> I started implementing a http proxy in python, that caches all requests
> of '/pool/' and '/dists/' uris in seperate directories. 'dists' is part
> of the build environment while 'pool' contains all the packages and
> should be part of the download directory.
>
> I am currently not actively working on this proxy, because of other
> tasks with higher priority, but I can give you access to it if you like.
>
> On my TODO list for this is:
>
> - Port to asyncio (with a simple http implementation)
> This proxy is currently single threaded and can only handle one
> connection at a time. Porting to asyncio is possible, but since the
> python standard library does not provide a http implementation based
> on asyncio a small http implementation based on this has to be
> implemented as well.
> - Integrate into bitbake/isar as a scripts/lib and a bbclass
> To ease early development I implemented this proxy outside of
> bitbake, but with the idea to integrate it into bitbake at a later
> date. It should be easily doable to integrate this into bitbake via
> two tasks. One that starts the proxy, and one that shuts it down.
> Maybe add a shutdown via a bitbake event as well, so that it will be
> shut down regardless of the tasks handled. Or do it completely via
> bitbake events.
>
> The current proxy limits repositories to the http protocol. But maybe
> its possible to have https proxies as well, but there its necessary to
> break the ssl chain.
The next point of my list would be the save and restore functionality.
This would be necessary to reproduce a build with a fresh build environment.
There are a couple of ways to do this. Here are some that are currently
on my mind:
* Just create a tarball of the 'dists' and 'pool' directory, archive
that and import it into the respective directories in the fresh
environment. This might not be resource efficient, because the pool
could contain packages that are not used in the image.
* Log requested files in the proxy and use this list afterwards to
create an archive that can be used to recreate the proxy
directories. This can not be done in an image recipe, but has to be
done just before bitbake is finished. Because the archive should
contain not only the packages that are used in one image, but all
the packages that are used in one bitbake build run.
* Use the 'source directory' feature of multistrap to create a
directory containing all used packages for an image and use these
packages to create an independent repository. This repo is then used
as the "upstream repo" in later builds.
If multistrap is no longer used, extract all these packages from the
apt-cache in the created root file system to emulate this multistrap
feature.
And some other variations of those three ideas.
I currently have no concrete idea how to archive the source packages
yet. Since the mapping of binary and source packages is not bijective,
its not trivial and dpkg & apt needs to be used to fetch them form the
repositories.
Cheers,
Claudius
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-08-03 8:13 Reproducibility of builds Claudius Heine
2017-08-21 11:23 ` Claudius Heine
@ 2017-09-18 15:05 ` Baurzhan Ismagulov
2017-09-19 8:55 ` Claudius Heine
2017-11-14 16:04 ` Christian Storm
2 siblings, 1 reply; 22+ messages in thread
From: Baurzhan Ismagulov @ 2017-09-18 15:05 UTC (permalink / raw)
To: isar-users
Hello Claudius,
thanks much for sharing the concept and the requirements!
Let's check whether I understand your concept correctly. Assume we have minimal
stretch binaries + hello source. Your concept provides for:
1. Start bitbake, which:
1. Downloads debs to be installed and adds them into a local apt repo.
2. Fetches hello sources, builds them, and copies the deb to the local apt
repo.
3. Bootstraps Debian and hello binary debs from the local apt repo.
Please correct me if I got anything incorrectly.
I like this workflow. This is what is used in Isar's predecessor. There, it is
implemented manually using debmirror.
The reason why Isar installs Debian packages from apt repos in Internet is to
give first-time users a setup working OOTB. If they start developing a product,
they are expected to create their own apt repos. See e.g.
http://events.linuxfoundation.org/sites/events/files/slides/isar-elce-2016_1.pdf,
slides 19 ("Debian apt" repo) and 26 ("Create repos for all components:
Debian...").
That is why I was originally thinking about a tool that would support this
manual workflow. After pondering on your proposal, I think it makes sense. I'd
like to see the following features:
* The functionality is implemented in a standalone tool usable manually or from
bitbake.
* The functionality is implemented based on dry-run output of
{deboot,multi,...}strap.
* The feature can be turned off in Isar's local configuration.
* The tool supports initial mirroring as well as an update. This should also be
controllable in Isar's local config.
What I don't like is the implementation via a http proxy. IMHO, it's too
indirect for the task (why bother with dynamic proxying if the list of packages
is defined statically in a given apt repo). It supports only one of apt's six
fetch methods (such as https, file, ssh, etc., see sources.list(5), more could
be defined in the future or in special environments). The implementation is
going to be complex, since it needs to distinguish between different build
process chains in the same environment (two bitbakes running in a single
docker).
It should be trivial to get a list of packages from multistrap. The same
functionality is available in debootstrap, when we move to it. Mirroring could
be done by an existing or a new tool. The latter may be a step to identify
requirements and get experience with the workflow before integrating the
functionality into the former (possibly upon feedback from Debian community).
Archiving of the apt repo is a CM issue outside of Isar. For reproducing older
versions, it should be managed in an SCM (e.g., git). Synchronization between
the right product and apt repo revisions is also outside Isar and could be
solved e.g. with kas. Or, one goes hard-core and commits apt stuff into the
product repo. In the future, we might come with a better solution for archiving
and version pinning; at this stage I'd like to utilize existing Debian means
first before going further. The details of the pinning concept would be
affected by bitbake debian/control backend implementation.
Similarly, at this stage I don't address advanced issues like sharing modified
and non-modified apt repos, which could be implemented by a KISS
jessie/Packages and myjessie/Packages with the shared pool. If we have many of
them in practice (which I doubt), we could still return to the issue.
Some comments below.
On Thu, Aug 03, 2017 at 10:13:12AM +0200, Claudius Heine wrote:
> am I right that Isar supports or should support reproducible root file
> system build?
Yes, this is possible outside of Isar. We wish that Isar makes that easier.
> If I understand correctly, when multistrap is called, it fetches always the
> latest version of all packages from the debian repository mirrors. Am I
> mistaken or is this feature still on the roadmap?
In the sense I interpret your wording, yes, multistrap always fetches the
latest version of all packages from the Debian repo.
That said, for a given repo, there is only one version of every package,
defined in the Packages file. It is the latest for that repo. Given a URI in
Internet, multistrap always fetches its "latest" (and the only) Packages and
installs the "latest" (and the only) package versions.
With kind regards,
Baurzhan.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-09-18 15:05 ` Baurzhan Ismagulov
@ 2017-09-19 8:55 ` Claudius Heine
0 siblings, 0 replies; 22+ messages in thread
From: Claudius Heine @ 2017-09-19 8:55 UTC (permalink / raw)
To: isar-users
Hi Baurzhan,
On 09/18/2017 05:05 PM, Baurzhan Ismagulov wrote:
> What I don't like is the implementation via a http proxy. IMHO, it's too
> indirect for the task (why bother with dynamic proxying if the list of packages
> is defined statically in a given apt repo).
Only if someone bothers to create a separate debian mirror repository
for every product. It uses much more resources. It would be much easier
to have a global package cache and a project local package index for it.
IMO that would be only possible with a caching repo proxy.
> It supports only one of apt's six
> fetch methods (such as https, file, ssh, etc., see sources.list(5), more could
> be defined in the future or in special environments).
At first. I started with a http proxy it the easiest to implement. Its
always possible to add additional functionality to the proxy to support
other fetch methods if necessary. But IMO that not really that important.
> The implementation is
> going to be complex, since it needs to distinguish between different build
> process chains in the same environment (two bitbakes running in a single
> docker).
Why? We have more then one port available, so we can run more then one
proxy simultaneously for each build. My current implementation just
chooses a free port and makes it available to the calling process.
> It should be trivial to get a list of packages from multistrap. The same
> functionality is available in debootstrap, when we move to it.
The problem is we still need to use apt in the buildchroot to install
additional build dependencies for each recipe. Those are not part of
what multistrap/debootstrap lists out. But since it would go through the
http proxy it would be part of the static package cache.
> Mirroring could
> be done by an existing or a new tool. The latter may be a step to identify
> requirements and get experience with the workflow before integrating the
> functionality into the former (possibly upon feedback from Debian community).
As I said, I don't see the sense in creating a full debian mirror for
every project. And partial mirrors are difficult to create because of
multistrap/debootstrap (in case of the buildchroot) don't know about
every package that is added to the image.
>
> Archiving of the apt repo is a CM issue outside of Isar. For reproducing older
> versions, it should be managed in an SCM (e.g., git).
That should be possible. Just archive the package index in a git repo
and the packages in a git lfs repo.
> Synchronization between
> the right product and apt repo revisions is also outside Isar and could be
> solved e.g. with kas.
Never said that it is. But isar is responsible for providing ways to
import/export some kind of package list into a build.
> Or, one goes hard-core and commits apt stuff into the
> product repo.
That might depend on your 'product' definition, but for me product is
not a image. So products can have varying package versions, while images
obviously doesn't. So committing them together with products makes no
sense to me. But committing them together with the final image with a
reference of the used refspec of the product repository makes more sense.
> In the future, we might come with a better solution for archiving
> and version pinning; at this stage I'd like to utilize existing Debian means
> first before going further. The details of the pinning concept would be
> affected by bitbake debian/control backend implementation.
I said nothing about pining, because IMO package updates etc. should
still be possible on the target if wanted. But we should be able to
recreate images at least from a package list. So apt package pinning is
just a different solution for a different problem. If you mean pinning
just in the bootstraping phase, then yes, that would be nice. But I
don't know how that can solve the buildchroot problem. Also since the
package index contains just one version of each package, I don't see how
it would be possible to pin them to an older version at this stage,
because those would no longer be available in the index and *bootstrap
would not know where to fetch them.
AFAIK Debian currently has no convenient means for solving these issues yet.
I used apt-cacher-ng when I worked with elbe, but setting that up for
every project separately is a big hassle. I want a easy solutions where
this stuff is done inside of the normal bitbake process, where not every
developer has to wire up her own process of building root file systems.
Because if they have to build their own most developers don't care about
it or it becomes impossible to recreate images because a couple of
unknown software packages in some unknown version and with unknown
configuration are necessary for it.
So its important to use normal upstream package mirrors and have a
process in place inside bitbake that cares about these issues transparently.
IMO its important to not have to many options and unneeded complexity.
So reproducibility should be the default and everyone is free to
update/clear the package index manually via a single bitbake task.
Cheers,
Claudius
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-08-03 8:13 Reproducibility of builds Claudius Heine
2017-08-21 11:23 ` Claudius Heine
2017-09-18 15:05 ` Baurzhan Ismagulov
@ 2017-11-14 16:04 ` Christian Storm
2017-11-14 16:22 ` Claudius Heine
2 siblings, 1 reply; 22+ messages in thread
From: Christian Storm @ 2017-11-14 16:04 UTC (permalink / raw)
To: isar-users
Hi,
since I'm very interested in this feature, I'd like to resume this
discussion and to eventually come to an agreed upon proposal on how
to implement it. So, without further ado, here are my thoughts on
the subject:
Regardless of the concrete technical implementation, I guess we can
agree on the need for a local cache/repository/store in which the Debian
binary packages plus their sources have to be stored since one may not
rely on the availability of those files online for eternity.
These files in this cache/repository/store are the union of the Debian
binary packages installed in the resulting image plus their sources as
well as those installed in the buildchroot plus their sources.
The latter is required to be able to rebuild Debian packages built from
source with the same compiler version, libraries, -dev packages, etc. pp.
Having the cache/repository/store at hand, there should be a mechanism
to prime Isar with it, i.e., Isar should only and exclusively use Debian
binary packages and sources from this cache/repository/store.
This is again, irrespective of the technical implementation, be it via
a repository cache or other means like git, a proxy server or whatsoever.
Granted, if one changes, e.g, IMAGE_INSTALL_append, the build fails but
does so rightfully as the set of packages is modified, resulting in a
new version/epoch (=set of Debian packages plus their sources). So,
there should be a convenient "interface" provided by Isar to maintain
the cache/repository/store. For example, one may want to have different
versions/epochs that may correspond to particular versions (git sha) of
the Isar layer. Or one wants to later add a Debian package plus its
source (which is automatically fetched), resulting in a new
version/epoch etc.
The remaining question is how to fill the cache/repository/store. In
order to have a consistent version/epoch (=set of Debian packages plus
their sources), there should not be duplicate packages in it, i.e., the
same Debian package but with different versions.
This could currently happen because there is a "window of vulnerability":
multistrap is run twice, once for isar-image-base.bb and once for
buildchroot.bb. In between those two runs, the Debian mirror used could
get updated, resulting in a different version of the Debian package
being installed in buildchroot than in the resulting image.
This is an inherent problem of relying on the Debian way of distributing
packages as one cannot a priori control what particular package versions
one gets: In contrast to, e.g., Yocto where the particular package
versions are specified in the recipes, this does not hold for Isar as
the particular package versions are defined by the Debian mirror used,
hence, one gets "injected" the particular package versions.
So, what's required to reduce the "window of vulnerability" and to have
a consistent cache/repository/store for a particular version/epoch is to
make a snapshot-type download of the required packages. For this, of
course, one needs to know the concrete set of packages. This list could
be delivered by a "package trace" Isar run since not only multistrap
does install packages but sprinkled apt-get install commands do as well.
Thereafter, knowing the list, the snapshot-type download can happen,
hopefully resulting in a consistent cache/repository/store.
So, what do you think?
Besten Gru�,
Christian
--
Dr. Christian Storm
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Otto-Hahn-Ring 6, 81739 M�nchen, Germany
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-11-14 16:04 ` Christian Storm
@ 2017-11-14 16:22 ` Claudius Heine
2017-11-17 16:53 ` [ext] Christian Storm
0 siblings, 1 reply; 22+ messages in thread
From: Claudius Heine @ 2017-11-14 16:22 UTC (permalink / raw)
To: [ext] Christian Storm; +Cc: isar-users
Hi Christian,
On 11/14/2017 05:04 PM, [ext] Christian Storm wrote:
> Hi,
>
> since I'm very interested in this feature, I'd like to resume this
> discussion and to eventually come to an agreed upon proposal on how
> to implement it. So, without further ado, here are my thoughts on
> the subject:
>
> Regardless of the concrete technical implementation, I guess we can
> agree on the need for a local cache/repository/store in which the Debian
> binary packages plus their sources have to be stored since one may not
> rely on the availability of those files online for eternity.
>
> These files in this cache/repository/store are the union of the Debian
> binary packages installed in the resulting image plus their sources as
> well as those installed in the buildchroot plus their sources.
> The latter is required to be able to rebuild Debian packages built from
> source with the same compiler version, libraries, -dev packages, etc. pp.
>
> Having the cache/repository/store at hand, there should be a mechanism
> to prime Isar with it, i.e., Isar should only and exclusively use Debian
> binary packages and sources from this cache/repository/store.
> This is again, irrespective of the technical implementation, be it via
> a repository cache or other means like git, a proxy server or whatsoever.
>
> Granted, if one changes, e.g, IMAGE_INSTALL_append, the build fails but
> does so rightfully as the set of packages is modified, resulting in a
> new version/epoch (=set of Debian packages plus their sources). So,
> there should be a convenient "interface" provided by Isar to maintain
> the cache/repository/store. For example, one may want to have different
> versions/epochs that may correspond to particular versions (git sha) of
> the Isar layer. Or one wants to later add a Debian package plus its
> source (which is automatically fetched), resulting in a new
> version/epoch etc.
>
> The remaining question is how to fill the cache/repository/store. In
> order to have a consistent version/epoch (=set of Debian packages plus
> their sources), there should not be duplicate packages in it, i.e., the
> same Debian package but with different versions.
> This could currently happen because there is a "window of vulnerability":
> multistrap is run twice, once for isar-image-base.bb and once for
> buildchroot.bb. In between those two runs, the Debian mirror used could
> get updated, resulting in a different version of the Debian package
> being installed in buildchroot than in the resulting image.
> This is an inherent problem of relying on the Debian way of distributing
> packages as one cannot a priori control what particular package versions
> one gets: In contrast to, e.g., Yocto where the particular package
> versions are specified in the recipes, this does not hold for Isar as
> the particular package versions are defined by the Debian mirror used,
> hence, one gets "injected" the particular package versions.
> So, what's required to reduce the "window of vulnerability" and to have
> a consistent cache/repository/store for a particular version/epoch is to
> make a snapshot-type download of the required packages. For this, of
> course, one needs to know the concrete set of packages. This list could
> be delivered by a "package trace" Isar run since not only multistrap
> does install packages but sprinkled apt-get install commands do as well.
> Thereafter, knowing the list, the snapshot-type download can happen,
> hopefully resulting in a consistent cache/repository/store.
>
>
> So, what do you think?
I agree with your formulation of the problem here.
Simple tracing of installed packages will have the problem you
described, that its possible that different versions of a package are
installed into buildchroot and image. So this trace needs to be cleaned
up and then based on that the whole process has to be started again to
create a consistent package list between buildchroot and image. This
doubles the build time in the trivial implementation.
With my suggestion of using a caching proxy, this could be solved
without any additional overhead.
I do have other ideas to do this, but that would restructure most of isar.
Cheers,
Claudius
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-11-14 16:22 ` Claudius Heine
@ 2017-11-17 16:53 ` [ext] Christian Storm
2017-11-17 18:14 ` Claudius Heine
0 siblings, 1 reply; 22+ messages in thread
From: [ext] Christian Storm @ 2017-11-17 16:53 UTC (permalink / raw)
To: isar-users
> > since I'm very interested in this feature, I'd like to resume this
> > discussion and to eventually come to an agreed upon proposal on how
> > to implement it. So, without further ado, here are my thoughts on
> > the subject:
> >
> > Regardless of the concrete technical implementation, I guess we can
> > agree on the need for a local cache/repository/store in which the Debian
> > binary packages plus their sources have to be stored since one may not
> > rely on the availability of those files online for eternity.
> >
> > These files in this cache/repository/store are the union of the Debian
> > binary packages installed in the resulting image plus their sources as
> > well as those installed in the buildchroot plus their sources.
> > The latter is required to be able to rebuild Debian packages built from
> > source with the same compiler version, libraries, -dev packages, etc. pp.
> >
> > Having the cache/repository/store at hand, there should be a mechanism
> > to prime Isar with it, i.e., Isar should only and exclusively use Debian
> > binary packages and sources from this cache/repository/store.
> > This is again, irrespective of the technical implementation, be it via
> > a repository cache or other means like git, a proxy server or whatsoever.
> >
> > Granted, if one changes, e.g, IMAGE_INSTALL_append, the build fails but
> > does so rightfully as the set of packages is modified, resulting in a
> > new version/epoch (=set of Debian packages plus their sources). So,
> > there should be a convenient "interface" provided by Isar to maintain
> > the cache/repository/store. For example, one may want to have different
> > versions/epochs that may correspond to particular versions (git sha) of
> > the Isar layer. Or one wants to later add a Debian package plus its
> > source (which is automatically fetched), resulting in a new
> > version/epoch etc.
> >
> > The remaining question is how to fill the cache/repository/store. In
> > order to have a consistent version/epoch (=set of Debian packages plus
> > their sources), there should not be duplicate packages in it, i.e., the
> > same Debian package but with different versions.
> > This could currently happen because there is a "window of vulnerability":
> > multistrap is run twice, once for isar-image-base.bb and once for
> > buildchroot.bb. In between those two runs, the Debian mirror used could
> > get updated, resulting in a different version of the Debian package
> > being installed in buildchroot than in the resulting image.
> > This is an inherent problem of relying on the Debian way of distributing
> > packages as one cannot a priori control what particular package versions
> > one gets: In contrast to, e.g., Yocto where the particular package
> > versions are specified in the recipes, this does not hold for Isar as
> > the particular package versions are defined by the Debian mirror used,
> > hence, one gets "injected" the particular package versions.
> > So, what's required to reduce the "window of vulnerability" and to have
> > a consistent cache/repository/store for a particular version/epoch is to
> > make a snapshot-type download of the required packages. For this, of
> > course, one needs to know the concrete set of packages. This list could
> > be delivered by a "package trace" Isar run since not only multistrap
> > does install packages but sprinkled apt-get install commands do as well.
> > Thereafter, knowing the list, the snapshot-type download can happen,
> > hopefully resulting in a consistent cache/repository/store.
> >
> >
> > So, what do you think?
>
> I agree with your formulation of the problem here.
>
> Simple tracing of installed packages will have the problem you
> described, that its possible that different versions of a package are
> installed into buildchroot and image. So this trace needs to be cleaned
> up and then based on that the whole process has to be started again to
> create a consistent package list between buildchroot and image. This
> doubles the build time in the trivial implementation.
Sure, there's no free lunch here :)
I'd rather strive for a good solution and avoid trivial implementations
to make lunch as close to free as it gets, to stay in the picture.
> With my suggestion of using a caching proxy, this could be solved
> without any additional overhead.
Could be the case, what are the drawbacks? What proxy do you propose to
use? Maybe I missed something on the proxy suggestion.. Could you
please elaborate on this?
> I do have other ideas to do this, but that would restructure most of isar.
Well, at least speaking for myself, I'd like to hear those as I consider
this feature to be essential. Choice in solutions is always good :)
Kind regards,
Christian
--
Dr. Christian Storm
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Otto-Hahn-Ring 6, 81739 M�nchen, Germany
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-11-17 16:53 ` [ext] Christian Storm
@ 2017-11-17 18:14 ` Claudius Heine
2017-11-20 8:33 ` [ext] Christian Storm
0 siblings, 1 reply; 22+ messages in thread
From: Claudius Heine @ 2017-11-17 18:14 UTC (permalink / raw)
To: [ext] Christian Storm, isar-users
[-- Attachment #1: Type: text/plain, Size: 7829 bytes --]
Hi,
On Fri, 2017-11-17 at 17:53 +0100, [ext] Christian Storm wrote:
> > > since I'm very interested in this feature, I'd like to resume
> > > this
> > > discussion and to eventually come to an agreed upon proposal on
> > > how
> > > to implement it. So, without further ado, here are my thoughts on
> > > the subject:
> > >
> > > Regardless of the concrete technical implementation, I guess we
> > > can
> > > agree on the need for a local cache/repository/store in which the
> > > Debian
> > > binary packages plus their sources have to be stored since one
> > > may not
> > > rely on the availability of those files online for eternity.
> > >
> > > These files in this cache/repository/store are the union of the
> > > Debian
> > > binary packages installed in the resulting image plus their
> > > sources as
> > > well as those installed in the buildchroot plus their sources.
> > > The latter is required to be able to rebuild Debian packages
> > > built from
> > > source with the same compiler version, libraries, -dev packages,
> > > etc. pp.
> > >
> > > Having the cache/repository/store at hand, there should be a
> > > mechanism
> > > to prime Isar with it, i.e., Isar should only and exclusively use
> > > Debian
> > > binary packages and sources from this cache/repository/store.
> > > This is again, irrespective of the technical implementation, be
> > > it via
> > > a repository cache or other means like git, a proxy server or
> > > whatsoever.
> > >
> > > Granted, if one changes, e.g, IMAGE_INSTALL_append, the build
> > > fails but
> > > does so rightfully as the set of packages is modified, resulting
> > > in a
> > > new version/epoch (=set of Debian packages plus their sources).
> > > So,
> > > there should be a convenient "interface" provided by Isar to
> > > maintain
> > > the cache/repository/store. For example, one may want to have
> > > different
> > > versions/epochs that may correspond to particular versions (git
> > > sha) of
> > > the Isar layer. Or one wants to later add a Debian package plus
> > > its
> > > source (which is automatically fetched), resulting in a new
> > > version/epoch etc.
> > >
> > > The remaining question is how to fill the cache/repository/store.
> > > In
> > > order to have a consistent version/epoch (=set of Debian packages
> > > plus
> > > their sources), there should not be duplicate packages in it,
> > > i.e., the
> > > same Debian package but with different versions.
> > > This could currently happen because there is a "window of
> > > vulnerability":
> > > multistrap is run twice, once for isar-image-base.bb and once for
> > > buildchroot.bb. In between those two runs, the Debian mirror used
> > > could
> > > get updated, resulting in a different version of the Debian
> > > package
> > > being installed in buildchroot than in the resulting image.
> > > This is an inherent problem of relying on the Debian way of
> > > distributing
> > > packages as one cannot a priori control what particular package
> > > versions
> > > one gets: In contrast to, e.g., Yocto where the particular
> > > package
> > > versions are specified in the recipes, this does not hold for
> > > Isar as
> > > the particular package versions are defined by the Debian mirror
> > > used,
> > > hence, one gets "injected" the particular package versions.
> > > So, what's required to reduce the "window of vulnerability" and
> > > to have
> > > a consistent cache/repository/store for a particular
> > > version/epoch is to
> > > make a snapshot-type download of the required packages. For this,
> > > of
> > > course, one needs to know the concrete set of packages. This list
> > > could
> > > be delivered by a "package trace" Isar run since not only
> > > multistrap
> > > does install packages but sprinkled apt-get install commands do
> > > as well.
> > > Thereafter, knowing the list, the snapshot-type download can
> > > happen,
> > > hopefully resulting in a consistent cache/repository/store.
> > >
> > >
> > > So, what do you think?
> >
> > I agree with your formulation of the problem here.
> >
> > Simple tracing of installed packages will have the problem you
> > described, that its possible that different versions of a package
> > are
> > installed into buildchroot and image. So this trace needs to be
> > cleaned
> > up and then based on that the whole process has to be started again
> > to
> > create a consistent package list between buildchroot and image.
> > This
> > doubles the build time in the trivial implementation.
>
> Sure, there's no free lunch here :)
> I'd rather strive for a good solution and avoid trivial
> implementations
> to make lunch as close to free as it gets, to stay in the picture.
>
>
> > With my suggestion of using a caching proxy, this could be solved
> > without any additional overhead.
>
> Could be the case, what are the drawbacks?
More complexity and stuff to implement. Also maybe download speed.
> What proxy do you propose to
> use?
I was at first going with my own standalone proxy implementation in
pure stdlib python, so that it could be completely integrated into
isar. I had a very simple solution ready rather quickly, but it was
only synchronous and as such could only handle one connection at a
time. Instead of just throwing more threads at it, I wanted to go the
asyncio route. Sadly the python stdlib does not provide a http
implementation for asyncio. I wasn't clear how to proceed from here
further (aiohttp dependency or minimal own http implementation).
The other idea is to just use a ready made apt caching proxy like apt-
cache-ng. But here I am unsure if its flexible enough to use in our
case. Starting it multiple times in parallel with different ports for
different caches and only user privileges might be possible but I
suspect that seperating the pool and the dists folder (pool should go
to DL_DIR while dists is part of the TMP_DIR) could be more difficult.
> Maybe I missed something on the proxy suggestion.. Could you
> please elaborate on this?
As for the integration the basic idea was that for taged bitbake tasks
the proxy is started and sets the *_PROXY environment variables. This
should be doable with some mods to the base.bbclass and some external
python scripts.
>
>
> > I do have other ideas to do this, but that would restructure most
> > of isar.
>
> Well, at least speaking for myself, I'd like to hear those as I
> consider
> this feature to be essential. Choice in solutions is always good :)
>
One idea that I got when I first investigated isar, was trying to be oe
compatible as much as possible. So using this idea would solve the
reproducable builds as well:
Basically implementing debootstrap with bitbake recipes that are
created virtually on runtime by downloading and parsing the
'dists/*/*/*/Packages.gz' file.
I suppose it should be possible to fetch the Packages file at an early
parsing step in a bitbake build, if its not already preset, and fill
the bitbake data store with recipe definitions that fetch those binary
deb packages, have the appropriate dependencies and install them into
the root file system.
However, this idea is still in the brain storming phase.
Since that would involve a very big redesign I don't think its feasible
currently.
Cheers,
Claudius
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de
PGP key: 6FF2 E59F 00C6 BC28 31D8 64C1 1173 CB19 9808 B153
Keyserver: hkp://pool.sks-keyservers.net
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-11-17 18:14 ` Claudius Heine
@ 2017-11-20 8:33 ` [ext] Christian Storm
2017-11-20 9:16 ` Claudius Heine
0 siblings, 1 reply; 22+ messages in thread
From: [ext] Christian Storm @ 2017-11-20 8:33 UTC (permalink / raw)
To: isar-users
> > [...]
> > > With my suggestion of using a caching proxy, this could be solved
> > > without any additional overhead.
> >
> > Could be the case, what are the drawbacks?
>
> More complexity and stuff to implement. Also maybe download speed.
>
> > What proxy do you propose to use?
>
> I was at first going with my own standalone proxy implementation in
> pure stdlib python, so that it could be completely integrated into
> isar.
Why not hooking this into the fetcher(s) so that it's integrated rather
than a standalone thing? As a bonus, you'll have full control on this
from the Isar core/code. I think the main invention here is the code
that does the consistent version/epoch guarantee anyway...
> I had a very simple solution ready rather quickly, but it was
> only synchronous and as such could only handle one connection at a
> time. Instead of just throwing more threads at it, I wanted to go the
> asyncio route. Sadly the python stdlib does not provide a http
> implementation for asyncio. I wasn't clear how to proceed from here
> further (aiohttp dependency or minimal own http implementation).
Ah, OK. Wouldn't this account for premature optimization? :)
> The other idea is to just use a ready made apt caching proxy like apt-
> cache-ng. But here I am unsure if its flexible enough to use in our
> case. Starting it multiple times in parallel with different ports for
> different caches and only user privileges might be possible but I
> suspect that seperating the pool and the dists folder (pool should go
> to DL_DIR while dists is part of the TMP_DIR) could be more difficult.
I would consider on the bonus side for this that we don't have to
develop/maintain a custom solution, given that it suits our purposes of
course...
> > Maybe I missed something on the proxy suggestion.. Could you
> > please elaborate on this?
>
> As for the integration the basic idea was that for taged bitbake tasks
> the proxy is started and sets the *_PROXY environment variables. This
> should be doable with some mods to the base.bbclass and some external
> python scripts.
>
> >
> >
> > > I do have other ideas to do this, but that would restructure most
> > > of isar.
> >
> > Well, at least speaking for myself, I'd like to hear those as I
> > consider
> > this feature to be essential. Choice in solutions is always good :)
> >
>
> One idea that I got when I first investigated isar, was trying to be oe
> compatible as much as possible. So using this idea would solve the
> reproducable builds as well:
>
> Basically implementing debootstrap with bitbake recipes that are
> created virtually on runtime by downloading and parsing the
> 'dists/*/*/*/Packages.gz' file.
Those virtual recipes then will have to be serialized as they contain
the version number of the package, right?
> I suppose it should be possible to fetch the Packages file at an early
> parsing step in a bitbake build, if its not already preset, and fill
> the bitbake data store with recipe definitions that fetch those binary
> deb packages, have the appropriate dependencies and install them into
> the root file system.
Yes, or do a 'download-only' step prior to building as it's available
on Yocto.
> However, this idea is still in the brain storming phase.
>
> Since that would involve a very big redesign I don't think its feasible
> currently.
Sounds interesting, at least for me...
Kind regards,
Christian
--
Dr. Christian Storm
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Otto-Hahn-Ring 6, 81739 M�nchen, Germany
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-11-20 8:33 ` [ext] Christian Storm
@ 2017-11-20 9:16 ` Claudius Heine
2017-11-29 18:53 ` Alexander Smirnov
0 siblings, 1 reply; 22+ messages in thread
From: Claudius Heine @ 2017-11-20 9:16 UTC (permalink / raw)
To: isar-users
[-- Attachment #1.1: Type: text/plain, Size: 5778 bytes --]
Hi Christian,
On 20.11.2017 09:33, [ext] Christian Storm wrote:
>>> [...]
>>>> With my suggestion of using a caching proxy, this could be solved
>>>> without any additional overhead.
>>>
>>> Could be the case, what are the drawbacks?
>>
>> More complexity and stuff to implement. Also maybe download speed.
>>
>>> What proxy do you propose to use?
>>
>> I was at first going with my own standalone proxy implementation in
>> pure stdlib python, so that it could be completely integrated into
>> isar.
>
> Why not hooking this into the fetcher(s) so that it's integrated rather
> than a standalone thing?
The bitbake fetcher is not the only step that downloads stuff in isar.
There is also multistrap and possible 'apt-get install' calls within a
chroot environment.
I was going to integrate it into isar at some point, but first I wanted
to have a working proof of concept without bitbake in between to be
easily testable. Then integrate it tightly into isar later.
> As a bonus, you'll have full control on this
> from the Isar core/code. I think the main invention here is the code
> that does the consistent version/epoch guarantee anyway...
Hmm... My hope is that this will be solved by itself, by splitting
'dists' and 'pool'.
>
>
>> I had a very simple solution ready rather quickly, but it was
>> only synchronous and as such could only handle one connection at a
>> time. Instead of just throwing more threads at it, I wanted to go the
>> asyncio route. Sadly the python stdlib does not provide a http
>> implementation for asyncio. I wasn't clear how to proceed from here
>> further (aiohttp dependency or minimal own http implementation).
>
> Ah, OK. Wouldn't this account for premature optimization? :)
Handling more than one connection in parallel should be possible IMO.
Going from one to two is harder then from two to n (n>2). So I was
lucky, in a sense, to discover at that early point in implementation
that this is harder to do than expected.
>> The other idea is to just use a ready made apt caching proxy like apt-
>> cache-ng. But here I am unsure if its flexible enough to use in our
>> case. Starting it multiple times in parallel with different ports for
>> different caches and only user privileges might be possible but I
>> suspect that seperating the pool and the dists folder (pool should go
>> to DL_DIR while dists is part of the TMP_DIR) could be more difficult.
>
> I would consider on the bonus side for this that we don't have to
> develop/maintain a custom solution, given that it suits our purposes of
> course...
Agree. But if it only 'sort of' suits our purpose, we might need to
write wrapper code around its short comings and maintain that.
>>> Maybe I missed something on the proxy suggestion.. Could you
>>> please elaborate on this?
>>
>> As for the integration the basic idea was that for taged bitbake tasks
>> the proxy is started and sets the *_PROXY environment variables. This
>> should be doable with some mods to the base.bbclass and some external
>> python scripts.
>>
>>>
>>>
>>>> I do have other ideas to do this, but that would restructure most
>>>> of isar.
>>>
>>> Well, at least speaking for myself, I'd like to hear those as I
>>> consider
>>> this feature to be essential. Choice in solutions is always good :)
>>>
>>
>> One idea that I got when I first investigated isar, was trying to be oe
>> compatible as much as possible. So using this idea would solve the
>> reproducable builds as well:
>>
>> Basically implementing debootstrap with bitbake recipes that are
>> created virtually on runtime by downloading and parsing the
>> 'dists/*/*/*/Packages.gz' file.
>
> Those virtual recipes then will have to be serialized as they contain
> the version number of the package, right?
I'm not sure if I understand your point correctly. I don't think the
recipes needs to be written down as a file somewhere. We might have to
take a look at the parsing part of bitbake, were the recipe data store
is filled. So were the deserialization happens from '*.bb' to entry in
ds. Here we just take one or more Debian package lists with some
additional information, like the repo url and fill the ds with generated
recipes.
>> I suppose it should be possible to fetch the Packages file at an early
>> parsing step in a bitbake build, if its not already preset, and fill
>> the bitbake data store with recipe definitions that fetch those binary
>> deb packages, have the appropriate dependencies and install them into
>> the root file system.
>
> Yes, or do a 'download-only' step prior to building as it's available
> on Yocto.
Not sure if that is possible. Task execution is done after all those
recipes are parsed and dependencies are resolved. To add virtual
packages ourselves we need to do that before any task is triggered. So
fetching the 'Packages.gz' file needs to be very early outside of what
recipes normally do.
I suspect that this is possible by using bitbake event handlers [1].
>> However, this idea is still in the brain storming phase.
>>
>> Since that would involve a very big redesign I don't think its feasible
>> currently.
>
> Sounds interesting, at least for me...
Thanks.
Claudius
[1]
https://www.yoctoproject.org/docs/latest/bitbake-user-manual/bitbake-user-manual.html#events
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de
PGP key: 6FF2 E59F 00C6 BC28 31D8 64C1 1173 CB19 9808 B153
Keyserver: hkp://pool.sks-keyservers.net
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-11-20 9:16 ` Claudius Heine
@ 2017-11-29 18:53 ` Alexander Smirnov
2017-11-29 19:02 ` Jan Kiszka
2017-11-30 9:31 ` Claudius Heine
0 siblings, 2 replies; 22+ messages in thread
From: Alexander Smirnov @ 2017-11-29 18:53 UTC (permalink / raw)
To: isar-users
Hi everybody,
I've started working on this topic and here I'd like to share my vision.
At the moment I've implemented simple PoC in my branch 'asmirnov/build_rep'.
What it does:
1. There is new recipe: base-apt. It provides task which:
- Fetches packages from origin Debian apt to local folder using
deboostrap.
- Put these packages via 'reprepro' to local repository called 'base-apt'.
2. Buildchroot uses 'base-apt' to generate rootfs.
3. Isar image uses 'base-apt' and 'isar' repos to generate rootfs.
What are the key benefits of this approach:
1. Download session for upstream packages is performed in a single step.
2. You could use your local 'versioned' apt repository instead of
downloading origin packages.
3. Having local apt repository managed by 'reprepro' provides us
possibility to implement version pinning. Reprepro provides lots of
things like:
- Get package name.
- Get package version.
- Remove specific package from repo.
- Add single package to repo.
So in general, if we have know which package version we want to have, we
need to get binary with this version and put it to 'base-apt'.
Which issues I see at the moment:
1. The key issue for me the list of packages for 'base-apt'. So before
'base-apt' task is executed, we should prepare full list of packages
that will be used by:
- buildchroot (BUILDCHROOT_PREINSTALL).
- packages to build (their build deps).
- image (IMAGE_PREINSTALL).
So I have an idea how to implement this via special tasks, will push
patch for RFC, but if you have your own proposals, I'll be happy to
discuss them!
Alex
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-11-29 18:53 ` Alexander Smirnov
@ 2017-11-29 19:02 ` Jan Kiszka
2017-11-30 8:04 ` Alexander Smirnov
2017-11-30 9:31 ` Claudius Heine
1 sibling, 1 reply; 22+ messages in thread
From: Jan Kiszka @ 2017-11-29 19:02 UTC (permalink / raw)
To: Alexander Smirnov, isar-users
On 2017-11-29 19:53, Alexander Smirnov wrote:
> Hi everybody,
>
> I've started working on this topic and here I'd like to share my vision.
> At the moment I've implemented simple PoC in my branch
> 'asmirnov/build_rep'.
>
> What it does:
>
> 1. There is new recipe: base-apt. It provides task which:
>
> - Fetches packages from origin Debian apt to local folder using
> deboostrap.
> - Put these packages via 'reprepro' to local repository called 'base-apt'.
>
> 2. Buildchroot uses 'base-apt' to generate rootfs.
>
> 3. Isar image uses 'base-apt' and 'isar' repos to generate rootfs.
>
>
>
> What are the key benefits of this approach:
>
> 1. Download session for upstream packages is performed in a single step.
>
> 2. You could use your local 'versioned' apt repository instead of
> downloading origin packages.
>
> 3. Having local apt repository managed by 'reprepro' provides us
> possibility to implement version pinning. Reprepro provides lots of
> things like:
> - Get package name.
> - Get package version.
> - Remove specific package from repo.
> - Add single package to repo.
>
> So in general, if we have know which package version we want to have, we
> need to get binary with this version and put it to 'base-apt'.
>
But this encodes the versions of the packages to be used implicitly into
their unique presence inside some local apt repo, no?
I would prefer a solution that stores the packages list with versions as
well and only uses that list, when provided, independent of the repo
content. That way we can throw all downloaded packages back into a
single archive repo. Have one repo per project version will quickly
explode storage-wise (or you need extra deduplication mechanisms).
That said, I'm fine with getting there in several steps, and this can be
a valid first one.
Jan
>
>
> Which issues I see at the moment:
>
> 1. The key issue for me the list of packages for 'base-apt'. So before
> 'base-apt' task is executed, we should prepare full list of packages
> that will be used by:
> - buildchroot (BUILDCHROOT_PREINSTALL).
> - packages to build (their build deps).
> - image (IMAGE_PREINSTALL).
>
> So I have an idea how to implement this via special tasks, will push
> patch for RFC, but if you have your own proposals, I'll be happy to
> discuss them!
>
> Alex
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-11-29 19:02 ` Jan Kiszka
@ 2017-11-30 8:04 ` Alexander Smirnov
2017-11-30 14:48 ` Jan Kiszka
0 siblings, 1 reply; 22+ messages in thread
From: Alexander Smirnov @ 2017-11-30 8:04 UTC (permalink / raw)
To: Jan Kiszka, isar-users
Hi Jan,
On 11/29/2017 10:02 PM, Jan Kiszka wrote:
> On 2017-11-29 19:53, Alexander Smirnov wrote:
>> Hi everybody,
>>
>> I've started working on this topic and here I'd like to share my vision.
>> At the moment I've implemented simple PoC in my branch
>> 'asmirnov/build_rep'.
>>
>> What it does:
>>
>> 1. There is new recipe: base-apt. It provides task which:
>>
>> - Fetches packages from origin Debian apt to local folder using
>> deboostrap.
>> - Put these packages via 'reprepro' to local repository called 'base-apt'.
>>
>> 2. Buildchroot uses 'base-apt' to generate rootfs.
>>
>> 3. Isar image uses 'base-apt' and 'isar' repos to generate rootfs.
>>
>>
>>
>> What are the key benefits of this approach:
>>
>> 1. Download session for upstream packages is performed in a single step.
>>
>> 2. You could use your local 'versioned' apt repository instead of
>> downloading origin packages.
>>
>> 3. Having local apt repository managed by 'reprepro' provides us
>> possibility to implement version pinning. Reprepro provides lots of
>> things like:
>> - Get package name.
>> - Get package version.
>> - Remove specific package from repo.
>> - Add single package to repo.
>>
>> So in general, if we have know which package version we want to have, we
>> need to get binary with this version and put it to 'base-apt'.
>>
>
> But this encodes the versions of the packages to be used implicitly into
> their unique presence inside some local apt repo, no?
>
> I would prefer a solution that stores the packages list with versions as
> well and only uses that list, when provided, independent of the repo
> content. That way we can throw all downloaded packages back into a
> single archive repo. Have one repo per project version will quickly
> explode storage-wise (or you need extra deduplication mechanisms).
>
> That said, I'm fine with getting there in several steps, and this can be
> a valid first one.
>
I got it.
Here I only mean that there are some tools that could help us in
implementing specific logic. At the moment I don't have the final
vision, but hope it will appear during experiments with this PoC.
My main wish is to avoid manual hacks with Debian artifacts and use
generic tools as much as possible.
Alex
> Jan
>
>>
>>
>> Which issues I see at the moment:
>>
>> 1. The key issue for me the list of packages for 'base-apt'. So before
>> 'base-apt' task is executed, we should prepare full list of packages
>> that will be used by:
>> - buildchroot (BUILDCHROOT_PREINSTALL).
>> - packages to build (their build deps).
>> - image (IMAGE_PREINSTALL).
>>
>> So I have an idea how to implement this via special tasks, will push
>> patch for RFC, but if you have your own proposals, I'll be happy to
>> discuss them!
>>
>> Alex
>>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-11-29 18:53 ` Alexander Smirnov
2017-11-29 19:02 ` Jan Kiszka
@ 2017-11-30 9:31 ` Claudius Heine
2017-12-06 16:21 ` Alexander Smirnov
1 sibling, 1 reply; 22+ messages in thread
From: Claudius Heine @ 2017-11-30 9:31 UTC (permalink / raw)
To: Alexander Smirnov, isar-users
[-- Attachment #1.1: Type: text/plain, Size: 2856 bytes --]
Hi Alex,
On 11/29/2017 07:53 PM, Alexander Smirnov wrote:
> Hi everybody,
>
> I've started working on this topic and here I'd like to share my vision.
> At the moment I've implemented simple PoC in my branch
> 'asmirnov/build_rep'.
>
> What it does:
>
> 1. There is new recipe: base-apt. It provides task which:
>
> - Fetches packages from origin Debian apt to local folder using
> deboostrap.
> - Put these packages via 'reprepro' to local repository called 'base-apt'.
>
> 2. Buildchroot uses 'base-apt' to generate rootfs.
>
> 3. Isar image uses 'base-apt' and 'isar' repos to generate rootfs.
>
>
>
> What are the key benefits of this approach:
>
> 1. Download session for upstream packages is performed in a single step.
>
> 2. You could use your local 'versioned' apt repository instead of
> downloading origin packages.
>
> 3. Having local apt repository managed by 'reprepro' provides us
> possibility to implement version pinning. Reprepro provides lots of
> things like:
> - Get package name.
> - Get package version.
> - Remove specific package from repo.
> - Add single package to repo.
>
> So in general, if we have know which package version we want to have, we
> need to get binary with this version and put it to 'base-apt'.
>
>
>
> Which issues I see at the moment:
>
> 1. The key issue for me the list of packages for 'base-apt'. So before
> 'base-apt' task is executed, we should prepare full list of packages
> that will be used by:
> - buildchroot (BUILDCHROOT_PREINSTALL).
> - packages to build (their build deps).
> - image (IMAGE_PREINSTALL).
Maybe try to do this flexible, because it should be also possible for
example to generate lxc images that are deployed to the final target in
the same isar run.
Also as Jan said, deduplication of packages so maybe try to fetch those
packages in the DL_DIR first, so that rebuilding is possible without
internet access, no tmp_dir, a populated DL_DIR and a package+version
list of sorts. Then have a task that copies those packages into the
tmp_dir into a repo within the tmp_dir and install from there. This way
the DL_DIR would only contain one instance of every packages and the
repo in the tmp_dir only have a copy (or maybe even just a symlink).
Archiving the DL_DIR would in this case be enough to build different
sets of images.
If you can solve this, than this solution looks promising.
Cheers,
Claudius
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de
PGP key: 6FF2 E59F 00C6 BC28 31D8 64C1 1173 CB19 9808 B153
Keyserver: hkp://pool.sks-keyservers.net
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-11-30 8:04 ` Alexander Smirnov
@ 2017-11-30 14:48 ` Jan Kiszka
0 siblings, 0 replies; 22+ messages in thread
From: Jan Kiszka @ 2017-11-30 14:48 UTC (permalink / raw)
To: Alexander Smirnov, isar-users
On 2017-11-30 09:04, Alexander Smirnov wrote:
> Hi Jan,
>
> On 11/29/2017 10:02 PM, Jan Kiszka wrote:
>> On 2017-11-29 19:53, Alexander Smirnov wrote:
>>> Hi everybody,
>>>
>>> I've started working on this topic and here I'd like to share my vision.
>>> At the moment I've implemented simple PoC in my branch
>>> 'asmirnov/build_rep'.
>>>
>>> What it does:
>>>
>>> 1. There is new recipe: base-apt. It provides task which:
>>>
>>> - Fetches packages from origin Debian apt to local folder using
>>> deboostrap.
>>> - Put these packages via 'reprepro' to local repository called
>>> 'base-apt'.
>>>
>>> 2. Buildchroot uses 'base-apt' to generate rootfs.
>>>
>>> 3. Isar image uses 'base-apt' and 'isar' repos to generate rootfs.
>>>
>>>
>>>
>>> What are the key benefits of this approach:
>>>
>>> 1. Download session for upstream packages is performed in a single step.
>>>
>>> 2. You could use your local 'versioned' apt repository instead of
>>> downloading origin packages.
>>>
>>> 3. Having local apt repository managed by 'reprepro' provides us
>>> possibility to implement version pinning. Reprepro provides lots of
>>> things like:
>>> - Get package name.
>>> - Get package version.
>>> - Remove specific package from repo.
>>> - Add single package to repo.
>>>
>>> So in general, if we have know which package version we want to have, we
>>> need to get binary with this version and put it to 'base-apt'.
>>>
>>
>> But this encodes the versions of the packages to be used implicitly into
>> their unique presence inside some local apt repo, no?
>>
>> I would prefer a solution that stores the packages list with versions as
>> well and only uses that list, when provided, independent of the repo
>> content. That way we can throw all downloaded packages back into a
>> single archive repo. Have one repo per project version will quickly
>> explode storage-wise (or you need extra deduplication mechanisms).
>>
>> That said, I'm fine with getting there in several steps, and this can be
>> a valid first one.
>>
>
> I got it.
>
> Here I only mean that there are some tools that could help us in
> implementing specific logic. At the moment I don't have the final
> vision, but hope it will appear during experiments with this PoC.
>
> My main wish is to avoid manual hacks with Debian artifacts and use
> generic tools as much as possible.
>
I do agree.
Jan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Reproducibility of builds
2017-11-30 9:31 ` Claudius Heine
@ 2017-12-06 16:21 ` Alexander Smirnov
0 siblings, 0 replies; 22+ messages in thread
From: Alexander Smirnov @ 2017-12-06 16:21 UTC (permalink / raw)
To: Storm, Christian; +Cc: isar-users
Hi Christian,
[...]
I've pushed my branch 'build_rep' which does the following:
1. Prepare list of packages being used by buildchroot, images and
packages to be built;
2. Create local apt with the packages listed above.
3. Generate buildchroot and image from local apt.
So it uses only one 'downloading' session.
Could you please test this branch with your custom meta?
Alex
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2017-12-06 16:21 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-03 8:13 Reproducibility of builds Claudius Heine
2017-08-21 11:23 ` Claudius Heine
2017-08-28 11:27 ` Claudius Heine
2017-09-05 10:05 ` Alexander Smirnov
2017-09-05 10:38 ` Jan Kiszka
2017-09-05 11:50 ` Alexander Smirnov
2017-09-05 11:54 ` Claudius Heine
2017-09-06 13:39 ` Claudius Heine
2017-09-18 15:05 ` Baurzhan Ismagulov
2017-09-19 8:55 ` Claudius Heine
2017-11-14 16:04 ` Christian Storm
2017-11-14 16:22 ` Claudius Heine
2017-11-17 16:53 ` [ext] Christian Storm
2017-11-17 18:14 ` Claudius Heine
2017-11-20 8:33 ` [ext] Christian Storm
2017-11-20 9:16 ` Claudius Heine
2017-11-29 18:53 ` Alexander Smirnov
2017-11-29 19:02 ` Jan Kiszka
2017-11-30 8:04 ` Alexander Smirnov
2017-11-30 14:48 ` Jan Kiszka
2017-11-30 9:31 ` Claudius Heine
2017-12-06 16:21 ` Alexander Smirnov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox