* [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds @ 2022-02-22 10:34 vijai kumar 2022-02-22 14:31 ` Henning Schild 0 siblings, 1 reply; 9+ messages in thread From: vijai kumar @ 2022-02-22 10:34 UTC (permalink / raw) To: isar-users, Henning Schild, Baurzhan Ismagulov; +Cc: Jan Kiszka Problem: -------- We could have several CI jobs that are running in parallel in different nodes. One might want to consolidate and build a base-apt from the debs/deb-srcs of all these builds. What's possible: --------------- With the current state of ISAR, the below is possible. 1. Run all the jobs in parallel in separate CI runners 2. Collect all the debs and deb-srcs from those builds and push to a common file server. 3. Download the debs and deb-srcs and create a repo out of it in the final CI step, 4. Upload the base-apt to the server. This has some disadvantages, we need to move all those data(deb/debsrcs), this increases time and cost. What's needed: -------------- The idea is to have a simple meta-data that can be used by repo generation tools to recreate the repo. Why manifest cannot be used: ---------------------------- Manifest does not serve this particular need. Below are the shortcomings of image manifest, 1. Does not have details about removed packages(eg localepurge) 2. Manifest of buildchroot would not have details about the package dependencies/imager installs at the time of generation(i.e. postprocess) Some ideas: ----------- There were a couple of ideas, 1. To use an external script to create a manifest of the downloads/{deb, debsrc} folder and try to download the packages using that manifest and appropriate sourceslist in the final runner. 2. To use "apt --print-uris" + "debootstrap --keep-debootstrap-dir" to create a metadata with complete url to the package. Later wget can be used to download those from the web. We are wondering if we could discuss and derive a solution for this here in ISAR itself instead of opting for some local scripts in downstream layers. Thanks, Vijai Kumar K ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds 2022-02-22 10:34 [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds vijai kumar @ 2022-02-22 14:31 ` Henning Schild 2022-02-24 13:20 ` vijai kumar 0 siblings, 1 reply; 9+ messages in thread From: Henning Schild @ 2022-02-22 14:31 UTC (permalink / raw) To: vijai kumar; +Cc: isar-users, Baurzhan Ismagulov, Jan Kiszka Hey Vijai, Am Tue, 22 Feb 2022 16:04:36 +0530 schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>: > Problem: > -------- > We could have several CI jobs that are running in parallel in > different nodes. One might want to consolidate and build a base-apt > from the debs/deb-srcs of all these builds. Can you go into more detail. I do not yet get the problem. It seems like you want to save compute time by sharing pre-built artifacts via some common storage. The sstate can do that very well, we are using shared folders for on-prem runners, s3 for AWS and sstate mirrors for population of "new empty runners" and "partial result delivery" of failed jobs and to sync on-prem with s3. isar is a tool to build images, not distros or repos or packages. While it can do all of that using it for such things can get tricky and isar was not designed for such cases. Meaning "base-apt" is not meant to be your cache to build many images from ... it is meant to be the cache for exactly one ... and sharing can cause problems. sstate would detect false sharing, say a package recipe for some reason uses a machine-conf variable. multiconfig or base-apt sharing would make you run into that bug, while sstate would likely not. So if it is about build time i suggest you have a look at sstate and the not yet upstreamed python helper scripts for sharing/eviction i can point you to in case you do not find it yourself. Henning > What's possible: > --------------- > With the current state of ISAR, the below is possible. > > 1. Run all the jobs in parallel in separate CI runners > 2. Collect all the debs and deb-srcs from those builds and push to a > common file server. > 3. Download the debs and deb-srcs and create a repo out of it in the > final CI step, > 4. Upload the base-apt to the server. > > This has some disadvantages, we need to move all those > data(deb/debsrcs), this increases time and cost. > > What's needed: > -------------- > The idea is to have a simple meta-data that can be used by repo > generation tools to recreate the repo. > > Why manifest cannot be used: > ---------------------------- > Manifest does not serve this particular need. Below are the > shortcomings of image manifest, > 1. Does not have details about removed packages(eg localepurge) > 2. Manifest of buildchroot would not have details about the package > dependencies/imager installs at the time of generation(i.e. > postprocess) > > Some ideas: > ----------- > There were a couple of ideas, > 1. To use an external script to create a manifest of the > downloads/{deb, debsrc} folder and try to download the packages using > that manifest and appropriate sourceslist in the final runner. > 2. To use "apt --print-uris" + "debootstrap --keep-debootstrap-dir" to > create a metadata with complete url to the package. Later wget can be > used to download those from the web. > > We are wondering if we could discuss and derive a solution for this > here in ISAR itself instead of opting for some local scripts in > downstream layers. > > Thanks, > Vijai Kumar K ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds 2022-02-22 14:31 ` Henning Schild @ 2022-02-24 13:20 ` vijai kumar 2022-02-24 15:42 ` Henning Schild 0 siblings, 1 reply; 9+ messages in thread From: vijai kumar @ 2022-02-24 13:20 UTC (permalink / raw) To: Henning Schild; +Cc: isar-users, Baurzhan Ismagulov, Jan Kiszka Hi Henning, On Tue, Feb 22, 2022 at 8:01 PM Henning Schild <henning.schild@siemens.com> wrote: > > Hey Vijai, > > Am Tue, 22 Feb 2022 16:04:36 +0530 > schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>: > > > Problem: > > -------- > > We could have several CI jobs that are running in parallel in > > different nodes. One might want to consolidate and build a base-apt > > from the debs/deb-srcs of all these builds. > > Can you go into more detail. I do not yet get the problem. runner 1(Germany) -> Building de0 nano runner 2(India) -> Building qemuarm runner 3(US) -> Building qemuamd64 All these builds are running in different servers. If we wanted to create a single base-apt from all these servers, then we need to copy over their deb/debsrcs/base-apt to a common server and then create a consolidated repo. This involves moving around this data. The problem can be avoided if we have a single metadata produced by all these builds which would have details of all the packages the build used. Basically a manifest of the build. This manifest can be later used to recreate the repo which can be hosted later on for these jobs. Having metadata and recreating repo is one way. There might be other ways as well. That is where we thought about the --print-uris option of apt. It basically gives you the complete URL to the package which we can download using wget. A manifest containing all the packages ever used by the build with its complete url. It could easily be used for several purposes, like as clearing input, repo regeneration etc. I don't think sstate can help here. I might be wrong though. Thanks, Vijai Kumar K > > It seems like you want to save compute time by sharing pre-built > artifacts via some common storage. The sstate can do that very well, we > are using shared folders for on-prem runners, s3 for AWS and sstate > mirrors for population of "new empty runners" and "partial result > delivery" of failed jobs and to sync on-prem with s3. > > isar is a tool to build images, not distros or repos or packages. While > it can do all of that using it for such things can get tricky and isar > was not designed for such cases. Meaning "base-apt" is not meant to be > your cache to build many images from ... it is meant to be the cache > for exactly one ... and sharing can cause problems. > > sstate would detect false sharing, say a package recipe for some reason > uses a machine-conf variable. multiconfig or base-apt sharing would > make you run into that bug, while sstate would likely not. > > So if it is about build time i suggest you have a look at sstate and the > not yet upstreamed python helper scripts for sharing/eviction i can > point you to in case you do not find it yourself. > > Henning > > > What's possible: > > --------------- > > With the current state of ISAR, the below is possible. > > > > 1. Run all the jobs in parallel in separate CI runners > > 2. Collect all the debs and deb-srcs from those builds and push to a > > common file server. > > 3. Download the debs and deb-srcs and create a repo out of it in the > > final CI step, > > 4. Upload the base-apt to the server. > > > > This has some disadvantages, we need to move all those > > data(deb/debsrcs), this increases time and cost. > > > > What's needed: > > -------------- > > The idea is to have a simple meta-data that can be used by repo > > generation tools to recreate the repo. > > > > Why manifest cannot be used: > > ---------------------------- > > Manifest does not serve this particular need. Below are the > > shortcomings of image manifest, > > 1. Does not have details about removed packages(eg localepurge) > > 2. Manifest of buildchroot would not have details about the package > > dependencies/imager installs at the time of generation(i.e. > > postprocess) > > > > Some ideas: > > ----------- > > There were a couple of ideas, > > 1. To use an external script to create a manifest of the > > downloads/{deb, debsrc} folder and try to download the packages using > > that manifest and appropriate sourceslist in the final runner. > > 2. To use "apt --print-uris" + "debootstrap --keep-debootstrap-dir" to > > create a metadata with complete url to the package. Later wget can be > > used to download those from the web. > > > > We are wondering if we could discuss and derive a solution for this > > here in ISAR itself instead of opting for some local scripts in > > downstream layers. > > > > Thanks, > > Vijai Kumar K > > -- > You received this message because you are subscribed to the Google Groups "isar-users" group. > To unsubscribe from this group and stop receiving emails from it, send an email to isar-users+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/isar-users/20220222153136.08432cb3%40md1za8fc.ad001.siemens.net. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds 2022-02-24 13:20 ` vijai kumar @ 2022-02-24 15:42 ` Henning Schild 2022-02-25 17:27 ` Jan Kiszka 0 siblings, 1 reply; 9+ messages in thread From: Henning Schild @ 2022-02-24 15:42 UTC (permalink / raw) To: vijai kumar; +Cc: isar-users, Baurzhan Ismagulov, Jan Kiszka Am Thu, 24 Feb 2022 18:50:50 +0530 schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>: > Hi Henning, > > On Tue, Feb 22, 2022 at 8:01 PM Henning Schild > <henning.schild@siemens.com> wrote: > > > > Hey Vijai, > > > > Am Tue, 22 Feb 2022 16:04:36 +0530 > > schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>: > > > > > Problem: > > > -------- > > > We could have several CI jobs that are running in parallel in > > > different nodes. One might want to consolidate and build a > > > base-apt from the debs/deb-srcs of all these builds. > > > > Can you go into more detail. I do not yet get the problem. > > runner 1(Germany) -> Building de0 nano > runner 2(India) -> Building qemuarm > runner 3(US) -> Building qemuamd64 > > > All these builds are running in different servers. > If we wanted to create a single base-apt from all these servers, then > we need to copy over their deb/debsrcs/base-apt to a common server and > then > create a consolidated repo. But why would you want to do that? I mean i get why you would want to store all in the same location, but not why it should be one repo. Maybe to save some space on sources and arch all .. but hey there are ways of deduplcating on filesystem or block level. You are just risking a weird local "all" package not being so "all" after all ... false sharing. > This involves moving around this data. Yes, if it one central storage place. No matter if it is one "repo" or many "repos" in i.e. folders. > The problem can be avoided if we have a single metadata produced by > all these builds which would have details of all the packages the > build used. > Basically a manifest of the build. This manifest can be later used to > recreate the repo which can be hosted later on for these jobs. We have a manifest for "image content" which already is fed into clearing, it is a bill of materials an nothing else, it can not be used to rebuild. Even if you had all metadata you need to store sources and binaries somewhere reliable, whether that is central or distributed is another story. Pointers to anything on the internet (including all debian repos) will at some point stop working. So if "exact rebuilding" in a "far away future" is what you want, mirroring is what you will need. Partial mirroring based on base-apt even with sources will be shaky and you will find yourself digging in snapshots again. But it will work. In the worst case you will not want "exact rebuild" but "fix backported rebuild", which means you will need all build-deps mirrored ... rescursively. In fact any "package relationship" maybe even a Conflicts might become rebuild relevant. A partial mirror will not cut it, rather take a full one, so you do not need to care of which bits to ignore and do not risk forgetting anything. The ideal way would be to eventually liberate snapshots of its throttling, the short term way is to spend some bucks on some buckets (S3). > Having metadata and recreating repo is one way. There might be other > ways as well. I am afraid you likely can not recreate if you do not keep everything yourself or a place you trust (snapshots?). There have been several threads on that topic already, including how one could help make snapshot work for debootstrap and co. Coming from reproducible builds and qubes-os [1] [2]. If you dig deeper you will find many people offering help and funding but for some reason things seem still "stuck". On top we could maybe see if we can establish something like snapshots in Siemens. But i guess outside and open to anyone will be much better. [1] https://groups.google.com/g/isar-users/c/X9B5chyEWpc/m/nVXwZuIRBAAJ [2] https://www.qubes-os.org/news/2021/10/08/reproducible-builds-for-debian-a-big-step-forward/ > That is where we thought about the --print-uris option of apt. It > basically gives you the complete URL to the package which we can > download using wget. > A manifest containing all the packages ever used by the build with its > complete url. It could easily be used for several purposes, like as > clearing input, > repo regeneration etc. Maybe we can find valid reasons to extend the manifests. But URLs to packages seem almost redundant, knowing the package names and versions and all sources.list entries one can generate these URLs for any mirror, picking just one of many mirrors would be limiting. And maybe there are valid reasons to having manifests even for buildchroots. But the problem here is that they change all the time while we still use one buildchoot. We see packages being added as build deps all the time, but also removed when build deps conflict. > I don't think sstate can help here. I might be wrong though. I guess sstate will not help. It is even more storage needs and more storage sync needs between runners if you want to share. Henning > Thanks, > Vijai Kumar K > > > > > It seems like you want to save compute time by sharing pre-built > > artifacts via some common storage. The sstate can do that very > > well, we are using shared folders for on-prem runners, s3 for AWS > > and sstate mirrors for population of "new empty runners" and > > "partial result delivery" of failed jobs and to sync on-prem with > > s3. > > > > isar is a tool to build images, not distros or repos or packages. > > While it can do all of that using it for such things can get tricky > > and isar was not designed for such cases. Meaning "base-apt" is not > > meant to be your cache to build many images from ... it is meant to > > be the cache for exactly one ... and sharing can cause problems. > > > > sstate would detect false sharing, say a package recipe for some > > reason uses a machine-conf variable. multiconfig or base-apt > > sharing would make you run into that bug, while sstate would likely > > not. > > > > So if it is about build time i suggest you have a look at sstate > > and the not yet upstreamed python helper scripts for > > sharing/eviction i can point you to in case you do not find it > > yourself. > > > > Henning > > > > > What's possible: > > > --------------- > > > With the current state of ISAR, the below is possible. > > > > > > 1. Run all the jobs in parallel in separate CI runners > > > 2. Collect all the debs and deb-srcs from those builds and push > > > to a common file server. > > > 3. Download the debs and deb-srcs and create a repo out of it in > > > the final CI step, > > > 4. Upload the base-apt to the server. > > > > > > This has some disadvantages, we need to move all those > > > data(deb/debsrcs), this increases time and cost. > > > > > > What's needed: > > > -------------- > > > The idea is to have a simple meta-data that can be used by repo > > > generation tools to recreate the repo. > > > > > > Why manifest cannot be used: > > > ---------------------------- > > > Manifest does not serve this particular need. Below are the > > > shortcomings of image manifest, > > > 1. Does not have details about removed packages(eg localepurge) > > > 2. Manifest of buildchroot would not have details about the > > > package dependencies/imager installs at the time of > > > generation(i.e. postprocess) > > > > > > Some ideas: > > > ----------- > > > There were a couple of ideas, > > > 1. To use an external script to create a manifest of the > > > downloads/{deb, debsrc} folder and try to download the packages > > > using that manifest and appropriate sourceslist in the final > > > runner. 2. To use "apt --print-uris" + "debootstrap > > > --keep-debootstrap-dir" to create a metadata with complete url to > > > the package. Later wget can be used to download those from the > > > web. > > > > > > We are wondering if we could discuss and derive a solution for > > > this here in ISAR itself instead of opting for some local scripts > > > in downstream layers. > > > > > > Thanks, > > > Vijai Kumar K > > > > -- > > You received this message because you are subscribed to the Google > > Groups "isar-users" group. To unsubscribe from this group and stop > > receiving emails from it, send an email to > > isar-users+unsubscribe@googlegroups.com. To view this discussion on > > the web visit > > https://groups.google.com/d/msgid/isar-users/20220222153136.08432cb3%40md1za8fc.ad001.siemens.net. > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds 2022-02-24 15:42 ` Henning Schild @ 2022-02-25 17:27 ` Jan Kiszka 2022-03-03 13:45 ` vijai kumar 0 siblings, 1 reply; 9+ messages in thread From: Jan Kiszka @ 2022-02-25 17:27 UTC (permalink / raw) To: Henning Schild, vijai kumar; +Cc: isar-users, Baurzhan Ismagulov On 24.02.22 16:42, Henning Schild wrote: > Am Thu, 24 Feb 2022 18:50:50 +0530 > schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>: > >> Hi Henning, >> >> On Tue, Feb 22, 2022 at 8:01 PM Henning Schild >> <henning.schild@siemens.com> wrote: >>> >>> Hey Vijai, >>> >>> Am Tue, 22 Feb 2022 16:04:36 +0530 >>> schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>: >>> >>>> Problem: >>>> -------- >>>> We could have several CI jobs that are running in parallel in >>>> different nodes. One might want to consolidate and build a >>>> base-apt from the debs/deb-srcs of all these builds. >>> >>> Can you go into more detail. I do not yet get the problem. >> >> runner 1(Germany) -> Building de0 nano >> runner 2(India) -> Building qemuarm >> runner 3(US) -> Building qemuamd64 >> >> >> All these builds are running in different servers. >> If we wanted to create a single base-apt from all these servers, then >> we need to copy over their deb/debsrcs/base-apt to a common server and >> then >> create a consolidated repo. > > But why would you want to do that? I mean i get why you would want to > store all in the same location, but not why it should be one repo. > Maybe to save some space on sources and arch all .. but hey there are > ways of deduplcating on filesystem or block level. > You are just risking a weird local "all" package not being so "all" > after all ... false sharing. We want to auto-build a single, "offline" capable repo from the BoM accumulated from those builds of all possible targets. And that in a way that does not require pushing large artifacts between the build stages, ideally only those BoM lists. > >> This involves moving around this data. > > Yes, if it one central storage place. No matter if it is one "repo" or > many "repos" in i.e. folders. > >> The problem can be avoided if we have a single metadata produced by >> all these builds which would have details of all the packages the >> build used. >> Basically a manifest of the build. This manifest can be later used to >> recreate the repo which can be hosted later on for these jobs. > > We have a manifest for "image content" which already is fed into > clearing, it is a bill of materials an nothing else, it can not > be used to rebuild. > Even if you had all metadata you need to store sources and binaries > somewhere reliable, whether that is central or distributed is another > story. > Pointers to anything on the internet (including all debian repos) will > at some point stop working. So if "exact rebuilding" in a "far away > future" is what you want, mirroring is what you will need. Exactly, this mirror is supposed to be generated, and that shortly after the individual builds succeeded (in a common pipeline stage). That can fail as any build can fail if a referenced version picked up during bootstrap got dropped while building an image. > Partial mirroring based on base-apt even with sources will be shaky and > you will find yourself digging in snapshots again. But it will work. Yes, it works for us (you should know ;)). Jan -- Siemens AG, Technology Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds 2022-02-25 17:27 ` Jan Kiszka @ 2022-03-03 13:45 ` vijai kumar 2022-03-04 10:03 ` Baurzhan Ismagulov 0 siblings, 1 reply; 9+ messages in thread From: vijai kumar @ 2022-03-03 13:45 UTC (permalink / raw) To: Jan Kiszka; +Cc: Henning Schild, isar-users, Baurzhan Ismagulov On Fri, Feb 25, 2022 at 10:57 PM Jan Kiszka <jan.kiszka@siemens.com> wrote: > > On 24.02.22 16:42, Henning Schild wrote: > > Am Thu, 24 Feb 2022 18:50:50 +0530 > > schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>: > > > >> Hi Henning, > >> > >> On Tue, Feb 22, 2022 at 8:01 PM Henning Schild > >> <henning.schild@siemens.com> wrote: > >>> > >>> Hey Vijai, > >>> > >>> Am Tue, 22 Feb 2022 16:04:36 +0530 > >>> schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>: > >>> > >>>> Problem: > >>>> -------- > >>>> We could have several CI jobs that are running in parallel in > >>>> different nodes. One might want to consolidate and build a > >>>> base-apt from the debs/deb-srcs of all these builds. > >>> > >>> Can you go into more detail. I do not yet get the problem. > >> > >> runner 1(Germany) -> Building de0 nano > >> runner 2(India) -> Building qemuarm > >> runner 3(US) -> Building qemuamd64 > >> > >> > >> All these builds are running in different servers. > >> If we wanted to create a single base-apt from all these servers, then > >> we need to copy over their deb/debsrcs/base-apt to a common server and > >> then > >> create a consolidated repo. > > > > But why would you want to do that? I mean i get why you would want to > > store all in the same location, but not why it should be one repo. > > Maybe to save some space on sources and arch all .. but hey there are > > ways of deduplcating on filesystem or block level. > > You are just risking a weird local "all" package not being so "all" > > after all ... false sharing. > > We want to auto-build a single, "offline" capable repo from the BoM > accumulated from those builds of all possible targets. And that in a way > that does not require pushing large artifacts between the build stages, > ideally only those BoM lists. If we are in agreement then we can think about how to achieve this. There are changes coming in soon, so the implementation should take that into consideration. I am not sure if the caching part is reworked. If so having an idea on the design would definitely help; Maybe ISAR maintainers can clarify on this. Thanks, Vijai Kumar K > > > > >> This involves moving around this data. > > > > Yes, if it one central storage place. No matter if it is one "repo" or > > many "repos" in i.e. folders. > > > >> The problem can be avoided if we have a single metadata produced by > >> all these builds which would have details of all the packages the > >> build used. > >> Basically a manifest of the build. This manifest can be later used to > >> recreate the repo which can be hosted later on for these jobs. > > > > We have a manifest for "image content" which already is fed into > > clearing, it is a bill of materials an nothing else, it can not > > be used to rebuild. > > Even if you had all metadata you need to store sources and binaries > > somewhere reliable, whether that is central or distributed is another > > story. > > Pointers to anything on the internet (including all debian repos) will > > at some point stop working. So if "exact rebuilding" in a "far away > > future" is what you want, mirroring is what you will need. > > Exactly, this mirror is supposed to be generated, and that shortly after > the individual builds succeeded (in a common pipeline stage). That can > fail as any build can fail if a referenced version picked up during > bootstrap got dropped while building an image. > > > Partial mirroring based on base-apt even with sources will be shaky and > > you will find yourself digging in snapshots again. But it will work. > > Yes, it works for us (you should know ;)). > > Jan > > -- > Siemens AG, Technology > Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds 2022-03-03 13:45 ` vijai kumar @ 2022-03-04 10:03 ` Baurzhan Ismagulov 2022-03-07 7:23 ` vijai kumar 0 siblings, 1 reply; 9+ messages in thread From: Baurzhan Ismagulov @ 2022-03-04 10:03 UTC (permalink / raw) To: isar-users On Thu, Mar 03, 2022 at 07:15:40PM +0530, vijai kumar wrote: > If we are in agreement then we can think about how to achieve this. > There are changes coming in soon, so the implementation should take > that into consideration. > > I am not sure if the caching part is reworked. If so having an idea on > the design would definitely help; Thanks Vijai for the discussion. In short, we've already started further base-apt improvement due to a number of reasons, e.g.: * Strict usage of base-apt for debootstrap and build-dep to ensure base-apt correctness in any build. * Pluggability of debootstrap, which is necessary for multistrapping, sudo removal, and maintainability. * We need to know which PN-PV is satisfiable from which location (base-apt, isar-apt, bitbake) in order to use Debian Build-Depends in bitbake. python-apt provides the necessary functionality. After we have the above, more necessary use cases become possible. E.g., storing and reusing built packages in per-layer apt repos. We also want to have parallel building. For us, it comes more from the CI side, as we have 3 h for fast and 10 h for full testsuite on the latest inexpensive hardware. The first step would be to parallelize the testcases with storing of intermediate results in a shared location. The second step would be extending that to individual bitbake tasks. Maybe icecc would be good enough to cover either or both, we have to test. Regarding your implementation proposal, I think that could be done. However, I'd like to better understand the motivation first. Is it e.g. creating a canonical repo for a given project? That would be easier to implement on top of the above. Regarding downloading time -- we had tested full local Debian mirrors and didn't see any performance improvement of CI jobs. We haven't dug deeper, maybe we have some parallelization killers in Isar. Regarding the central repo for remote building sites -- in my experience, it is very slow, our customers end up installing local replication servers. We aim at full Debian support, be it packages, repos, or images. Debian, being a binary server / desktop distribution and not a source-based development kit, has a number of inflexibilities such as sudo, versioning, rules, etc.; we would like to work towards more developer friendliness here. Bitbake and Yocto contribute much here, and we would like to find a good working solution. That is why we welcome this use case and would like to work on that after understanding the details. Jan told me you already had some implementations for this. You also mention time and costs. Could you please share the concept behind the work so far, and which time and costs you mean? Then we could proceed step by step while having the big picture in mind. With kind regards, Baurzhan. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds 2022-03-04 10:03 ` Baurzhan Ismagulov @ 2022-03-07 7:23 ` vijai kumar 2022-03-15 11:45 ` Baurzhan Ismagulov 0 siblings, 1 reply; 9+ messages in thread From: vijai kumar @ 2022-03-07 7:23 UTC (permalink / raw) To: isar-users On Fri, Mar 4, 2022 at 3:33 PM Baurzhan Ismagulov <ibr@radix50.net> wrote: > > On Thu, Mar 03, 2022 at 07:15:40PM +0530, vijai kumar wrote: > > If we are in agreement then we can think about how to achieve this. > > There are changes coming in soon, so the implementation should take > > that into consideration. > > > > I am not sure if the caching part is reworked. If so having an idea on > > the design would definitely help; > > Thanks Vijai for the discussion. In short, we've already started further > base-apt improvement due to a number of reasons, e.g.: If there is already a branch for this activity with some initial implementations, can you please point to it? > > * Strict usage of base-apt for debootstrap and build-dep to ensure base-apt > correctness in any build. > > * Pluggability of debootstrap, which is necessary for multistrapping, sudo > removal, and maintainability. > > * We need to know which PN-PV is satisfiable from which location (base-apt, > isar-apt, bitbake) in order to use Debian Build-Depends in bitbake. > > python-apt provides the necessary functionality. After we have the above, more > necessary use cases become possible. E.g., storing and reusing built packages > in per-layer apt repos. > > We also want to have parallel building. For us, it comes more from the CI side, > as we have 3 h for fast and 10 h for full testsuite on the latest inexpensive > hardware. The first step would be to parallelize the testcases with storing of > intermediate results in a shared location. The second step would be extending > that to individual bitbake tasks. Maybe icecc would be good enough to cover > either or both, we have to test. > > Regarding your implementation proposal, I think that could be done. However, > I'd like to better understand the motivation first. Is it e.g. creating a > canonical repo for a given project? That would be easier to implement on top of > the above. It is for recreating a repo for a given project from some kind of manifest. This way we could avoid pushing the repos between multiple CI runners. The current goal is to create a single repo from multiple projects. We might have multiple projects running parallel in different CI runners, the idea is to create a single repo from all those builds without the need to push data around. So, some kind of project manifest. These manifests then can be used to create a single repo. Instead of copying over all the debs to a single location and trigger creation of base-apt. > > Regarding downloading time -- we had tested full local Debian mirrors and > didn't see any performance improvement of CI jobs. We haven't dug deeper, maybe > we have some parallelization killers in Isar. > > Regarding the central repo for remote building sites -- in my experience, it is > very slow, our customers end up installing local replication servers. > > We aim at full Debian support, be it packages, repos, or images. Debian, being > a binary server / desktop distribution and not a source-based development kit, > has a number of inflexibilities such as sudo, versioning, rules, etc.; we would > like to work towards more developer friendliness here. Bitbake and Yocto > contribute much here, and we would like to find a good working solution. > > That is why we welcome this use case and would like to work on that after > understanding the details. Jan told me you already had some implementations for > this. You also mention time and costs. Could you please share the concept > behind the work so far, and which time and costs you mean? Then we could > proceed step by step while having the big picture in mind. We thought about a few options, 1. Gather the download urls for all the packages we download. This could be our metadata. 2. Club all manifests at the end of build. (Buildchroot, isar-bootstrap & image rootfs) to recreate a master list of packages used in the build. 2 has some disadvantages. We have to probe buildchroot after image build to get a complete package list. Even then it doesnot capture the packages that are removed. 1 seems like a solution, It would have to be injected as part of the build, like how we injected downloading debs. The advantage it brings is that we don't necessarily need the apt sources information to recreate the repo. A simple wget would do. There is also a risk of urls becoming obsolete. There could be better solutions, maybe our new way of creating base-apt might help in creating metadata in a cleaner way. Thanks, Vijai Kumar K > > With kind regards, > Baurzhan. > > -- > You received this message because you are subscribed to the Google Groups "isar-users" group. > To unsubscribe from this group and stop receiving emails from it, send an email to isar-users+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/isar-users/YiHj1KTffbhLxPl5%40ilbers.de. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds 2022-03-07 7:23 ` vijai kumar @ 2022-03-15 11:45 ` Baurzhan Ismagulov 0 siblings, 0 replies; 9+ messages in thread From: Baurzhan Ismagulov @ 2022-03-15 11:45 UTC (permalink / raw) To: isar-users On Mon, Mar 07, 2022 at 12:53:26PM +0530, vijai kumar wrote: > If there is already a branch for this activity with some initial > implementations, can you please point to it? v2: https://groups.google.com/g/isar-users/c/65lRtw4EU_8/m/_O2hIRPBAgAJ v3 WIP: https://github.com/ilbers/isar/tree/baseapt_v3/10 > It is for recreating a repo for a given project from some kind of > manifest. This way we could > avoid pushing the repos between multiple CI runners. > > The current goal is to create a single repo from multiple projects. We > might have multiple projects running parallel in different CI runners, > the idea is to create a single repo from all those builds without the > need to push data around. So, some kind of project manifest. > > These manifests then can be used to create a single repo. Instead of > copying over all the debs to a single location and trigger creation of > base-apt. Ok, so it's about creating One Canonical Base-Apt (for a project / department / business unit / company). > We thought about a few options, > 1. Gather the download urls for all the packages we download. This > could be our metadata. > 2. Club all manifests at the end of build. (Buildchroot, > isar-bootstrap & image rootfs) to recreate a master list of packages > used in the build. > > 2 has some disadvantages. We have to probe buildchroot after image > build to get a complete package list. Even then it doesnot capture the > packages that are removed. > 1 seems like a solution, It would have to be injected as part of the > build, like how we injected downloading debs. The advantage it brings > is that we don't necessarily need the apt sources information to > recreate the repo. A simple wget would do. There is also a risk of > urls becoming obsolete. > > There could be better solutions, maybe our new way of creating > base-apt might help in creating metadata in a cleaner way. I agree that post-build collection has some limitations -- that was the motivation for us to make a small step towards a more Debian-like repo management. The patchset implements the approach #1. We use python-apt to determine what we need upfront. The debootstrap part works in v2. Build-deps part is about to be finished in v3. We download immediately, but updating to output only is easy -- that is in fact one of our requirements. We don't address package removal and recursive fetching (rebuilding the whole base-apt exclusively from local files) in this step. Implementing https://wiki.debian.org/HelmutGrohne/rebootstrap with Isar would be cool (and we need at least parts of that logic for certain use cases), but we have to see whether we need other stuff like Build-Depends support before that. With kind regards, Baurzhan. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-03-15 11:45 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-22 10:34 [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds vijai kumar 2022-02-22 14:31 ` Henning Schild 2022-02-24 13:20 ` vijai kumar 2022-02-24 15:42 ` Henning Schild 2022-02-25 17:27 ` Jan Kiszka 2022-03-03 13:45 ` vijai kumar 2022-03-04 10:03 ` Baurzhan Ismagulov 2022-03-07 7:23 ` vijai kumar 2022-03-15 11:45 ` Baurzhan Ismagulov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox