* [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds
@ 2022-02-22 10:34 vijai kumar
2022-02-22 14:31 ` Henning Schild
0 siblings, 1 reply; 9+ messages in thread
From: vijai kumar @ 2022-02-22 10:34 UTC (permalink / raw)
To: isar-users, Henning Schild, Baurzhan Ismagulov; +Cc: Jan Kiszka
Problem:
--------
We could have several CI jobs that are running in parallel in different nodes.
One might want to consolidate and build a base-apt from the
debs/deb-srcs of all these builds.
What's possible:
---------------
With the current state of ISAR, the below is possible.
1. Run all the jobs in parallel in separate CI runners
2. Collect all the debs and deb-srcs from those builds and push to a
common file server.
3. Download the debs and deb-srcs and create a repo out of it in the
final CI step,
4. Upload the base-apt to the server.
This has some disadvantages, we need to move all those
data(deb/debsrcs), this increases time and cost.
What's needed:
--------------
The idea is to have a simple meta-data that can be used by repo
generation tools to recreate the repo.
Why manifest cannot be used:
----------------------------
Manifest does not serve this particular need. Below are the
shortcomings of image manifest,
1. Does not have details about removed packages(eg localepurge)
2. Manifest of buildchroot would not have details about the package
dependencies/imager installs at the time of generation(i.e.
postprocess)
Some ideas:
-----------
There were a couple of ideas,
1. To use an external script to create a manifest of the
downloads/{deb, debsrc} folder and try to download the packages using
that manifest and appropriate sourceslist in the final runner.
2. To use "apt --print-uris" + "debootstrap --keep-debootstrap-dir" to
create a metadata with complete url to the package. Later wget can be
used to download those from the web.
We are wondering if we could discuss and derive a solution for this
here in ISAR itself instead of opting for some local scripts in
downstream layers.
Thanks,
Vijai Kumar K
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds
2022-02-22 10:34 [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds vijai kumar
@ 2022-02-22 14:31 ` Henning Schild
2022-02-24 13:20 ` vijai kumar
0 siblings, 1 reply; 9+ messages in thread
From: Henning Schild @ 2022-02-22 14:31 UTC (permalink / raw)
To: vijai kumar; +Cc: isar-users, Baurzhan Ismagulov, Jan Kiszka
Hey Vijai,
Am Tue, 22 Feb 2022 16:04:36 +0530
schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>:
> Problem:
> --------
> We could have several CI jobs that are running in parallel in
> different nodes. One might want to consolidate and build a base-apt
> from the debs/deb-srcs of all these builds.
Can you go into more detail. I do not yet get the problem.
It seems like you want to save compute time by sharing pre-built
artifacts via some common storage. The sstate can do that very well, we
are using shared folders for on-prem runners, s3 for AWS and sstate
mirrors for population of "new empty runners" and "partial result
delivery" of failed jobs and to sync on-prem with s3.
isar is a tool to build images, not distros or repos or packages. While
it can do all of that using it for such things can get tricky and isar
was not designed for such cases. Meaning "base-apt" is not meant to be
your cache to build many images from ... it is meant to be the cache
for exactly one ... and sharing can cause problems.
sstate would detect false sharing, say a package recipe for some reason
uses a machine-conf variable. multiconfig or base-apt sharing would
make you run into that bug, while sstate would likely not.
So if it is about build time i suggest you have a look at sstate and the
not yet upstreamed python helper scripts for sharing/eviction i can
point you to in case you do not find it yourself.
Henning
> What's possible:
> ---------------
> With the current state of ISAR, the below is possible.
>
> 1. Run all the jobs in parallel in separate CI runners
> 2. Collect all the debs and deb-srcs from those builds and push to a
> common file server.
> 3. Download the debs and deb-srcs and create a repo out of it in the
> final CI step,
> 4. Upload the base-apt to the server.
>
> This has some disadvantages, we need to move all those
> data(deb/debsrcs), this increases time and cost.
>
> What's needed:
> --------------
> The idea is to have a simple meta-data that can be used by repo
> generation tools to recreate the repo.
>
> Why manifest cannot be used:
> ----------------------------
> Manifest does not serve this particular need. Below are the
> shortcomings of image manifest,
> 1. Does not have details about removed packages(eg localepurge)
> 2. Manifest of buildchroot would not have details about the package
> dependencies/imager installs at the time of generation(i.e.
> postprocess)
>
> Some ideas:
> -----------
> There were a couple of ideas,
> 1. To use an external script to create a manifest of the
> downloads/{deb, debsrc} folder and try to download the packages using
> that manifest and appropriate sourceslist in the final runner.
> 2. To use "apt --print-uris" + "debootstrap --keep-debootstrap-dir" to
> create a metadata with complete url to the package. Later wget can be
> used to download those from the web.
>
> We are wondering if we could discuss and derive a solution for this
> here in ISAR itself instead of opting for some local scripts in
> downstream layers.
>
> Thanks,
> Vijai Kumar K
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds
2022-02-22 14:31 ` Henning Schild
@ 2022-02-24 13:20 ` vijai kumar
2022-02-24 15:42 ` Henning Schild
0 siblings, 1 reply; 9+ messages in thread
From: vijai kumar @ 2022-02-24 13:20 UTC (permalink / raw)
To: Henning Schild; +Cc: isar-users, Baurzhan Ismagulov, Jan Kiszka
Hi Henning,
On Tue, Feb 22, 2022 at 8:01 PM Henning Schild
<henning.schild@siemens.com> wrote:
>
> Hey Vijai,
>
> Am Tue, 22 Feb 2022 16:04:36 +0530
> schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>:
>
> > Problem:
> > --------
> > We could have several CI jobs that are running in parallel in
> > different nodes. One might want to consolidate and build a base-apt
> > from the debs/deb-srcs of all these builds.
>
> Can you go into more detail. I do not yet get the problem.
runner 1(Germany) -> Building de0 nano
runner 2(India) -> Building qemuarm
runner 3(US) -> Building qemuamd64
All these builds are running in different servers.
If we wanted to create a single base-apt from all these servers, then
we need to copy over their deb/debsrcs/base-apt to a common server and
then
create a consolidated repo.
This involves moving around this data.
The problem can be avoided if we have a single metadata produced by
all these builds which would have details of all the packages the
build used.
Basically a manifest of the build. This manifest can be later used to
recreate the repo which can be hosted later on for these jobs.
Having metadata and recreating repo is one way. There might be other
ways as well.
That is where we thought about the --print-uris option of apt. It
basically gives you the complete URL to the package which we can
download using wget.
A manifest containing all the packages ever used by the build with its
complete url. It could easily be used for several purposes, like as
clearing input,
repo regeneration etc.
I don't think sstate can help here. I might be wrong though.
Thanks,
Vijai Kumar K
>
> It seems like you want to save compute time by sharing pre-built
> artifacts via some common storage. The sstate can do that very well, we
> are using shared folders for on-prem runners, s3 for AWS and sstate
> mirrors for population of "new empty runners" and "partial result
> delivery" of failed jobs and to sync on-prem with s3.
>
> isar is a tool to build images, not distros or repos or packages. While
> it can do all of that using it for such things can get tricky and isar
> was not designed for such cases. Meaning "base-apt" is not meant to be
> your cache to build many images from ... it is meant to be the cache
> for exactly one ... and sharing can cause problems.
>
> sstate would detect false sharing, say a package recipe for some reason
> uses a machine-conf variable. multiconfig or base-apt sharing would
> make you run into that bug, while sstate would likely not.
>
> So if it is about build time i suggest you have a look at sstate and the
> not yet upstreamed python helper scripts for sharing/eviction i can
> point you to in case you do not find it yourself.
>
> Henning
>
> > What's possible:
> > ---------------
> > With the current state of ISAR, the below is possible.
> >
> > 1. Run all the jobs in parallel in separate CI runners
> > 2. Collect all the debs and deb-srcs from those builds and push to a
> > common file server.
> > 3. Download the debs and deb-srcs and create a repo out of it in the
> > final CI step,
> > 4. Upload the base-apt to the server.
> >
> > This has some disadvantages, we need to move all those
> > data(deb/debsrcs), this increases time and cost.
> >
> > What's needed:
> > --------------
> > The idea is to have a simple meta-data that can be used by repo
> > generation tools to recreate the repo.
> >
> > Why manifest cannot be used:
> > ----------------------------
> > Manifest does not serve this particular need. Below are the
> > shortcomings of image manifest,
> > 1. Does not have details about removed packages(eg localepurge)
> > 2. Manifest of buildchroot would not have details about the package
> > dependencies/imager installs at the time of generation(i.e.
> > postprocess)
> >
> > Some ideas:
> > -----------
> > There were a couple of ideas,
> > 1. To use an external script to create a manifest of the
> > downloads/{deb, debsrc} folder and try to download the packages using
> > that manifest and appropriate sourceslist in the final runner.
> > 2. To use "apt --print-uris" + "debootstrap --keep-debootstrap-dir" to
> > create a metadata with complete url to the package. Later wget can be
> > used to download those from the web.
> >
> > We are wondering if we could discuss and derive a solution for this
> > here in ISAR itself instead of opting for some local scripts in
> > downstream layers.
> >
> > Thanks,
> > Vijai Kumar K
>
> --
> You received this message because you are subscribed to the Google Groups "isar-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isar-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/isar-users/20220222153136.08432cb3%40md1za8fc.ad001.siemens.net.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds
2022-02-24 13:20 ` vijai kumar
@ 2022-02-24 15:42 ` Henning Schild
2022-02-25 17:27 ` Jan Kiszka
0 siblings, 1 reply; 9+ messages in thread
From: Henning Schild @ 2022-02-24 15:42 UTC (permalink / raw)
To: vijai kumar; +Cc: isar-users, Baurzhan Ismagulov, Jan Kiszka
Am Thu, 24 Feb 2022 18:50:50 +0530
schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>:
> Hi Henning,
>
> On Tue, Feb 22, 2022 at 8:01 PM Henning Schild
> <henning.schild@siemens.com> wrote:
> >
> > Hey Vijai,
> >
> > Am Tue, 22 Feb 2022 16:04:36 +0530
> > schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>:
> >
> > > Problem:
> > > --------
> > > We could have several CI jobs that are running in parallel in
> > > different nodes. One might want to consolidate and build a
> > > base-apt from the debs/deb-srcs of all these builds.
> >
> > Can you go into more detail. I do not yet get the problem.
>
> runner 1(Germany) -> Building de0 nano
> runner 2(India) -> Building qemuarm
> runner 3(US) -> Building qemuamd64
>
>
> All these builds are running in different servers.
> If we wanted to create a single base-apt from all these servers, then
> we need to copy over their deb/debsrcs/base-apt to a common server and
> then
> create a consolidated repo.
But why would you want to do that? I mean i get why you would want to
store all in the same location, but not why it should be one repo.
Maybe to save some space on sources and arch all .. but hey there are
ways of deduplcating on filesystem or block level.
You are just risking a weird local "all" package not being so "all"
after all ... false sharing.
> This involves moving around this data.
Yes, if it one central storage place. No matter if it is one "repo" or
many "repos" in i.e. folders.
> The problem can be avoided if we have a single metadata produced by
> all these builds which would have details of all the packages the
> build used.
> Basically a manifest of the build. This manifest can be later used to
> recreate the repo which can be hosted later on for these jobs.
We have a manifest for "image content" which already is fed into
clearing, it is a bill of materials an nothing else, it can not
be used to rebuild.
Even if you had all metadata you need to store sources and binaries
somewhere reliable, whether that is central or distributed is another
story.
Pointers to anything on the internet (including all debian repos) will
at some point stop working. So if "exact rebuilding" in a "far away
future" is what you want, mirroring is what you will need.
Partial mirroring based on base-apt even with sources will be shaky and
you will find yourself digging in snapshots again. But it will work.
In the worst case you will not want "exact rebuild" but "fix backported
rebuild", which means you will need all build-deps mirrored ...
rescursively. In fact any "package relationship" maybe even a Conflicts
might become rebuild relevant.
A partial mirror will not cut it, rather take a full one, so you do not
need to care of which bits to ignore and do not risk forgetting
anything.
The ideal way would be to eventually liberate snapshots of its
throttling, the short term way is to spend some bucks on some buckets
(S3).
> Having metadata and recreating repo is one way. There might be other
> ways as well.
I am afraid you likely can not recreate if you do not keep everything
yourself or a place you trust (snapshots?).
There have been several threads on that topic already, including how
one could help make snapshot work for debootstrap and co. Coming from
reproducible builds and qubes-os [1] [2].
If you dig deeper you will find many people offering help and funding
but for some reason things seem still "stuck".
On top we could maybe see if we can establish something like snapshots
in Siemens. But i guess outside and open to anyone will be much better.
[1] https://groups.google.com/g/isar-users/c/X9B5chyEWpc/m/nVXwZuIRBAAJ
[2]
https://www.qubes-os.org/news/2021/10/08/reproducible-builds-for-debian-a-big-step-forward/
> That is where we thought about the --print-uris option of apt. It
> basically gives you the complete URL to the package which we can
> download using wget.
> A manifest containing all the packages ever used by the build with its
> complete url. It could easily be used for several purposes, like as
> clearing input,
> repo regeneration etc.
Maybe we can find valid reasons to extend the manifests. But URLs to
packages seem almost redundant, knowing the package names and versions
and all sources.list entries one can generate these URLs for any
mirror, picking just one of many mirrors would be limiting.
And maybe there are valid reasons to having manifests even for
buildchroots. But the problem here is that they change all the time
while we still use one buildchoot. We see packages being added as build
deps all the time, but also removed when build deps conflict.
> I don't think sstate can help here. I might be wrong though.
I guess sstate will not help. It is even more storage needs and more
storage sync needs between runners if you want to share.
Henning
> Thanks,
> Vijai Kumar K
>
> >
> > It seems like you want to save compute time by sharing pre-built
> > artifacts via some common storage. The sstate can do that very
> > well, we are using shared folders for on-prem runners, s3 for AWS
> > and sstate mirrors for population of "new empty runners" and
> > "partial result delivery" of failed jobs and to sync on-prem with
> > s3.
> >
> > isar is a tool to build images, not distros or repos or packages.
> > While it can do all of that using it for such things can get tricky
> > and isar was not designed for such cases. Meaning "base-apt" is not
> > meant to be your cache to build many images from ... it is meant to
> > be the cache for exactly one ... and sharing can cause problems.
> >
> > sstate would detect false sharing, say a package recipe for some
> > reason uses a machine-conf variable. multiconfig or base-apt
> > sharing would make you run into that bug, while sstate would likely
> > not.
> >
> > So if it is about build time i suggest you have a look at sstate
> > and the not yet upstreamed python helper scripts for
> > sharing/eviction i can point you to in case you do not find it
> > yourself.
> >
> > Henning
> >
> > > What's possible:
> > > ---------------
> > > With the current state of ISAR, the below is possible.
> > >
> > > 1. Run all the jobs in parallel in separate CI runners
> > > 2. Collect all the debs and deb-srcs from those builds and push
> > > to a common file server.
> > > 3. Download the debs and deb-srcs and create a repo out of it in
> > > the final CI step,
> > > 4. Upload the base-apt to the server.
> > >
> > > This has some disadvantages, we need to move all those
> > > data(deb/debsrcs), this increases time and cost.
> > >
> > > What's needed:
> > > --------------
> > > The idea is to have a simple meta-data that can be used by repo
> > > generation tools to recreate the repo.
> > >
> > > Why manifest cannot be used:
> > > ----------------------------
> > > Manifest does not serve this particular need. Below are the
> > > shortcomings of image manifest,
> > > 1. Does not have details about removed packages(eg localepurge)
> > > 2. Manifest of buildchroot would not have details about the
> > > package dependencies/imager installs at the time of
> > > generation(i.e. postprocess)
> > >
> > > Some ideas:
> > > -----------
> > > There were a couple of ideas,
> > > 1. To use an external script to create a manifest of the
> > > downloads/{deb, debsrc} folder and try to download the packages
> > > using that manifest and appropriate sourceslist in the final
> > > runner. 2. To use "apt --print-uris" + "debootstrap
> > > --keep-debootstrap-dir" to create a metadata with complete url to
> > > the package. Later wget can be used to download those from the
> > > web.
> > >
> > > We are wondering if we could discuss and derive a solution for
> > > this here in ISAR itself instead of opting for some local scripts
> > > in downstream layers.
> > >
> > > Thanks,
> > > Vijai Kumar K
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "isar-users" group. To unsubscribe from this group and stop
> > receiving emails from it, send an email to
> > isar-users+unsubscribe@googlegroups.com. To view this discussion on
> > the web visit
> > https://groups.google.com/d/msgid/isar-users/20220222153136.08432cb3%40md1za8fc.ad001.siemens.net.
> >
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds
2022-02-24 15:42 ` Henning Schild
@ 2022-02-25 17:27 ` Jan Kiszka
2022-03-03 13:45 ` vijai kumar
0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2022-02-25 17:27 UTC (permalink / raw)
To: Henning Schild, vijai kumar; +Cc: isar-users, Baurzhan Ismagulov
On 24.02.22 16:42, Henning Schild wrote:
> Am Thu, 24 Feb 2022 18:50:50 +0530
> schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>:
>
>> Hi Henning,
>>
>> On Tue, Feb 22, 2022 at 8:01 PM Henning Schild
>> <henning.schild@siemens.com> wrote:
>>>
>>> Hey Vijai,
>>>
>>> Am Tue, 22 Feb 2022 16:04:36 +0530
>>> schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>:
>>>
>>>> Problem:
>>>> --------
>>>> We could have several CI jobs that are running in parallel in
>>>> different nodes. One might want to consolidate and build a
>>>> base-apt from the debs/deb-srcs of all these builds.
>>>
>>> Can you go into more detail. I do not yet get the problem.
>>
>> runner 1(Germany) -> Building de0 nano
>> runner 2(India) -> Building qemuarm
>> runner 3(US) -> Building qemuamd64
>>
>>
>> All these builds are running in different servers.
>> If we wanted to create a single base-apt from all these servers, then
>> we need to copy over their deb/debsrcs/base-apt to a common server and
>> then
>> create a consolidated repo.
>
> But why would you want to do that? I mean i get why you would want to
> store all in the same location, but not why it should be one repo.
> Maybe to save some space on sources and arch all .. but hey there are
> ways of deduplcating on filesystem or block level.
> You are just risking a weird local "all" package not being so "all"
> after all ... false sharing.
We want to auto-build a single, "offline" capable repo from the BoM
accumulated from those builds of all possible targets. And that in a way
that does not require pushing large artifacts between the build stages,
ideally only those BoM lists.
>
>> This involves moving around this data.
>
> Yes, if it one central storage place. No matter if it is one "repo" or
> many "repos" in i.e. folders.
>
>> The problem can be avoided if we have a single metadata produced by
>> all these builds which would have details of all the packages the
>> build used.
>> Basically a manifest of the build. This manifest can be later used to
>> recreate the repo which can be hosted later on for these jobs.
>
> We have a manifest for "image content" which already is fed into
> clearing, it is a bill of materials an nothing else, it can not
> be used to rebuild.
> Even if you had all metadata you need to store sources and binaries
> somewhere reliable, whether that is central or distributed is another
> story.
> Pointers to anything on the internet (including all debian repos) will
> at some point stop working. So if "exact rebuilding" in a "far away
> future" is what you want, mirroring is what you will need.
Exactly, this mirror is supposed to be generated, and that shortly after
the individual builds succeeded (in a common pipeline stage). That can
fail as any build can fail if a referenced version picked up during
bootstrap got dropped while building an image.
> Partial mirroring based on base-apt even with sources will be shaky and
> you will find yourself digging in snapshots again. But it will work.
Yes, it works for us (you should know ;)).
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds
2022-02-25 17:27 ` Jan Kiszka
@ 2022-03-03 13:45 ` vijai kumar
2022-03-04 10:03 ` Baurzhan Ismagulov
0 siblings, 1 reply; 9+ messages in thread
From: vijai kumar @ 2022-03-03 13:45 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Henning Schild, isar-users, Baurzhan Ismagulov
On Fri, Feb 25, 2022 at 10:57 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>
> On 24.02.22 16:42, Henning Schild wrote:
> > Am Thu, 24 Feb 2022 18:50:50 +0530
> > schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>:
> >
> >> Hi Henning,
> >>
> >> On Tue, Feb 22, 2022 at 8:01 PM Henning Schild
> >> <henning.schild@siemens.com> wrote:
> >>>
> >>> Hey Vijai,
> >>>
> >>> Am Tue, 22 Feb 2022 16:04:36 +0530
> >>> schrieb vijai kumar <vijaikumar.kanagarajan@gmail.com>:
> >>>
> >>>> Problem:
> >>>> --------
> >>>> We could have several CI jobs that are running in parallel in
> >>>> different nodes. One might want to consolidate and build a
> >>>> base-apt from the debs/deb-srcs of all these builds.
> >>>
> >>> Can you go into more detail. I do not yet get the problem.
> >>
> >> runner 1(Germany) -> Building de0 nano
> >> runner 2(India) -> Building qemuarm
> >> runner 3(US) -> Building qemuamd64
> >>
> >>
> >> All these builds are running in different servers.
> >> If we wanted to create a single base-apt from all these servers, then
> >> we need to copy over their deb/debsrcs/base-apt to a common server and
> >> then
> >> create a consolidated repo.
> >
> > But why would you want to do that? I mean i get why you would want to
> > store all in the same location, but not why it should be one repo.
> > Maybe to save some space on sources and arch all .. but hey there are
> > ways of deduplcating on filesystem or block level.
> > You are just risking a weird local "all" package not being so "all"
> > after all ... false sharing.
>
> We want to auto-build a single, "offline" capable repo from the BoM
> accumulated from those builds of all possible targets. And that in a way
> that does not require pushing large artifacts between the build stages,
> ideally only those BoM lists.
If we are in agreement then we can think about how to achieve this.
There are changes coming in soon, so the implementation should take
that into consideration.
I am not sure if the caching part is reworked. If so having an idea on
the design would definitely help;
Maybe ISAR maintainers can clarify on this.
Thanks,
Vijai Kumar K
>
> >
> >> This involves moving around this data.
> >
> > Yes, if it one central storage place. No matter if it is one "repo" or
> > many "repos" in i.e. folders.
> >
> >> The problem can be avoided if we have a single metadata produced by
> >> all these builds which would have details of all the packages the
> >> build used.
> >> Basically a manifest of the build. This manifest can be later used to
> >> recreate the repo which can be hosted later on for these jobs.
> >
> > We have a manifest for "image content" which already is fed into
> > clearing, it is a bill of materials an nothing else, it can not
> > be used to rebuild.
> > Even if you had all metadata you need to store sources and binaries
> > somewhere reliable, whether that is central or distributed is another
> > story.
> > Pointers to anything on the internet (including all debian repos) will
> > at some point stop working. So if "exact rebuilding" in a "far away
> > future" is what you want, mirroring is what you will need.
>
> Exactly, this mirror is supposed to be generated, and that shortly after
> the individual builds succeeded (in a common pipeline stage). That can
> fail as any build can fail if a referenced version picked up during
> bootstrap got dropped while building an image.
>
> > Partial mirroring based on base-apt even with sources will be shaky and
> > you will find yourself digging in snapshots again. But it will work.
>
> Yes, it works for us (you should know ;)).
>
> Jan
>
> --
> Siemens AG, Technology
> Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds
2022-03-03 13:45 ` vijai kumar
@ 2022-03-04 10:03 ` Baurzhan Ismagulov
2022-03-07 7:23 ` vijai kumar
0 siblings, 1 reply; 9+ messages in thread
From: Baurzhan Ismagulov @ 2022-03-04 10:03 UTC (permalink / raw)
To: isar-users
On Thu, Mar 03, 2022 at 07:15:40PM +0530, vijai kumar wrote:
> If we are in agreement then we can think about how to achieve this.
> There are changes coming in soon, so the implementation should take
> that into consideration.
>
> I am not sure if the caching part is reworked. If so having an idea on
> the design would definitely help;
Thanks Vijai for the discussion. In short, we've already started further
base-apt improvement due to a number of reasons, e.g.:
* Strict usage of base-apt for debootstrap and build-dep to ensure base-apt
correctness in any build.
* Pluggability of debootstrap, which is necessary for multistrapping, sudo
removal, and maintainability.
* We need to know which PN-PV is satisfiable from which location (base-apt,
isar-apt, bitbake) in order to use Debian Build-Depends in bitbake.
python-apt provides the necessary functionality. After we have the above, more
necessary use cases become possible. E.g., storing and reusing built packages
in per-layer apt repos.
We also want to have parallel building. For us, it comes more from the CI side,
as we have 3 h for fast and 10 h for full testsuite on the latest inexpensive
hardware. The first step would be to parallelize the testcases with storing of
intermediate results in a shared location. The second step would be extending
that to individual bitbake tasks. Maybe icecc would be good enough to cover
either or both, we have to test.
Regarding your implementation proposal, I think that could be done. However,
I'd like to better understand the motivation first. Is it e.g. creating a
canonical repo for a given project? That would be easier to implement on top of
the above.
Regarding downloading time -- we had tested full local Debian mirrors and
didn't see any performance improvement of CI jobs. We haven't dug deeper, maybe
we have some parallelization killers in Isar.
Regarding the central repo for remote building sites -- in my experience, it is
very slow, our customers end up installing local replication servers.
We aim at full Debian support, be it packages, repos, or images. Debian, being
a binary server / desktop distribution and not a source-based development kit,
has a number of inflexibilities such as sudo, versioning, rules, etc.; we would
like to work towards more developer friendliness here. Bitbake and Yocto
contribute much here, and we would like to find a good working solution.
That is why we welcome this use case and would like to work on that after
understanding the details. Jan told me you already had some implementations for
this. You also mention time and costs. Could you please share the concept
behind the work so far, and which time and costs you mean? Then we could
proceed step by step while having the big picture in mind.
With kind regards,
Baurzhan.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds
2022-03-04 10:03 ` Baurzhan Ismagulov
@ 2022-03-07 7:23 ` vijai kumar
2022-03-15 11:45 ` Baurzhan Ismagulov
0 siblings, 1 reply; 9+ messages in thread
From: vijai kumar @ 2022-03-07 7:23 UTC (permalink / raw)
To: isar-users
On Fri, Mar 4, 2022 at 3:33 PM Baurzhan Ismagulov <ibr@radix50.net> wrote:
>
> On Thu, Mar 03, 2022 at 07:15:40PM +0530, vijai kumar wrote:
> > If we are in agreement then we can think about how to achieve this.
> > There are changes coming in soon, so the implementation should take
> > that into consideration.
> >
> > I am not sure if the caching part is reworked. If so having an idea on
> > the design would definitely help;
>
> Thanks Vijai for the discussion. In short, we've already started further
> base-apt improvement due to a number of reasons, e.g.:
If there is already a branch for this activity with some initial
implementations, can you please point to it?
>
> * Strict usage of base-apt for debootstrap and build-dep to ensure base-apt
> correctness in any build.
>
> * Pluggability of debootstrap, which is necessary for multistrapping, sudo
> removal, and maintainability.
>
> * We need to know which PN-PV is satisfiable from which location (base-apt,
> isar-apt, bitbake) in order to use Debian Build-Depends in bitbake.
>
> python-apt provides the necessary functionality. After we have the above, more
> necessary use cases become possible. E.g., storing and reusing built packages
> in per-layer apt repos.
>
> We also want to have parallel building. For us, it comes more from the CI side,
> as we have 3 h for fast and 10 h for full testsuite on the latest inexpensive
> hardware. The first step would be to parallelize the testcases with storing of
> intermediate results in a shared location. The second step would be extending
> that to individual bitbake tasks. Maybe icecc would be good enough to cover
> either or both, we have to test.
>
> Regarding your implementation proposal, I think that could be done. However,
> I'd like to better understand the motivation first. Is it e.g. creating a
> canonical repo for a given project? That would be easier to implement on top of
> the above.
It is for recreating a repo for a given project from some kind of
manifest. This way we could
avoid pushing the repos between multiple CI runners.
The current goal is to create a single repo from multiple projects. We
might have multiple projects running parallel in different CI runners,
the idea is to create a single repo from all those builds without the
need to push data around. So, some kind of project manifest.
These manifests then can be used to create a single repo. Instead of
copying over all the debs to a single location and trigger creation of
base-apt.
>
> Regarding downloading time -- we had tested full local Debian mirrors and
> didn't see any performance improvement of CI jobs. We haven't dug deeper, maybe
> we have some parallelization killers in Isar.
>
> Regarding the central repo for remote building sites -- in my experience, it is
> very slow, our customers end up installing local replication servers.
>
> We aim at full Debian support, be it packages, repos, or images. Debian, being
> a binary server / desktop distribution and not a source-based development kit,
> has a number of inflexibilities such as sudo, versioning, rules, etc.; we would
> like to work towards more developer friendliness here. Bitbake and Yocto
> contribute much here, and we would like to find a good working solution.
>
> That is why we welcome this use case and would like to work on that after
> understanding the details. Jan told me you already had some implementations for
> this. You also mention time and costs. Could you please share the concept
> behind the work so far, and which time and costs you mean? Then we could
> proceed step by step while having the big picture in mind.
We thought about a few options,
1. Gather the download urls for all the packages we download. This
could be our metadata.
2. Club all manifests at the end of build. (Buildchroot,
isar-bootstrap & image rootfs) to recreate a master list of packages
used in the build.
2 has some disadvantages. We have to probe buildchroot after image
build to get a complete package list. Even then it doesnot capture the
packages that are removed.
1 seems like a solution, It would have to be injected as part of the
build, like how we injected downloading debs. The advantage it brings
is that we don't necessarily need the apt sources information to
recreate the repo. A simple wget would do. There is also a risk of
urls becoming obsolete.
There could be better solutions, maybe our new way of creating
base-apt might help in creating metadata in a cleaner way.
Thanks,
Vijai Kumar K
>
> With kind regards,
> Baurzhan.
>
> --
> You received this message because you are subscribed to the Google Groups "isar-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isar-users+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/isar-users/YiHj1KTffbhLxPl5%40ilbers.de.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds
2022-03-07 7:23 ` vijai kumar
@ 2022-03-15 11:45 ` Baurzhan Ismagulov
0 siblings, 0 replies; 9+ messages in thread
From: Baurzhan Ismagulov @ 2022-03-15 11:45 UTC (permalink / raw)
To: isar-users
On Mon, Mar 07, 2022 at 12:53:26PM +0530, vijai kumar wrote:
> If there is already a branch for this activity with some initial
> implementations, can you please point to it?
v2: https://groups.google.com/g/isar-users/c/65lRtw4EU_8/m/_O2hIRPBAgAJ
v3 WIP: https://github.com/ilbers/isar/tree/baseapt_v3/10
> It is for recreating a repo for a given project from some kind of
> manifest. This way we could
> avoid pushing the repos between multiple CI runners.
>
> The current goal is to create a single repo from multiple projects. We
> might have multiple projects running parallel in different CI runners,
> the idea is to create a single repo from all those builds without the
> need to push data around. So, some kind of project manifest.
>
> These manifests then can be used to create a single repo. Instead of
> copying over all the debs to a single location and trigger creation of
> base-apt.
Ok, so it's about creating One Canonical Base-Apt (for a project / department /
business unit / company).
> We thought about a few options,
> 1. Gather the download urls for all the packages we download. This
> could be our metadata.
> 2. Club all manifests at the end of build. (Buildchroot,
> isar-bootstrap & image rootfs) to recreate a master list of packages
> used in the build.
>
> 2 has some disadvantages. We have to probe buildchroot after image
> build to get a complete package list. Even then it doesnot capture the
> packages that are removed.
> 1 seems like a solution, It would have to be injected as part of the
> build, like how we injected downloading debs. The advantage it brings
> is that we don't necessarily need the apt sources information to
> recreate the repo. A simple wget would do. There is also a risk of
> urls becoming obsolete.
>
> There could be better solutions, maybe our new way of creating
> base-apt might help in creating metadata in a cleaner way.
I agree that post-build collection has some limitations -- that was the
motivation for us to make a small step towards a more Debian-like repo
management. The patchset implements the approach #1. We use python-apt to
determine what we need upfront. The debootstrap part works in v2. Build-deps
part is about to be finished in v3. We download immediately, but updating to
output only is easy -- that is in fact one of our requirements.
We don't address package removal and recursive fetching (rebuilding the whole
base-apt exclusively from local files) in this step. Implementing
https://wiki.debian.org/HelmutGrohne/rebootstrap with Isar would be cool (and
we need at least parts of that logic for certain use cases), but we have to see
whether we need other stuff like Build-Depends support before that.
With kind regards,
Baurzhan.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-03-15 11:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-22 10:34 [Discussion]: Metadata to consolidate and rebuild base-apt from distributed CI builds vijai kumar
2022-02-22 14:31 ` Henning Schild
2022-02-24 13:20 ` vijai kumar
2022-02-24 15:42 ` Henning Schild
2022-02-25 17:27 ` Jan Kiszka
2022-03-03 13:45 ` vijai kumar
2022-03-04 10:03 ` Baurzhan Ismagulov
2022-03-07 7:23 ` vijai kumar
2022-03-15 11:45 ` Baurzhan Ismagulov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox