* expand-on-first-boot and surprising umounts
@ 2022-10-21 7:48 Raphael Lisicki
2022-10-21 8:45 ` Bezdeka, Florian
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Raphael Lisicki @ 2022-10-21 7:48 UTC (permalink / raw)
To: isar-users
Hi everybody,
I am using a debian bullseye based system and use expand-on-first-boot
to expand the last partition. It is not the root file system but an
extra ext4 partition to be mounted under /data. The mounting happens
after expand-on-first boot has succeeded.
Sometimes, on some builds more often than others, /data gets (attempted
to be) umounted immediately after being mounted and subsequent units
will fail.
Removing expand-on-first-boot resolves the issue, so does adding
"ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am
not sure if this is only a sophisticated way of solving a race condition
with "sleep".
My gut feeling is that after expand-on-first boot finishes, the kernel
still processes block device events, which systemd gets after /data has
already been mounted and then cause it to be umounted, as systemd was
already picky with umounting stuff in the past [1]. Unfortunately I have
no idea how to test this hypothesis.
Did anyone already experience something like this? The racy-ness of the
issue makes creating a minimal reproduction hard.
[1] https://github.com/systemd/systemd/issues/9869
best regards
raphael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts
2022-10-21 7:48 expand-on-first-boot and surprising umounts Raphael Lisicki
@ 2022-10-21 8:45 ` Bezdeka, Florian
2022-10-21 9:36 ` Raphael Lisicki
2022-10-21 11:15 ` Henning Schild
2022-10-21 15:41 ` Roberto A. Foglietta
2 siblings, 1 reply; 12+ messages in thread
From: Bezdeka, Florian @ 2022-10-21 8:45 UTC (permalink / raw)
To: Lisicki, Raphael, isar-users
On Fri, 2022-10-21 at 09:48 +0200, Raphael Lisicki wrote:
> Hi everybody,
>
> I am using a debian bullseye based system and use expand-on-first-boot
> to expand the last partition. It is not the root file system but an
> extra ext4 partition to be mounted under /data. The mounting happens
> after expand-on-first boot has succeeded.
>
> Sometimes, on some builds more often than others, /data gets (attempted
> to be) umounted immediately after being mounted and subsequent units
> will fail.
Hm... Nothing happens on build time. Everything takes place on the
"first" boot (until expand-on-first-boot disables itself). "more often"
is also confusing me because expand-on-first-boot should run exactly
once.
>
> Removing expand-on-first-boot resolves the issue, so does adding
> "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am
> not sure if this is only a sophisticated way of solving a race condition
> with "sleep".
>
> My gut feeling is that after expand-on-first boot finishes, the kernel
> still processes block device events, which systemd gets after /data has
> already been mounted and then cause it to be umounted, as systemd was
> already picky with umounting stuff in the past [1]. Unfortunately I have
> no idea how to test this hypothesis.
Nothing should be mounted (except the rootfs) until expand-on-first
completed. See below.
>
> Did anyone already experience something like this? The racy-ness of the
> issue makes creating a minimal reproduction hard.
Nope, have never seen that, but that doesn't mean that there is no
race.
We have
After=systemd-remount-fs.service
Before=local-fs-pre.target shutdown.target
ConditionPathIsReadWrite=/etc
inside the expand-on-first-boot.service file.
So we should be done before systemd tries to mount your /data
(according to Before=) and start after systemd did remounting in case
mount options (ro, rw, ...) have to be adjusted.
I guess you should check your startup order (systemd-analyze might
help) and report back if we missed something.
>
>
>
> [1] https://github.com/systemd/systemd/issues/9869
>
>
>
> best regards
> raphael
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts
2022-10-21 8:45 ` Bezdeka, Florian
@ 2022-10-21 9:36 ` Raphael Lisicki
2022-10-21 9:41 ` Bezdeka, Florian
0 siblings, 1 reply; 12+ messages in thread
From: Raphael Lisicki @ 2022-10-21 9:36 UTC (permalink / raw)
To: Bezdeka, Florian (T CED SES-DE), isar-users
On 21.10.22 10:45, Bezdeka, Florian (T CED SES-DE) wrote:
> On Fri, 2022-10-21 at 09:48 +0200, Raphael Lisicki wrote:
>> Hi everybody,
>>
>> I am using a debian bullseye based system and use expand-on-first-boot
>> to expand the last partition. It is not the root file system but an
>> extra ext4 partition to be mounted under /data. The mounting happens
>> after expand-on-first boot has succeeded.
>>
>> Sometimes, on some builds more often than others, /data gets (attempted
>> to be) umounted immediately after being mounted and subsequent units
>> will fail.
>
> Hm... Nothing happens on build time. Everything takes place on the
> "first" boot (until expand-on-first-boot disables itself). "more often"
> is also confusing me because expand-on-first-boot should run exactly
> once.
Yes, it runs on first boot, but I can use the very same original image
multiple times (after restoring it). And I can also try the same with
other builds (doing a first boot) and some builds seem to be more often
affected than others. My guess from this is that the exact
alignment/size/padding/moon phase of some things in the image also plays
a role.
>
>>
>> Removing expand-on-first-boot resolves the issue, so does adding
>> "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am
>> not sure if this is only a sophisticated way of solving a race condition
>> with "sleep".
>>
>> My gut feeling is that after expand-on-first boot finishes, the kernel
>> still processes block device events, which systemd gets after /data has
>> already been mounted and then cause it to be umounted, as systemd was
>> already picky with umounting stuff in the past [1]. Unfortunately I have
>> no idea how to test this hypothesis.
>
> Nothing should be mounted (except the rootfs) until expand-on-first
> completed. See below.
>
>>
>> Did anyone already experience something like this? The racy-ness of the
>> issue makes creating a minimal reproduction hard.
>
> Nope, have never seen that, but that doesn't mean that there is no
> race.
>
> We have
>
> After=systemd-remount-fs.service
> Before=local-fs-pre.target shutdown.target
> ConditionPathIsReadWrite=/etc
>
> inside the expand-on-first-boot.service file.
>
> So we should be done before systemd tries to mount your /data
> (according to Before=) and start after systemd did remounting in case
> mount options (ro, rw, ...) have to be adjusted.
>
> I guess you should check your startup order (systemd-analyze might
> help) and report back if we missed something.
systemd-analyze shows exactly what you described: after
expand-on-first-boot has finished, /data gets mounted (and immedeately
umounted again).
best regards
raphael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts
2022-10-21 9:36 ` Raphael Lisicki
@ 2022-10-21 9:41 ` Bezdeka, Florian
2022-10-21 13:05 ` Jan Kiszka
2022-10-21 16:24 ` Raphael Lisicki
0 siblings, 2 replies; 12+ messages in thread
From: Bezdeka, Florian @ 2022-10-21 9:41 UTC (permalink / raw)
To: Lisicki, Raphael, isar-users
On Fri, 2022-10-21 at 11:36 +0200, Raphael Lisicki wrote:
>
> On 21.10.22 10:45, Bezdeka, Florian (T CED SES-DE) wrote:
> > On Fri, 2022-10-21 at 09:48 +0200, Raphael Lisicki wrote:
> > > Hi everybody,
> > >
> > > I am using a debian bullseye based system and use expand-on-first-boot
> > > to expand the last partition. It is not the root file system but an
> > > extra ext4 partition to be mounted under /data. The mounting happens
> > > after expand-on-first boot has succeeded.
> > >
> > > Sometimes, on some builds more often than others, /data gets (attempted
> > > to be) umounted immediately after being mounted and subsequent units
> > > will fail.
> >
> > Hm... Nothing happens on build time. Everything takes place on the
> > "first" boot (until expand-on-first-boot disables itself). "more often"
> > is also confusing me because expand-on-first-boot should run exactly
> > once.
>
> Yes, it runs on first boot, but I can use the very same original image
> multiple times (after restoring it). And I can also try the same with
> other builds (doing a first boot) and some builds seem to be more often
> affected than others. My guess from this is that the exact
> alignment/size/padding/moon phase of some things in the image also plays
> a role.
>
>
> >
> > >
> > > Removing expand-on-first-boot resolves the issue, so does adding
> > > "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am
> > > not sure if this is only a sophisticated way of solving a race condition
> > > with "sleep".
> > >
> > > My gut feeling is that after expand-on-first boot finishes, the kernel
> > > still processes block device events, which systemd gets after /data has
> > > already been mounted and then cause it to be umounted, as systemd was
> > > already picky with umounting stuff in the past [1]. Unfortunately I have
> > > no idea how to test this hypothesis.
> >
> > Nothing should be mounted (except the rootfs) until expand-on-first
> > completed. See below.
> >
> > >
> > > Did anyone already experience something like this? The racy-ness of the
> > > issue makes creating a minimal reproduction hard.
> >
> > Nope, have never seen that, but that doesn't mean that there is no
> > race.
> >
> > We have
> >
> > After=systemd-remount-fs.service
> > Before=local-fs-pre.target shutdown.target
> > ConditionPathIsReadWrite=/etc
> >
> > inside the expand-on-first-boot.service file.
> >
> > So we should be done before systemd tries to mount your /data
> > (according to Before=) and start after systemd did remounting in case
> > mount options (ro, rw, ...) have to be adjusted.
> >
> > I guess you should check your startup order (systemd-analyze might
> > help) and report back if we missed something.
>
> systemd-analyze shows exactly what you described: after
> expand-on-first-boot has finished, /data gets mounted (and immedeately
> umounted again).
OK, so you have to figure out why this happens. Hopefully there are
some logs available...
What file system is on your /data partition? What is the initial size
and the full-blown size?
>
>
>
> best regards
> raphael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts
2022-10-21 7:48 expand-on-first-boot and surprising umounts Raphael Lisicki
2022-10-21 8:45 ` Bezdeka, Florian
@ 2022-10-21 11:15 ` Henning Schild
2022-10-21 15:41 ` Roberto A. Foglietta
2 siblings, 0 replies; 12+ messages in thread
From: Henning Schild @ 2022-10-21 11:15 UTC (permalink / raw)
To: Raphael Lisicki; +Cc: isar-users
Hi,
we have patches to use systemd-growfs instead of our manual way. They
might just solve your problem. So please give them a try.
[PATCH v5 0/1] expand-on-first-boot: Switch from resize2fs to
systemd-growfs
All that was missing last time was a way to deal with stretch.
And we really should test all that in CI.
Henning
Am Fri, 21 Oct 2022 09:48:14 +0200
schrieb Raphael Lisicki <raphael.lisicki@siemens.com>:
> Hi everybody,
>
> I am using a debian bullseye based system and use
> expand-on-first-boot to expand the last partition. It is not the root
> file system but an extra ext4 partition to be mounted under /data.
> The mounting happens after expand-on-first boot has succeeded.
>
> Sometimes, on some builds more often than others, /data gets
> (attempted to be) umounted immediately after being mounted and
> subsequent units will fail.
>
> Removing expand-on-first-boot resolves the issue, so does adding
> "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I
> am not sure if this is only a sophisticated way of solving a race
> condition with "sleep".
>
> My gut feeling is that after expand-on-first boot finishes, the kernel
> still processes block device events, which systemd gets after /data
> has already been mounted and then cause it to be umounted, as systemd
> was already picky with umounting stuff in the past [1]. Unfortunately
> I have no idea how to test this hypothesis.
>
> Did anyone already experience something like this? The racy-ness of
> the issue makes creating a minimal reproduction hard.
>
>
>
> [1] https://github.com/systemd/systemd/issues/9869
>
>
>
> best regards
> raphael
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts
2022-10-21 9:41 ` Bezdeka, Florian
@ 2022-10-21 13:05 ` Jan Kiszka
2022-10-21 16:38 ` Jan Kiszka
2022-10-21 16:24 ` Raphael Lisicki
1 sibling, 1 reply; 12+ messages in thread
From: Jan Kiszka @ 2022-10-21 13:05 UTC (permalink / raw)
To: Bezdeka, Florian, Lisicki, Raphael, isar-users, Quirin Gylstorff
On 21.10.22 11:41, Bezdeka, Florian wrote:
> On Fri, 2022-10-21 at 11:36 +0200, Raphael Lisicki wrote:
>>
>> On 21.10.22 10:45, Bezdeka, Florian (T CED SES-DE) wrote:
>>> On Fri, 2022-10-21 at 09:48 +0200, Raphael Lisicki wrote:
>>>> Hi everybody,
>>>>
>>>> I am using a debian bullseye based system and use expand-on-first-boot
>>>> to expand the last partition. It is not the root file system but an
>>>> extra ext4 partition to be mounted under /data. The mounting happens
>>>> after expand-on-first boot has succeeded.
>>>>
>>>> Sometimes, on some builds more often than others, /data gets (attempted
>>>> to be) umounted immediately after being mounted and subsequent units
>>>> will fail.
>>>
>>> Hm... Nothing happens on build time. Everything takes place on the
>>> "first" boot (until expand-on-first-boot disables itself). "more often"
>>> is also confusing me because expand-on-first-boot should run exactly
>>> once.
>>
>> Yes, it runs on first boot, but I can use the very same original image
>> multiple times (after restoring it). And I can also try the same with
>> other builds (doing a first boot) and some builds seem to be more often
>> affected than others. My guess from this is that the exact
>> alignment/size/padding/moon phase of some things in the image also plays
>> a role.
>>
>>
>>>
>>>>
>>>> Removing expand-on-first-boot resolves the issue, so does adding
>>>> "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am
>>>> not sure if this is only a sophisticated way of solving a race condition
>>>> with "sleep".
>>>>
>>>> My gut feeling is that after expand-on-first boot finishes, the kernel
>>>> still processes block device events, which systemd gets after /data has
>>>> already been mounted and then cause it to be umounted, as systemd was
>>>> already picky with umounting stuff in the past [1]. Unfortunately I have
>>>> no idea how to test this hypothesis.
>>>
>>> Nothing should be mounted (except the rootfs) until expand-on-first
>>> completed. See below.
>>>
>>>>
>>>> Did anyone already experience something like this? The racy-ness of the
>>>> issue makes creating a minimal reproduction hard.
>>>
>>> Nope, have never seen that, but that doesn't mean that there is no
>>> race.
>>>
>>> We have
>>>
>>> After=systemd-remount-fs.service
>>> Before=local-fs-pre.target shutdown.target
>>> ConditionPathIsReadWrite=/etc
>>>
>>> inside the expand-on-first-boot.service file.
>>>
>>> So we should be done before systemd tries to mount your /data
>>> (according to Before=) and start after systemd did remounting in case
>>> mount options (ro, rw, ...) have to be adjusted.
>>>
>>> I guess you should check your startup order (systemd-analyze might
>>> help) and report back if we missed something.
>>
>> systemd-analyze shows exactly what you described: after
>> expand-on-first-boot has finished, /data gets mounted (and immedeately
>> umounted again).
>
> OK, so you have to figure out why this happens. Hopefully there are
> some logs available...
>
> What file system is on your /data partition? What is the initial size
> and the full-blown size?
>
This pattern rings a bell, I've seen something like this on the IOT2050
as well. I'm still looking for the hack that worked around it, maybe
Quirin recalls that more quickly. And I think that hack was obsoleted
when we moved to r/o-rootfs and overlays...
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts
2022-10-21 7:48 expand-on-first-boot and surprising umounts Raphael Lisicki
2022-10-21 8:45 ` Bezdeka, Florian
2022-10-21 11:15 ` Henning Schild
@ 2022-10-21 15:41 ` Roberto A. Foglietta
2022-10-21 16:15 ` Bezdeka, Florian
2 siblings, 1 reply; 12+ messages in thread
From: Roberto A. Foglietta @ 2022-10-21 15:41 UTC (permalink / raw)
To: Raphael Lisicki; +Cc: isar-users
Il giorno ven 21 ott 2022 alle ore 09:48 Raphael Lisicki
<raphael.lisicki@siemens.com> ha scritto:
>
> Hi everybody,
>
> I am using a debian bullseye based system and use expand-on-first-boot
> to expand the last partition. It is not the root file system but an
> extra ext4 partition to be mounted under /data. The mounting happens
> after expand-on-first boot has succeeded.
>
> Sometimes, on some builds more often than others, /data gets (attempted
> to be) umounted immediately after being mounted and subsequent units
> will fail.
>
> Removing expand-on-first-boot resolves the issue, so does adding
> "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am
> not sure if this is only a sophisticated way of solving a race condition
> with "sleep".
Hi Raphael,
this trick should resolve the problem (untested, after the first boot
nothing happens anymore)
echo 'echo "exit 0" >$0'
>>isar/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
It is equivalent to apply this patch and obviously it is just a work around
diff --git a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
index 759ff8b..09f5cd4 100755
--- a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
+++ b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
@@ -62,3 +62,4 @@ partx -u "${LAST_PART}"
export EXT2FS_NO_MTAB_OK=1
resize2fs "${LAST_PART}"
+echo "exit 0" >$0
Best regards, R-
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts
2022-10-21 15:41 ` Roberto A. Foglietta
@ 2022-10-21 16:15 ` Bezdeka, Florian
2022-10-21 17:35 ` Roberto A. Foglietta
0 siblings, 1 reply; 12+ messages in thread
From: Bezdeka, Florian @ 2022-10-21 16:15 UTC (permalink / raw)
To: roberto.foglietta, Lisicki, Raphael; +Cc: isar-users
On Fri, 2022-10-21 at 17:41 +0200, Roberto A. Foglietta wrote:
> Il giorno ven 21 ott 2022 alle ore 09:48 Raphael Lisicki
> <raphael.lisicki@siemens.com> ha scritto:
> >
> > Hi everybody,
> >
> > I am using a debian bullseye based system and use expand-on-first-boot
> > to expand the last partition. It is not the root file system but an
> > extra ext4 partition to be mounted under /data. The mounting happens
> > after expand-on-first boot has succeeded.
> >
> > Sometimes, on some builds more often than others, /data gets (attempted
> > to be) umounted immediately after being mounted and subsequent units
> > will fail.
> >
> > Removing expand-on-first-boot resolves the issue, so does adding
> > "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am
> > not sure if this is only a sophisticated way of solving a race condition
> > with "sleep".
>
> Hi Raphael,
>
> this trick should resolve the problem (untested, after the first boot
> nothing happens anymore)
>
> echo 'echo "exit 0" >$0'
> > > isar/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
>
> It is equivalent to apply this patch and obviously it is just a work around
>
> diff --git a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
> b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
> index 759ff8b..09f5cd4 100755
> --- a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
> +++ b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
> @@ -62,3 +62,4 @@ partx -u "${LAST_PART}"
> export EXT2FS_NO_MTAB_OK=1
>
> resize2fs "${LAST_PART}"
> +echo "exit 0" >$0
Why should we destroy the expand-last-partition.sh script? We disable
the systemd service after the first run...
And how should that fix the race that Raphael has reported?
>
> Best regards, R-
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts
2022-10-21 9:41 ` Bezdeka, Florian
2022-10-21 13:05 ` Jan Kiszka
@ 2022-10-21 16:24 ` Raphael Lisicki
1 sibling, 0 replies; 12+ messages in thread
From: Raphael Lisicki @ 2022-10-21 16:24 UTC (permalink / raw)
To: Bezdeka, Florian (T CED SES-DE), isar-users
On 21.10.22 11:41, Bezdeka, Florian (T CED SES-DE) wrote:
> On Fri, 2022-10-21 at 11:36 +0200, Raphael Lisicki wrote:
>>
>> On 21.10.22 10:45, Bezdeka, Florian (T CED SES-DE) wrote:
>>> On Fri, 2022-10-21 at 09:48 +0200, Raphael Lisicki wrote:
>>>> Hi everybody,
>>>>
>>>> I am using a debian bullseye based system and use expand-on-first-boot
>>>> to expand the last partition. It is not the root file system but an
>>>> extra ext4 partition to be mounted under /data. The mounting happens
>>>> after expand-on-first boot has succeeded.
>>>>
>>>> Sometimes, on some builds more often than others, /data gets (attempted
>>>> to be) umounted immediately after being mounted and subsequent units
>>>> will fail.
>>>
>>> Hm... Nothing happens on build time. Everything takes place on the
>>> "first" boot (until expand-on-first-boot disables itself). "more often"
>>> is also confusing me because expand-on-first-boot should run exactly
>>> once.
>>
>> Yes, it runs on first boot, but I can use the very same original image
>> multiple times (after restoring it). And I can also try the same with
>> other builds (doing a first boot) and some builds seem to be more often
>> affected than others. My guess from this is that the exact
>> alignment/size/padding/moon phase of some things in the image also plays
>> a role.
>>
>>
>>>
>>>>
>>>> Removing expand-on-first-boot resolves the issue, so does adding
>>>> "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am
>>>> not sure if this is only a sophisticated way of solving a race condition
>>>> with "sleep".
>>>>
>>>> My gut feeling is that after expand-on-first boot finishes, the kernel
>>>> still processes block device events, which systemd gets after /data has
>>>> already been mounted and then cause it to be umounted, as systemd was
>>>> already picky with umounting stuff in the past [1]. Unfortunately I have
>>>> no idea how to test this hypothesis.
>>>
>>> Nothing should be mounted (except the rootfs) until expand-on-first
>>> completed. See below.
>>>
>>>>
>>>> Did anyone already experience something like this? The racy-ness of the
>>>> issue makes creating a minimal reproduction hard.
>>>
>>> Nope, have never seen that, but that doesn't mean that there is no
>>> race.
>>>
>>> We have
>>>
>>> After=systemd-remount-fs.service
>>> Before=local-fs-pre.target shutdown.target
>>> ConditionPathIsReadWrite=/etc
>>>
>>> inside the expand-on-first-boot.service file.
>>>
>>> So we should be done before systemd tries to mount your /data
>>> (according to Before=) and start after systemd did remounting in case
>>> mount options (ro, rw, ...) have to be adjusted.
>>>
>>> I guess you should check your startup order (systemd-analyze might
>>> help) and report back if we missed something.
>>
>> systemd-analyze shows exactly what you described: after
>> expand-on-first-boot has finished, /data gets mounted (and immedeately
>> umounted again).
>
> OK, so you have to figure out why this happens. Hopefully there are
> some logs available...
Yes, trying to do so, but nothing really helpful to see, if only there
was a way to check why a unit got disabled.
The most suspicious things I can find - taken from a boot without the
problem:
Oct 20 11:18:33 localhost systemctl[329]: Removed
/etc/systemd/system/sysinit.target.wants/expand-on-first-boot.service.
Oct 20 11:18:33 localhost systemd[1]: Reloading.
[...]
Oct 20 11:18:33 localhost systemd[1]: Found device QEMU_HARDDISK data.
Oct 20 11:18:34 localhost systemd[1]: Mounted /data.
Maybe that systemd reloading might have something to do with it. I will
need to re-check with a log when the error occurred, I only have a log
at hand where the unloading failed (even weirder behaviour, but I would
forget about that one for now).
I will repeat it and re-check in the environment where the problem
happens, when I have access again. Running the same image on another
system with qemu does not allow the reproduction of the issue (already
played a little bit with number of cores).
>
> What file system is on your /data partition? What is the initial size
> and the full-blown size?
I am running in a virtual machine and did not bother to actually
increase the image. It is an ext4 filesystem blown from 5324812 to a
whopping 5324813 sectors, and resize2fs does not even do anything about
it: "The filesystem is already 665601 (4k) blocks long. Nothing to do!"
best regards
raphael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts
2022-10-21 13:05 ` Jan Kiszka
@ 2022-10-21 16:38 ` Jan Kiszka
2022-10-27 14:36 ` Raphael Lisicki
0 siblings, 1 reply; 12+ messages in thread
From: Jan Kiszka @ 2022-10-21 16:38 UTC (permalink / raw)
To: Bezdeka, Florian, Lisicki, Raphael, isar-users, Quirin Gylstorff
On 21.10.22 15:05, Jan Kiszka wrote:
> On 21.10.22 11:41, Bezdeka, Florian wrote:
>> On Fri, 2022-10-21 at 11:36 +0200, Raphael Lisicki wrote:
>>>
>>> On 21.10.22 10:45, Bezdeka, Florian (T CED SES-DE) wrote:
>>>> On Fri, 2022-10-21 at 09:48 +0200, Raphael Lisicki wrote:
>>>>> Hi everybody,
>>>>>
>>>>> I am using a debian bullseye based system and use expand-on-first-boot
>>>>> to expand the last partition. It is not the root file system but an
>>>>> extra ext4 partition to be mounted under /data. The mounting happens
>>>>> after expand-on-first boot has succeeded.
>>>>>
>>>>> Sometimes, on some builds more often than others, /data gets (attempted
>>>>> to be) umounted immediately after being mounted and subsequent units
>>>>> will fail.
>>>>
>>>> Hm... Nothing happens on build time. Everything takes place on the
>>>> "first" boot (until expand-on-first-boot disables itself). "more often"
>>>> is also confusing me because expand-on-first-boot should run exactly
>>>> once.
>>>
>>> Yes, it runs on first boot, but I can use the very same original image
>>> multiple times (after restoring it). And I can also try the same with
>>> other builds (doing a first boot) and some builds seem to be more often
>>> affected than others. My guess from this is that the exact
>>> alignment/size/padding/moon phase of some things in the image also plays
>>> a role.
>>>
>>>
>>>>
>>>>>
>>>>> Removing expand-on-first-boot resolves the issue, so does adding
>>>>> "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am
>>>>> not sure if this is only a sophisticated way of solving a race condition
>>>>> with "sleep".
>>>>>
>>>>> My gut feeling is that after expand-on-first boot finishes, the kernel
>>>>> still processes block device events, which systemd gets after /data has
>>>>> already been mounted and then cause it to be umounted, as systemd was
>>>>> already picky with umounting stuff in the past [1]. Unfortunately I have
>>>>> no idea how to test this hypothesis.
>>>>
>>>> Nothing should be mounted (except the rootfs) until expand-on-first
>>>> completed. See below.
>>>>
>>>>>
>>>>> Did anyone already experience something like this? The racy-ness of the
>>>>> issue makes creating a minimal reproduction hard.
>>>>
>>>> Nope, have never seen that, but that doesn't mean that there is no
>>>> race.
>>>>
>>>> We have
>>>>
>>>> After=systemd-remount-fs.service
>>>> Before=local-fs-pre.target shutdown.target
>>>> ConditionPathIsReadWrite=/etc
>>>>
>>>> inside the expand-on-first-boot.service file.
>>>>
>>>> So we should be done before systemd tries to mount your /data
>>>> (according to Before=) and start after systemd did remounting in case
>>>> mount options (ro, rw, ...) have to be adjusted.
>>>>
>>>> I guess you should check your startup order (systemd-analyze might
>>>> help) and report back if we missed something.
>>>
>>> systemd-analyze shows exactly what you described: after
>>> expand-on-first-boot has finished, /data gets mounted (and immedeately
>>> umounted again).
>>
>> OK, so you have to figure out why this happens. Hopefully there are
>> some logs available...
>>
>> What file system is on your /data partition? What is the initial size
>> and the full-blown size?
>>
>
> This pattern rings a bell, I've seen something like this on the IOT2050
> as well. I'm still looking for the hack that worked around it, maybe
> Quirin recalls that more quickly. And I think that hack was obsoleted
> when we moved to r/o-rootfs and overlays...
>
I didn't find the workaround, but we also played with udev settle and
various delays.
I recall our theory that a delayed device removal/replug event from the
kernel after the partition table manipulation of expand-on-first-boot
may confuse systemd to unmount but not remount the data partition again.
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts
2022-10-21 16:15 ` Bezdeka, Florian
@ 2022-10-21 17:35 ` Roberto A. Foglietta
0 siblings, 0 replies; 12+ messages in thread
From: Roberto A. Foglietta @ 2022-10-21 17:35 UTC (permalink / raw)
To: Bezdeka, Florian; +Cc: Lisicki, Raphael, isar-users
Il giorno ven 21 ott 2022 alle ore 18:15 Bezdeka, Florian
<florian.bezdeka@siemens.com> ha scritto:
>
> On Fri, 2022-10-21 at 17:41 +0200, Roberto A. Foglietta wrote:
> > Il giorno ven 21 ott 2022 alle ore 09:48 Raphael Lisicki
> > <raphael.lisicki@siemens.com> ha scritto:
> > >
> > > Hi everybody,
> > >
> > > I am using a debian bullseye based system and use expand-on-first-boot
> > > to expand the last partition. It is not the root file system but an
> > > extra ext4 partition to be mounted under /data. The mounting happens
> > > after expand-on-first boot has succeeded.
> > >
> > > Sometimes, on some builds more often than others, /data gets (attempted
> > > to be) umounted immediately after being mounted and subsequent units
> > > will fail.
> > >
> > > Removing expand-on-first-boot resolves the issue, so does adding
> > > "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am
> > > not sure if this is only a sophisticated way of solving a race condition
> > > with "sleep".
> >
> > Hi Raphael,
> >
> > this trick should resolve the problem (untested, after the first boot
> > nothing happens anymore)
> >
> > echo 'echo "exit 0" >$0'
> > > > isar/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
> >
> > It is equivalent to apply this patch and obviously it is just a work around
> >
> > diff --git a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
> > b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
> > index 759ff8b..09f5cd4 100755
> > --- a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
> > +++ b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh
> > @@ -62,3 +62,4 @@ partx -u "${LAST_PART}"
> > export EXT2FS_NO_MTAB_OK=1
> >
> > resize2fs "${LAST_PART}"
> > +echo "exit 0" >$0
>
> Why should we destroy the expand-last-partition.sh script?
Hi Florian,
because Raphael wrote: "Removing expand-on-first-boot resolves the issue"
> We disable
> the systemd service after the first run...
Then why removing (so destroying) the script would fix the problem?
[Service]
Type=oneshot
ExecStart=/usr/share/expand-on-first-boot/expand-last-partition.sh
ExecStartPost=-/bin/systemctl disable expand-on-first-boot.service
I am not an expert of systemd but I have the sensation that
ExecStartPost will run ONLY IF ExecStart script returns true. It is a
"oneshot" service - and in my mind - one successful shot. Where "in my
mind" means if I were written systemd.
Another reason is: because systemd could have a bug in some {past,
present, future} version that affects the disabling. So, having a
double way to obtain the same result is not a work-around but a
redundant safe path for a mission critical system that cannot fail.
However, the workaround may fail because the script runs with "set
-e". So, every failure in the script will prevent it from arriving at
the last command.
So, if the workaround solves the problem then the problem is a bug in
systemd that does not disable the service (probable) or something that
re-enables it (unlikely).
Instead, if the workaround does not work - and it is easy to check
because after the first boot the script still exists in its original
form - then "set -e" did its job and "set -x" + logging in a file
would help for debugging.
In brief: the workaround solves otherwise helps us to define the
perimeter of the unexpected behaviour.
Best regards, R-
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts
2022-10-21 16:38 ` Jan Kiszka
@ 2022-10-27 14:36 ` Raphael Lisicki
0 siblings, 0 replies; 12+ messages in thread
From: Raphael Lisicki @ 2022-10-27 14:36 UTC (permalink / raw)
To: Jan Kiszka, Bezdeka, Florian, isar-users, Quirin Gylstorff
On 21.10.22 18:38, Jan Kiszka wrote:
>
> I recall our theory that a delayed device removal/replug event from the
> kernel after the partition table manipulation of expand-on-first-boot
> may confuse systemd to unmount but not remount the data partition again.
>
> Jan
>
I have enabled debug for systemd and udevd, gladly, this did not fix the
race condition but bombarded me with logs and confirms that suspicion:
for /data:
Oct 27 10:04:09 localhost systemd-udevd[289]: sda7: Device (SEQNUM=1809, ACTION=add) processed
Oct 27 10:04:10 localhost systemd-udevd[273]: sda7: Device (SEQNUM=2082, ACTION=remove) processed
Oct 27 10:04:10 localhost systemd-udevd[273]: sda7: Device (SEQNUM=2083, ACTION=add) processed
Oct 27 10:04:10 localhost systemd-udevd[288]: sda7: Device (SEQNUM=2095, ACTION=change) processed
while for / (sda1):
Oct 27 10:04:09 localhost systemd-udevd[279]: sda1: Device (SEQNUM=1803, ACTION=add) processed
Oct 27 10:04:10 localhost systemd-udevd[279]: sda1: Device (SEQNUM=2089, ACTION=change) processed
Unfortunantely, the reason why the remova action is produced is not
completely obvios, but the nearest hit is:
Oct 27 10:04:09 localhost systemd-udevd[272]: sda: Inotify event: 8 for /dev/sda // IN_CLOSE_WRITE
...
Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda
Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1
Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2
Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda3
Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda4
Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda5
Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda6
Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda7
Oct 27 10:04:09 localhost systemd-udevd[272]: sda7: Inotify event: 8000 for /dev/sda7 // IN_IGNORED ?
Oct 27 10:04:09 localhost systemd-udevd[272]: sda7: Removing watch
What would be the correct solution here, as the IN_IGNORED means "Watch
was removed explicitly (inotify_rm_watch(2)) or automatically (file was
deleted, or filesystem was unmounted)." So looks like systemd behavior looks
fine, and the only option would be to cancel contradicting events, which would
be the next race condition.
Sounds to me like the only fix is to make sure that all event-queues are processed
before proceeding with mounting stuff, after expand-on-first-boot, right?
best regards
raphael
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-10-27 14:36 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-21 7:48 expand-on-first-boot and surprising umounts Raphael Lisicki
2022-10-21 8:45 ` Bezdeka, Florian
2022-10-21 9:36 ` Raphael Lisicki
2022-10-21 9:41 ` Bezdeka, Florian
2022-10-21 13:05 ` Jan Kiszka
2022-10-21 16:38 ` Jan Kiszka
2022-10-27 14:36 ` Raphael Lisicki
2022-10-21 16:24 ` Raphael Lisicki
2022-10-21 11:15 ` Henning Schild
2022-10-21 15:41 ` Roberto A. Foglietta
2022-10-21 16:15 ` Bezdeka, Florian
2022-10-21 17:35 ` Roberto A. Foglietta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox