* expand-on-first-boot and surprising umounts @ 2022-10-21 7:48 Raphael Lisicki 2022-10-21 8:45 ` Bezdeka, Florian ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Raphael Lisicki @ 2022-10-21 7:48 UTC (permalink / raw) To: isar-users Hi everybody, I am using a debian bullseye based system and use expand-on-first-boot to expand the last partition. It is not the root file system but an extra ext4 partition to be mounted under /data. The mounting happens after expand-on-first boot has succeeded. Sometimes, on some builds more often than others, /data gets (attempted to be) umounted immediately after being mounted and subsequent units will fail. Removing expand-on-first-boot resolves the issue, so does adding "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am not sure if this is only a sophisticated way of solving a race condition with "sleep". My gut feeling is that after expand-on-first boot finishes, the kernel still processes block device events, which systemd gets after /data has already been mounted and then cause it to be umounted, as systemd was already picky with umounting stuff in the past [1]. Unfortunately I have no idea how to test this hypothesis. Did anyone already experience something like this? The racy-ness of the issue makes creating a minimal reproduction hard. [1] https://github.com/systemd/systemd/issues/9869 best regards raphael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts 2022-10-21 7:48 expand-on-first-boot and surprising umounts Raphael Lisicki @ 2022-10-21 8:45 ` Bezdeka, Florian 2022-10-21 9:36 ` Raphael Lisicki 2022-10-21 11:15 ` Henning Schild 2022-10-21 15:41 ` Roberto A. Foglietta 2 siblings, 1 reply; 12+ messages in thread From: Bezdeka, Florian @ 2022-10-21 8:45 UTC (permalink / raw) To: Lisicki, Raphael, isar-users On Fri, 2022-10-21 at 09:48 +0200, Raphael Lisicki wrote: > Hi everybody, > > I am using a debian bullseye based system and use expand-on-first-boot > to expand the last partition. It is not the root file system but an > extra ext4 partition to be mounted under /data. The mounting happens > after expand-on-first boot has succeeded. > > Sometimes, on some builds more often than others, /data gets (attempted > to be) umounted immediately after being mounted and subsequent units > will fail. Hm... Nothing happens on build time. Everything takes place on the "first" boot (until expand-on-first-boot disables itself). "more often" is also confusing me because expand-on-first-boot should run exactly once. > > Removing expand-on-first-boot resolves the issue, so does adding > "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am > not sure if this is only a sophisticated way of solving a race condition > with "sleep". > > My gut feeling is that after expand-on-first boot finishes, the kernel > still processes block device events, which systemd gets after /data has > already been mounted and then cause it to be umounted, as systemd was > already picky with umounting stuff in the past [1]. Unfortunately I have > no idea how to test this hypothesis. Nothing should be mounted (except the rootfs) until expand-on-first completed. See below. > > Did anyone already experience something like this? The racy-ness of the > issue makes creating a minimal reproduction hard. Nope, have never seen that, but that doesn't mean that there is no race. We have After=systemd-remount-fs.service Before=local-fs-pre.target shutdown.target ConditionPathIsReadWrite=/etc inside the expand-on-first-boot.service file. So we should be done before systemd tries to mount your /data (according to Before=) and start after systemd did remounting in case mount options (ro, rw, ...) have to be adjusted. I guess you should check your startup order (systemd-analyze might help) and report back if we missed something. > > > > [1] https://github.com/systemd/systemd/issues/9869 > > > > best regards > raphael > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts 2022-10-21 8:45 ` Bezdeka, Florian @ 2022-10-21 9:36 ` Raphael Lisicki 2022-10-21 9:41 ` Bezdeka, Florian 0 siblings, 1 reply; 12+ messages in thread From: Raphael Lisicki @ 2022-10-21 9:36 UTC (permalink / raw) To: Bezdeka, Florian (T CED SES-DE), isar-users On 21.10.22 10:45, Bezdeka, Florian (T CED SES-DE) wrote: > On Fri, 2022-10-21 at 09:48 +0200, Raphael Lisicki wrote: >> Hi everybody, >> >> I am using a debian bullseye based system and use expand-on-first-boot >> to expand the last partition. It is not the root file system but an >> extra ext4 partition to be mounted under /data. The mounting happens >> after expand-on-first boot has succeeded. >> >> Sometimes, on some builds more often than others, /data gets (attempted >> to be) umounted immediately after being mounted and subsequent units >> will fail. > > Hm... Nothing happens on build time. Everything takes place on the > "first" boot (until expand-on-first-boot disables itself). "more often" > is also confusing me because expand-on-first-boot should run exactly > once. Yes, it runs on first boot, but I can use the very same original image multiple times (after restoring it). And I can also try the same with other builds (doing a first boot) and some builds seem to be more often affected than others. My guess from this is that the exact alignment/size/padding/moon phase of some things in the image also plays a role. > >> >> Removing expand-on-first-boot resolves the issue, so does adding >> "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am >> not sure if this is only a sophisticated way of solving a race condition >> with "sleep". >> >> My gut feeling is that after expand-on-first boot finishes, the kernel >> still processes block device events, which systemd gets after /data has >> already been mounted and then cause it to be umounted, as systemd was >> already picky with umounting stuff in the past [1]. Unfortunately I have >> no idea how to test this hypothesis. > > Nothing should be mounted (except the rootfs) until expand-on-first > completed. See below. > >> >> Did anyone already experience something like this? The racy-ness of the >> issue makes creating a minimal reproduction hard. > > Nope, have never seen that, but that doesn't mean that there is no > race. > > We have > > After=systemd-remount-fs.service > Before=local-fs-pre.target shutdown.target > ConditionPathIsReadWrite=/etc > > inside the expand-on-first-boot.service file. > > So we should be done before systemd tries to mount your /data > (according to Before=) and start after systemd did remounting in case > mount options (ro, rw, ...) have to be adjusted. > > I guess you should check your startup order (systemd-analyze might > help) and report back if we missed something. systemd-analyze shows exactly what you described: after expand-on-first-boot has finished, /data gets mounted (and immedeately umounted again). best regards raphael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts 2022-10-21 9:36 ` Raphael Lisicki @ 2022-10-21 9:41 ` Bezdeka, Florian 2022-10-21 13:05 ` Jan Kiszka 2022-10-21 16:24 ` Raphael Lisicki 0 siblings, 2 replies; 12+ messages in thread From: Bezdeka, Florian @ 2022-10-21 9:41 UTC (permalink / raw) To: Lisicki, Raphael, isar-users On Fri, 2022-10-21 at 11:36 +0200, Raphael Lisicki wrote: > > On 21.10.22 10:45, Bezdeka, Florian (T CED SES-DE) wrote: > > On Fri, 2022-10-21 at 09:48 +0200, Raphael Lisicki wrote: > > > Hi everybody, > > > > > > I am using a debian bullseye based system and use expand-on-first-boot > > > to expand the last partition. It is not the root file system but an > > > extra ext4 partition to be mounted under /data. The mounting happens > > > after expand-on-first boot has succeeded. > > > > > > Sometimes, on some builds more often than others, /data gets (attempted > > > to be) umounted immediately after being mounted and subsequent units > > > will fail. > > > > Hm... Nothing happens on build time. Everything takes place on the > > "first" boot (until expand-on-first-boot disables itself). "more often" > > is also confusing me because expand-on-first-boot should run exactly > > once. > > Yes, it runs on first boot, but I can use the very same original image > multiple times (after restoring it). And I can also try the same with > other builds (doing a first boot) and some builds seem to be more often > affected than others. My guess from this is that the exact > alignment/size/padding/moon phase of some things in the image also plays > a role. > > > > > > > > > > Removing expand-on-first-boot resolves the issue, so does adding > > > "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am > > > not sure if this is only a sophisticated way of solving a race condition > > > with "sleep". > > > > > > My gut feeling is that after expand-on-first boot finishes, the kernel > > > still processes block device events, which systemd gets after /data has > > > already been mounted and then cause it to be umounted, as systemd was > > > already picky with umounting stuff in the past [1]. Unfortunately I have > > > no idea how to test this hypothesis. > > > > Nothing should be mounted (except the rootfs) until expand-on-first > > completed. See below. > > > > > > > > Did anyone already experience something like this? The racy-ness of the > > > issue makes creating a minimal reproduction hard. > > > > Nope, have never seen that, but that doesn't mean that there is no > > race. > > > > We have > > > > After=systemd-remount-fs.service > > Before=local-fs-pre.target shutdown.target > > ConditionPathIsReadWrite=/etc > > > > inside the expand-on-first-boot.service file. > > > > So we should be done before systemd tries to mount your /data > > (according to Before=) and start after systemd did remounting in case > > mount options (ro, rw, ...) have to be adjusted. > > > > I guess you should check your startup order (systemd-analyze might > > help) and report back if we missed something. > > systemd-analyze shows exactly what you described: after > expand-on-first-boot has finished, /data gets mounted (and immedeately > umounted again). OK, so you have to figure out why this happens. Hopefully there are some logs available... What file system is on your /data partition? What is the initial size and the full-blown size? > > > > best regards > raphael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts 2022-10-21 9:41 ` Bezdeka, Florian @ 2022-10-21 13:05 ` Jan Kiszka 2022-10-21 16:38 ` Jan Kiszka 2022-10-21 16:24 ` Raphael Lisicki 1 sibling, 1 reply; 12+ messages in thread From: Jan Kiszka @ 2022-10-21 13:05 UTC (permalink / raw) To: Bezdeka, Florian, Lisicki, Raphael, isar-users, Quirin Gylstorff On 21.10.22 11:41, Bezdeka, Florian wrote: > On Fri, 2022-10-21 at 11:36 +0200, Raphael Lisicki wrote: >> >> On 21.10.22 10:45, Bezdeka, Florian (T CED SES-DE) wrote: >>> On Fri, 2022-10-21 at 09:48 +0200, Raphael Lisicki wrote: >>>> Hi everybody, >>>> >>>> I am using a debian bullseye based system and use expand-on-first-boot >>>> to expand the last partition. It is not the root file system but an >>>> extra ext4 partition to be mounted under /data. The mounting happens >>>> after expand-on-first boot has succeeded. >>>> >>>> Sometimes, on some builds more often than others, /data gets (attempted >>>> to be) umounted immediately after being mounted and subsequent units >>>> will fail. >>> >>> Hm... Nothing happens on build time. Everything takes place on the >>> "first" boot (until expand-on-first-boot disables itself). "more often" >>> is also confusing me because expand-on-first-boot should run exactly >>> once. >> >> Yes, it runs on first boot, but I can use the very same original image >> multiple times (after restoring it). And I can also try the same with >> other builds (doing a first boot) and some builds seem to be more often >> affected than others. My guess from this is that the exact >> alignment/size/padding/moon phase of some things in the image also plays >> a role. >> >> >>> >>>> >>>> Removing expand-on-first-boot resolves the issue, so does adding >>>> "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am >>>> not sure if this is only a sophisticated way of solving a race condition >>>> with "sleep". >>>> >>>> My gut feeling is that after expand-on-first boot finishes, the kernel >>>> still processes block device events, which systemd gets after /data has >>>> already been mounted and then cause it to be umounted, as systemd was >>>> already picky with umounting stuff in the past [1]. Unfortunately I have >>>> no idea how to test this hypothesis. >>> >>> Nothing should be mounted (except the rootfs) until expand-on-first >>> completed. See below. >>> >>>> >>>> Did anyone already experience something like this? The racy-ness of the >>>> issue makes creating a minimal reproduction hard. >>> >>> Nope, have never seen that, but that doesn't mean that there is no >>> race. >>> >>> We have >>> >>> After=systemd-remount-fs.service >>> Before=local-fs-pre.target shutdown.target >>> ConditionPathIsReadWrite=/etc >>> >>> inside the expand-on-first-boot.service file. >>> >>> So we should be done before systemd tries to mount your /data >>> (according to Before=) and start after systemd did remounting in case >>> mount options (ro, rw, ...) have to be adjusted. >>> >>> I guess you should check your startup order (systemd-analyze might >>> help) and report back if we missed something. >> >> systemd-analyze shows exactly what you described: after >> expand-on-first-boot has finished, /data gets mounted (and immedeately >> umounted again). > > OK, so you have to figure out why this happens. Hopefully there are > some logs available... > > What file system is on your /data partition? What is the initial size > and the full-blown size? > This pattern rings a bell, I've seen something like this on the IOT2050 as well. I'm still looking for the hack that worked around it, maybe Quirin recalls that more quickly. And I think that hack was obsoleted when we moved to r/o-rootfs and overlays... Jan -- Siemens AG, Technology Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts 2022-10-21 13:05 ` Jan Kiszka @ 2022-10-21 16:38 ` Jan Kiszka 2022-10-27 14:36 ` Raphael Lisicki 0 siblings, 1 reply; 12+ messages in thread From: Jan Kiszka @ 2022-10-21 16:38 UTC (permalink / raw) To: Bezdeka, Florian, Lisicki, Raphael, isar-users, Quirin Gylstorff On 21.10.22 15:05, Jan Kiszka wrote: > On 21.10.22 11:41, Bezdeka, Florian wrote: >> On Fri, 2022-10-21 at 11:36 +0200, Raphael Lisicki wrote: >>> >>> On 21.10.22 10:45, Bezdeka, Florian (T CED SES-DE) wrote: >>>> On Fri, 2022-10-21 at 09:48 +0200, Raphael Lisicki wrote: >>>>> Hi everybody, >>>>> >>>>> I am using a debian bullseye based system and use expand-on-first-boot >>>>> to expand the last partition. It is not the root file system but an >>>>> extra ext4 partition to be mounted under /data. The mounting happens >>>>> after expand-on-first boot has succeeded. >>>>> >>>>> Sometimes, on some builds more often than others, /data gets (attempted >>>>> to be) umounted immediately after being mounted and subsequent units >>>>> will fail. >>>> >>>> Hm... Nothing happens on build time. Everything takes place on the >>>> "first" boot (until expand-on-first-boot disables itself). "more often" >>>> is also confusing me because expand-on-first-boot should run exactly >>>> once. >>> >>> Yes, it runs on first boot, but I can use the very same original image >>> multiple times (after restoring it). And I can also try the same with >>> other builds (doing a first boot) and some builds seem to be more often >>> affected than others. My guess from this is that the exact >>> alignment/size/padding/moon phase of some things in the image also plays >>> a role. >>> >>> >>>> >>>>> >>>>> Removing expand-on-first-boot resolves the issue, so does adding >>>>> "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am >>>>> not sure if this is only a sophisticated way of solving a race condition >>>>> with "sleep". >>>>> >>>>> My gut feeling is that after expand-on-first boot finishes, the kernel >>>>> still processes block device events, which systemd gets after /data has >>>>> already been mounted and then cause it to be umounted, as systemd was >>>>> already picky with umounting stuff in the past [1]. Unfortunately I have >>>>> no idea how to test this hypothesis. >>>> >>>> Nothing should be mounted (except the rootfs) until expand-on-first >>>> completed. See below. >>>> >>>>> >>>>> Did anyone already experience something like this? The racy-ness of the >>>>> issue makes creating a minimal reproduction hard. >>>> >>>> Nope, have never seen that, but that doesn't mean that there is no >>>> race. >>>> >>>> We have >>>> >>>> After=systemd-remount-fs.service >>>> Before=local-fs-pre.target shutdown.target >>>> ConditionPathIsReadWrite=/etc >>>> >>>> inside the expand-on-first-boot.service file. >>>> >>>> So we should be done before systemd tries to mount your /data >>>> (according to Before=) and start after systemd did remounting in case >>>> mount options (ro, rw, ...) have to be adjusted. >>>> >>>> I guess you should check your startup order (systemd-analyze might >>>> help) and report back if we missed something. >>> >>> systemd-analyze shows exactly what you described: after >>> expand-on-first-boot has finished, /data gets mounted (and immedeately >>> umounted again). >> >> OK, so you have to figure out why this happens. Hopefully there are >> some logs available... >> >> What file system is on your /data partition? What is the initial size >> and the full-blown size? >> > > This pattern rings a bell, I've seen something like this on the IOT2050 > as well. I'm still looking for the hack that worked around it, maybe > Quirin recalls that more quickly. And I think that hack was obsoleted > when we moved to r/o-rootfs and overlays... > I didn't find the workaround, but we also played with udev settle and various delays. I recall our theory that a delayed device removal/replug event from the kernel after the partition table manipulation of expand-on-first-boot may confuse systemd to unmount but not remount the data partition again. Jan -- Siemens AG, Technology Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts 2022-10-21 16:38 ` Jan Kiszka @ 2022-10-27 14:36 ` Raphael Lisicki 0 siblings, 0 replies; 12+ messages in thread From: Raphael Lisicki @ 2022-10-27 14:36 UTC (permalink / raw) To: Jan Kiszka, Bezdeka, Florian, isar-users, Quirin Gylstorff On 21.10.22 18:38, Jan Kiszka wrote: > > I recall our theory that a delayed device removal/replug event from the > kernel after the partition table manipulation of expand-on-first-boot > may confuse systemd to unmount but not remount the data partition again. > > Jan > I have enabled debug for systemd and udevd, gladly, this did not fix the race condition but bombarded me with logs and confirms that suspicion: for /data: Oct 27 10:04:09 localhost systemd-udevd[289]: sda7: Device (SEQNUM=1809, ACTION=add) processed Oct 27 10:04:10 localhost systemd-udevd[273]: sda7: Device (SEQNUM=2082, ACTION=remove) processed Oct 27 10:04:10 localhost systemd-udevd[273]: sda7: Device (SEQNUM=2083, ACTION=add) processed Oct 27 10:04:10 localhost systemd-udevd[288]: sda7: Device (SEQNUM=2095, ACTION=change) processed while for / (sda1): Oct 27 10:04:09 localhost systemd-udevd[279]: sda1: Device (SEQNUM=1803, ACTION=add) processed Oct 27 10:04:10 localhost systemd-udevd[279]: sda1: Device (SEQNUM=2089, ACTION=change) processed Unfortunantely, the reason why the remova action is produced is not completely obvios, but the nearest hit is: Oct 27 10:04:09 localhost systemd-udevd[272]: sda: Inotify event: 8 for /dev/sda // IN_CLOSE_WRITE ... Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1 Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2 Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda3 Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda4 Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda5 Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda6 Oct 27 10:04:09 localhost systemd-udevd[272]: sda: device is closed, synthesising 'change' on /sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda7 Oct 27 10:04:09 localhost systemd-udevd[272]: sda7: Inotify event: 8000 for /dev/sda7 // IN_IGNORED ? Oct 27 10:04:09 localhost systemd-udevd[272]: sda7: Removing watch What would be the correct solution here, as the IN_IGNORED means "Watch was removed explicitly (inotify_rm_watch(2)) or automatically (file was deleted, or filesystem was unmounted)." So looks like systemd behavior looks fine, and the only option would be to cancel contradicting events, which would be the next race condition. Sounds to me like the only fix is to make sure that all event-queues are processed before proceeding with mounting stuff, after expand-on-first-boot, right? best regards raphael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts 2022-10-21 9:41 ` Bezdeka, Florian 2022-10-21 13:05 ` Jan Kiszka @ 2022-10-21 16:24 ` Raphael Lisicki 1 sibling, 0 replies; 12+ messages in thread From: Raphael Lisicki @ 2022-10-21 16:24 UTC (permalink / raw) To: Bezdeka, Florian (T CED SES-DE), isar-users On 21.10.22 11:41, Bezdeka, Florian (T CED SES-DE) wrote: > On Fri, 2022-10-21 at 11:36 +0200, Raphael Lisicki wrote: >> >> On 21.10.22 10:45, Bezdeka, Florian (T CED SES-DE) wrote: >>> On Fri, 2022-10-21 at 09:48 +0200, Raphael Lisicki wrote: >>>> Hi everybody, >>>> >>>> I am using a debian bullseye based system and use expand-on-first-boot >>>> to expand the last partition. It is not the root file system but an >>>> extra ext4 partition to be mounted under /data. The mounting happens >>>> after expand-on-first boot has succeeded. >>>> >>>> Sometimes, on some builds more often than others, /data gets (attempted >>>> to be) umounted immediately after being mounted and subsequent units >>>> will fail. >>> >>> Hm... Nothing happens on build time. Everything takes place on the >>> "first" boot (until expand-on-first-boot disables itself). "more often" >>> is also confusing me because expand-on-first-boot should run exactly >>> once. >> >> Yes, it runs on first boot, but I can use the very same original image >> multiple times (after restoring it). And I can also try the same with >> other builds (doing a first boot) and some builds seem to be more often >> affected than others. My guess from this is that the exact >> alignment/size/padding/moon phase of some things in the image also plays >> a role. >> >> >>> >>>> >>>> Removing expand-on-first-boot resolves the issue, so does adding >>>> "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am >>>> not sure if this is only a sophisticated way of solving a race condition >>>> with "sleep". >>>> >>>> My gut feeling is that after expand-on-first boot finishes, the kernel >>>> still processes block device events, which systemd gets after /data has >>>> already been mounted and then cause it to be umounted, as systemd was >>>> already picky with umounting stuff in the past [1]. Unfortunately I have >>>> no idea how to test this hypothesis. >>> >>> Nothing should be mounted (except the rootfs) until expand-on-first >>> completed. See below. >>> >>>> >>>> Did anyone already experience something like this? The racy-ness of the >>>> issue makes creating a minimal reproduction hard. >>> >>> Nope, have never seen that, but that doesn't mean that there is no >>> race. >>> >>> We have >>> >>> After=systemd-remount-fs.service >>> Before=local-fs-pre.target shutdown.target >>> ConditionPathIsReadWrite=/etc >>> >>> inside the expand-on-first-boot.service file. >>> >>> So we should be done before systemd tries to mount your /data >>> (according to Before=) and start after systemd did remounting in case >>> mount options (ro, rw, ...) have to be adjusted. >>> >>> I guess you should check your startup order (systemd-analyze might >>> help) and report back if we missed something. >> >> systemd-analyze shows exactly what you described: after >> expand-on-first-boot has finished, /data gets mounted (and immedeately >> umounted again). > > OK, so you have to figure out why this happens. Hopefully there are > some logs available... Yes, trying to do so, but nothing really helpful to see, if only there was a way to check why a unit got disabled. The most suspicious things I can find - taken from a boot without the problem: Oct 20 11:18:33 localhost systemctl[329]: Removed /etc/systemd/system/sysinit.target.wants/expand-on-first-boot.service. Oct 20 11:18:33 localhost systemd[1]: Reloading. [...] Oct 20 11:18:33 localhost systemd[1]: Found device QEMU_HARDDISK data. Oct 20 11:18:34 localhost systemd[1]: Mounted /data. Maybe that systemd reloading might have something to do with it. I will need to re-check with a log when the error occurred, I only have a log at hand where the unloading failed (even weirder behaviour, but I would forget about that one for now). I will repeat it and re-check in the environment where the problem happens, when I have access again. Running the same image on another system with qemu does not allow the reproduction of the issue (already played a little bit with number of cores). > > What file system is on your /data partition? What is the initial size > and the full-blown size? I am running in a virtual machine and did not bother to actually increase the image. It is an ext4 filesystem blown from 5324812 to a whopping 5324813 sectors, and resize2fs does not even do anything about it: "The filesystem is already 665601 (4k) blocks long. Nothing to do!" best regards raphael ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts 2022-10-21 7:48 expand-on-first-boot and surprising umounts Raphael Lisicki 2022-10-21 8:45 ` Bezdeka, Florian @ 2022-10-21 11:15 ` Henning Schild 2022-10-21 15:41 ` Roberto A. Foglietta 2 siblings, 0 replies; 12+ messages in thread From: Henning Schild @ 2022-10-21 11:15 UTC (permalink / raw) To: Raphael Lisicki; +Cc: isar-users Hi, we have patches to use systemd-growfs instead of our manual way. They might just solve your problem. So please give them a try. [PATCH v5 0/1] expand-on-first-boot: Switch from resize2fs to systemd-growfs All that was missing last time was a way to deal with stretch. And we really should test all that in CI. Henning Am Fri, 21 Oct 2022 09:48:14 +0200 schrieb Raphael Lisicki <raphael.lisicki@siemens.com>: > Hi everybody, > > I am using a debian bullseye based system and use > expand-on-first-boot to expand the last partition. It is not the root > file system but an extra ext4 partition to be mounted under /data. > The mounting happens after expand-on-first boot has succeeded. > > Sometimes, on some builds more often than others, /data gets > (attempted to be) umounted immediately after being mounted and > subsequent units will fail. > > Removing expand-on-first-boot resolves the issue, so does adding > "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I > am not sure if this is only a sophisticated way of solving a race > condition with "sleep". > > My gut feeling is that after expand-on-first boot finishes, the kernel > still processes block device events, which systemd gets after /data > has already been mounted and then cause it to be umounted, as systemd > was already picky with umounting stuff in the past [1]. Unfortunately > I have no idea how to test this hypothesis. > > Did anyone already experience something like this? The racy-ness of > the issue makes creating a minimal reproduction hard. > > > > [1] https://github.com/systemd/systemd/issues/9869 > > > > best regards > raphael > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts 2022-10-21 7:48 expand-on-first-boot and surprising umounts Raphael Lisicki 2022-10-21 8:45 ` Bezdeka, Florian 2022-10-21 11:15 ` Henning Schild @ 2022-10-21 15:41 ` Roberto A. Foglietta 2022-10-21 16:15 ` Bezdeka, Florian 2 siblings, 1 reply; 12+ messages in thread From: Roberto A. Foglietta @ 2022-10-21 15:41 UTC (permalink / raw) To: Raphael Lisicki; +Cc: isar-users Il giorno ven 21 ott 2022 alle ore 09:48 Raphael Lisicki <raphael.lisicki@siemens.com> ha scritto: > > Hi everybody, > > I am using a debian bullseye based system and use expand-on-first-boot > to expand the last partition. It is not the root file system but an > extra ext4 partition to be mounted under /data. The mounting happens > after expand-on-first boot has succeeded. > > Sometimes, on some builds more often than others, /data gets (attempted > to be) umounted immediately after being mounted and subsequent units > will fail. > > Removing expand-on-first-boot resolves the issue, so does adding > "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am > not sure if this is only a sophisticated way of solving a race condition > with "sleep". Hi Raphael, this trick should resolve the problem (untested, after the first boot nothing happens anymore) echo 'echo "exit 0" >$0' >>isar/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh It is equivalent to apply this patch and obviously it is just a work around diff --git a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh index 759ff8b..09f5cd4 100755 --- a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh +++ b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh @@ -62,3 +62,4 @@ partx -u "${LAST_PART}" export EXT2FS_NO_MTAB_OK=1 resize2fs "${LAST_PART}" +echo "exit 0" >$0 Best regards, R- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts 2022-10-21 15:41 ` Roberto A. Foglietta @ 2022-10-21 16:15 ` Bezdeka, Florian 2022-10-21 17:35 ` Roberto A. Foglietta 0 siblings, 1 reply; 12+ messages in thread From: Bezdeka, Florian @ 2022-10-21 16:15 UTC (permalink / raw) To: roberto.foglietta, Lisicki, Raphael; +Cc: isar-users On Fri, 2022-10-21 at 17:41 +0200, Roberto A. Foglietta wrote: > Il giorno ven 21 ott 2022 alle ore 09:48 Raphael Lisicki > <raphael.lisicki@siemens.com> ha scritto: > > > > Hi everybody, > > > > I am using a debian bullseye based system and use expand-on-first-boot > > to expand the last partition. It is not the root file system but an > > extra ext4 partition to be mounted under /data. The mounting happens > > after expand-on-first boot has succeeded. > > > > Sometimes, on some builds more often than others, /data gets (attempted > > to be) umounted immediately after being mounted and subsequent units > > will fail. > > > > Removing expand-on-first-boot resolves the issue, so does adding > > "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am > > not sure if this is only a sophisticated way of solving a race condition > > with "sleep". > > Hi Raphael, > > this trick should resolve the problem (untested, after the first boot > nothing happens anymore) > > echo 'echo "exit 0" >$0' > > > isar/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh > > It is equivalent to apply this patch and obviously it is just a work around > > diff --git a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh > b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh > index 759ff8b..09f5cd4 100755 > --- a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh > +++ b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh > @@ -62,3 +62,4 @@ partx -u "${LAST_PART}" > export EXT2FS_NO_MTAB_OK=1 > > resize2fs "${LAST_PART}" > +echo "exit 0" >$0 Why should we destroy the expand-last-partition.sh script? We disable the systemd service after the first run... And how should that fix the race that Raphael has reported? > > Best regards, R- > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: expand-on-first-boot and surprising umounts 2022-10-21 16:15 ` Bezdeka, Florian @ 2022-10-21 17:35 ` Roberto A. Foglietta 0 siblings, 0 replies; 12+ messages in thread From: Roberto A. Foglietta @ 2022-10-21 17:35 UTC (permalink / raw) To: Bezdeka, Florian; +Cc: Lisicki, Raphael, isar-users Il giorno ven 21 ott 2022 alle ore 18:15 Bezdeka, Florian <florian.bezdeka@siemens.com> ha scritto: > > On Fri, 2022-10-21 at 17:41 +0200, Roberto A. Foglietta wrote: > > Il giorno ven 21 ott 2022 alle ore 09:48 Raphael Lisicki > > <raphael.lisicki@siemens.com> ha scritto: > > > > > > Hi everybody, > > > > > > I am using a debian bullseye based system and use expand-on-first-boot > > > to expand the last partition. It is not the root file system but an > > > extra ext4 partition to be mounted under /data. The mounting happens > > > after expand-on-first boot has succeeded. > > > > > > Sometimes, on some builds more often than others, /data gets (attempted > > > to be) umounted immediately after being mounted and subsequent units > > > will fail. > > > > > > Removing expand-on-first-boot resolves the issue, so does adding > > > "ExecStartPost=/usr/bin/udevadm settle" to expand-on-first-boot but I am > > > not sure if this is only a sophisticated way of solving a race condition > > > with "sleep". > > > > Hi Raphael, > > > > this trick should resolve the problem (untested, after the first boot > > nothing happens anymore) > > > > echo 'echo "exit 0" >$0' > > > > isar/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh > > > > It is equivalent to apply this patch and obviously it is just a work around > > > > diff --git a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh > > b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh > > index 759ff8b..09f5cd4 100755 > > --- a/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh > > +++ b/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh > > @@ -62,3 +62,4 @@ partx -u "${LAST_PART}" > > export EXT2FS_NO_MTAB_OK=1 > > > > resize2fs "${LAST_PART}" > > +echo "exit 0" >$0 > > Why should we destroy the expand-last-partition.sh script? Hi Florian, because Raphael wrote: "Removing expand-on-first-boot resolves the issue" > We disable > the systemd service after the first run... Then why removing (so destroying) the script would fix the problem? [Service] Type=oneshot ExecStart=/usr/share/expand-on-first-boot/expand-last-partition.sh ExecStartPost=-/bin/systemctl disable expand-on-first-boot.service I am not an expert of systemd but I have the sensation that ExecStartPost will run ONLY IF ExecStart script returns true. It is a "oneshot" service - and in my mind - one successful shot. Where "in my mind" means if I were written systemd. Another reason is: because systemd could have a bug in some {past, present, future} version that affects the disabling. So, having a double way to obtain the same result is not a work-around but a redundant safe path for a mission critical system that cannot fail. However, the workaround may fail because the script runs with "set -e". So, every failure in the script will prevent it from arriving at the last command. So, if the workaround solves the problem then the problem is a bug in systemd that does not disable the service (probable) or something that re-enables it (unlikely). Instead, if the workaround does not work - and it is easy to check because after the first boot the script still exists in its original form - then "set -e" did its job and "set -x" + logging in a file would help for debugging. In brief: the workaround solves otherwise helps us to define the perimeter of the unexpected behaviour. Best regards, R- ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-10-27 14:36 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-10-21 7:48 expand-on-first-boot and surprising umounts Raphael Lisicki 2022-10-21 8:45 ` Bezdeka, Florian 2022-10-21 9:36 ` Raphael Lisicki 2022-10-21 9:41 ` Bezdeka, Florian 2022-10-21 13:05 ` Jan Kiszka 2022-10-21 16:38 ` Jan Kiszka 2022-10-27 14:36 ` Raphael Lisicki 2022-10-21 16:24 ` Raphael Lisicki 2022-10-21 11:15 ` Henning Schild 2022-10-21 15:41 ` Roberto A. Foglietta 2022-10-21 16:15 ` Bezdeka, Florian 2022-10-21 17:35 ` Roberto A. Foglietta
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox