public inbox for isar-users@googlegroups.com
 help / color / mirror / Atom feed
From: "'Heinisch, Alexander' via isar-users" <isar-users@googlegroups.com>
To: "ubely@ilbers.de" <ubely@ilbers.de>,
	"isar-users@googlegroups.com" <isar-users@googlegroups.com>,
	"Kiszka, Jan" <jan.kiszka@siemens.com>
Cc: "quirin.gylstorff@siemens.com" <quirin.gylstorff@siemens.com>,
	"Shekar, Kasturi" <kasturi.shekar@siemens.com>
Subject: Re: [PATCH 03/17] meta-isar: deploy-image: Change reboot logic
Date: Fri, 14 Mar 2025 14:25:55 +0000	[thread overview]
Message-ID: <be6af9287630689ae711b50cc247ccf1d4fcd498.camel@siemens.com> (raw)
In-Reply-To: <f56884186356c514f66cbea793b521640be7ee1a.camel@ilbers.de>

On Fri, 2025-03-14 at 09:25 +0300, Uladzimir Bely wrote:
> On Fri, 2025-03-14 at 06:54 +0100, Jan Kiszka wrote:
> > On 13.03.25 14:10, Uladzimir Bely wrote:
> > > On Tue, 2024-07-02 at 15:38 +0200, 'Jan Kiszka' via isar-users
> > > wrote:
> > > > From: Jan Kiszka <jan.kiszka@siemens.com>
> > > > 
> > > > Pull the reboot out of the script. This allows for cleaner
> > > > integration
> > > > with different calling environment, may they be a systemd unit,
> > > > an
> > > > initramfs script or simply a shell for testing purposes.
> > > > 
> > > > And if the script exits with an error, wait a minute before
> > > > rebooting
> > > > the system, rather than just trying to re-execute it. This
> > > > permits to
> > > > inspect potential error as well.
> > > > 
> > > > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > > > ---
> > > >  .../recipes-installer/deploy-image/files/deploy-image-
> > > > wic.sh   
> > > > > 2
> > > > +-
> > > >  .../recipes-installer/deploy-
> > > > image/files/install.override.conf 
> > > > > 2
> > > > +-
> > > >  2 files changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/meta-isar/recipes-installer/deploy-
> > > > image/files/deploy-
> > > > image-wic.sh b/meta-isar/recipes-installer/deploy-
> > > > image/files/deploy-
> > > > image-wic.sh
> > > > index 8043aff1..12c1eea2 100644
> > > > --- a/meta-isar/recipes-installer/deploy-image/files/deploy-
> > > > image-
> > > > wic.sh
> > > > +++ b/meta-isar/recipes-installer/deploy-image/files/deploy-
> > > > image-
> > > > wic.sh
> > > > @@ -105,4 +105,4 @@ fi
> > > >  umount "$installdata"
> > > >  sync
> > > >  dialog --title "Reboot" --msgbox "Installation is successful.
> > > > System
> > > > will be rebooted. Please remove the USB stick." 7 60
> > > > -reboot
> > > > +exit 0
> > > > diff --git a/meta-isar/recipes-installer/deploy-
> > > > image/files/install.override.conf b/meta-isar/recipes-
> > > > installer/deploy-image/files/install.override.conf
> > > > index 73874caa..357d8662 100644
> > > > --- a/meta-isar/recipes-installer/deploy-
> > > > image/files/install.override.conf
> > > > +++ b/meta-isar/recipes-installer/deploy-
> > > > image/files/install.override.conf
> > > > @@ -1,5 +1,5 @@
> > > >  [Service]
> > > >  ExecStart=
> > > > -ExecStart=/usr/bin/deploy-image-wic.sh
> > > > +ExecStart=/bin/sh -c "deploy-image-wic.sh || (echo 'Rebooting
> > > > in
> > > > 60
> > > > s'; sleep 60); reboot"
> > > 
> > > Hello Jan.
> > > 
> > > This change seems to cause a sort of race issue.
> > > 
> > > Few days ago "[PATCH v3 0/2] Cover installer image with tests"
> > > was
> > > merged. Tests were OK (and "next" is still OK) on two jenkins and
> > > one
> > > gitlab instances. But on the third jenkins instance, they were
> > > failing.
> > > 
> > > Boot log:
> > > 
> > > ```
> > > Got config:
> > >   installer_unattended=true
> > >   installer_image_uri=/install/isar-image-ci-debian-bookworm-
> > > qemuamd64.wic.zst
> > >   installer_target_dev=/dev/sda
> > >   installer_target_overwrite=OVERWRITE
> > > bmaptool: ERROR: An error occurred, here is the traceback:
> > > Traceback (most recent call last):
> > >   File "/usr/lib/python3/dist-packages/bmaptools/CLI.py", line
> > > 116,
> > > in
> > > open_block_device
> > >     descriptor = os.open(path, os.O_WRONLY | os.O_EXCL)
> > > 
> > > bmaptool: ERROR: cannot open block device '/dev/sda' in exclusive
> > > mode:
> > > [Errno 16] Device or resource busy: '/dev/sda'
> > > 
> > > Rebooting in 60 s
> > > ```
> > > 
> > > But either machine is not booted, or there are some "ext4" errors
> > > happen.
> > > 
> > > I couldn't understand few things:
> > >  - why this happens (unable to get exclusive access to block
> > > device)$
> > >  - why device is written even if there is such error in logs$
> > >  - why sometimes we don't see this error in logs$
> > >  - why this doesn't happen when "unattended mode" is off.


It does not happen when "unattended mode" is off, because the race is
interrupted by the script waiting for user input at the very beginning.

This is a race condition between multiple instances of the install
script executed in parallel, dependent on how many gettys are active.
Since the number of gettys and further, which getty get's connected in
the end, via terminal, via serial, ... is kind of hard to predict, imo
the only option to resolve that race is by adding a dedicated
(singleton) service which is executed on startup.

I am currently preparing a patch set where a fix for that is included,
similar to this:

```
[Unit]
Description=Target-bootstrapper service

[Service]
User=root
ExecStart=/bin/sh -c "/usr/bin/target-bootstrapper.sh || (echo
'Rebooting in 60 s'; sleep 60); reboot"

StandardOutput=journal+console
TTYPath=/dev/console

[Install]
WantedBy=multi-user.target
``

BR Alexander

> > > 
> > > The reason is that we have these two "getty" instances overriden
> > > by
> > > custom one having the race. While "serial-getty@ttyS0.service.d"
> > > does
> > > the job, "getty@tty1.service.d" produces this error visible in
> > > logs
> > > and
> > > reboot machine in 60 seconds.
> > > 
> > > In some cases 60sec is enough for first service to finish writing
> > > to
> > > the disc, but on slower machines we result in incomplete copy and
> > > broken system.
> > > 
> > > Before this patch, when "reboot" calls were in the script,
> > > "failed"
> > > instance simply exited and didn't try to reboot the board.
> > > 
> > > As a quick fix, I would reverted this patch.
> > 
> > That won't be a fix, just papering over.
> > 
> > The race is apparently around the target device becoming ready for
> > bmaptool, or our script racing with some other service accessing
> > the
> > device for a while. I suspect the service is lacking some
> > dependency
> > here. So, let's identify the race partner first: How to reproduce
> > locally?
> > 
> > Jan
> > 
> 
> No, my finding is that getty@tty1.service.d conflicts with
> serial-getty@ttyS0.service.d one. Every of them tries to run bmaptool
> when unattended mode is active.
> 
> meta-isar/recipes-installer/deploy-image-service/deploy-image-
> service.bb:
> 
> ```
> do_install() {
>   install -m 0600 ${WORKDIR}/install.override.conf
> ${D}/usr/lib/systemd/system/getty@tty1.service.d/override.conf
>   install -m 0600 ${WORKDIR}/install.override.conf
> ${D}/usr/lib/systemd/system/serial-getty@ttyS0.service.d/override.con
> f
> }
> ```
> 
> It can be easy reproduced if we decrease timeout from 60s to e.g.
> 10s,
> to simulate slower system:
> 
> ```
> isar $ sed -i -e 's/60/10/g' meta-isar/recipes-installer/deploy-
> image-
> service/files/install.override.conf 
> isar $ ./kas/kas-container shell kas/isar.yaml 
> builder@b21a175d3b78:/build$ rm -rf conf/
> builder@b21a175d3b78:/build$ /repo/scripts/ci_build.sh -T installer
> ```
> 
> Failed instance of overriden getty reboots the system while the
> target
> is being populated with the not failed instance.
> 
> Reboot hangs at something like
> 
> ```
> [    9.323801] JBD2: no valid journal superblock found
> [    9.325117] EXT4-fs (sda2): error loading journal
> ```
> 
> 
> > > 
> > > Also, I think this behaviour may impact on bmap-tools "psplash"
> > > patches
> > > currently we have on maillist. I would expect that user won't see
> > > the
> > > progress bar on screen because it's somewhere on serial console.
> > > 
> > > 
> > > >  StandardInput=tty
> > > >  StandardOutput=tty
> > > > -- 
> > > > 2.43.0
> > > > 
> > > 
> > 
> 
> -- 
> Best regards,
> Uladzimir.
> 
> 
> 

-- 
You received this message because you are subscribed to the Google Groups "isar-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isar-users+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/isar-users/be6af9287630689ae711b50cc247ccf1d4fcd498.camel%40siemens.com.

  reply	other threads:[~2025-03-14 14:26 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-02 13:38 [PATCH 00/17] Reworks, fixes and unattended mode for image installer Jan Kiszka
2024-07-02 13:38 ` [PATCH 01/17] Kconfig: Rework installer image submenu Jan Kiszka
2024-07-02 13:38 ` [PATCH 02/17] installer: Do not show systemd boot menu Jan Kiszka
2024-07-02 13:38 ` [PATCH 03/17] meta-isar: deploy-image: Change reboot logic Jan Kiszka
2025-03-13 13:10   ` Uladzimir Bely
2025-03-14  5:54     ` 'Jan Kiszka' via isar-users
2025-03-14  6:25       ` Uladzimir Bely
2025-03-14 14:25         ` 'Heinisch, Alexander' via isar-users [this message]
2025-03-14 15:04           ` Uladzimir Bely
2024-07-02 13:38 ` [PATCH 04/17] meta-isar: deploy-image: Drop umount attempt after installation Jan Kiszka
2024-07-02 13:38 ` [PATCH 05/17] meta-isar: deploy-image: Fix bmap support Jan Kiszka
2024-07-02 13:38 ` [PATCH 06/17] meta-isar: deploy-image: Improve root mountpoint discovery Jan Kiszka
2024-07-02 13:38 ` [PATCH 07/17] meta-isar: deploy-image: Make TARGET_DEVICE a complete path Jan Kiszka
2024-07-02 13:38 ` [PATCH 08/17] meta-isar: deploy-image: Drop pointless --clear options from dialog Jan Kiszka
2024-07-02 13:38 ` [PATCH 09/17] meta-isar: deploy-image: Allow to cancel installation Jan Kiszka
2024-07-02 13:38 ` [PATCH 10/17] meta-isar: deploy-image: Warn if overwriting a non-empty disk Jan Kiszka
2024-07-03 15:53   ` 'MOESSBAUER, Felix' via isar-users
2024-07-03 15:55     ` 'Jan Kiszka' via isar-users
2024-07-19  5:33       ` Uladzimir Bely
2024-07-19  5:44         ` 'Jan Kiszka' via isar-users
2024-07-02 13:38 ` [PATCH 11/17] meta-isar: deploy-image: Fix and enhance image selection dialog Jan Kiszka
2024-07-02 13:38 ` [PATCH 12/17] meta-isar: deploy-image: Improve target device list dialog Jan Kiszka
2024-07-02 13:38 ` [PATCH 13/17] meta-isar: deploy-image: Polish some dialogs Jan Kiszka
2024-07-02 13:38 ` [PATCH 14/17] meta-isar: deploy-image: Re-indent Jan Kiszka
2024-07-02 13:38 ` [PATCH 15/17] meta-isar: deploy-image: Prepare for auto-installation mode Jan Kiszka
2024-07-02 13:38 ` [PATCH 16/17] meta-isar: deploy-image: Introduce " Jan Kiszka
2024-07-03 15:56   ` 'MOESSBAUER, Felix' via isar-users
2024-07-03 16:08     ` 'Jan Kiszka' via isar-users
2024-07-02 13:38 ` [PATCH 17/17] meta-isar: deploy-image: Polish recipe Jan Kiszka
2024-07-23  7:40 ` [PATCH 00/17] Reworks, fixes and unattended mode for image installer Uladzimir Bely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=be6af9287630689ae711b50cc247ccf1d4fcd498.camel@siemens.com \
    --to=isar-users@googlegroups.com \
    --cc=alexander.heinisch@siemens.com \
    --cc=jan.kiszka@siemens.com \
    --cc=kasturi.shekar@siemens.com \
    --cc=quirin.gylstorff@siemens.com \
    --cc=ubely@ilbers.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox