From: "Moessbauer, Felix" <felix.moessbauer@siemens.com>
To: "Schild, Henning" <henning.schild@siemens.com>
Cc: "Bovensiepen, Daniel (bovi)" <daniel.bovensiepen@siemens.com>,
"isar-users@googlegroups.com" <isar-users@googlegroups.com>,
"Kiszka, Jan" <jan.kiszka@siemens.com>,
"venkata.pyla@toshiba-tsip.com" <venkata.pyla@toshiba-tsip.com>
Subject: Re: [PATCH 03/11] rootfs postprocess: clean python cache
Date: Wed, 11 Jan 2023 13:18:26 +0000 [thread overview]
Message-ID: <360bbce523ed35f7687788f8a6cb946bdfa447c3.camel@siemens.com> (raw)
In-Reply-To: <20230111134756.77c9564a@md1za8fc.ad001.siemens.net>
On Wed, 2023-01-11 at 13:47 +0100, Henning Schild wrote:
> Am Wed, 11 Jan 2023 09:23:01 +0100
> schrieb "Moessbauer, Felix (T CED INW-CN)"
> <felix.moessbauer@siemens.com>:
>
> > On Wed, 2023-01-11 at 09:06 +0100, Henning Schild wrote:
> > > Am Wed, 11 Jan 2023 04:11:32 +0000
> > > schrieb Felix Moessbauer <felix.moessbauer@siemens.com>:
> > >
> > > > When calling python scripts, python automatically creates cache
> > > > files
> > > > to speedup future invocations of the same sources. This often
> > > > happens
> > > > in postinst scripts, that directly run in the image chroot. The
> > > > created debian packages do not ship these files, as the
> > > > debheper
> > > > scripts remove them before installing.
> > > >
> > > > For the rootfs part, we manually have to do it to also not
> > > > include these in the final image. This patch implements this
> > > > logic
> > > > in
> > > > a custom cleanup postprocess step. As there might be situations
> > > > where
> > > > shipping of a subset of the caches is desireable (e.g. readonly
> > > > rootfs
> > > > images), we add support to control this logic using
> > > > ROOTFS_FEATURES.
> > > >
> > > > Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
> > > > ---
> > > > meta/classes/image.bbclass | 2 +-
> > > > meta/classes/rootfs.bbclass | 6 ++++++
> > > > 2 files changed, 7 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/meta/classes/image.bbclass
> > > > b/meta/classes/image.bbclass
> > > > index 519a2e5..b86a428 100644
> > > > --- a/meta/classes/image.bbclass
> > > > +++ b/meta/classes/image.bbclass
> > > > @@ -80,7 +80,7 @@ image_do_mounts() {
> > > > }
> > > >
> > > > ROOTFSDIR = "${IMAGE_ROOTFS}"
> > > > -ROOTFS_FEATURES += "clean-package-cache generate-manifest
> > > > export-dpkg-status clean-log-files clean-debconf-cache"
> > > > +ROOTFS_FEATURES += "clean-package-cache clean-pycache
> > > > generate-manifest export-dpkg-status clean-log-files
> > > > clean-debconf-cache" ROOTFS_PACKAGES += "${IMAGE_PREINSTALL}
> > > > ${IMAGE_INSTALL}" ROOTFS_MANIFEST_DEPLOY_DIR ?=
> > > > "${DEPLOY_DIR_IMAGE}"
> > > > ROOTFS_DPKGSTATUS_DEPLOY_DIR ?= "${DEPLOY_DIR_IMAGE}" diff --
> > > > git
> > > > a/meta/classes/rootfs.bbclass b/meta/classes/rootfs.bbclass
> > > > index
> > > > 786682d..325e7ae 100644 --- a/meta/classes/rootfs.bbclass +++
> > > > b/meta/classes/rootfs.bbclass @@ -252,6 +252,12 @@
> > > > rootfs_postprocess_clean_debconf_cache() { sudo rm -rf
> > > > "${ROOTFSDIR}/var/cache/debconf/"* }
> > > >
> > > > +ROOTFS_POSTPROCESS_COMMAND +=
> > > > "${@bb.utils.contains('ROOTFS_FEATURES', 'clean-pycache',
> > > > 'rootfs_postprocess_clean_pycache', '', d)}"
> > > > +rootfs_postprocess_clean_pycache() {
> > > > + sudo find ${ROOTFSDIR}/usr -type f -name '*.pyc'
> > > > -delete -print
> > > > + sudo find ${ROOTFSDIR}/usr -type d -name '__pycache__'
> > > > -delete -print +}
> > >
> > > Are we sure that this can never be valid content of any package?
> > > I
> > > suggest we double check with dpkg.
> >
> > I already checked this. Shipping the __pycache__ folder is a
> > linitan
> > error [1], shipping any .pyc files is a linitan warning [2].
> >
> > Adding bbwarn here does not make sense either, as we cannot
> > distinguish between pycache entries from a broken package and ones
> > created by postinst scripts. Anyways, pyc files are just cache
> > files
> > and these should not be part of any package or image.
>
> Can we not ask dpkg -S for every file before we delete it? Removing
> files owned by package would likely be wrong. No matter what you
> might
This does not scale. We are talking about potentially thousands of pyc
files (e.g. for tensorflow or pytorch).
> think of the quality of such a package and how many debian rules you
> cite. We have these kinds of packages, coming from funny vendors and
> maybe also from weird recipes.
I know. Anyways, the python code will very likely break in case only
the .pyc files are on the system, as these files depend on many
conditions which are different in the buildchroot and on the target.
In case any of the conditions is not met, it will be re-generated from
the .py file. Let us please not try to create overly complex solutions
for use-cases that are broken / invalid in the first place.
In short: I'm strictly against not removing these files. I even thought
about always running this cleanup command unconditionally.
I would also appreciate if we do not delay the whole reproducibility
story just because there might be some exotic and invalid use-cases
that break.
Felix
>
> I am not worried about packages coming from debian and built with
> debian tooling.
>
> Henning
>
> > In case a user really wants to ship .pyc files, he can still
> > disable
> > this rootfs feature. But the debian ruleset should be our baseline,
> > not some erroneous behavior that somebody could implement.
> >
> > [1]
> > https://lintian.debian.org/tags/package-installs-python-pycache-dir
> > [2]
> >
> > https://lintian.debian.org/tags/source-contains-prebuilt-python-object
> >
> > Felix
> >
> > >
> > > Henning
> > >
> > > > ROOTFS_POSTPROCESS_COMMAND +=
> > > > "${@bb.utils.contains('ROOTFS_FEATURES', 'generate-manifest',
> > > > 'rootfs_generate_manifest', '', d)}" rootfs_generate_manifest
> > > > () {
> > > > mkdir -p ${ROOTFS_MANIFEST_DEPLOY_DIR}
> > >
> >
>
next prev parent reply other threads:[~2023-01-11 13:18 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-11 4:11 [PATCH 00/11] Make rootfs build reproducible Felix Moessbauer
2023-01-11 4:11 ` [PATCH 01/11] fix rebuild of rootfs_finalize task Felix Moessbauer
2023-01-11 4:11 ` [PATCH 02/11] image.bbclass: fix non-reproducible file time-stamps inside rootfs Felix Moessbauer
2023-01-11 4:11 ` [PATCH 03/11] rootfs postprocess: clean python cache Felix Moessbauer
2023-01-11 8:06 ` Henning Schild
2023-01-11 8:23 ` Moessbauer, Felix
2023-01-11 12:47 ` Henning Schild
2023-01-11 13:18 ` Moessbauer, Felix [this message]
2023-01-11 13:23 ` Jan Kiszka
2023-01-11 4:11 ` [PATCH 04/11] remove non-portable ldconfig aux-cache Felix Moessbauer
2023-01-11 8:19 ` Henning Schild
2023-01-11 8:31 ` Moessbauer, Felix
2023-01-11 12:52 ` Henning Schild
2023-01-11 4:11 ` [PATCH 05/11] generate deterministic clear-text password hash Felix Moessbauer
2023-01-11 8:21 ` Henning Schild
2023-01-11 4:11 ` [PATCH 06/11] update debian initramfs in deterministic mode Felix Moessbauer
2023-01-11 8:23 ` Henning Schild
2023-01-11 8:39 ` Moessbauer, Felix
2023-01-11 12:55 ` Henning Schild
2023-01-11 4:11 ` [PATCH 07/11] create custom " Felix Moessbauer
2023-01-11 4:11 ` [PATCH 08/11] make deb_add_changelog idempotent Felix Moessbauer
2023-01-11 4:11 ` [PATCH 09/11] deb_add_changelog: set timestamp to valid epoch Felix Moessbauer
2023-01-11 4:11 ` [PATCH 10/11] deb_add_changelog: use SOURCE_DATE_EPOCH Felix Moessbauer
2023-01-11 8:49 ` Henning Schild
2023-01-11 9:06 ` Moessbauer, Felix
2023-01-11 4:11 ` [PATCH 11/11] make custom linux-image bit-by-bit reproducible Felix Moessbauer
2023-01-11 6:51 ` [PATCH 00/11] Make rootfs build reproducible Jan Kiszka
2023-01-11 9:04 ` Venkata.Pyla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=360bbce523ed35f7687788f8a6cb946bdfa447c3.camel@siemens.com \
--to=felix.moessbauer@siemens.com \
--cc=daniel.bovensiepen@siemens.com \
--cc=henning.schild@siemens.com \
--cc=isar-users@googlegroups.com \
--cc=jan.kiszka@siemens.com \
--cc=venkata.pyla@toshiba-tsip.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox