From: Jan Kiszka <jan.kiszka@siemens.com>
To: "Moessbauer, Felix (T CED INW-CN)" <felix.moessbauer@siemens.com>,
"Schild, Henning (T CED SES-DE)" <henning.schild@siemens.com>
Cc: "Bovensiepen,
Daniel (bovi) (T CED INW-CN)" <daniel.bovensiepen@siemens.com>,
"isar-users@googlegroups.com" <isar-users@googlegroups.com>,
"venkata.pyla@toshiba-tsip.com" <venkata.pyla@toshiba-tsip.com>
Subject: Re: [PATCH 03/11] rootfs postprocess: clean python cache
Date: Wed, 11 Jan 2023 14:23:56 +0100 [thread overview]
Message-ID: <1a33ebd0-5b4a-87d6-01a9-4b0c41cb3e8e@siemens.com> (raw)
In-Reply-To: <360bbce523ed35f7687788f8a6cb946bdfa447c3.camel@siemens.com>
On 11.01.23 14:18, Moessbauer, Felix (T CED INW-CN) wrote:
> On Wed, 2023-01-11 at 13:47 +0100, Henning Schild wrote:
>> Am Wed, 11 Jan 2023 09:23:01 +0100
>> schrieb "Moessbauer, Felix (T CED INW-CN)"
>> <felix.moessbauer@siemens.com>:
>>
>>> On Wed, 2023-01-11 at 09:06 +0100, Henning Schild wrote:
>>>> Am Wed, 11 Jan 2023 04:11:32 +0000
>>>> schrieb Felix Moessbauer <felix.moessbauer@siemens.com>:
>>>>
>>>>> When calling python scripts, python automatically creates cache
>>>>> files
>>>>> to speedup future invocations of the same sources. This often
>>>>> happens
>>>>> in postinst scripts, that directly run in the image chroot. The
>>>>> created debian packages do not ship these files, as the
>>>>> debheper
>>>>> scripts remove them before installing.
>>>>>
>>>>> For the rootfs part, we manually have to do it to also not
>>>>> include these in the final image. This patch implements this
>>>>> logic
>>>>> in
>>>>> a custom cleanup postprocess step. As there might be situations
>>>>> where
>>>>> shipping of a subset of the caches is desireable (e.g. readonly
>>>>> rootfs
>>>>> images), we add support to control this logic using
>>>>> ROOTFS_FEATURES.
>>>>>
>>>>> Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
>>>>> ---
>>>>> meta/classes/image.bbclass | 2 +-
>>>>> meta/classes/rootfs.bbclass | 6 ++++++
>>>>> 2 files changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/meta/classes/image.bbclass
>>>>> b/meta/classes/image.bbclass
>>>>> index 519a2e5..b86a428 100644
>>>>> --- a/meta/classes/image.bbclass
>>>>> +++ b/meta/classes/image.bbclass
>>>>> @@ -80,7 +80,7 @@ image_do_mounts() {
>>>>> }
>>>>>
>>>>> ROOTFSDIR = "${IMAGE_ROOTFS}"
>>>>> -ROOTFS_FEATURES += "clean-package-cache generate-manifest
>>>>> export-dpkg-status clean-log-files clean-debconf-cache"
>>>>> +ROOTFS_FEATURES += "clean-package-cache clean-pycache
>>>>> generate-manifest export-dpkg-status clean-log-files
>>>>> clean-debconf-cache" ROOTFS_PACKAGES += "${IMAGE_PREINSTALL}
>>>>> ${IMAGE_INSTALL}" ROOTFS_MANIFEST_DEPLOY_DIR ?=
>>>>> "${DEPLOY_DIR_IMAGE}"
>>>>> ROOTFS_DPKGSTATUS_DEPLOY_DIR ?= "${DEPLOY_DIR_IMAGE}" diff --
>>>>> git
>>>>> a/meta/classes/rootfs.bbclass b/meta/classes/rootfs.bbclass
>>>>> index
>>>>> 786682d..325e7ae 100644 --- a/meta/classes/rootfs.bbclass +++
>>>>> b/meta/classes/rootfs.bbclass @@ -252,6 +252,12 @@
>>>>> rootfs_postprocess_clean_debconf_cache() { sudo rm -rf
>>>>> "${ROOTFSDIR}/var/cache/debconf/"* }
>>>>>
>>>>> +ROOTFS_POSTPROCESS_COMMAND +=
>>>>> "${@bb.utils.contains('ROOTFS_FEATURES', 'clean-pycache',
>>>>> 'rootfs_postprocess_clean_pycache', '', d)}"
>>>>> +rootfs_postprocess_clean_pycache() {
>>>>> + sudo find ${ROOTFSDIR}/usr -type f -name '*.pyc'
>>>>> -delete -print
>>>>> + sudo find ${ROOTFSDIR}/usr -type d -name '__pycache__'
>>>>> -delete -print +}
>>>>
>>>> Are we sure that this can never be valid content of any package?
>>>> I
>>>> suggest we double check with dpkg.
>>>
>>> I already checked this. Shipping the __pycache__ folder is a
>>> linitan
>>> error [1], shipping any .pyc files is a linitan warning [2].
>>>
>>> Adding bbwarn here does not make sense either, as we cannot
>>> distinguish between pycache entries from a broken package and ones
>>> created by postinst scripts. Anyways, pyc files are just cache
>>> files
>>> and these should not be part of any package or image.
>>
>> Can we not ask dpkg -S for every file before we delete it? Removing
>> files owned by package would likely be wrong. No matter what you
>> might
>
> This does not scale. We are talking about potentially thousands of pyc
> files (e.g. for tensorflow or pytorch).
>
>> think of the quality of such a package and how many debian rules you
>> cite. We have these kinds of packages, coming from funny vendors and
>> maybe also from weird recipes.
>
> I know. Anyways, the python code will very likely break in case only
> the .pyc files are on the system, as these files depend on many
> conditions which are different in the buildchroot and on the target.
> In case any of the conditions is not met, it will be re-generated from
> the .py file. Let us please not try to create overly complex solutions
> for use-cases that are broken / invalid in the first place.
>
> In short: I'm strictly against not removing these files. I even thought
> about always running this cleanup command unconditionally.
>
> I would also appreciate if we do not delay the whole reproducibility
> story just because there might be some exotic and invalid use-cases
> that break.
Yes, focus should be on clean Debian packages first. If broken
downstream stumbles too often, we can still take measures.
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
next prev parent reply other threads:[~2023-01-11 13:24 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-11 4:11 [PATCH 00/11] Make rootfs build reproducible Felix Moessbauer
2023-01-11 4:11 ` [PATCH 01/11] fix rebuild of rootfs_finalize task Felix Moessbauer
2023-01-11 4:11 ` [PATCH 02/11] image.bbclass: fix non-reproducible file time-stamps inside rootfs Felix Moessbauer
2023-01-11 4:11 ` [PATCH 03/11] rootfs postprocess: clean python cache Felix Moessbauer
2023-01-11 8:06 ` Henning Schild
2023-01-11 8:23 ` Moessbauer, Felix
2023-01-11 12:47 ` Henning Schild
2023-01-11 13:18 ` Moessbauer, Felix
2023-01-11 13:23 ` Jan Kiszka [this message]
2023-01-11 4:11 ` [PATCH 04/11] remove non-portable ldconfig aux-cache Felix Moessbauer
2023-01-11 8:19 ` Henning Schild
2023-01-11 8:31 ` Moessbauer, Felix
2023-01-11 12:52 ` Henning Schild
2023-01-11 4:11 ` [PATCH 05/11] generate deterministic clear-text password hash Felix Moessbauer
2023-01-11 8:21 ` Henning Schild
2023-01-11 4:11 ` [PATCH 06/11] update debian initramfs in deterministic mode Felix Moessbauer
2023-01-11 8:23 ` Henning Schild
2023-01-11 8:39 ` Moessbauer, Felix
2023-01-11 12:55 ` Henning Schild
2023-01-11 4:11 ` [PATCH 07/11] create custom " Felix Moessbauer
2023-01-11 4:11 ` [PATCH 08/11] make deb_add_changelog idempotent Felix Moessbauer
2023-01-11 4:11 ` [PATCH 09/11] deb_add_changelog: set timestamp to valid epoch Felix Moessbauer
2023-01-11 4:11 ` [PATCH 10/11] deb_add_changelog: use SOURCE_DATE_EPOCH Felix Moessbauer
2023-01-11 8:49 ` Henning Schild
2023-01-11 9:06 ` Moessbauer, Felix
2023-01-11 4:11 ` [PATCH 11/11] make custom linux-image bit-by-bit reproducible Felix Moessbauer
2023-01-11 6:51 ` [PATCH 00/11] Make rootfs build reproducible Jan Kiszka
2023-01-11 9:04 ` Venkata.Pyla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1a33ebd0-5b4a-87d6-01a9-4b0c41cb3e8e@siemens.com \
--to=jan.kiszka@siemens.com \
--cc=daniel.bovensiepen@siemens.com \
--cc=felix.moessbauer@siemens.com \
--cc=henning.schild@siemens.com \
--cc=isar-users@googlegroups.com \
--cc=venkata.pyla@toshiba-tsip.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox