public inbox for isar-users@googlegroups.com
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@siemens.com>
To: "Moessbauer, Felix (T CED INW-CN)" <felix.moessbauer@siemens.com>,
	"Schild, Henning (T CED SES-DE)" <henning.schild@siemens.com>
Cc: "Bovensiepen,
	Daniel (bovi) (T CED INW-CN)" <daniel.bovensiepen@siemens.com>,
	"isar-users@googlegroups.com" <isar-users@googlegroups.com>,
	"venkata.pyla@toshiba-tsip.com" <venkata.pyla@toshiba-tsip.com>
Subject: Re: [PATCH 03/11] rootfs postprocess: clean python cache
Date: Wed, 11 Jan 2023 14:23:56 +0100	[thread overview]
Message-ID: <1a33ebd0-5b4a-87d6-01a9-4b0c41cb3e8e@siemens.com> (raw)
In-Reply-To: <360bbce523ed35f7687788f8a6cb946bdfa447c3.camel@siemens.com>

On 11.01.23 14:18, Moessbauer, Felix (T CED INW-CN) wrote:
> On Wed, 2023-01-11 at 13:47 +0100, Henning Schild wrote:
>> Am Wed, 11 Jan 2023 09:23:01 +0100
>> schrieb "Moessbauer, Felix (T CED INW-CN)"
>> <felix.moessbauer@siemens.com>:
>>
>>> On Wed, 2023-01-11 at 09:06 +0100, Henning Schild wrote:
>>>> Am Wed, 11 Jan 2023 04:11:32 +0000
>>>> schrieb Felix Moessbauer <felix.moessbauer@siemens.com>:
>>>>
>>>>> When calling python scripts, python automatically creates cache
>>>>> files
>>>>> to speedup future invocations of the same sources. This often
>>>>> happens
>>>>> in postinst scripts, that directly run in the image chroot. The
>>>>> created debian packages do not ship these files, as the
>>>>> debheper
>>>>> scripts remove them before installing.
>>>>>
>>>>> For the rootfs part, we manually have to do it to also not
>>>>> include these in the final image. This patch implements this
>>>>> logic
>>>>> in
>>>>> a custom cleanup postprocess step. As there might be situations
>>>>> where
>>>>> shipping of a subset of the caches is desireable (e.g. readonly
>>>>> rootfs
>>>>> images), we add support to control this logic using
>>>>> ROOTFS_FEATURES.
>>>>>
>>>>> Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
>>>>> ---
>>>>>  meta/classes/image.bbclass  | 2 +-
>>>>>  meta/classes/rootfs.bbclass | 6 ++++++
>>>>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/meta/classes/image.bbclass
>>>>> b/meta/classes/image.bbclass
>>>>> index 519a2e5..b86a428 100644
>>>>> --- a/meta/classes/image.bbclass
>>>>> +++ b/meta/classes/image.bbclass
>>>>> @@ -80,7 +80,7 @@ image_do_mounts() {
>>>>>  }
>>>>>
>>>>>  ROOTFSDIR = "${IMAGE_ROOTFS}"
>>>>> -ROOTFS_FEATURES += "clean-package-cache generate-manifest
>>>>> export-dpkg-status clean-log-files clean-debconf-cache"
>>>>> +ROOTFS_FEATURES += "clean-package-cache clean-pycache
>>>>> generate-manifest export-dpkg-status clean-log-files
>>>>> clean-debconf-cache" ROOTFS_PACKAGES += "${IMAGE_PREINSTALL}
>>>>> ${IMAGE_INSTALL}" ROOTFS_MANIFEST_DEPLOY_DIR ?=
>>>>> "${DEPLOY_DIR_IMAGE}"
>>>>> ROOTFS_DPKGSTATUS_DEPLOY_DIR ?= "${DEPLOY_DIR_IMAGE}" diff --
>>>>> git
>>>>> a/meta/classes/rootfs.bbclass b/meta/classes/rootfs.bbclass
>>>>> index
>>>>> 786682d..325e7ae 100644 --- a/meta/classes/rootfs.bbclass +++
>>>>> b/meta/classes/rootfs.bbclass @@ -252,6 +252,12 @@
>>>>> rootfs_postprocess_clean_debconf_cache() { sudo rm -rf
>>>>> "${ROOTFSDIR}/var/cache/debconf/"* }
>>>>>
>>>>> +ROOTFS_POSTPROCESS_COMMAND +=
>>>>> "${@bb.utils.contains('ROOTFS_FEATURES', 'clean-pycache',
>>>>> 'rootfs_postprocess_clean_pycache', '', d)}"
>>>>> +rootfs_postprocess_clean_pycache() {
>>>>> +    sudo find ${ROOTFSDIR}/usr -type f -name '*.pyc'
>>>>> -delete -print
>>>>> +    sudo find ${ROOTFSDIR}/usr -type d -name '__pycache__'
>>>>> -delete -print +}
>>>>
>>>> Are we sure that this can never be valid content of any package?
>>>> I
>>>> suggest we double check with dpkg.
>>>
>>> I already checked this. Shipping the __pycache__ folder is a
>>> linitan
>>> error [1], shipping any .pyc files is a linitan warning [2].
>>>
>>> Adding bbwarn here does not make sense either, as we cannot
>>> distinguish between pycache entries from a broken package and ones
>>> created by postinst scripts. Anyways, pyc files are just cache
>>> files
>>> and these should not be part of any package or image.
>>
>> Can we not ask dpkg -S for every file before we delete it? Removing
>> files owned by package would likely be wrong. No matter what you
>> might
> 
> This does not scale. We are talking about potentially thousands of pyc
> files (e.g. for tensorflow or pytorch).
> 
>> think of the quality of such a package and how many debian rules you
>> cite. We have these kinds of packages, coming from funny vendors and
>> maybe also from weird recipes.
> 
> I know. Anyways, the python code will very likely break in case only
> the .pyc files are on the system, as these files depend on many
> conditions which are different in the buildchroot and on the target.
> In case any of the conditions is not met, it will be re-generated from
> the .py file. Let us please not try to create overly complex solutions
> for use-cases that are broken / invalid in the first place.
> 
> In short: I'm strictly against not removing these files. I even thought
> about always running this cleanup command unconditionally.
> 
> I would also appreciate if we do not delay the whole reproducibility
> story just because there might be some exotic and invalid use-cases
> that break.

Yes, focus should be on clean Debian packages first. If broken
downstream stumbles too often, we can still take measures.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


  reply	other threads:[~2023-01-11 13:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-11  4:11 [PATCH 00/11] Make rootfs build reproducible Felix Moessbauer
2023-01-11  4:11 ` [PATCH 01/11] fix rebuild of rootfs_finalize task Felix Moessbauer
2023-01-11  4:11 ` [PATCH 02/11] image.bbclass: fix non-reproducible file time-stamps inside rootfs Felix Moessbauer
2023-01-11  4:11 ` [PATCH 03/11] rootfs postprocess: clean python cache Felix Moessbauer
2023-01-11  8:06   ` Henning Schild
2023-01-11  8:23     ` Moessbauer, Felix
2023-01-11 12:47       ` Henning Schild
2023-01-11 13:18         ` Moessbauer, Felix
2023-01-11 13:23           ` Jan Kiszka [this message]
2023-01-11  4:11 ` [PATCH 04/11] remove non-portable ldconfig aux-cache Felix Moessbauer
2023-01-11  8:19   ` Henning Schild
2023-01-11  8:31     ` Moessbauer, Felix
2023-01-11 12:52       ` Henning Schild
2023-01-11  4:11 ` [PATCH 05/11] generate deterministic clear-text password hash Felix Moessbauer
2023-01-11  8:21   ` Henning Schild
2023-01-11  4:11 ` [PATCH 06/11] update debian initramfs in deterministic mode Felix Moessbauer
2023-01-11  8:23   ` Henning Schild
2023-01-11  8:39     ` Moessbauer, Felix
2023-01-11 12:55       ` Henning Schild
2023-01-11  4:11 ` [PATCH 07/11] create custom " Felix Moessbauer
2023-01-11  4:11 ` [PATCH 08/11] make deb_add_changelog idempotent Felix Moessbauer
2023-01-11  4:11 ` [PATCH 09/11] deb_add_changelog: set timestamp to valid epoch Felix Moessbauer
2023-01-11  4:11 ` [PATCH 10/11] deb_add_changelog: use SOURCE_DATE_EPOCH Felix Moessbauer
2023-01-11  8:49   ` Henning Schild
2023-01-11  9:06     ` Moessbauer, Felix
2023-01-11  4:11 ` [PATCH 11/11] make custom linux-image bit-by-bit reproducible Felix Moessbauer
2023-01-11  6:51 ` [PATCH 00/11] Make rootfs build reproducible Jan Kiszka
2023-01-11  9:04 ` Venkata.Pyla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a33ebd0-5b4a-87d6-01a9-4b0c41cb3e8e@siemens.com \
    --to=jan.kiszka@siemens.com \
    --cc=daniel.bovensiepen@siemens.com \
    --cc=felix.moessbauer@siemens.com \
    --cc=henning.schild@siemens.com \
    --cc=isar-users@googlegroups.com \
    --cc=venkata.pyla@toshiba-tsip.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox