From mboxrd@z Thu Jan 1 00:00:00 1970 X-GM-THRID: 7085973421677346816 X-Received: by 2002:a5d:528b:0:b0:203:d928:834c with SMTP id c11-20020a5d528b000000b00203d928834cmr1062142wrv.500.1649921800862; Thu, 14 Apr 2022 00:36:40 -0700 (PDT) X-BeenThere: isar-users@googlegroups.com Received: by 2002:a05:600c:1d8a:b0:38e:d19c:f24f with SMTP id p10-20020a05600c1d8a00b0038ed19cf24fls2715212wms.3.canary-gmail; Thu, 14 Apr 2022 00:36:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwf9AeJY4rIRgmFEtVZA3R588tuJb4yq5lxE6NM5EoK2qVtR/nPxkIWsjnoB+AvUyB0S32Q X-Received: by 2002:a05:600c:1d18:b0:38f:f19c:37ee with SMTP id l24-20020a05600c1d1800b0038ff19c37eemr2346913wms.88.1649921799834; Thu, 14 Apr 2022 00:36:39 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1649921799; cv=pass; d=google.com; s=arc-20160816; b=tyf+QIZVV0Y5BMSZVjybbreN6E/KVkDGRppEttZZFO63KQQ12T/0uVsSpQgQVaOuk7 BZRvqulK31/M1pYyW7OY3e4A/SrU5JVabcRXS35U2zsmYISRvdNmAH3CDKoX25ug4X6j G5mraUC7n2ElLSD+i/EV1h6B8FrhaV43c0dra7n8mhVyUUQ/M+n6MGxYsq5lyGvEFcct KQ93a7eBIRNIFCPihGhBrqvxcsVl6FiZPCqKOVqIBAB6LmZbcO/Ma+yOvo4OV9OVLrMS kidZBR9gIvw/jZwp2xfpHZg6zv0Q20JGoFvPkvY2IJcbIrIkv/1Idnc218k0a2/s3tNU CdhQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:content-transfer-encoding:references:in-reply-to :message-id:subject:cc:to:from:date:dkim-signature; bh=YEQhLOeIypo42P3oaUSpQRds9S3El7Kt2NOyizd/Q8o=; b=phyA6sPziBeMSg+njB/0XH4nzj0MLZPH1DJG2UAxrpJ8iSM5pvtmftAtSTZw3sk809 SKint+ticWc9O4YCY8mTL1VPG+5ELtL3h/CyQUFXIVBiwe3Itf+ZcGS6s1QMyRR7RBRp PqJPPZ7j0FdWaZ8MsVdTfzCNR/1i+iB7GPZfnM056HHU3m5ac9FoSQi3uElwiscMPa+j KSvO/PtvQ5VjyW/jse5KKCkT8aB20t9S+GllUUvNuJdtEDllqgHk2DtnV39vtSSfMWHF a/f+o5MbfhD/qW4jvaoI/xMZXnqVNHqvX5M9eYXn96wPG1voHt+5z9RZ9SkDrbkd+7Yo JNXQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@siemens.com header.s=selector2 header.b=OBGbOtsC; arc=pass (i=1 spf=pass spfdomain=siemens.com dkim=pass dkdomain=siemens.com dmarc=pass fromdomain=siemens.com); spf=pass (google.com: domain of henning.schild@siemens.com designates 2a01:111:f400:7e1a::60a as permitted sender) smtp.mailfrom=henning.schild@siemens.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=siemens.com Return-Path: Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2060a.outbound.protection.outlook.com. [2a01:111:f400:7e1a::60a]) by gmr-mx.google.com with ESMTPS id d10-20020a05600c34ca00b0038e564ac2cbsi246033wmq.4.2022.04.14.00.36.39 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Apr 2022 00:36:39 -0700 (PDT) Received-SPF: pass (google.com: domain of henning.schild@siemens.com designates 2a01:111:f400:7e1a::60a as permitted sender) client-ip=2a01:111:f400:7e1a::60a; Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@siemens.com header.s=selector2 header.b=OBGbOtsC; arc=pass (i=1 spf=pass spfdomain=siemens.com dkim=pass dkdomain=siemens.com dmarc=pass fromdomain=siemens.com); spf=pass (google.com: domain of henning.schild@siemens.com designates 2a01:111:f400:7e1a::60a as permitted sender) smtp.mailfrom=henning.schild@siemens.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=siemens.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WZfyvjFbPR/1K+4/qSNTO40byTPK1KL1Jv4yEOFj1NWruEoekm0gTNjDpHdXDXlDqNoLOgL9rh4l+BFy4qGprYT9dYwFLytKtEHuQTs9h4s+sP+JXWQnUuwtO/auUPq4q1tR6Ovx5cXxwCMDUad8X+myve50HxT+w7h8MKMig9zo8dA049eaz8YW/TYKOmDb551+GgiYaZyWXihbN0tK1EENcqMOGyfnQ0/GCkerqNleZ23koKO3BO9PG/d3nm/+rFohr6oCQPpMcAEQUjmoPgbawg17L4Mb9l95p1lfK/2y2S4I+mXwR7PLJKmRmlxywskxcqlMskT39SwThWhXgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YEQhLOeIypo42P3oaUSpQRds9S3El7Kt2NOyizd/Q8o=; b=nOn4Fd3T9oQbTj9wB4rLFTyZk+rGautLTqMRSpbDw87GP2xRcvIyxvf8Ffh2GBcXorqgvmoA9Wn0ZaMx838jbLAHx4LmqGQTya/00fNbsqQhzL5Vl2OHADe3ApIV49LWlfO1RWHd4fqoWyoXjErZp+JkN+LnvvGjUIKB1dZrBT4iwhUhXw0BVl/+Sg/BzHjf8t3H2E4ax4RRhkkrqlO//atepzeRTQN8djdTzrHzh3jjh+0SI7ZlDURkh51OwzNtmOfooh53gBLzXgInlsdHvJUiAkyq9FAV24p4pvNtP3xQEY1hdlIWPqqdwK3d24V9fd5DmOfAt+fZbDwXdDlvnQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=siemens.com; dmarc=pass action=none header.from=siemens.com; dkim=pass header.d=siemens.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=siemens.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YEQhLOeIypo42P3oaUSpQRds9S3El7Kt2NOyizd/Q8o=; b=OBGbOtsCUCEGer8Jvp1MZBFPL5y4mGx7ZoOhht2XoWVmzfCUcbaaHgmcJNvzjw/WtXb6VZiEM+gqBw8p3gIrCTpVEgCiVJ+YFnVa1dRcT79aVJcIOU5rKkXLEZLJlk8VP8xYWRtF5nru6j6Jadr7eJrsXqFmSbegFhaOqEgQfcm0XAD8kca8OyJ2gpNcVMVrS3aRD31t8+rjTdUfKQHxdH2y+7OYZ1NwSEMo0J4FyDXCekeI2ix38z26uEFpDdQyK4LYIZ0of9QHBsRj0k7te0xcRYrp7y5vO//nSxVWwiFlU79/TcGvATBqDFpQveJtnFSvq8J99nG490ZfAl61iQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=siemens.com; Received: from PA4PR10MB5780.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:102:269::8) by AM6PR10MB2248.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:20b:45::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5144.29; Thu, 14 Apr 2022 07:36:37 +0000 Received: from PA4PR10MB5780.EURPRD10.PROD.OUTLOOK.COM ([fe80::f4be:8b5d:4314:c2d8]) by PA4PR10MB5780.EURPRD10.PROD.OUTLOOK.COM ([fe80::f4be:8b5d:4314:c2d8%4]) with mapi id 15.20.5144.028; Thu, 14 Apr 2022 07:36:37 +0000 Date: Thu, 14 Apr 2022 09:36:34 +0200 From: Henning Schild To: Adriaan Schmidt Cc: Subject: Re: [PATCH 1/2] scripts: add isar-sstate Message-ID: <20220414093634.5959663c@md1za8fc.ad001.siemens.net> In-Reply-To: <20220413063534.799526-2-adriaan.schmidt@siemens.com> References: <20220413063534.799526-1-adriaan.schmidt@siemens.com> <20220413063534.799526-2-adriaan.schmidt@siemens.com> X-Mailer: Claws Mail 3.18.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-ClientProxiedBy: AS9PR06CA0565.eurprd06.prod.outlook.com (2603:10a6:20b:485::34) To PA4PR10MB5780.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:102:269::8) Return-Path: henning.schild@siemens.com MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 220ca22a-829c-410d-f560-08da1de9823f X-MS-TrafficTypeDiagnostic: AM6PR10MB2248:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: BiEGuXRQWdonlaluTrraSgTbmS85qPiMPYUVyk2x3ifb1AybDX/GKfkh4ReceeA4WmewwJkMq7Gaz6NzeBeS82wKrhIW8QrmYBhS8OFtAIfINeWSp+dv5ANrIr8U9R7zAaQZL/tSLGGd1TwLafQ2ptz9CsdX68JPmBDw18o08lEgL5+lpsk19IyPGnlzVCpQnY6rUxWEpgpDxXcObl98gI8/OC3kHpUCDYG3Kxp/f7/YxK+vlbkUG1BdxXM+8nK3HRFKQK/Aeg84d7jxUCN7x7+MxDPT/H8P8uMQRKZ+5n87xqiPXb5hksPpXfgLPdFMtzYwPgpjAQHW8VJkqOR6T8HBlty3QUc6+Bg2CeDei+BARb+PE0msfanSMM1+N2XCDqeIoyTTOGx80DhfaVxZkzKhTYL9lSlUvxYQhdoewOt4c9Nei6AJtNzic6Sa9qSGtLHfozNyAQ9phfLBloFyZZ3DRKH+Hoq8Ym8rkeIeOsd3B3i0RC9vmut8SRPYOdvrTTKmtrXcDWD3Qc9dCb1fR08akt/Y7oMT7Hl+n0HKRU/nEzCrS6Hy8Hz0FTtV06iEEoa87Dp5bZHWfTs+JCaUMXVUlriKzeDuAZKOj3WeCmmg4AYGYwQMGyErgbln1z06SQ6WsqcZDbvPGxyVR3n/8A== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PA4PR10MB5780.EURPRD10.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230001)(4636009)(366004)(83380400001)(66476007)(8936002)(66946007)(66556008)(966005)(6512007)(6666004)(498600001)(30864003)(6636002)(44832011)(86362001)(8676002)(4326008)(5660300002)(6862004)(2906002)(6486002)(38100700002)(6506007)(82960400001)(9686003)(1076003)(186003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 2 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?cftGLbNYSle9gpcGU5wLzaP5LsHXyXBJHmFtzEaQq5Ns9Q33SVgSAi7opxoS?= =?us-ascii?Q?jDr4Peem6ZlnZUMnfePPKGjSgeIqrAAce5gLCf6H2cApjVhr7Mn6+KfGtpBO?= =?us-ascii?Q?ypH3rKkLOjekefLpEQbaGSniwsK5//mF4t+/p0P9Tx1VTpA077cv9osMwZ07?= =?us-ascii?Q?RsdW1ls88X/uTb8nee5y4UDFR0EhQE+tSsntjaKHxXkUfXGnjb7SaR0VmzcD?= =?us-ascii?Q?IhbuENQs8d/BXc+O1in0DPIjdB0owd5pLY2H6+WHYoTKpGxsbHFGJSHE4lIH?= =?us-ascii?Q?JZo0ItdG4VYROUbW1nB8UDE1DCtFU6mhj9eBiyI0hFjo+qhwNh1A5kTJ/JJg?= =?us-ascii?Q?/XwKfp8zJjqkhcv29TL97+uJbgSjgkmxRuL3Tlm3Ssyj8nS6jxwonqt/8h2+?= =?us-ascii?Q?MDk6j18y+NzR1DLWsGKFS84gWsNSmw1mEOiYvOtL9wicW1lOhO4QgBUuTAdL?= =?us-ascii?Q?llrsdyT4dYAQyh7Ehr46kiE6pXBQ5QV3Y3yYp5KiKsGWebk6tCu3fHE4/Z1Z?= =?us-ascii?Q?1TgABHDdiF8hM5apsFC8BCvVyYh7EdxP8+C+bFXrV6/lhkB2GCDEpPujsgty?= =?us-ascii?Q?KXZ0xGhUYUkY0Ftk5OMbTXUszT+wegkl1vma7EiBmNzaele9JddYkaALtNjQ?= =?us-ascii?Q?ocygL2bY1+uOe6jWan7CeIm3FsPTmUNTmuh9u9JrlESS6MOQNxn+hzqI1MOK?= =?us-ascii?Q?U2neLXyqdK4rPSRrpStB7zgLVtvfTlmANLFfIVN4KarYiXtw8sT0Ko1IJomb?= =?us-ascii?Q?ivS7TQ7/DKfybkfzx1jxse9KjezRDotE3xPvffXkb2PujN9a7Q6vo6rCuJNL?= =?us-ascii?Q?o1ZEDulP0iPb9hcMrCLMkAKFwQGSinFmhjFlVGO8YV8tvQqajtCpBbv+FtxG?= =?us-ascii?Q?F244WRJw8zJeMNSe1NLA1lAjd6Iaf4mm8lRpfW3m8RmEzGFfFnPd7minSRqP?= =?us-ascii?Q?/xoERt8Nwe2pqpyf35Kj8uGXvQbFRPxARCW9ygowrg92943318FxhWW7NAYE?= =?us-ascii?Q?ju9Loo6JgyNvr7kmOkZDR/JvQbbjLAjjJOtY0LLXfO6OxCQFV7QcWJ20RSCy?= =?us-ascii?Q?5lhE7b1HzHyjLY2cVLNqX5s7qqxlZfAK4NqfnKRGhBRnWi6iALabL10MppcD?= =?us-ascii?Q?rkrVi1foxgRP5KWjOa5C4DODmUDBJNuFcxtQebe+VT1RYJx2pGDiCNN35vxS?= =?us-ascii?Q?AYDhX0sN+i7HE7/4BApVHAZ/N9d81um9cmQeGoEJ0VLquoA2Yxy87j+BWFqA?= =?us-ascii?Q?cX/54M/DD0cXPrsrb2lfjcuVEGHVludccdFsyKPL2ohHp7ZcGoAJcVadR6gr?= =?us-ascii?Q?mndWyy5gnLMJGmgzK07UMojBzAifamNDAGVdjeXQxGWdIE6Rx7FiscwEluum?= =?us-ascii?Q?0JaXvyVCdXGGAVhLN2nkJRPo+Af4YM5LnSgMRZ2cSEamB3p3gQCQNW6oWyoc?= =?us-ascii?Q?2H/U/vogQAxgYxOvWXEpzYQ2OtmM7wU8b8LbfZcyu1t/fzObKDNoaTOAjXAI?= =?us-ascii?Q?h7O6f16MI7vy52mxjLWGcCAVPTkzC0un0ZbBbI4NEQujye0mQRcleUeK4THm?= =?us-ascii?Q?kDAwwo3oZge474FAfPvdeZGtXhDxnioZ92I82hQMFfW6XlYTK3zj6KW8ZwxF?= =?us-ascii?Q?0gKY6DnDL58sRqBthng/5R1pTgY/iHE5hMexUnK66id+H13pxoB5zjIWvmCx?= =?us-ascii?Q?cDyQyPeYJc//HZsRyVOfyLb5/MegQx8Rpju4kPqTjcXMB7JzXs/fa9c5558v?= =?us-ascii?Q?6+5e3N0Kr+g0xa3owsLCeB2DBSZ5m8JrFW42oqgWV1PA7ok4eMQIswE4ZCd0?= X-MS-Exchange-AntiSpam-MessageData-1: IXvuwiyyJpzj3qTXrWd2jSlRbYm+2DGA2L8= X-OriginatorOrg: siemens.com X-MS-Exchange-CrossTenant-Network-Message-Id: 220ca22a-829c-410d-f560-08da1de9823f X-MS-Exchange-CrossTenant-AuthSource: PA4PR10MB5780.EURPRD10.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Apr 2022 07:36:37.8349 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 38ae3bcd-9579-4fd4-adda-b42e1495d55a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 86JptwMi6QNpBURERUiP8E2hZy8krg2Ig/1/fYpBwCDXan9rn11u4UNjydI8084KtUGy8ATV2k6WvnZPpTOIQVj6nZLXp/zl/ITbLUoqEoY= X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR10MB2248 X-TUID: lkBsrSCn8rjD Am Wed, 13 Apr 2022 08:35:33 +0200 schrieb Adriaan Schmidt : > This adds a maintenance helper script to work with remote/shared > sstate caches. Is that script in fact an isar-thing or rather a bitbake thing? To me it all sounds like the whole bitbake community could really use that. We could carry it in isar but also think bigger, hitting more people to help us maintain and improve it. regards, Henning > Signed-off-by: Adriaan Schmidt > --- > scripts/isar-sstate | 743 > ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 743 > insertions(+) create mode 100755 scripts/isar-sstate > > diff --git a/scripts/isar-sstate b/scripts/isar-sstate > new file mode 100755 > index 00000000..b1e2c1ec > --- /dev/null > +++ b/scripts/isar-sstate > @@ -0,0 +1,743 @@ > +#!/usr/bin/env python3 > +""" > +This software is part of Isar > +Copyright (c) Siemens AG, 2022 > + > +# isar-sstate: Helper for management of shared sstate caches > + > +Isar uses the sstate cache feature of bitbake to cache the output of > certain +build tasks, potentially speeding up builds significantly. > This script is +meant to help managing shared sstate caches, speeding > up builds using cache +artifacts created elsewhere. There are two > main ways of accessing a shared +sstate cache: > + - Point `SSTATE_DIR` to a persistent location that is used by > multiple > + builds. bitbake will read artifacts from there, and also > immediately > + store generated cache artifacts in this location. This speeds up > local > + builds, and if `SSTATE_DIR` is located on a shared filesystem, > it can > + also benefit others. > + - Point `SSTATE_DIR` to a local directory (e.g., simply use the > default > + value `${TOPDIR}/sstate-cache`), and additionally set > `SSTATE_MIRRORS` > + to a remote sstate cache. bitbake will use artifacts from both > locations, > + but will write newly created artifacts only to the local folder > + `SSTATE_DIR`. To share them, you need to explicitly upload them > to > + the shared location, which is what isar-sstate is for. > + > +isar-sstate implements four commands (upload, clean, info, analyze), > +and supports three remote backends (filesystem, http/webdav, AWS S3). > + > +## Commands > + > +### upload > + > +The `upload` command pushes the contents of a local sstate cache to > the +remote location, uploading all files that don't already exist on > the remote. + > +### clean > + > +The `clean` command deletes old artifacts from the remote cache. It > takes two +arguments, `--max-age` and `--max-sig-age`, each of which > must be a number, +followed by one of `w`, `d`, `h`, `m`, or `s` (for > weeks, days, hours, minutes, +seconds, respectively). > + > +`--max-age` specifies up to which age artifacts should be kept in > the cache. +Anything older will be removed. Note that this only > applies to the `.tgz` files +containing the actual cached items, not > the `.siginfo` files containing the +cache metadata (signatures and > hashes). +To permit analysis of caching details using the `analyze` > command, the siginfo +files can be kept longer, as indicated by > `--max-sig-age`. If not set explicitly, +this defaults to `max_age`, > and any explicitly given value can't be smaller +than `max_age`. > + > +### info > + > +The `info` command scans the remote cache and displays some basic > statistics. +The argument `--verbose` increases the amount of > information displayed. + > +### analyze > + > +The `analyze` command iterates over all artifacts in the local > sstate cache, +and compares them to the contents of the remote cache. > If an item is not +present in the remote cache, the signature of the > local item is compared +to all potential matches in the remote cache, > identified by matching +architecture, recipe (`PN`), and task. This > analysis has the same output +format as `bitbake-diffsigs`. > + > +## Backends > + > +### Filesystem backend > + > +This uses a filesystem location as the remote cache. In case you can > access +your remote cache this way, you could also have bitbake write > to the cache +directly, by setting `SSTATE_DIR`. However, using > `isar-sstate` gives +you a uniform interface, and lets you use the > same code/CI scripts across +heterogeneous setups. Also, it gives you > the `analyze` command. + > +### http backend > + > +A http server with webdav extension can be used as remote cache. > +Apache can easily be configured to function as a remote sstate > cache, e.g.: +``` > + > + Alias /sstate/ /path/to/sstate/location/ > + > + Dav on > + Options Indexes > + Require all granted > + > + > +``` > +In addition you need to load Apache's dav module: > +``` > +a2enmod dav > +``` > + > +To use the http backend, you need to install the Python webdavclient > library. +On Debian you would: > +``` > +apt-get install python3-webdavclient > +``` > + > +### S3 backend > + > +An AWS S3 bucket can be used as remote cache. You need to ensure > that AWS +credentials are present (e.g., in your AWS config file or > as environment +variables). > + > +To use the S3 backend you need to install the Python botocore > library. +On Debian you would: > +``` > +apt-get install python3-botocore > +``` > +""" > + > +import argparse > +from collections import namedtuple > +import datetime > +import os > +import re > +import shutil > +import sys > +from tempfile import NamedTemporaryFile > +import time > + > +sys.path.insert(0, > os.path.join(os.path.dirname(os.path.realpath(__file__)), '..', > 'bitbake', 'lib')) +analysis_supported = True +from bb.siggen import > compare_sigfiles + > +# runtime detection of supported targets > +webdav_supported = True > +try: > + import webdav3.client > + import webdav3.exceptions > +except ModuleNotFoundError: > + webdav_supported = False > + > +s3_supported = True > +try: > + import botocore.exceptions > + import botocore.session > +except ModuleNotFoundError: > + s3_supported = False > + > +SstateCacheEntry = namedtuple( > + 'SstateCacheEntry', 'hash path arch pn task suffix islink > age size'.split()) + > +# The filename of sstate items is defined in Isar: > +# SSTATE_PKGSPEC = > "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:" +# > "${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:" + > +# This regex extracts relevant fields: > +SstateRegex = re.compile(r'sstate:(?P[^:]*):[^:]*:[^:]*:[^:]*:' > + > r'(?P[^:]*):[^:]*:(?P[0-9a-f]*)_' > + r'(?P[^\.]*)\.(?P.*)') > + > + > +class SstateTargetBase(object): > + def __init__(self, path): > + """Constructor > + > + :param path: URI of the remote (without leading > 'protocol://') > + """ > + pass > + > + def __repr__(self): > + """Format remote for printing > + > + :returns: URI string, including 'protocol://' > + """ > + pass > + > + def exists(self, path=''): > + """Check if a remote path exists > + > + :param path: path (file or directory) to check > + :returns: True if path exists, False otherwise > + """ > + pass > + > + def create(self): > + """Try to create the remote > + > + :returns: True if remote could be created, False otherwise > + """ > + pass > + > + def mkdir(self, path): > + """Create a directory on the remote > + > + :param path: path to create > + :returns: True on success, False on failure > + """ > + pass > + > + def upload(self, path, filename): > + """Uploads a local file to the remote > + > + :param path: remote path to upload to > + :param filename: local file to upload > + """ > + pass > + > + def delete(self, path): > + """Delete remote file and remove potential empty directories > + > + :param path: remote file to delete > + """ > + pass > + > + def list_all(self): > + """List all sstate files in the remote > + > + :returns: list of SstateCacheEntry objects > + """ > + pass > + > + def download(self, path): > + """Prepare to temporarily access a remote file for reading > + > + This is meant to provide access to siginfo files during > analysis. Files > + must not be modified, and should be released using release() > once they > + are no longer used. > + > + :param path: remote path > + :returns: local path to file > + """ > + pass > + > + def release(self, download_path): > + """Release a temporary file > + > + :param doenload_path: local file > + """ > + pass > + > + > +class SstateFileTarget(SstateTargetBase): > + def __init__(self, path): > + if path.startswith('file://'): > + path = path[len('file://'):] > + self.path = path > + self.basepath = os.path.abspath(path) > + > + def __repr__(self): > + return f"file://{self.path}" > + > + def exists(self, path=''): > + return os.path.exists(os.path.join(self.basepath, path)) > + > + def create(self): > + return self.mkdir('') > + > + def mkdir(self, path): > + try: > + os.makedirs(os.path.join(self.basepath, path), > exist_ok=True) > + except OSError: > + return False > + return True > + > + def upload(self, path, filename): > + shutil.copy(filename, os.path.join(self.basepath, path)) > + > + def delete(self, path): > + try: > + os.remove(os.path.join(self.basepath, path)) > + except FileNotFoundError: > + pass > + dirs = path.split('/')[:-1] > + for d in [dirs[:i] for i in range(len(dirs), 0, -1)]: > + try: > + os.rmdir(os.path.join(self.basepath, '/'.join(d))) > + except FileNotFoundError: > + pass > + except OSError: # directory is not empty > + break > + > + def list_all(self): > + all_files = [] > + now = time.time() > + for subdir, dirs, files in os.walk(self.basepath): > + reldir = subdir[(len(self.basepath)+1):] > + for f in files: > + m = SstateRegex.match(f) > + if m is not None: > + islink = os.path.islink(os.path.join(subdir, f)) > + age = int(now - > os.path.getmtime(os.path.join(subdir, f))) > + all_files.append(SstateCacheEntry( > + path=os.path.join(reldir, f), > + size=os.path.getsize(os.path.join(subdir, > f)), > + islink=islink, > + age=age, > + **(m.groupdict()))) > + return all_files > + > + def download(self, path): > + # we don't actually download, but instead just pass the > local path > + if not self.exists(path): > + return None > + return os.path.join(self.basepath, path) > + > + def release(self, download_path): > + # as we didn't download, there is nothing to clean up > + pass > + > + > +class SstateDavTarget(SstateTargetBase): > + def __init__(self, url): > + if not webdav_supported: > + print("ERROR: No webdav support. Please install the > webdav3 Python module.") > + print("INFO: on Debian: 'apt-get install > python3-webdavclient'") > + sys.exit(1) > + m = re.match('^([^:]+://[^/]+)/(.*)', url) > + if not m: > + print(f"Cannot parse target path: {url}") > + sys.exit(1) > + self.host = m.group(1) > + self.basepath = m.group(2) > + if not self.basepath.endswith('/'): > + self.basepath += '/' > + self.dav = webdav3.client.Client({'webdav_hostname': > self.host}) > + self.tmpfiles = [] > + > + def __repr__(self): > + return f"{self.host}/{self.basepath}" > + > + def exists(self, path=''): > + return self.dav.check(self.basepath + path) > + > + def create(self): > + return self.mkdir('') > + > + def mkdir(self, path): > + dirs = (self.basepath + path).split('/') > + > + for i in range(len(dirs)): > + d = '/'.join(dirs[:(i+1)]) + '/' > + if not self.dav.check(d): > + if not self.dav.mkdir(d): > + return False > + return True > + > + def upload(self, path, filename): > + return self.dav.upload_sync(remote_path=self.basepath + > path, local_path=filename) + > + def delete(self, path): > + self.dav.clean(self.basepath + path) > + dirs = path.split('/')[1:-1] > + for d in [dirs[:i] for i in range(len(dirs), 0, -1)]: > + items = self.dav.list(self.basepath + '/'.join(d), > get_info=True) > + if len(items) > 0: > + # collection is not empty > + break > + self.dav.clean(self.basepath + '/'.join(d)) > + > + def list_all(self): > + now = time.time() > + > + def recurse_dir(path): > + files = [] > + for item in self.dav.list(path, get_info=True): > + if item['isdir'] and not item['path'] == path: > + files.extend(recurse_dir(item['path'])) > + elif not item['isdir']: > + m = SstateRegex.match(item['path'][len(path):]) > + if m is not None: > + modified = time.mktime( > + datetime.datetime.strptime( > + item['created'], > + '%Y-%m-%dT%H:%M:%SZ').timetuple()) > + age = int(now - modified) > + files.append(SstateCacheEntry( > + path=item['path'][len(self.basepath):], > + size=int(item['size']), > + islink=False, > + age=age, > + **(m.groupdict()))) > + return files > + return recurse_dir(self.basepath) > + > + def download(self, path): > + # download to a temporary file > + tmp = NamedTemporaryFile(prefix='isar-sstate-', delete=False) > + tmp.close() > + try: > + self.dav.download_sync(remote_path=self.basepath + path, > local_path=tmp.name) > + except webdav3.exceptions.RemoteResourceNotFound: > + return None > + self.tmpfiles.append(tmp.name) > + return tmp.name > + > + def release(self, download_path): > + # remove the temporary download > + if download_path is not None and download_path in > self.tmpfiles: > + os.remove(download_path) > + self.tmpfiles = [f for f in self.tmpfiles if not f == > download_path] + > + > +class SstateS3Target(SstateTargetBase): > + def __init__(self, path): > + if not s3_supported: > + print("ERROR: No S3 support. Please install the botocore > Python module.") > + print("INFO: on Debian: 'apt-get install > python3-botocore'") > + sys.exit(1) > + session = botocore.session.get_session() > + self.s3 = session.create_client('s3') > + if path.startswith('s3://'): > + path = path[len('s3://'):] > + m = re.match('^([^/]+)(?:/(.+)?)?$', path) > + self.bucket = m.group(1) > + if m.group(2): > + self.basepath = m.group(2) > + if not self.basepath.endswith('/'): > + self.basepath += '/' > + else: > + self.basepath = '' > + self.tmpfiles = [] > + > + def __repr__(self): > + return f"s3://{self.bucket}/{self.basepath}" > + > + def exists(self, path=''): > + if path == '': > + # check if the bucket exists > + try: > + self.s3.head_bucket(Bucket=self.bucket) > + except botocore.exceptions.ClientError as e: > + print(e) > + print(e.response['Error']['Message']) > + return False > + return True > + try: > + self.s3.head_object(Bucket=self.bucket, > Key=self.basepath + path) > + except botocore.exceptions.ClientError as e: > + if e.response['ResponseMetadata']['HTTPStatusCode'] != > 404: > + print(e) > + print(e.response['Error']['Message']) > + return False > + return True > + > + def create(self): > + return self.exists() > + > + def mkdir(self, path): > + # in S3, folders are implicit and don't need to be created > + return True > + > + def upload(self, path, filename): > + try: > + self.s3.put_object(Body=open(filename, 'rb'), > Bucket=self.bucket, Key=self.basepath + path) > + except botocore.exceptions.ClientError as e: > + print(e) > + print(e.response['Error']['Message']) > + > + def delete(self, path): > + try: > + self.s3.delete_object(Bucket=self.bucket, > Key=self.basepath + path) > + except botocore.exceptions.ClientError as e: > + print(e) > + print(e.response['Error']['Message']) > + > + def list_all(self): > + now = time.time() > + > + def recurse_dir(path): > + files = [] > + try: > + result = self.s3.list_objects(Bucket=self.bucket, > Prefix=path, Delimiter='/') > + except botocore.exceptions.ClientError as e: > + print(e) > + print(e.response['Error']['Message']) > + return [] > + for f in result.get('Contents', []): > + m = SstateRegex.match(f['Key'][len(path):]) > + if m is not None: > + modified = > time.mktime(f['LastModified'].timetuple()) > + age = int(now - modified) > + files.append(SstateCacheEntry( > + path=f['Key'][len(self.basepath):], > + size=f['Size'], > + islink=False, > + age=age, > + **(m.groupdict()))) > + for p in result.get('CommonPrefixes', []): > + files.extend(recurse_dir(p['Prefix'])) > + return files > + return recurse_dir(self.basepath) > + > + def download(self, path): > + # download to a temporary file > + tmp = NamedTemporaryFile(prefix='isar-sstate-', delete=False) > + try: > + result = self.s3.get_object(Bucket=self.bucket, > Key=self.basepath + path) > + except botocore.exceptions.ClientError: > + return None > + tmp.write(result['Body'].read()) > + tmp.close() > + self.tmpfiles.append(tmp.name) > + return tmp.name > + > + def release(self, download_path): > + # remove the temporary download > + if download_path is not None and download_path in > self.tmpfiles: > + os.remove(download_path) > + self.tmpfiles = [f for f in self.tmpfiles if not f == > download_path] + > + > +def arguments(): > + parser = argparse.ArgumentParser() > + parser.add_argument( > + 'command', type=str, metavar='command', > + choices='info upload clean analyze'.split(), > + help="command to execute (info, upload, clean, analyze)") > + parser.add_argument( > + 'source', type=str, nargs='?', > + help="local sstate dir (for uploads or analysis)") > + parser.add_argument( > + 'target', type=str, > + help="remote sstate location (a file://, http://, or s3:// > URI)") > + parser.add_argument( > + '-v', '--verbose', default=False, action='store_true') > + parser.add_argument( > + '--max-age', type=str, default='1d', > + help="clean tgz files older than MAX_AGE (a number followed > by w|d|h|m|s)") > + parser.add_argument( > + '--max-sig-age', type=str, default=None, > + help="clean siginfo files older than MAX_SIG_AGE (defaults > to MAX_AGE)") + > + args = parser.parse_args() > + if args.command in 'upload analyze'.split() and args.source is > None: > + print(f"ERROR: '{args.command}' needs a source and target") > + sys.exit(1) > + elif args.command in 'info clean'.split() and args.source is not > None: > + print(f"ERROR: '{args.command}' must not have a source (only > a target)") > + sys.exit(1) > + return args > + > + > +def sstate_upload(source, target, verbose, **kwargs): > + if not os.path.isdir(source): > + print(f"WARNING: source {source} does not exist. Not > uploading.") > + return 0 > + > + if not target.exists() and not target.create(): > + print(f"ERROR: target {target} does not exist and could not > be created.") > + return -1 > + > + print(f"INFO: uploading {source} to {target}") > + os.chdir(source) > + upload, exists = [], [] > + for subdir, dirs, files in os.walk('.'): > + target_dirs = subdir.split('/')[1:] > + for f in files: > + file_path = (('/'.join(target_dirs) + '/') if > len(target_dirs) > 0 else '') + f > + if target.exists(file_path): > + if verbose: > + print(f"[EXISTS] {file_path}") > + exists.append(file_path) > + else: > + upload.append((file_path, target_dirs)) > + upload_gb = (sum([os.path.getsize(f[0]) for f in upload]) / > 1024.0 / 1024.0 / 1024.0) > + print(f"INFO: uploading {len(upload)} files ({upload_gb:.02f} > GB)") > + print(f"INFO: {len(exists)} files already present on target") > + for file_path, target_dirs in upload: > + if verbose: > + print(f"[UPLOAD] {file_path}") > + target.mkdir('/'.join(target_dirs)) > + target.upload(file_path, file_path) > + return 0 > + > + > +def sstate_clean(target, max_age, max_sig_age, verbose, **kwargs): > + def convert_to_seconds(x): > + seconds_per_unit = {'s': 1, 'm': 60, 'h': 3600, 'd': 86400, > 'w': 604800} > + m = re.match(r'^(\d+)(w|d|h|m|s)?', x) > + if m is None: > + print(f"ERROR: cannot parse MAX_AGE '{max_age}', needs > to be a number followed by w|d|h|m|s") > + sys.exit(-1) > + if (unit := m.group(2)) is None: > + print("WARNING: MAX_AGE without unit, assuming 'days'") > + unit = 'd' > + return int(m.group(1)) * seconds_per_unit[unit] > + > + max_age_seconds = convert_to_seconds(max_age) > + if max_sig_age is None: > + max_sig_age_seconds = max_age_seconds > + else: > + max_sig_age_seconds = max(max_age_seconds, > convert_to_seconds(max_sig_age)) + > + if not target.exists(): > + print(f"INFO: cannot access target {target}. Nothing to > clean.") > + return 0 > + > + print(f"INFO: scanning {target}") > + all_files = target.list_all() > + links = [f for f in all_files if f.islink] > + if links: > + print(f"NOTE: we have links: {links}") > + tgz_files = [f for f in all_files if f.suffix == 'tgz'] > + siginfo_files = [f for f in all_files if f.suffix == > 'tgz.siginfo'] > + del_tgz_files = [f for f in tgz_files if f.age >= > max_age_seconds] > + del_tgz_hashes = [f.hash for f in del_tgz_files] > + del_siginfo_files = [f for f in siginfo_files if > + f.age >= max_sig_age_seconds or f.hash in > del_tgz_hashes] > + print(f"INFO: found {len(tgz_files)} tgz files, > {len(del_tgz_files)} of which are older than {max_age}") > + print(f"INFO: found {len(siginfo_files)} siginfo files, > {len(del_siginfo_files)} of which " > + f"correspond to tgz files or are older than {max_sig_age}") > + > + for f in del_tgz_files + del_siginfo_files: > + if verbose: > + print(f"[DELETE] {f.path}") > + target.delete(f.path) > + freed_gb = sum([x.size for x in del_tgz_files + > del_siginfo_files]) / 1024.0 / 1024.0 / 1024.0 > + print(f"INFO: freed {freed_gb:.02f} GB") > + return 0 > + > + > +def sstate_info(target, verbose, **kwargs): > + if not target.exists(): > + print(f"INFO: cannot access target {target}. No info to > show.") > + return 0 > + > + print(f"INFO: scanning {target}") > + all_files = target.list_all() > + size_gb = sum([x.size for x in all_files]) / 1024.0 / 1024.0 / > 1024.0 > + print(f"INFO: found {len(all_files)} files ({size_gb:0.2f} GB)") > + > + if not verbose: > + return 0 > + > + archs = list(set([f.arch for f in all_files])) > + print(f"INFO: found the following archs: {archs}") > + > + key_task = {'deb': 'dpkg_build', > + 'rootfs': 'rootfs_install', > + 'bootstrap': 'bootstrap'} > + recipes = {k: [] for k in key_task.keys()} > + others = [] > + for pn in set([f.pn for f in all_files]): > + tasks = set([f.task for f in all_files if f.pn == pn]) > + ks = [k for k, v in key_task.items() if v in tasks] > + if len(ks) == 1: > + recipes[ks[0]].append(pn) > + elif len(ks) == 0: > + others.append(pn) > + else: > + print(f"WARNING: {pn} could be any of {ks}") > + for k, entries in recipes.items(): > + print(f"Cache hits for {k}:") > + for pn in entries: > + hits = [f for f in all_files if f.pn == pn and f.task == > key_task[k] and f.suffix == 'tgz'] > + print(f" - {pn}: {len(hits)} hits") > + print("Other cache hits:") > + for pn in others: > + print(f" - {pn}") > + return 0 > + > + > +def sstate_analyze(source, target, **kwargs): > + if not os.path.isdir(source): > + print(f"ERROR: source {source} does not exist. Nothing to > analyze.") > + return -1 > + if not target.exists(): > + print(f"ERROR: target {target} does not exist. Nothing to > analyze.") > + return -1 > + > + source = SstateFileTarget(source) > + local_sigs = {s.hash: s for s in source.list_all() if > s.suffix.endswith('.siginfo')} > + remote_sigs = {s.hash: s for s in target.list_all() if > s.suffix.endswith('.siginfo')} + > + key_tasks = 'dpkg_build rootfs_install bootstrap'.split() > + > + check = [k for k, v in local_sigs.items() if v.task in key_tasks] > + for local_hash in check: > + s = local_sigs[local_hash] > + print(f"\033[1;33m==== checking local item > {s.arch}:{s.pn}:{s.task} ({s.hash[:8]}) ====\033[0m") > + if local_hash in remote_sigs: > + print(" -> found hit in remote cache") > + continue > + remote_matches = [k for k, v in remote_sigs.items() if > s.arch == v.arch and s.pn == v.pn and s.task == v.task] > + if len(remote_matches) == 0: > + print(" -> found no hit, and no potential remote > matches") > + else: > + print(f" -> found no hit, but {len(remote_matches)} > potential remote matches") > + for r in remote_matches: > + t = remote_sigs[r] > + print(f"\033[0;33m**** comparing to {r[:8]} ****\033[0m") > + > + def recursecb(key, remote_hash, local_hash): > + recout = [] > + if remote_hash in remote_sigs.keys(): > + remote_file = > target.download(remote_sigs[remote_hash].path) > + elif remote_hash in local_sigs.keys(): > + recout.append(f"found remote hash in local > signatures ({key})!?! (please implement that case!)") > + return recout > + else: > + recout.append(f"could not find remote signature > {remote_hash[:8]} for job {key}") > + return recout > + if local_hash in local_sigs.keys(): > + local_file = > source.download(local_sigs[local_hash].path) > + elif local_hash in remote_sigs.keys(): > + local_file = > target.download(remote_sigs[local_hash].path) > + else: > + recout.append(f"could not find local signature > {local_hash[:8]} for job {key}") > + return recout > + if local_file is None or remote_file is None: > + out = "Aborting analysis because siginfo files > disappered unexpectedly" > + else: > + out = compare_sigfiles(remote_file, local_file, > recursecb, color=True) > + if local_hash in local_sigs.keys(): > + source.release(local_file) > + else: > + target.release(local_file) > + target.release(remote_file) > + for change in out: > + recout.extend([' ' + line for line in > change.splitlines()]) > + return recout > + > + local_file = source.download(s.path) > + remote_file = target.download(t.path) > + out = compare_sigfiles(remote_file, local_file, > recursecb, color=True) > + source.release(local_file) > + target.release(remote_file) > + # shorten hashes from 64 to 8 characters for better > readability > + out = [re.sub(r'([0-9a-f]{8})[0-9a-f]{56}', r'\1', line) > for line in out] > + print('\n'.join(out)) > + > + > +def main(): > + args = arguments() > + > + if args.target.startswith('http://'): > + target = SstateDavTarget(args.target) > + elif args.target.startswith('s3://'): > + target = SstateS3Target(args.target) > + elif args.target.startswith('file://'): > + target = SstateFileTarget(args.target) > + else: # no protocol given, assume file:// > + target = SstateFileTarget(args.target) > + > + args.target = target > + return globals()[f'sstate_{args.command}'](**vars(args)) > + > + > +if __name__ == '__main__': > + sys.exit(main())