From: "'MOESSBAUER, Felix' via isar-users" <isar-users@googlegroups.com>
To: "Schaffner, Tobias" <tobias.schaffner@siemens.com>,
"isar-users@googlegroups.com" <isar-users@googlegroups.com>
Cc: "Schmidt, Adriaan" <adriaan.schmidt@siemens.com>
Subject: Re: [PATCH] isar-sstate: reupload utilized files older than max-age
Date: Thu, 25 Jul 2024 08:20:54 +0000 [thread overview]
Message-ID: <5d7ca4b5bb51f618506e8a79dee3cabfe0b660fe.camel@siemens.com> (raw)
In-Reply-To: <4902c475-e195-45ef-9d20-05496eebb33d@siemens.com>
On Thu, 2024-07-25 at 09:18 +0200, Tobias Schaffner wrote:
> On 24.07.24 14:38, Moessbauer, Felix (T CED OES-DE) wrote:
> > On Tue, 2024-07-23 at 14:27 +0200, Tobias Schaffner wrote:
> > > Currently, the Isar-sstate script deletes all files older than
> > > max-
> > > age
> > > during a clean call, regardless of whether they are still in use.
> > > Given
> > > that S3 buckets do not offer a means to update timestamps other
> > > than
> > > through a reupload, this commit introduces a change to reupload
> > > all
> > > files utilized by the current build if they are older than max-
> > > age
> > > during an isar-sstate upload call.
> >
> > Hi, I'm wondering if it is sufficient to just re-upload the
> > signature,
> > but not the file itself. Otherwise we "punish" good caching by a
> > lot of
> > traffic between the build servers and S3.
>
> It depends on your CI structure but this does not necessarily
> introduce
> more traffic. The Idea is to do a re-upload when the file would be
> cleaned.
Ok, got it. So the new logic behaves similar to a LRU cache.
>
> A the moment a common pattern for isar-sstate usage is:
> clean -> download needed artifacts that are available -> rebuild
> artifact x that is still needed but was cleaned -> upload x
>
> This change allows you to:
> download needed artifacts that are available -> reupload x that would
> be
> cleaned -> clean
This sounds reasonable. We can already apply this patch to a couple of
our CI systems to see how it behaves in practice.
Felix
>
> In both cases x will have to be uploaded.
>
> Best,
> Tobias
>
> > CC'ing Adriaan.
> >
> > Felix
> >
> > >
> > > Signed-off-by: Tobias Schaffner <tobias.schaffner@siemens.com>
> > > ---
> > > scripts/isar-sstate | 57 ++++++++++++++++++++++++++++++--------
> > > -----
> > > --
> > > 1 file changed, 38 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/scripts/isar-sstate b/scripts/isar-sstate
> > > index 4ea38bc8..a60f50dd 100755
> > > --- a/scripts/isar-sstate
> > > +++ b/scripts/isar-sstate
> > > @@ -32,6 +32,11 @@ and supports three remote backends
> > > (filesystem,
> > > http/webdav, AWS S3).
> > > The `upload` command pushes the contents of a local sstate
> > > cache to
> > > the
> > > remote location, uploading all files that don't already exist
> > > on the
> > > remote.
> > >
> > > +`--max-age` specifies after which time artifacts in the cache
> > > should
> > > be
> > > +refreshed. Files older than this age will be reuploaded to
> > > update
> > > its timestamp.
> > > +This value should be chosen to be smaller than the clean max-age
> > > to
> > > ensure that
> > > +the artifacts are refreshed before they are cleaned.
> > > +
> > > ### clean
> > >
> > > The `clean` command deletes old artifacts from the remote
> > > cache. It
> > > takes two
> > > @@ -179,6 +184,17 @@ StampsRegex = re.compile(
> > >
> > > r"(.*/)?(?P<arch>[^/]+)/(?P<pn>[^/]+)/([^/]+)\.do_(?P<task>[^/]+)
> > > \.(?
> > > P<suffix>sigdata)\.(?P<hash>[0-9a-f]{64})"
> > > )
> > >
> > > +def convert_duration_string_to_seconds(x):
> > > + seconds_per_unit = {'s': 1, 'm': 60, 'h': 3600, 'd': 86400,
> > > 'w':
> > > 604800}
> > > + m = re.match(r'^(\d+)(w|d|h|m|s)?', x)
> > > + if m is None:
> > > + return None
> > > + unit = m.group(2)
> > > + if unit is None:
> > > + print("WARNING: MAX_AGE without unit, assuming 'days'")
> > > + unit = 'd'
> > > + return int(m.group(1)) * seconds_per_unit[unit]
> > > +
> > > class SstateTargetBase(object):
> > > def __init__(self, path, cached=False):
> > > """Constructor
> > > @@ -598,7 +614,7 @@ def arguments():
> > > '-v', '--verbose', default=False, action='store_true')
> > > parser.add_argument(
> > > '--max-age', type=str, default='1d',
> > > - help="clean: remove archive files older than MAX_AGE (a
> > > number followed by w|d|h|m|s)")
> > > + help="clean/upload: remove/reupload archive files older
> > > than
> > > MAX_AGE (a number followed by w|d|h|m|s)")
> > > parser.add_argument(
> > > '--max-sig-age', type=str, default=None,
> > > help="clean: remove siginfo files older than
> > > MAX_SIG_AGE
> > > (defaults to MAX_AGE)")
> > > @@ -632,7 +648,7 @@ def arguments():
> > > return args
> > >
> > >
> > > -def sstate_upload(source, target, verbose, **kwargs):
> > > +def sstate_upload(source, target, verbose, max_age="1d",
> > > **kwargs):
> > > if not os.path.isdir(source):
> > > print(f"WARNING: source {source} does not exist. Not
> > > uploading.")
> > > return 0
> > > @@ -640,23 +656,37 @@ def sstate_upload(source, target, verbose,
> > > **kwargs):
> > > print(f"WARNING: target {target} does not exist and
> > > could
> > > not be created. Not uploading.")
> > > return 0
> > >
> > > + print(f"INFO: scanning {target}")
> > > + all_files = target.list_all()
> > > +
> > > + def target_file_present(file_path):
> > > + for file in all_files:
> > > + if file.path == file_path:
> > > + return file
> > > +
> > > print(f"INFO: uploading {source} to {target}")
> > > os.chdir(source)
> > > - upload, exists = [], []
> > > + upload, exists, update = [], [], []
> > > for subdir, dirs, files in os.walk('.'):
> > > target_dirs = subdir.split('/')[1:]
> > > for f in files:
> > > file_path = (('/'.join(target_dirs) + '/') if
> > > len(target_dirs) > 0 else '') + f
> > > - if target.exists(file_path):
> > > + target_file = target_file_present(file_path)
> > > + if target_file:
> > > if verbose:
> > > print(f"[EXISTS] {file_path}")
> > > exists.append(file_path)
> > > + if target_file.age >
> > > convert_duration_string_to_seconds(max_age):
> > > + update.append((file_path, target_dirs))
> > > + if verbose:
> > > + print(f"[UPDATE] {file_path}")
> > > else:
> > > upload.append((file_path, target_dirs))
> > > - upload_gb = (sum([os.path.getsize(f[0]) for f in upload]) /
> > > 1024.0 / 1024.0 / 1024.0)
> > > + upload_gb = (sum([os.path.getsize(f[0]) for f in (upload +
> > > update)]) / 1024.0 / 1024.0 / 1024.0)
> > > print(f"INFO: uploading {len(upload)} files
> > > ({upload_gb:.02f}
> > > GB)")
> > > print(f"INFO: {len(exists)} files already present on
> > > target")
> > > - for file_path, target_dirs in upload:
> > > + print(f"INFO: {len(update)} files will be refreshed")
> > > + for file_path, target_dirs in upload + update:
> > > if verbose:
> > > print(f"[UPLOAD] {file_path}")
> > > target.mkdir('/'.join(target_dirs))
> > > @@ -665,24 +695,13 @@ def sstate_upload(source, target, verbose,
> > > **kwargs):
> > >
> > >
> > > def sstate_clean(target, max_age, max_sig_age, verbose,
> > > **kwargs):
> > > - def convert_to_seconds(x):
> > > - seconds_per_unit = {'s': 1, 'm': 60, 'h': 3600, 'd':
> > > 86400,
> > > 'w': 604800}
> > > - m = re.match(r'^(\d+)(w|d|h|m|s)?', x)
> > > - if m is None:
> > > - return None
> > > - unit = m.group(2)
> > > - if unit is None:
> > > - print("WARNING: MAX_AGE without unit, assuming
> > > 'days'")
> > > - unit = 'd'
> > > - return int(m.group(1)) * seconds_per_unit[unit]
> > > -
> > > - max_age_seconds = convert_to_seconds(max_age)
> > > + max_age_seconds =
> > > convert_duration_string_to_seconds(max_age)
> > > if max_age_seconds is None:
> > > print(f"ERROR: cannot parse MAX_AGE '{max_age}', needs
> > > to be
> > > a number followed by w|d|h|m|s")
> > > return 1
> > > if max_sig_age is None:
> > > max_sig_age = max_age
> > > - max_sig_age_seconds = max(max_age_seconds,
> > > convert_to_seconds(max_sig_age))
> > > + max_sig_age_seconds = max(max_age_seconds,
> > > convert_duration_string_to_seconds(max_sig_age))
> > >
> > > if not target.exists():
> > > print(f"WARNING: cannot access target {target}. Nothing
> > > to
> > > clean.")
> >
--
Siemens AG, Technology
Linux Expert Center
--
You received this message because you are subscribed to the Google Groups "isar-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isar-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/isar-users/5d7ca4b5bb51f618506e8a79dee3cabfe0b660fe.camel%40siemens.com.
next prev parent reply other threads:[~2024-07-25 8:21 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-23 12:27 'Tobias Schaffner' via isar-users
2024-07-24 12:38 ` 'MOESSBAUER, Felix' via isar-users
2024-07-25 7:18 ` 'Tobias Schaffner' via isar-users
2024-07-25 8:20 ` 'MOESSBAUER, Felix' via isar-users [this message]
2024-07-25 9:34 ` 'Schmidt, Adriaan' via isar-users
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5d7ca4b5bb51f618506e8a79dee3cabfe0b660fe.camel@siemens.com \
--to=isar-users@googlegroups.com \
--cc=adriaan.schmidt@siemens.com \
--cc=felix.moessbauer@siemens.com \
--cc=tobias.schaffner@siemens.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox