public inbox for isar-users@googlegroups.com
 help / color / mirror / Atom feed
From: "'MOESSBAUER, Felix' via isar-users" <isar-users@googlegroups.com>
To: "Heinisch, Alexander" <alexander.heinisch@siemens.com>,
	"ubely@ilbers.de" <ubely@ilbers.de>,
	"isar-users@googlegroups.com" <isar-users@googlegroups.com>,
	"Kiszka, Jan" <jan.kiszka@siemens.com>
Subject: Re: [PATCH 0/3] Added support for apt caching
Date: Thu, 31 Oct 2024 16:26:49 +0000	[thread overview]
Message-ID: <37f9067a2c3372a3d8c7a1402b9739869677bec9.camel@siemens.com> (raw)
In-Reply-To: <AM7PR10MB33207841E236025B621B725486552@AM7PR10MB3320.EURPRD10.PROD.OUTLOOK.COM>

On Thu, 2024-10-31 at 15:40 +0000, Heinisch, Alexander (FT RPD CED SES-
AT) wrote:
> > Hi, this series is much needed to work with the still unreliable
> > snapshot mirrors.
> > 
> > @Alexander: Do you plan to send a v2?
> > 
> > At the same time I'm working on adding internal apt-cacher-ng
> > support to kas to let the build pass the initial bootstrapping.
> > 
> > Best regards,
> > Felix
> 
> Hi Felix
> 
> Thank you for coming back.
> 
> Even when using apt-cacher-ng index files oftentimes got updated from
> snapshot.debian.org which caused problems when our company was on a
> blacklist for some time again.

I know, but this is partially also due to bugs in the apt-cacher-ng
implementation. Today it completely broke after an upstream change on
snapshot.d.o requiring a backport of the fix in [1]. I just sent an
email to the snapshot ML requesting the backport.

[1] https://bugs-devel.debian.org/cgi-bin/bugreport.cgi?bug=1074404

> 
> Unfortunately, I didn't find the time to analyze why that was the
> case.
> I did a tcpdump during one of our builds, but didn't analyze it for 2
> weeks or so :-(
> 
> But I suspect either apt client sends a reload request or the expiry
> date 
> returned from upstream is to limited.

It could also simply due to incorrect parsing of the expiry dates in
apt-cacher-ng. Recently there were a lot of fixes regarding time
parsing. Tricky to debug, though...

> While this could be relevant when fetching packages from "main"
> mirrors, 
> it should not have much impact on snapshot mirrors.
> 
> To mitigate that issue, since then we switched to squid as a proxy
> for snapshot.debian.org
> Squid has an offline mode, which says, no matter what happens, cach
> entries once
> seen are never updated upstream. As stated above, while this could
> have drastic
> impacts when using main mirrors, it shouldn't cause issues on
> snapshots, by definition.
> 
> Thus, I dropped apt-cacher-ng in our project in favour of squid.
> I also prepared documentation for such, but during preparing the
> patch, I 
> was not sure if that is worth a separate doc/ file or if we should
> merge that with
> doc/offline.md. I was struggling with that decision since it does not
> really 
> solve an offline case, as it only caches packages already seen once,
> and further, 
> only solves the offline case for apt and not for other sources like
> git, ...

Actually I'm more interested in having stable builds against
snapshot.d.o, not so much in 100% offline builds. The situation
upstream also got a bit better by rate-limiting on HTTP basis instead
of TCP basis, so clients (including apt-cacher-ng and squid) should be
able to correctly backoff. But I also did not check if the rate-
limiting is implemented correctly, so that the client knows when to
retry...

Anyways, we have a dilemma here: We need a stable baseline to build
against (both due to product requirements, as well as for the SState
cache). But currently it is REALLY hard to get this working in CI
builds.

> 
> What is your opinion?

Probably we need both, until it is not clear which solution is long-
term stable.

Felix

> 
> BR Alexander
> 
> PS: Appended the patch, I was referring to:
> 
> From cf64db474c2f2477633bfe3fd111156d2ac7495a Mon Sep 17 00:00:00
> 2001
> From: Alexander Heinisch <alexander.heinisch@siemens.com>
> Date: Thu, 24 Oct 2024 20:06:23 +0200
> Subject: [PATCH] doc: Added setup guide for squid as an caching proxy
> for apt
>  (snapshot) mirrors.
> 
> Signed-off-by: Alexander Heinisch <alexander.heinisch@siemens.com>
> ---
>  doc/apt-caching-proxy.md | 142
> +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 142 insertions(+)
>  create mode 100644 doc/apt-caching-proxy.md
> 
> diff --git a/doc/apt-caching-proxy.md b/doc/apt-caching-proxy.md
> new file mode 100644
> index 00000000..2a23a313
> --- /dev/null
> +++ b/doc/apt-caching-proxy.md
> @@ -0,0 +1,142 @@
> +# Setup Squid as APT Caching Proxy
> +
> +Limited download bandwitdth oftentimes is an issue, and increases
> the build times drastically. Further, large corporate networks could
> get rate limited by debian mirrors, as many people / pipelines / aso.
> fetch huge amounts of packets from there.
> +
> +In such cases a proxy caching the packages is quite useful as it
> reduces download times and reduces pressure on debian mirrors.
> +
> +## Install Squid Proxy
> +```
> +apt install squid
> +```
> +
> +## Configure Proxy for Caching (with APT in mind)
> +
> +1. /etc/squid/squid.conf
> +This file contains the main configuration for `squid`.
> +We configure it to listen to port `4242` and cache all requests from
> sites listed in `/etc/squid/mirror-dstdomain.acl`. Further, to
> enable, offline usecases (or usecases where your ip got temporarily
> blacklisted by `snapshot.debian.org` or similar) we set `offline_mode
> on`
> +to not fetch already cached packages from upstream.
> +
> +> Note: While `offline_mode on` is totally fine for
> `snapshot.debian.org` when using a timestamp to fix your package
> archive version, this could cause unintended behaviour (most probably
> outdated packages) when used against a non archive mirror.
> +
> +> Hint: If you are planning to work against non archive mirrors, and
> you are not sure, it's recommended to set `offline_mode off` and
> probably tweak cache behaviour with a `refresh_pattern`.
> +
> +### /etc/squid/squid.conf:
> +```
> +# File: /etc/squid/squid.conf
> +
> +# default to a different port than stock squid
> +http_port 4242
> +
> +# user visible name
> +visible_hostname squid-apt-caching-proxy
> +
> +# do not fetch already cached packages from upstream
> +offline_mode on
> +
> +# we need a big cache, some debs are huge
> +maximum_object_size 512 MB
> +
> +# increase available disk space for cache dir to 40G
> +cache_dir aufs /var/cache/squid 40000 16 256
> +
> +# logs
> +access_log /var/log/squid/access.log
> +cache_log /var/log/squid/cache.log
> +cache_store_log /var/log/squid/store.log
> +
> +# tweaks to speed things up
> +cache_mem 256 MB
> +maximum_object_size_in_memory 10240 KB
> +
> +# only allow ports we trust
> +acl Safe_ports port 80
> +acl Safe_ports port 443
> +
> +http_access deny !Safe_ports
> +
> +# Deny access to blacklisted sites
> +acl blockedpkgs urlpath_regex "/etc/squid/pkg-blacklist-regexp.acl"
> +http_access deny blockedpkgs
> +
> +# List of domains to cache
> +acl to_archive_mirrors dstdomain "/etc/squid/mirror-dstdomain.acl"
> +# don't cache domains not listed in the mirrors file
> +cache deny !to_archive_mirrors
> +
> +# Allow access to the proxy only from networks listed in allowed-
> networks-src.acl
> +acl allowed_networks src "/etc/squid/allowed-networks-src.acl"
> +http_access allow allowed_networks
> +
> +# And finally deny all other access to this proxy
> +http_access deny all
> +```
> +
> +### /etc/squid/mirror-dstdomain.acl:
> +```
> +# File: /etc/squid/mirror-dstdomain.acl
> +
> +snapshot.debian.org
> +```
> +
> +### /etc/squid/pkg-blacklist-regexp.acl:
> +```
> +# File: /etc/squid/pkg-blacklist-regexp.acl
> +# Empty for now
> +```
> +
> +### /etc/squid/allowed-networks-src.acl:
> +```
> +# File: /etc/squid/allowed-networks-src.acl
> +
> +# network sources that you want to allow access to the cache
> +
> +# private networks
> +10.0.0.0/8
> +172.16.0.0/12
> +192.168.0.0/16
> +127.0.0.1
> +
> +# IPv6 private addresses
> +fe80::/64
> +::1/128
> +
> +# IPv6 mesh local
> +fd00::/8
> +```
> +
> +Restart `systemctl restart squid`
> +
> +## Use the Proxy in ISAR Build System
> +
> +To forward the proxy settings to apt inside the ISAR build system
> just export `http_proxy`
> +as follows:
> +
> +```
> +export http_proxy=http://<proxy-server-ip>:4242
> +```
> +
> +> Hint: Consider also setting `https_proxy`.
> +
> +### Validation
> +
> +The first time you build your image the cache will fetch all
> packages from upstream.
> +During that phase you will see log entries, like
> +
> +```
> +... TCP_MISS/200 1574478 GET
> http://snapshot.debian.org/file/7cfaf...
> +```
> +in `/var/log/squid/access.log`.
> +
> +From that time on for existing packages only
> +
> +```
> +... TCP_OFFLINE_HIT/200 1574480 GET
> http://snapshot.debian.org/file/7cfaf...
> +... TCP_MEM_HIT/200 1574480 GET
> http://snapshot.debian.org/file/7cfaf...
> +```
> +
> +> Note: When you add new packages to your image, these have to be
> fetched first, so you will encounter `TCP_MISS`es whenever you add
> packages you didn't fetched before. Same holds true when upgrading
> the snapshot timestamp (`ISAR_APT_SNAPSHOT_TIMESTAMP` or
> `ISAR_APT_SNAPSHOT_DATE`).
> +
> +> Hint: You can observe your cache misses using:
> +> ```
> +> tail -f /var/log/squid/access.log | grep -e TCP_MEM_HIT -e
> TCP_OFFLINE_HIT -v
> +> ```
> -- 
> 2.43.0
> 

-- 
Siemens AG, Technology
Linux Expert Center


-- 
You received this message because you are subscribed to the Google Groups "isar-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isar-users+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/isar-users/37f9067a2c3372a3d8c7a1402b9739869677bec9.camel%40siemens.com.

  reply	other threads:[~2024-10-31 16:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-27 19:06 alexander.heinisch via isar-users
2024-09-27 19:06 ` [PATCH 1/3] Added DISTRO_APT_SNAPSHOT_PREMIRROR_BASE to specify the base-url of the mirror used alexander.heinisch via isar-users
2024-10-01 15:18   ` 'Jan Kiszka' via isar-users
2024-09-27 19:06 ` [PATCH 2/3] Added Kconfig for cached snapshot mirror alexander.heinisch via isar-users
2024-09-27 19:06 ` [PATCH 3/3] Added doc to setup apt cache alexander.heinisch via isar-users
2024-10-08 20:12   ` 'Niedermayr, BENEDIKT' via isar-users
2024-10-01 13:47 ` [PATCH 0/3] Added support for apt caching 'MOESSBAUER, Felix' via isar-users
2024-10-08  5:20 ` Uladzimir Bely
2024-10-08  6:43   ` 'Heinisch, Alexander' via isar-users
2024-10-08 12:38     ` 'Jan Kiszka' via isar-users
2024-10-31 14:46       ` 'MOESSBAUER, Felix' via isar-users
2024-10-31 15:40         ` 'Heinisch, Alexander' via isar-users
2024-10-31 16:26           ` 'MOESSBAUER, Felix' via isar-users [this message]
2024-10-31 16:53             ` 'Heinisch, Alexander' via isar-users

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37f9067a2c3372a3d8c7a1402b9739869677bec9.camel@siemens.com \
    --to=isar-users@googlegroups.com \
    --cc=alexander.heinisch@siemens.com \
    --cc=felix.moessbauer@siemens.com \
    --cc=jan.kiszka@siemens.com \
    --cc=ubely@ilbers.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox