public inbox for isar-users@googlegroups.com
 help / color / mirror / Atom feed
From: "'Heinisch, Alexander' via isar-users" <isar-users@googlegroups.com>
To: "MOESSBAUER, Felix" <felix.moessbauer@siemens.com>,
	"ubely@ilbers.de" <ubely@ilbers.de>,
	"isar-users@googlegroups.com" <isar-users@googlegroups.com>,
	"Kiszka, Jan" <jan.kiszka@siemens.com>
Subject: RE: [PATCH 0/3] Added support for apt caching
Date: Thu, 31 Oct 2024 15:40:02 +0000	[thread overview]
Message-ID: <AM7PR10MB33207841E236025B621B725486552@AM7PR10MB3320.EURPRD10.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <8604a9552135790de2df1a7fc05c31bc07075259.camel@siemens.com>

> Hi, this series is much needed to work with the still unreliable snapshot mirrors.
> 
> @Alexander: Do you plan to send a v2?
> 
> At the same time I'm working on adding internal apt-cacher-ng support to kas to let the build pass the initial bootstrapping.
> 
> Best regards,
> Felix

Hi Felix

Thank you for coming back.

Even when using apt-cacher-ng index files oftentimes got updated from
snapshot.debian.org which caused problems when our company was on a
blacklist for some time again.

Unfortunately, I didn't find the time to analyze why that was the case.
I did a tcpdump during one of our builds, but didn't analyze it for 2 weeks or so :-(

But I suspect either apt client sends a reload request or the expiry date 
returned from upstream is to limited.
While this could be relevant when fetching packages from "main" mirrors, 
it should not have much impact on snapshot mirrors.

To mitigate that issue, since then we switched to squid as a proxy for snapshot.debian.org
Squid has an offline mode, which says, no matter what happens, cach entries once
seen are never updated upstream. As stated above, while this could have drastic
impacts when using main mirrors, it shouldn't cause issues on snapshots, by definition.

Thus, I dropped apt-cacher-ng in our project in favour of squid.
I also prepared documentation for such, but during preparing the patch, I 
was not sure if that is worth a separate doc/ file or if we should merge that with
doc/offline.md. I was struggling with that decision since it does not really 
solve an offline case, as it only caches packages already seen once, and further, 
only solves the offline case for apt and not for other sources like git, ...

What is your opinion?

BR Alexander

PS: Appended the patch, I was referring to:

From cf64db474c2f2477633bfe3fd111156d2ac7495a Mon Sep 17 00:00:00 2001
From: Alexander Heinisch <alexander.heinisch@siemens.com>
Date: Thu, 24 Oct 2024 20:06:23 +0200
Subject: [PATCH] doc: Added setup guide for squid as an caching proxy for apt
 (snapshot) mirrors.

Signed-off-by: Alexander Heinisch <alexander.heinisch@siemens.com>
---
 doc/apt-caching-proxy.md | 142 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 142 insertions(+)
 create mode 100644 doc/apt-caching-proxy.md

diff --git a/doc/apt-caching-proxy.md b/doc/apt-caching-proxy.md
new file mode 100644
index 00000000..2a23a313
--- /dev/null
+++ b/doc/apt-caching-proxy.md
@@ -0,0 +1,142 @@
+# Setup Squid as APT Caching Proxy
+
+Limited download bandwitdth oftentimes is an issue, and increases the build times drastically. Further, large corporate networks could get rate limited by debian mirrors, as many people / pipelines / aso. fetch huge amounts of packets from there.
+
+In such cases a proxy caching the packages is quite useful as it reduces download times and reduces pressure on debian mirrors.
+
+## Install Squid Proxy
+```
+apt install squid
+```
+
+## Configure Proxy for Caching (with APT in mind)
+
+1. /etc/squid/squid.conf
+This file contains the main configuration for `squid`.
+We configure it to listen to port `4242` and cache all requests from sites listed in `/etc/squid/mirror-dstdomain.acl`. Further, to enable, offline usecases (or usecases where your ip got temporarily blacklisted by `snapshot.debian.org` or similar) we set `offline_mode on`
+to not fetch already cached packages from upstream.
+
+> Note: While `offline_mode on` is totally fine for `snapshot.debian.org` when using a timestamp to fix your package archive version, this could cause unintended behaviour (most probably outdated packages) when used against a non archive mirror.
+
+> Hint: If you are planning to work against non archive mirrors, and you are not sure, it's recommended to set `offline_mode off` and probably tweak cache behaviour with a `refresh_pattern`.
+
+### /etc/squid/squid.conf:
+```
+# File: /etc/squid/squid.conf
+
+# default to a different port than stock squid
+http_port 4242
+
+# user visible name
+visible_hostname squid-apt-caching-proxy
+
+# do not fetch already cached packages from upstream
+offline_mode on
+
+# we need a big cache, some debs are huge
+maximum_object_size 512 MB
+
+# increase available disk space for cache dir to 40G
+cache_dir aufs /var/cache/squid 40000 16 256
+
+# logs
+access_log /var/log/squid/access.log
+cache_log /var/log/squid/cache.log
+cache_store_log /var/log/squid/store.log
+
+# tweaks to speed things up
+cache_mem 256 MB
+maximum_object_size_in_memory 10240 KB
+
+# only allow ports we trust
+acl Safe_ports port 80
+acl Safe_ports port 443
+
+http_access deny !Safe_ports
+
+# Deny access to blacklisted sites
+acl blockedpkgs urlpath_regex "/etc/squid/pkg-blacklist-regexp.acl"
+http_access deny blockedpkgs
+
+# List of domains to cache
+acl to_archive_mirrors dstdomain "/etc/squid/mirror-dstdomain.acl"
+# don't cache domains not listed in the mirrors file
+cache deny !to_archive_mirrors
+
+# Allow access to the proxy only from networks listed in allowed-networks-src.acl
+acl allowed_networks src "/etc/squid/allowed-networks-src.acl"
+http_access allow allowed_networks
+
+# And finally deny all other access to this proxy
+http_access deny all
+```
+
+### /etc/squid/mirror-dstdomain.acl:
+```
+# File: /etc/squid/mirror-dstdomain.acl
+
+snapshot.debian.org
+```
+
+### /etc/squid/pkg-blacklist-regexp.acl:
+```
+# File: /etc/squid/pkg-blacklist-regexp.acl
+# Empty for now
+```
+
+### /etc/squid/allowed-networks-src.acl:
+```
+# File: /etc/squid/allowed-networks-src.acl
+
+# network sources that you want to allow access to the cache
+
+# private networks
+10.0.0.0/8
+172.16.0.0/12
+192.168.0.0/16
+127.0.0.1
+
+# IPv6 private addresses
+fe80::/64
+::1/128
+
+# IPv6 mesh local
+fd00::/8
+```
+
+Restart `systemctl restart squid`
+
+## Use the Proxy in ISAR Build System
+
+To forward the proxy settings to apt inside the ISAR build system just export `http_proxy`
+as follows:
+
+```
+export http_proxy=http://<proxy-server-ip>:4242
+```
+
+> Hint: Consider also setting `https_proxy`.
+
+### Validation
+
+The first time you build your image the cache will fetch all packages from upstream.
+During that phase you will see log entries, like
+
+```
+... TCP_MISS/200 1574478 GET http://snapshot.debian.org/file/7cfaf...
+```
+in `/var/log/squid/access.log`.
+
+From that time on for existing packages only
+
+```
+... TCP_OFFLINE_HIT/200 1574480 GET http://snapshot.debian.org/file/7cfaf...
+... TCP_MEM_HIT/200 1574480 GET http://snapshot.debian.org/file/7cfaf...
+```
+
+> Note: When you add new packages to your image, these have to be fetched first, so you will encounter `TCP_MISS`es whenever you add packages you didn't fetched before. Same holds true when upgrading the snapshot timestamp (`ISAR_APT_SNAPSHOT_TIMESTAMP` or `ISAR_APT_SNAPSHOT_DATE`).
+
+> Hint: You can observe your cache misses using:
+> ```
+> tail -f /var/log/squid/access.log | grep -e TCP_MEM_HIT -e TCP_OFFLINE_HIT -v
+> ```
-- 
2.43.0

-- 
You received this message because you are subscribed to the Google Groups "isar-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isar-users+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/isar-users/AM7PR10MB33207841E236025B621B725486552%40AM7PR10MB3320.EURPRD10.PROD.OUTLOOK.COM.

  reply	other threads:[~2024-10-31 15:40 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-27 19:06 alexander.heinisch via isar-users
2024-09-27 19:06 ` [PATCH 1/3] Added DISTRO_APT_SNAPSHOT_PREMIRROR_BASE to specify the base-url of the mirror used alexander.heinisch via isar-users
2024-10-01 15:18   ` 'Jan Kiszka' via isar-users
2024-09-27 19:06 ` [PATCH 2/3] Added Kconfig for cached snapshot mirror alexander.heinisch via isar-users
2024-09-27 19:06 ` [PATCH 3/3] Added doc to setup apt cache alexander.heinisch via isar-users
2024-10-08 20:12   ` 'Niedermayr, BENEDIKT' via isar-users
2024-10-01 13:47 ` [PATCH 0/3] Added support for apt caching 'MOESSBAUER, Felix' via isar-users
2024-10-08  5:20 ` Uladzimir Bely
2024-10-08  6:43   ` 'Heinisch, Alexander' via isar-users
2024-10-08 12:38     ` 'Jan Kiszka' via isar-users
2024-10-31 14:46       ` 'MOESSBAUER, Felix' via isar-users
2024-10-31 15:40         ` 'Heinisch, Alexander' via isar-users [this message]
2024-10-31 16:26           ` 'MOESSBAUER, Felix' via isar-users
2024-10-31 16:53             ` 'Heinisch, Alexander' via isar-users

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM7PR10MB33207841E236025B621B725486552@AM7PR10MB3320.EURPRD10.PROD.OUTLOOK.COM \
    --to=isar-users@googlegroups.com \
    --cc=alexander.heinisch@siemens.com \
    --cc=felix.moessbauer@siemens.com \
    --cc=jan.kiszka@siemens.com \
    --cc=ubely@ilbers.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox