public inbox for isar-users@googlegroups.com
 help / color / mirror / Atom feed
From: Srinuvasan Arjunan <srinuvasanasv@gmail.com>
To: isar-users <isar-users@googlegroups.com>
Subject: Re: [PATCH] deb-dl-dir: remove excessive calls to dpkg-deb in debsrc_download
Date: Mon, 10 Mar 2025 04:06:55 -0700 (PDT)	[thread overview]
Message-ID: <671d33d8-1a0f-402d-8b6a-f8c56d6c3e30n@googlegroups.com> (raw)
In-Reply-To: <20250305131142.2717692-1-cedric.hombourger@siemens.com>


[-- Attachment #1.1: Type: text/plain, Size: 6153 bytes --]



On Wednesday, March 5, 2025 at 6:42:05 PM UTC+5:30 Cedric Hombourger wrote:

Several calls to dpkg-deb are made for each single .deb file found in 
downloads to parse individual fields. This approach is terribly slow 
when a large amount of .deb files are found. Use apt-ftparchive to 
produce an index of packages that were found and a simple awk script 
to produce a (sorted) list of source package names and their versions. 
Also avoid using sed to remove Epoch from the version when we are 
trying to determine the name of the .dsc file: we instead use a simple 
POSIX parameter expansion to remove everything up to the first colon 

Signed-off-by: Cedric Hombourger <cedric.h...@siemens.com> 
--- 
meta/classes/deb-dl-dir.bbclass | 62 +++++++++++++++++++-------------- 
1 file changed, 35 insertions(+), 27 deletions(-) 

diff --git a/meta/classes/deb-dl-dir.bbclass 
b/meta/classes/deb-dl-dir.bbclass 
index 7ebd057e..53ce4538 100644 
--- a/meta/classes/deb-dl-dir.bbclass 
+++ b/meta/classes/deb-dl-dir.bbclass 
@@ -5,23 +5,6 @@ 

inherit repository 

-is_not_part_of_current_build() { 
- local package="$( dpkg-deb --show --showformat '${Package}' "${1}" )" 
- local arch="$( dpkg-deb --show --showformat '${Architecture}' "${1}" )" 
- local version="$( dpkg-deb --show --showformat '${Version}' "${1}" )" 
- # Since we are parsing all the debs in DEBDIR, we can to some extend 
- # try to eliminate some debs that are not part of the current multiconfig 
- # build using the below method. 
- local output="$( grep -xhs ".* status installed ${package}:${arch} 
${version}" \ 
- "${IMAGE_ROOTFS}"/var/log/dpkg.log \ 
- "${SCHROOT_HOST_DIR}"/var/log/dpkg.log \ 
- "${SCHROOT_TARGET_DIR}"/var/log/dpkg.log \ 
- "${SCHROOT_HOST_DIR}"/tmp/dpkg_common.log \ 
- "${SCHROOT_TARGET_DIR}"/tmp/dpkg_common.log | head -1 )" 
- 
- [ -z "${output}" ] 
-} 
- 
debsrc_do_mounts() { 
sudo -s <<EOSUDO 
set -e 
@@ -54,16 +37,41 @@ debsrc_download() { 
( flock 9 
set -e 
printenv | grep -q BB_VERBOSE_LOGS && set -x 
- find "${rootfs}/var/cache/apt/archives/" -maxdepth 1 -type f -iname 
'*\.deb' | while read package; do 
- is_not_part_of_current_build "${package}" && continue 
- local src="$( dpkg-deb --show --showformat '${source:Package}' 
"${package}" )" 
- local version="$( dpkg-deb --show --showformat '${source:Version}' 
"${package}" )" 
- local dscname="$(echo ${src}_${version} | sed -e 's/_[0-9]\+:/_/')" 
- local dscfile=$(find "${DEBSRCDIR}"/"${rootfs_distro}" -name 
"${dscname}.dsc") 
- [ -n "$dscfile" ] && continue 
- 
- sudo -E chroot --userspec=$( id -u ):$( id -g ) ${rootfs} \ 
- sh -c ' mkdir -p "/deb-src/${1}/${2}" && cd "/deb-src/${1}/${2}" && 
apt-get -y --download-only --only-source source "$2"="$3" ' download-src 
"${rootfs_distro}" "${src}" "${version}" 
+ 
+ # Use apt-ftparchive to scan all .deb files found in the download 
directory 
+ # and produce an index that we can "parse" with awk. This is much faster 
+ # than parsing each .deb file individually using dpkg-deb. Lines from the 
+ # index we need are: 
+ # 
+ # Package: <binary-name> 
+ # Version: <binary-version> 
+ # Source: <source-name> (<source-version>) 
+ # 
+ # If Source is omitted, then <source-name>=<binary-name> and 
+ # if <source-version> is not specified then it is <binary-version>. 
+ # The awk script handles these optional fields. It looks for Size: as a 
+ # trigger to print the source,version tupple 
+ 
+ apt-ftparchive --md5=no --sha1=no --sha256=no --sha512=no \ 
+ -a "${DISTRO_ARCH}" packages \


  Hi Cedric,

  I took this patch for my deb-src-caching issue [1], now i can able to 
download deb-src for bootstrap and image related packages
  only missing part is imager_install related packages, going to send the 
patches based on your patch.

  But here i found one issue for armfh arch base-apt builds in ISAR, the 
help2man and texinfo deb-src packages are missing
  because when we take the index using  apt-ftparchive --md5=no --sha1=no 
--sha256=no --sha512=no  -a "${DISTRO_ARCH}"
  we uses the -a ${DISTRO_ARCH}, in this case it is armfh, but help2man and 
texinfo packages are only available for amd64 arch (might
  be ISAR_CROSS_COMPILE configuration) not armhf, hence the index doesn't 
have those packages , due to this reason we are not able to
  download src packages for those packages.

   I would suggest we can remove -a "${DISTRO_ARCH}" option and anyhow we 
are getting final list with sort -u.
   Validated without -a option and it's working fine as expected.

   [1]: https://groups.google.com/g/isar-users/c/8QstIaudyts

 Please provide your thoughts?  


+ "${rootfs}/var/cache/apt/archives" \ 
+ | awk '/^Package:/ { s=$2; } 
+ /^Version:/ { v=$2; next } 
+ /^Source:/ { s=$2; if ($3 ~ /^\(/) v=substr($3, 2, length($3)-2) } 
+ /^Size:/ { print s, v}' \ 
+ | sort -u \ 
+ | while read src version; do 
+ # Name of the .dsc file does not include Epoch, remove it before checking 
+ # if sources were already downloaded. Avoid using sed here to reduce the 
+ # number of processes being spawned by this function: we assume that the 
+ # version is correctly formatted and simply strip everything up to the 
+ # first colon 
+ dscname="${src}_${version#*:}.dsc" 
+ [ -f "${DEBSRCDIR}"/"${rootfs_distro}"/"${src}"/"${dscname}" ] || { 
+ # use apt-get source to download sources in DEBSRCDIR 
+ sudo -E chroot --userspec=$( id -u ):$( id -g ) ${rootfs} \ 
+ sh -c ' mkdir -p "/deb-src/${1}/${2}" && cd "/deb-src/${1}/${2}" && 
apt-get -y --download-only --only-source source "$2"="$3" ' download-src 
"${rootfs_distro}" "${src}" "${version}" 
+ } 
done 
) 9>"${DEBSRCDIR}/${rootfs_distro}.lock" 

-- 
2.39.5 

-- 
You received this message because you are subscribed to the Google Groups "isar-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isar-users+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/isar-users/671d33d8-1a0f-402d-8b6a-f8c56d6c3e30n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 7922 bytes --]

  parent reply	other threads:[~2025-03-10 11:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-05 13:11 'Cedric Hombourger' via isar-users
2025-03-05 13:57 ` 'Jan Kiszka' via isar-users
2025-03-05 15:08   ` 'cedric.hombourger@siemens.com' via isar-users
2025-03-05 17:22 ` 'Niedermayr, BENEDIKT' via isar-users
2025-03-05 17:24   ` 'Niedermayr, BENEDIKT' via isar-users
2025-03-10 11:06 ` Srinuvasan Arjunan [this message]
2025-03-22  6:15 ` [PATCH v2 0/1] " 'Cedric Hombourger' via isar-users
2025-03-22  6:15   ` [PATCH v2 1/1] " 'Cedric Hombourger' via isar-users
2025-03-27 10:34   ` [PATCH v2 0/1] " Uladzimir Bely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=671d33d8-1a0f-402d-8b6a-f8c56d6c3e30n@googlegroups.com \
    --to=srinuvasanasv@gmail.com \
    --cc=isar-users@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox