Hints, Tips, and Tricks

This list is currently in no particular order. It's mostly to help me remember how I did things. If you find it useful please let me know, and I may put more of my secrets here as I remember them.

Finding duplicate files

Problem: Duplicate files take up lots of space.

Solution: This (technically) one-liner script should work on any relatively modern flavor of Linux or Unix.


# fast find and remove duplicate files

# lack of support for null field separator in uniq requires use of potentially
# existent character; using hex ff should be safe but not guaranteed
# a lot of time saved by using find only once, and by using file sizes as the
# first elimination step, fewer md5sum calculations will be needed, saving
# even more time

# 1.  find all the file sizes and names
# 2.  pad the sizes with leading zeros and use "$IFS" as a field separator (so uniq will work)
#     escape non-normal characters in file name and path:
#     a. create associative array with each 8-bit character as an array index
#        i.  use character as value for printable characters
#        ii. use C style quoting ($'\xHH') as value for non-printable and special/meta characters
#     b. with null field separator each character in name becomes separate array element
#     c. loop through each name array element and print value of hex element
# 3.  reverse sort so largest files will be checksummed first and uniq will work
# 4.  eliminate files with unique sizes
# 5.  calculate the md5sum for each potentially duplicate file, reordering the
#     fields so the file size is first then md5sums (assuming that files with
#     the same md5sum will be the same size)
#     add tee to stderr between md5sum and cut to monitor progress; may not work if su'ed
# 6.  reverse sort so largest files will be at the beginning of the final script and uniq will work
# 7.  keep files with duplicate sizes and md5sums only
# 8.  format for a shell script to rm unwanted files (keep grouping for convenience)
#     with the file size and md5sum before the rm commands for reference
# 9.  write to rm-dupes.sh, using the current date/time in the file name

IFS=$'\xff' \
  && find . \( -not \( -path ./.\* -prune \
        -o -path ./Mail -prune \
      \) \) -type f -not -empty -printf "%s${IFS}%p\0" \
  | awk -F "$IFS" -v Q="'" 'BEGIN {RS="\0"
        for (i=0; i<=255; i++) {c = sprintf("%c", i)
          xa[c] = (c ~ /[0-9A-Za-z._~/-]/) ? c : sprintf("$%s\\\\x%02x%s", Q, i, Q)} }
       {fn = sprintf("%012d", $1) FS 
        fnl = split($2, fna, "")
        for (i=1; i<=fnl; i++) fn = fn xa[fna[i]]
        print fn}' \
  | sort -r \
  | uniq -w 12 -D \
  | while read FBYTES FNAME
      do echo "$FBYTES$IFS"$(eval 'md5sum '$FNAME|cut -b-32)"$IFS$FNAME"
    done \
  | sort -r \
  | uniq -w 45 --all-repeated=separate \
  | awk -F "$IFS" -v Q="'" 'BEGIN {b = 1
        print "#!/bin/bash" }
      /^$/ {b = 1}
      !/^$/ {if (b) printf "\n# %" Q "d # %s\n", $1, $2
        b = 0
        print "rm -fv " $3}' \
  > rm-dupes-$(date +%s).sh

# use editor to delete the rm commands for the files we want to keep
That's it!

Google Earth 7

Problem: Google Earth 7 doesn't run in Slackware64 because it requires LSB and has a bug which causes a crash when trying to access Persian fonts.

Solution: This works in Slackware64-14

  1. Earth doesn't come in the Slackware native format, so you have two choices: use rpm with --nodeps, or convert the rpm to a txz package then install that. Life will be a little easier if we do the two-step from the directory where the file was downloaded:
    rpm2txz google-stable-stable_current_x86_64.rpm
    installpkg google-stable-stable_current_x86_64.txz
  2. Create the following symlinks, and delete the Persian font config file:
    ln -sf /lib/ld-linux.so.2 /lib/ld-lsb.so.3
    ln -sf /lib64/ld-linux-x86-64.so.2 /lib64/ld-lsb-x86-64.so.3
    rm /etc/fonts/conf.d/65-fonts-persian.conf
That's it!