Replace idiomatic set ops with canonical ones

The use of comm(1) is the canonical way to perform set processing.
The original operations were idiomatic and less efficient.
Their expense came from the sort operation, which was run on top
of the preceding sort required to make their records unique.

All replacement operations run in linear time.
pull/553/head
Diomidis Spinellis 2018-04-03 10:36:09 +03:00
parent 666c7fee18
commit 1843c1eddc
1 changed files with 4 additions and 4 deletions

8
README.md Normal file → Executable file
View File

@ -351,11 +351,11 @@ mkdir empty && rsync -r --delete empty/ some-dir && rmdir some-dir
A few examples of piecing together commands: A few examples of piecing together commands:
- It is remarkably helpful sometimes that you can do set intersection, union, and difference of text files via `sort`/`uniq`. Suppose `a` and `b` are text files that are already uniqued. This is fast, and works on files of arbitrary size, up to many gigabytes. (Sort is not limited by memory, though you may need to use the `-T` option if `/tmp` is on a small root partition.) See also the note about `LC_ALL` above and `sort`'s `-u` option (left out for clarity below). - It is remarkably helpful sometimes that you can do set intersection, union, and difference of text files via `sort`/`comm`. Suppose `a` and `b` are sorted text files. This is fast, and works on files of arbitrary size, up to many gigabytes. (Sort is not limited by memory, though you may need to use the `-T` option if `/tmp` is on a small root partition.) See also the note about `LC_ALL` above and `sort`'s `-u` option (left out for clarity below).
```sh ```sh
sort a b | uniq > c # c is a union b sort -mu a b > c # c is a union b
sort a b | uniq -d > c # c is a intersect b comm -12 a b > c # c is a intersect b
sort a b b | uniq -u > c # c is set difference a - b comm -23 a b > c # c is set difference a - b
``` ```
- Use `grep . *` to quickly examine the contents of all files in a directory (so each line is paired with the filename), or `head -100 *` (so each file has a heading). This can be useful for directories filled with config settings like those in `/sys`, `/proc`, `/etc`. - Use `grep . *` to quickly examine the contents of all files in a directory (so each line is paired with the filename), or `head -100 *` (so each file has a heading). This can be useful for directories filled with config settings like those in `/sys`, `/proc`, `/etc`.