From 1777450d8239f2eb335197bba4ec17e930066b8c Mon Sep 17 00:00:00 2001 From: Joshua Levy Date: Fri, 23 Feb 2018 16:28:09 -0800 Subject: [PATCH] Another couple uconv examples I find useful. Always hard to remember and look up, so worth listing here. --- README.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 7e1824c..51b9ad6 100644 --- a/README.md +++ b/README.md @@ -278,9 +278,13 @@ mkdir empty && rsync -r --delete empty/ some-dir && rmdir some-dir - For binary diffs (delta compression), use `xdelta3`. -- To convert text encodings, try `iconv`. Or `uconv` for more advanced use; it supports some advanced Unicode things. For example, this command lowercases and removes all accents (by expanding and dropping them): +- To convert text encodings, try `iconv`. Or `uconv` for more advanced use; it supports some advanced Unicode things. For example: ```sh - uconv -f utf-8 -t utf-8 -x '::Any-Lower; ::Any-NFD; [:Nonspacing Mark:] >; ::Any-NFC; ' < input.txt > output.txt + # Displays hex codes or actual names of characters (useful for debugging): + uconv -f utf-8 -t utf-8 -x '::Any-Hex;' < input.txt + uconv -f utf-8 -t utf-8 -x '::Any-Name;' < input.txt + # Lowercase and removes all accents (by expanding and dropping them): + uconv -f utf-8 -t utf-8 -x '::Any-Lower; ::Any-NFD; [:Nonspacing Mark:] >; ::Any-NFC;' < input.txt > output.txt ``` - To split files into pieces, see `split` (to split by size) and `csplit` (to split by a pattern).