unispect/README.md
2023-01-09 12:43:51 +01:00

90 lines
3.2 KiB
Markdown

# unispect
Like `uniname` but worse.
Made because nix doesn't have uniname and I couldn't find any way to install it and also it kinda sucked so like
## Example
```
$ echo "Δ, Й, ק ,م, ๗, あ, 叶, 葉, and 말" | ./unispect
U+0394 <ce94> Lu [Greek and Coptic] | GREEK CAPITAL LETTER DELTA
U+002C <2c> Po [Basic Latin] | COMMA
U+0020 <20> Zs [Basic Latin] | SPACE
U+0419 <d099> Lu [Cyrillic] | CYRILLIC CAPITAL LETTER SHORT I
U+002C <2c> Po [Basic Latin] | COMMA
U+0020 <20> Zs [Basic Latin] | SPACE
U+05E7 <d7a7> Lo [Hebrew] | HEBREW LETTER QOF
U+0020 <20> Zs [Basic Latin] | SPACE
U+002C <2c> Po [Basic Latin] | COMMA
U+0645 <d985> Lo [Arabic] | ARABIC LETTER MEEM
U+002C <2c> Po [Basic Latin] | COMMA
U+0020 <20> Zs [Basic Latin] | SPACE
U+0E57 <e0b997> Nd [Thai] | THAI DIGIT SEVEN
U+002C <2c> Po [Basic Latin] | COMMA
U+0020 <20> Zs [Basic Latin] | SPACE
U+3042 <e38182> Lo [Hiragana] | HIRAGANA LETTER A
U+002C <2c> Po [Basic Latin] | COMMA
U+0020 <20> Zs [Basic Latin] | SPACE
U+53F6 <e58fb6> Lo [CJK Unified Ideographs] | CJK UNIFIED IDEOGRAPH-53F6
U+002C <2c> Po [Basic Latin] | COMMA
U+0020 <20> Zs [Basic Latin] | SPACE
U+8449 <e89189> Lo [CJK Unified Ideographs] | CJK UNIFIED IDEOGRAPH-8449
U+002C <2c> Po [Basic Latin] | COMMA
U+0020 <20> Zs [Basic Latin] | SPACE
U+0061 <61> Ll [Basic Latin] | LATIN SMALL LETTER A
U+006E <6e> Ll [Basic Latin] | LATIN SMALL LETTER N
U+0064 <64> Ll [Basic Latin] | LATIN SMALL LETTER D
U+0020 <20> Zs [Basic Latin] | SPACE
U+B9D0 <eba790> Lo [Hangul Syllables] | HANGUL SYLLABLE MAL
U+002E <2e> Po [Basic Latin] | FULL STOP
U+0020 <20> Zs [Basic Latin] | SPACE
```
## Usage
```
./unispect --help
Usage: unispect [OPTIONS]
Options:
-f, --format <FORMAT>
What attributes to include, in order. Key:
p: (Code)point (eg. U+e621)
e: UTF-8 Encoding (eg. <e621>)
a: Age/version added (eg. 1.1.0)
c: Category (eg. Ll)
b: Block (eg. [Basic Latin])
C: Case
i: ignorable case
c: cased
f: changes when casefolded
m: changes when casemapped
l: changes when lowercased
u: changes when uppercased
t: changes when titlecased
L: lowercase
U: uppercase
[default: pecb]
-q, --no-name
Do not display the name of the character
-t, --tight
Don't pad each attribute to align them
-h, --help
Print help information (use `-h` for a summary)
-V, --version
Print version information
```
## TODO:
* Handle invalid UTF-8?
* Print the characters themselves? (Spacing will be hard to get right, combining characters must be handled, etc.)