A summary of open source projects that I recently worked on. Find more on sr.ht and GitHub.
If you have any questions about my projects, you can use my public inbox.
Natural language processing
- conllx-rs/conllx-utils: library and utilities for manipulating CoNLL-X data. Rust
- Dact: search tool for Alpino treebanks. C++
- dpar: transition-based dependency parser using neural nets. Rust
- finalfrontier: word embeddings with subword units. Rust
- finalfusion: word embedding package with support for subword units, quantization, and memory mapping. Rust
- sticker: part-of-speech tagger, topological field labeler, and dependency parser using bidirectional RNNs, dilated convolutions, or transformers. Rust
- go2vec: support for reading/using word embeddings in Go. Go
- Citar: Hidden Markov Model part-of-speech tagger. Go
- TinyEst: maximum entropy parameter estimator for rankers, with feature selection. C
- golinear: Go binding for liblinear. Go
- reductive: (optimized) product quantization. Rust
- ART: package and utilties for significance tests using (approximate) randomization.
- Dictomaton: Java library for dictionary automata, perfect hash automata, Levenshtein automata, and compact string mappings. Java
- Quzah: library for generating RGB color sets for categorical data. Maximizes perception distance using simulated annealing. Java