MeTA is a modern C++ data sciences toolkit featuring

  • text tokenization, including deep semantic features like parse trees
  • inverted and forward indexes with compression and various caching strategies
  • a collection of ranking functions for searching the indexes
  • topic models
  • classification algorithms
  • graph algorithms
  • language models
  • CRF implementation (POS-tagging, shallow parsing)
  • wrappers for liblinear and libsvm (including libsvm dataset parsers)
  • UTF8 support for analysis on various languages
  • multithreaded algorithms


Doxygen documentation can be found here.

Project setup

See the setup guide for installation instructions.


We have walkthroughs for the following parts of MeTA:


Contact us through our GitHub issues page if you’d like your application of MeTA on our site!