MeTA is a modern C++ data sciences toolkit featuring
- text tokenization, including deep semantic features like parse trees
- inverted and forward indexes with compression and various caching strategies
- a collection of ranking functions for searching the indexes
- topic models
- classification algorithms
- graph algorithms
- language models
- CRF implementation (POS-tagging, shallow parsing)
- wrappers for liblinear and libsvm (including libsvm dataset parsers)
- UTF8 support for analysis on various languages
- multithreaded algorithms
Documentation
Doxygen documentation can be found here.
Project setup
See the setup guide for installation instructions.
Tutorials
We have walkthroughs for the following parts of MeTA:
Users
- The TIMAN Research Group from the UIUC Computer Science Department uses MeTA in their text mining research
- The Coursera course Text Retrieval and Search Engines uses MeTA in programming assignments available to thousands of students
- An upcoming textbook Text Data Analysis and Management: A Practical Introduction to Text Mining and Information Retrieval showcases the MeTA toolkit with exercises and demos
Contact us through our GitHub issues page if you’d like your application of MeTA on our site!