ModErn Text Analysis
META Enumerates Textual Applications
|
▼Nmeta | The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing |
►Nanalyzers | Contains various ways to segment text and deal with preprocessed files (POS tags, parse trees, etc) |
►Nfilters | Contains filters that mutate existing token streams in a filter chain |
Calpha_filter | Filter that removes "non-letter" characters from tokens |
Cempty_sentence_filter | Filter that removes any empty sentences from the token stream |
Cenglish_normalizer | Filter that normalizes english language tokens |
Cicu_filter | Filter that applies an ICU transliteration to each token in the sequence |
Clength_filter | Filter that only retains tokens that are within a certain length range, inclusive |
Clist_filter | Filter that either removes or keeps tokens from a given list |
Clowercase_filter | Filter that converts all tokens to lowercase |
Cporter2_stemmer | Filter that stems words according to the porter2 stemmer algorithm |
Cptb_normalizer | A filter that normalizes text to match Penn Treebank conventions |
Csentence_boundary | Filter that adds sentence boundary tokens ("<s>" and "</s>") to streams of tokens |
►Ntokenizers | Contains tokenizers that start off a filter chain |
Ccharacter_tokenizer | Converts documents into streams of characters |
►Cicu_tokenizer | Converts documents into streams of tokens by following the unicode standards for sentence and word segmentation |
Cimpl | Implementation class for the icu_tokenizer |
Cwhitespace_tokenizer | Converts documents into streams of whitespace delimited tokens |
►Canalyzer | An class that provides a framework to produce token counts from documents |
Canalyzer_exception | Basic exception for analyzer interactions |
Canalyzer_factory | Factory that is responsible for creating analyzers from configuration files |
Cbranch_featurizer | Tokenizes parse trees by extracting branching factor features |
Cdepth_featurizer | Tokenizes parse trees by extracting depth features |
Cfeaturizer_factory | Factory that is responsible for creating tree featurizers from configuration files |
Cfilter_factory | Factory that is responsible for creating filters during analyzer construction |
Clibsvm_analyzer | Libsvm_analyzer tokenizes documents that have been created from a line_corpus, where each line is in libsvm input format and stored in the document's content field |
Cmulti_analyzer | More than one analyzer |
Cngram_analyzer | Analyzes documents based on an ngram word model, where the value for n is supplied by the user |
Cngram_pos_analyzer | Analyzes documents based on part-of-speech tags instead of words |
Cngram_word_analyzer | Analyzes documents using their tokenized words |
Csemi_skeleton_featurizer | Tokenizes parse trees by keeping track of only a single node label and the underlying tree structure |
Cskeleton_featurizer | Tokenizes parse trees by only tokenizing the tree structure itself |
Csubtree_featurizer | Tokenizes parse trees by counting occurrences of subtrees in a document's parse tree |
Ctag_featurizer | Tokenizes parse trees by looking at labels of leaf and interior nodes |
►Ctoken_stream | Base class that represents a stream of tokens that have been extracted from a document |
Ctoken_stream_exception | Basic exception class for token stream interactions |
Ctree_analyzer | Base class tokenizing using parse tree features |
Ctree_featurizer | Base class for featurizers that convert trees into features in a document |
►Ncaching | Containers to be used for caching purposes |
Cdblru_cache | A double-barrel approach at a LRU cache |
Cgeneric_shard_cache | A simple sharding-based approach for increasing concurrency within a cache |
Clocking_map | A simple wrapper around a std::unordered_map that uses an internal mutex for synchronization safety |
Cno_evict_cache | An incredibly simple "cache" that simply keeps everything in memory |
►Csplay_cache | A splay_cache is a fixed-size splay tree for cache operations |
Cnode | One node in the splay tree contains pointers to children and the templated (key, value) pair |
Csplay_cache_exception | Basic exception for splay_cache interactions |
►Nclassify | Algorithms for feature selection, KNN search, and confusion matrices |
►Nkernel | Kernel functions for linear classifiers |
Cpolynomial | A polynomial kernel function for a linear classifier to adapt it to data that is not linearly separable |
Cradial_basis | A radial basis function kernel for linear classifiers to adapt them to data that is not linearly separable |
Csigmoid | A sigmoid kernel function for a linear classifier to adapt it to data that is not linearly separable |
►Nloss | Loss functions for sgd |
Chinge | The hinge loss for SGD algorithms |
Chuber | The huber loss for SGD algorithms |
Cleast_squares | The least-squares loss function for SGD algorithms |
Clogistic | The logistic loss for SGD algorithms |
Closs_function | Base class for all loss functions that can be passed to the sgd classifier |
Closs_function_factory | Factory that is responsible for creating loss functions from strings |
Cmodified_huber | The modified huber loss function for SGD algorithms |
Cperceptron | The perceptron loss function for SGD algorithms |
Csmooth_hinge | The smooth hinge loss function for SGD algorithms |
Csquared_hinge | The squared hinge loss function for SGD algorithms |
Cbinary_classifier | A classifier which classifies documents as "positive" or "negative" |
Cbinary_classifier_factory | Factory that is responsible for creating binary classifiers from configuration files |
Cclassifier | A classifier uses a document's feature space to identify which group it belongs to |
Cclassifier_factory | Factory that is responsible for creating classifiers from configuration files |
Cconfusion_matrix | Allows interpretation of classification errors |
Cdual_perceptron | Implements a perceptron classifier, but using the dual formulation of the problem |
►Cknn | Implements the k-Nearest Neighbor lazy learning classification algorithm |
Cknn_exception | Basic exception for knn interactions |
Clinear_model | A storage class for multiclass linear classifier models |
Clinear_model_exception | Exception thrown during interactions with linear_models |
Clogistic_regression | Multinomial logistic regression |
Cnaive_bayes | Implements the Naive Bayes classifier, a simplistic probabilistic classifier that uses Bayes' theorem with strong feature independence assumptions |
►Cnearest_centroid | Implements the nearest centroid classification algorithm |
Cnearest_centroid_exception | Basic exception for nearest_centroid interactions |
Cone_vs_all | Generalizes binary classifiers to operate over multiclass types using the one vs all method |
Cone_vs_one | Ensemble method adaptor for extending binary_classifiers to the multi-class classification case by using a one-vs-one strategy |
Csgd | Implements stochastic gradient descent for learning binary linear classifiers |
Csvm_wrapper | Wrapper class for liblinear (http://www.csie.ntu.edu.tw/~cjlin/liblinear/) and libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) implementation of support vector machine classification |
Cwinnow | Implements the Winnow classifier, a simplistic linear classifier for linearly-separable data |
►Ncorpus | Various ways to convert corpus formats into META-readable documents |
►Ccorpus | Provides interface to with multiple corpus input formats |
Ccorpus_exception | Basic exception for corpus interactions |
Cdocument | Represents an indexable document |
Cfile_corpus | Creates document objects from individual files, each representing a single document |
Cgz_corpus | Fills document objects with content line-by-line from gzip-compressed input files |
Cline_corpus | Fills document objects with content line-by-line from an input file |
►Ngraph | Contains implementations of the graph data structure and algorithms that operate over them |
►Nalgorithms | |
Cgraph_algorithm_exception | Exception for errors in graph algorithms |
Cdefault_edge | |
Cdefault_node | |
►Cdirected_graph | A (currently) simple class to represent a directed graph in memory |
Cedge_iterator | |
Cnode_iterator | |
Cundirected_graph | |
Cdirected_graph_exception | Basic exception for directed_graph interactions |
Cgraph | |
Cgraph_exception | Basic exception for graph interactions |
►Cundirected_graph | A simple class to represent a directed graph in memory |
Cdirected_graph | |
Cedge_iterator | |
Cnode_iterator | |
Cundirected_graph_exception | Basic exception for undirected_graph interactions |
►Nindex | Indexes to create efficient representations of data |
Cabsolute_discount | Implements the absolute discounting smoothing method |
Ccached_index | Decorator class for wrapping indexes with a cache |
Cchunk | Represents a portion of a disk_index's postings file |
►Cchunk_handler | An interface for writing and merging inverted chunks of postings_data for a disk_index |
Cchunk_handler_exception | Simple exception class for chunk_handler interactions |
Cproducer | The object that is fed postings_data by the index |
Cdirichlet_prior | Implements Bayesian smoothing with a Dirichlet prior |
►Cdisk_index | Holds generic data structures and functions that inverted_index and forward_index both use |
Cdisk_index_impl | The implementation of a disk_index |
►Cforward_index | The forward_index stores information on a corpus by doc_ids |
Cforward_index_exception | Basic exception for forward_index interactions |
Cimpl | Implementation of a forward_index |
►Cinverted_index | Stores information on a corpus indexed by term_ids |
Cimpl | Implementation of an inverted_index |
Cinverted_index_exception | Basic exception for inverted_index interactions |
►Cir_eval | Evaluates lists of ranked documents returned from a search engine; can give stats per-query (e.g |
Cir_eval_exception | Basic exception for ir_eval interactions |
Cjelinek_mercer | Implements the Jelinek-Mercer smoothed ranking model |
Clanguage_model_ranker | Scores documents according to one of three different smoothed language model scoring methods described in "A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval" by Zhai and Lafferty, 2001 |
Cokapi_bm25 | The Okapi BM25 scoring function |
Cpivoted_length | The pivoted document length normalization ranking function |
Cpostings_data | A class to represent the per-PrimaryKey data in an index's postings file |
Cranker | A ranker scores a query against all the documents in an inverted index, returning a list of documents sorted by relevance |
Cranker_factory | Factory that is responsible for creating rankers from configuration files |
Cscore_data | A score_data object contains information needed to evaluate a ranking function |
Cstring_list | A class designed for reading large lists of strings that have been persisted to disk |
Cstring_list_writer | A class for writing large lists of strings to disk with an associated index file for fast random access |
Cvocabulary_map | A read-only view of a B+-tree-like structure that stores the vocabulary for an index |
►Cvocabulary_map_writer | A class that writes the B+-tree-like data structure used for storing the term id mapping in an index |
Cvocabulary_map_writer_exception | An exception that can be thrown during the building of the tree |
►Nio | Compressed file readers and writers, configuration file readers, a simple parser, and memory-mapped file support |
►Nlibsvm_parser | Parser specifically for libsvm-formatted files |
Clibsvm_parser_exception | Exception class for this parser |
►Ccompressed_file_reader | Represents a file of unsigned integers compressed using gamma compression |
Ccompressed_file_reader_exception | Basic exception for compressed_file_reader interactions |
►Ccompressed_file_writer | Writes to a file of unsigned integers using gamma compression |
Ccompressed_file_writer_exception | Basic exception for compressed_file_writer interactions |
Cgzifstream | |
Cgzofstream | |
Cgzstreambuf | |
►Cmmap_file | Memory maps a text file readonly |
Cmmap_file_exception | Basic exception for mmap_file interactions |
Cparser | Parses a text file by reading it completely into memory, delimiting tokens by user request |
►Nlm | Contains implementations of statistical language models |
Clanguage_model | |
►Nlogging | Namespace which contains all of the logging interface classes |
►Clogger | Logger: Main logging class |
Clog_line | Log_line: Represents a single message to be written to all sinks |
Csink | Sink: A wrapper for a stream that a logger should write to |
►Nparallel | Implementation of a thread pool and a parallel for loop |
►Cthread_pool | Represents a collection of a fixed number of threads, which tasks can be added to |
Cconcrete_task | A concrete task is templated with a result type |
Ctask | A generic task object |
►Nparser | Contains functions that relate to phrase structure trees and parsing of natural language |
Cannotation_remover | A tree transformer that removes annotations (currently only Penn Treebank style) from trees |
►Cbinarizer | A tree transformer that converts any n-ary productions to binary productions using provided head annotations |
Cexception | |
Cconst_visitor | Abstract base class for visitors over parse trees that do not modify the underlying tree |
Cdebinarizer | A tree transformer that converts binarized trees back into n-ary trees |
Cempty_remover | A tree transformer that removes trace elements (like "-NONE-" in the Penn Treebank) as well as nodes with empty yields |
Cevalb | A re-implementation of (some of) the evalb metrics |
Chead_finder | A visitor that annotates the internal nodes of parse trees with their head constituents/lexicons |
Chead_rule | |
Cinternal_node | An internal node in a parse tree |
Cleaf_node | A leaf node (pre-terminal) in a parse tree |
Cleaf_node_finder | This is a visitor that finds all of the leaf nodes in a parse tree |
Cmulti_transformer | A template class for composing tree transformers |
Cnode | A single node in a parse tree for a sentence |
Cparse_tree | Represents the parse tree for a sentence |
Csequence_extractor | This is a visitor that converts a parse tree into a POS-tagged sequence |
►Csr_parser | A shift-reduce constituency parser |
Cexception | Exception thrown during parser actions |
Cstate_analyzer | Analyzer responsible for converting a parser state to a feature_vector |
Ctraining_batch | A training batch |
Ctraining_data | Training data for the parser |
Ctraining_options | Training options required for learning a parser model |
Cstate | Represents the current parser state of a shift-reduce parser |
►Ctransition | Represents a transition taken by the parser |
Cexception | Exception thrown during interactions with transitions |
►Ctransition_finder | This is a visitor that converts a parse tree into a list of transitions that a shift-reduce parser would have to take in order to generate it |
Cexception | |
►Ctransition_map | An invertible map that maps transitions to ids |
Cexception | Exception thrown from interactions with the transition_map |
Ctree_transformer | Abstract base class for tree transformers |
Cunary_chain_remover | Transforms trees by removing any unary X -> X rules |
Cvisitor | Abstract base class for visitors over parse trees that are allowed to modify the underlying tree |
►Nprinting | Contains functions that print to the terminal and provide progress bars |
Cprogress | Simple class for reporting progress of lengthy operations |
►Nsequence | Sequence representations and labeling models/algorithms |
►Ccrf | Linear-chain conditional random field for POS tagging and chunking applications |
Cparameters | Wrapper to represent the parameters used during learning |
Cscorer | Internal class that holds scoring information for sequences under the current model |
Ctagger | |
Cviterbi_scorer | Scorer for performing viterbi-based tagging |
Cforward_trellis | Special trellis for the normalized forward algorithm |
►Cobservation | Represents an observation in a tagged sequence |
Cexception | Basic exception class for observation interactions |
►Cperceptron | A greedy averaged perceptron tagger |
Ctraining_options | Training options required for learning a tagger |
Csequence | Represents a tagged sequence of observations |
►Csequence_analyzer | Analyzer that operates over sequences, generating features based on a set of "observation functions" |
Cbasic_collector | Implementation-detail collector |
Ccollector | Interface class used for analyzing observations inside user-provided feature functions |
Cconst_collector | Const version of the collector |
Cdefault_collector | Non-const version of the collector |
Cexception | |
Ctrellis | Basic trellis for holding score data for the forward/backward algorithm |
Cviterbi_trellis | Special trellis for the Viterbi algorithm |
►Nstats | Probability distributions and other statistics functions |
►Cdirichlet | Represents a Dirichlet distribution |
Cparameters | |
Cmultinomial | Represents a multinomial/categorical distribution |
►Ntesting | Contains unit testing functions for the META toolkit |
Cannotation_checker | |
Cbinary_checker | |
Cfile_guard | Always makes sure a new file is created |
Cunit_test_exception | Exception class used to report errors in the unit test |
►Ntopics | Topic modeling functionality |
Clda_cvb | Lda_cvb: An implementation of LDA that uses collapsed variational bayes for inference |
Clda_gibbs | A LDA topic model implemented using a collapsed gibbs sampler |
Clda_model | An LDA topic model base class |
Clda_scvb | Lda_scvb: An implementation of LDA that uses stochastic collapsed variational Bayes for inference |
Cparallel_lda_gibbs | An LDA topic model implemented using the Approximate Distributed LDA algorithm |
►Nutf | Functions for converting to and from various character sets |
Cicu_handle | Internal class that ensures that ICU cleans up all of its "still-reachable" memory before program termination |
►Csegmenter | Class that encapsulates segmenting unicode strings |
Cimpl | Implementation class for the segmenter |
Csegment | Represents a segment within a unicode string |
►Ctransformer | Class that encapsulates transliteration of unicode strings |
Cimpl | Implementation class for the transformer |
►Nutil | Shared resources and utilities |
Cbad_optional_access | Exception thrown when trying to obtain the value of a non-engaged optional |
►Cbasic_range | Implements a range that spans a loop's extension and termination conditions, most useful for iterating over a range of numbers with a range-based for loop |
Citerator_t | Iterator to traverse the generic range |
Ccomparable | A CRTP base class that allows for inheritance of all comparator operators given that the derived class defines an operator<() |
Cdense_matrix | Simple wrapper class for representing a dense matrix laid out in row-major order (that is, its internal representation is a linear array of the rows) |
►Cdisk_vector | Disk_vector represents a large constant-size vector that does not necessarily fit in memory |
Cdisk_vector_exception | Basic exception for disk_vector |
Citerator | Provides iterator functionality for the disk_vector class |
►Cfactory | Generic factory that can be subclassed to create factories for specific types |
Cexception | Simple exception for factories |
Chash_wrapper | Helper class that allows the wrapped type to be hashed into standard library containers such as unordered_map or unordered_set |
Cidentifier | CRTP base template that denotes an identifier |
►Cinvertible_map | This data structure indexes by keys as well as values, allowing constant amortized lookup time by key or value |
Cinvertible_map_exception | Basic exception for invertible_map interactions |
CIterator | The invertible_map iterator is really just a wrapper for the forward (key -> value) unordered_map iterator |
Cmultilevel_clonable | Template class to facilitate polymorphic cloning |
►Cnullopt_t | A dummy type for representing a disengaged option<T> |
Cinit | An empty object |
Cnumeric | Empty helper class to denote that something is numeric |
Cnumerical_identifier | A CRTP template base that adds numeric functionality to the identifier type |
Coptional | A class for representing optional values |
Coptional_dummy_t | A dummy type for optional storage |
Coptional_storage | A storage class for the optional<T> class |
►Cpersistent_stack | |
Cexception | |
Cnode | |
Cpimpl | Class to assist in simple pointer-to-implementation classes |
Csparse_vector | Represents a sparse vector, indexed by type Index and storing values of type Value |
Ctrivial_init_t | A tag for trivial initialization of optional storage |
▼Nstd | STL namespace |
Chash< meta::index::postings_data< PrimaryKey, SecondaryKey > > | Hash specialization for postings_data<PrimaryKey, SecondaryKey> |
Chash< meta::util::hash_wrapper< Wrapped > > | A partial specialization that allows for hashing of hash_wrapper types based on their base type |
Cannotation_checker | |
Cbinary_checker | |
Cdirected_graph | |
Cedge_iterator | |
Cnode_iterator | |
Cundirected_graph |