ModErn Text Analysis
META Enumerates Textual Applications
|
Indexes to create efficient representations of data. More...
Classes | |
class | absolute_discount |
Implements the absolute discounting smoothing method. More... | |
class | cached_index |
Decorator class for wrapping indexes with a cache. More... | |
class | chunk |
Represents a portion of a disk_index's postings file. More... | |
class | chunk_handler |
An interface for writing and merging inverted chunks of postings_data for a disk_index. More... | |
class | dirichlet_prior |
Implements Bayesian smoothing with a Dirichlet prior. More... | |
class | disk_index |
Holds generic data structures and functions that inverted_index and forward_index both use. More... | |
class | forward_index |
The forward_index stores information on a corpus by doc_ids. More... | |
class | inverted_index |
The inverted_index class stores information on a corpus indexed by term_ids. More... | |
class | ir_eval |
Evaluates lists of ranked documents returned from a search engine; can give stats per-query (e.g. More... | |
class | jelinek_mercer |
Implements the Jelinek-Mercer smoothed ranking model. More... | |
class | language_model_ranker |
Scores documents according to one of three different smoothed language model scoring methods described in "A Study of Smoothing Methods for Language
Models Applied to Ad Hoc Information Retrieval" by Zhai and Lafferty, 2001. More... | |
class | okapi_bm25 |
The Okapi BM25 scoring function. More... | |
class | pivoted_length |
The pivoted document length normalization ranking function. More... | |
class | postings_data |
A class to represent the per-PrimaryKey data in an index's postings file. More... | |
class | ranker |
A ranker scores a query against all the documents in an inverted index, returning a list of documents sorted by relevance. More... | |
class | ranker_factory |
Factory that is responsible for creating rankers from configuration files. More... | |
struct | score_data |
A score_data object contains information needed to evaluate a ranking function. More... | |
class | string_list |
A class designed for reading large lists of strings that have been persisted to disk. More... | |
class | string_list_writer |
A class for writing large lists of strings to disk with an associated index file for fast random access. More... | |
class | vocabulary_map |
A read-only view of a B+-tree-like structure that stores the vocabulary for an index. More... | |
class | vocabulary_map_writer |
A class that writes the B+-tree-like data structure used for storing the term id mapping in an index. More... | |
Typedefs | |
using | dblru_inverted_index = cached_index< inverted_index, caching::default_dblru_cache > |
Inverted index using default DBLRU cache. | |
using | splay_inverted_index = cached_index< inverted_index, caching::splay_cache > |
Inverted index using splay cache. | |
using | memory_forward_index = cached_index< forward_index, caching::no_evict_cache > |
In-memory forward index. | |
using | dblru_forward_index = cached_index< forward_index, caching::default_dblru_cache > |
Forward index using default DBLRU cache. | |
using | splay_forward_index = cached_index< forward_index, caching::splay_cache > |
Forward index using splay cache. | |
Enumerations | |
enum | index_file { DOC_IDS_MAPPING = 0, DOC_IDS_MAPPING_INDEX, DOC_SIZES, DOC_LABELS, DOC_UNIQUETERMS, LABEL_IDS_MAPPING, POSTINGS, TERM_IDS_MAPPING, TERM_IDS_MAPPING_INVERSE } |
Collection of all the files that comprise a disk_index. | |
Functions | |
template<class Index , class... Args> | |
std::shared_ptr< Index > | make_index (const std::string &config_file, Args &&...args) |
Factory method for creating indexes. More... | |
template<class Index , template< class, class > class Cache, class... Args> | |
std::shared_ptr< cached_index< Index, Cache > > | make_index (const std::string &config_file, Args &&...args) |
Factory method for creating indexes that are cached. More... | |
template<class PrimaryKey , class SecondaryKey > | |
io::compressed_file_reader & | operator>> (io::compressed_file_reader &in, postings_data< PrimaryKey, SecondaryKey > &pd) |
Reads semi-compressed postings data from a compressed file. More... | |
template<> | |
io::compressed_file_reader & | operator>> (io::compressed_file_reader &in, postings_data< std::string, doc_id > &pd) |
Reads semi-compressed postings data from a compressed file. More... | |
template<class PrimaryKey , class SecondaryKey > | |
bool | operator== (const postings_data< PrimaryKey, SecondaryKey > &lhs, const postings_data< PrimaryKey, SecondaryKey > &rhs) |
template<> | |
std::unique_ptr< ranker > | make_ranker< absolute_discount > (const cpptoml::table &) |
Specialization of the factory method used to create absolute_discount rankers. | |
template<> | |
std::unique_ptr< ranker > | make_ranker< dirichlet_prior > (const cpptoml::table &) |
Specialization of the factory method used to create dirichlet_prior rankers. | |
template<> | |
std::unique_ptr< ranker > | make_ranker< jelinek_mercer > (const cpptoml::table &) |
Specialization of the factory method used to create jelinek_mercer rankers. | |
template<> | |
std::unique_ptr< ranker > | make_ranker< okapi_bm25 > (const cpptoml::table &) |
Specialization of the factory method used to create okapi_bm25 rankers. | |
template<> | |
std::unique_ptr< ranker > | make_ranker< pivoted_length > (const cpptoml::table &) |
Specialization of the factory method used to create pivoted_length rankers. | |
std::unique_ptr< ranker > | make_ranker (const cpptoml::table &) |
Convenience method for creating a ranker using the factory. More... | |
template<class Ranker > | |
void | register_ranker () |
Registration method for rankers. More... | |
Indexes to create efficient representations of data.
std::shared_ptr<Index> meta::index::make_index | ( | const std::string & | config_file, |
Args &&... | args | ||
) |
Factory method for creating indexes.
inverted_index is a friend of the factory method used to create cached versions of it.
forward_index is a friend of the factory method used to create cached versions of it.
forward_index is a friend of the factory method used to create it.
Usage:
config_file | The path to the configuration file to be used to build the index |
args | any additional arguments to forward to the constructor for the chosen index type (usually none) |
std::shared_ptr<cached_index<Index, Cache> > meta::index::make_index | ( | const std::string & | config_file, |
Args &&... | args | ||
) |
Factory method for creating indexes that are cached.
inverted_index is a friend of the factory method used to create cached versions of it.
forward_index is a friend of the factory method used to create cached versions of it.
forward_index is a friend of the factory method used to create it.
Usage:
Other options will be forwarded to the constructor for the chosen cache class.
config_file | the path to the configuration file to be used to build the index. |
args | any additional arguments to forward to the constructor for the cache class chosen |
io::compressed_file_reader & meta::index::operator>> | ( | io::compressed_file_reader & | in, |
postings_data< PrimaryKey, SecondaryKey > & | pd | ||
) |
Reads semi-compressed postings data from a compressed file.
in | The stream to read from |
pd | The postings data object to write the stream info to |
|
inline |
Reads semi-compressed postings data from a compressed file.
in | The stream to read from |
pd | The postings data object to write the stream info to |
bool meta::index::operator== | ( | const postings_data< PrimaryKey, SecondaryKey > & | lhs, |
const postings_data< PrimaryKey, SecondaryKey > & | rhs | ||
) |
lhs | The first postings_data |
rhs | The postings_data to compare with |
std::unique_ptr< ranker > meta::index::make_ranker | ( | const cpptoml::table & | config | ) |
Convenience method for creating a ranker using the factory.
Factory method for creating a ranker.
This should be specialized if your given ranker requires special construction behavior (e.g., reading parameters).
void meta::index::register_ranker | ( | ) |
Registration method for rankers.
Clients should use this method to register any new rankers they write.