ModErn Text Analysis
META Enumerates Textual Applications
Classes | Typedefs | Enumerations | Functions
meta::index Namespace Reference

Indexes to create efficient representations of data. More...

Classes

class  absolute_discount
 Implements the absolute discounting smoothing method. More...
 
class  cached_index
 Decorator class for wrapping indexes with a cache. More...
 
class  chunk
 Represents a portion of a disk_index's postings file. More...
 
class  chunk_handler
 An interface for writing and merging inverted chunks of postings_data for a disk_index. More...
 
class  dirichlet_prior
 Implements Bayesian smoothing with a Dirichlet prior. More...
 
class  disk_index
 Holds generic data structures and functions that inverted_index and forward_index both use. More...
 
class  forward_index
 The forward_index stores information on a corpus by doc_ids. More...
 
class  inverted_index
 The inverted_index class stores information on a corpus indexed by term_ids. More...
 
class  ir_eval
 Evaluates lists of ranked documents returned from a search engine; can give stats per-query (e.g. More...
 
class  jelinek_mercer
 Implements the Jelinek-Mercer smoothed ranking model. More...
 
class  language_model_ranker
 Scores documents according to one of three different smoothed language model scoring methods described in "A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval" by Zhai and Lafferty, 2001. More...
 
class  okapi_bm25
 The Okapi BM25 scoring function. More...
 
class  pivoted_length
 The pivoted document length normalization ranking function. More...
 
class  postings_data
 A class to represent the per-PrimaryKey data in an index's postings file. More...
 
class  ranker
 A ranker scores a query against all the documents in an inverted index, returning a list of documents sorted by relevance. More...
 
class  ranker_factory
 Factory that is responsible for creating rankers from configuration files. More...
 
struct  score_data
 A score_data object contains information needed to evaluate a ranking function. More...
 
class  string_list
 A class designed for reading large lists of strings that have been persisted to disk. More...
 
class  string_list_writer
 A class for writing large lists of strings to disk with an associated index file for fast random access. More...
 
class  vocabulary_map
 A read-only view of a B+-tree-like structure that stores the vocabulary for an index. More...
 
class  vocabulary_map_writer
 A class that writes the B+-tree-like data structure used for storing the term id mapping in an index. More...
 

Typedefs

using dblru_inverted_index = cached_index< inverted_index, caching::default_dblru_cache >
 Inverted index using default DBLRU cache.
 
using splay_inverted_index = cached_index< inverted_index, caching::splay_cache >
 Inverted index using splay cache.
 
using memory_forward_index = cached_index< forward_index, caching::no_evict_cache >
 In-memory forward index.
 
using dblru_forward_index = cached_index< forward_index, caching::default_dblru_cache >
 Forward index using default DBLRU cache.
 
using splay_forward_index = cached_index< forward_index, caching::splay_cache >
 Forward index using splay cache.
 

Enumerations

enum  index_file {
  DOC_IDS_MAPPING = 0, DOC_IDS_MAPPING_INDEX, DOC_SIZES, DOC_LABELS,
  DOC_UNIQUETERMS, LABEL_IDS_MAPPING, POSTINGS, TERM_IDS_MAPPING,
  TERM_IDS_MAPPING_INVERSE
}
 Collection of all the files that comprise a disk_index.
 

Functions

template<class Index , class... Args>
std::shared_ptr< Index > make_index (const std::string &config_file, Args &&...args)
 Factory method for creating indexes. More...
 
template<class Index , template< class, class > class Cache, class... Args>
std::shared_ptr< cached_index< Index, Cache > > make_index (const std::string &config_file, Args &&...args)
 Factory method for creating indexes that are cached. More...
 
template<class PrimaryKey , class SecondaryKey >
io::compressed_file_readeroperator>> (io::compressed_file_reader &in, postings_data< PrimaryKey, SecondaryKey > &pd)
 Reads semi-compressed postings data from a compressed file. More...
 
template<>
io::compressed_file_readeroperator>> (io::compressed_file_reader &in, postings_data< std::string, doc_id > &pd)
 Reads semi-compressed postings data from a compressed file. More...
 
template<class PrimaryKey , class SecondaryKey >
bool operator== (const postings_data< PrimaryKey, SecondaryKey > &lhs, const postings_data< PrimaryKey, SecondaryKey > &rhs)
 
template<>
std::unique_ptr< rankermake_ranker< absolute_discount > (const cpptoml::table &)
 Specialization of the factory method used to create absolute_discount rankers.
 
template<>
std::unique_ptr< rankermake_ranker< dirichlet_prior > (const cpptoml::table &)
 Specialization of the factory method used to create dirichlet_prior rankers.
 
template<>
std::unique_ptr< rankermake_ranker< jelinek_mercer > (const cpptoml::table &)
 Specialization of the factory method used to create jelinek_mercer rankers.
 
template<>
std::unique_ptr< rankermake_ranker< okapi_bm25 > (const cpptoml::table &)
 Specialization of the factory method used to create okapi_bm25 rankers.
 
template<>
std::unique_ptr< rankermake_ranker< pivoted_length > (const cpptoml::table &)
 Specialization of the factory method used to create pivoted_length rankers.
 
std::unique_ptr< rankermake_ranker (const cpptoml::table &)
 Convenience method for creating a ranker using the factory. More...
 
template<class Ranker >
void register_ranker ()
 Registration method for rankers. More...
 

Detailed Description

Indexes to create efficient representations of data.

Function Documentation

template<class Index , class... Args>
std::shared_ptr<Index> meta::index::make_index ( const std::string &  config_file,
Args &&...  args 
)

Factory method for creating indexes.

inverted_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create it.

Usage:

auto idx = index::make_index<derived_index_type>(config_path);
Parameters
config_fileThe path to the configuration file to be used to build the index
argsany additional arguments to forward to the constructor for the chosen index type (usually none)
Returns
A properly initialized index
template<class Index , template< class, class > class Cache, class... Args>
std::shared_ptr<cached_index<Index, Cache> > meta::index::make_index ( const std::string &  config_file,
Args &&...  args 
)

Factory method for creating indexes that are cached.

inverted_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create it.

Usage:

auto idx =
index::make_index<dervied_index_type,
cache_type>(config_path, other, options);

Other options will be forwarded to the constructor for the chosen cache class.

Parameters
config_filethe path to the configuration file to be used to build the index.
argsany additional arguments to forward to the constructor for the cache class chosen
Returns
A properly initialized, and automatically cached, index.
template<class PrimaryKey , class SecondaryKey >
io::compressed_file_reader & meta::index::operator>> ( io::compressed_file_reader in,
postings_data< PrimaryKey, SecondaryKey > &  pd 
)

Reads semi-compressed postings data from a compressed file.

Parameters
inThe stream to read from
pdThe postings data object to write the stream info to
Returns
the input stream
template<>
io::compressed_file_reader& meta::index::operator>> ( io::compressed_file_reader in,
postings_data< std::string, doc_id > &  pd 
)
inline

Reads semi-compressed postings data from a compressed file.

Parameters
inThe stream to read from
pdThe postings data object to write the stream info to
Returns
the input stream
template<class PrimaryKey , class SecondaryKey >
bool meta::index::operator== ( const postings_data< PrimaryKey, SecondaryKey > &  lhs,
const postings_data< PrimaryKey, SecondaryKey > &  rhs 
)
Parameters
lhsThe first postings_data
rhsThe postings_data to compare with
Returns
whether this postings_data has the same PrimaryKey as the paramter
std::unique_ptr< ranker > meta::index::make_ranker ( const cpptoml::table &  config)

Convenience method for creating a ranker using the factory.

Factory method for creating a ranker.

This should be specialized if your given ranker requires special construction behavior (e.g., reading parameters).

template<class Ranker >
void meta::index::register_ranker ( )

Registration method for rankers.

Clients should use this method to register any new rankers they write.