ModErn Text Analysis
META Enumerates Textual Applications
Classes | Public Types | Public Member Functions | Protected Member Functions | Private Member Functions | Private Attributes | Friends | List of all members
meta::index::inverted_index Class Reference

The inverted_index class stores information on a corpus indexed by term_ids. More...

#include <inverted_index.h>

Inheritance diagram for meta::index::inverted_index:
meta::index::disk_index

Classes

class  impl
 Implementation of an inverted_index. More...
 
class  inverted_index_exception
 Basic exception for inverted_index interactions. More...
 

Public Types

using primary_key_type = term_id
 
using secondary_key_type = doc_id
 
using postings_data_type = postings_data< term_id, doc_id >
 
using index_pdata_type = postings_data< std::string, doc_id >
 
using exception = inverted_index_exception
 

Public Member Functions

 inverted_index (inverted_index &&)
 Move constructs a inverted_index.
 
inverted_indexoperator= (inverted_index &&)
 Move assigns a inverted_index.
 
 inverted_index (const inverted_index &)=delete
 inverted_index may not be copy-constructed.
 
inverted_indexoperator= (const inverted_index &)=delete
 inverted_index may not be copy-assigned.
 
virtual ~inverted_index ()
 Default destructor.
 
void tokenize (corpus::document &doc)
 
virtual std::shared_ptr< postings_data_typesearch_primary (term_id t_id) const
 
uint64_t doc_freq (term_id t_id) const
 
uint64_t term_freq (term_id t_id, doc_id d_id) const
 
uint64_t total_corpus_terms ()
 
uint64_t total_num_occurences (term_id t_id) const
 
double avg_doc_length ()
 
- Public Member Functions inherited from meta::index::disk_index
virtual ~disk_index ()=default
 Default destructor.
 
std::string index_name () const
 
uint64_t num_docs () const
 
std::string doc_name (doc_id d_id) const
 
std::string doc_path (doc_id d_id) const
 
std::vector< doc_id > docs () const
 
uint64_t doc_size (doc_id d_id) const
 
class_label label (doc_id d_id) const
 
label_id lbl_id (doc_id d_id) const
 
label_id id (class_label label) const
 
class_label class_label_from_id (label_id l_id) const
 
uint64_t num_labels () const
 
std::vector< class_label > class_labels () const
 
virtual uint64_t unique_terms (doc_id d_id) const
 
virtual uint64_t unique_terms () const
 
term_id get_term_id (const std::string &term)
 
std::string term_text (term_id t_id) const
 
 disk_index (disk_index &&)=default
 Move constructs a disk_index.
 
disk_indexoperator= (disk_index &&)=default
 Move assigns a disk_index.
 

Protected Member Functions

 inverted_index (const cpptoml::table &config)
 
- Protected Member Functions inherited from meta::index::disk_index
 disk_index (const cpptoml::table &config, const std::string &name)
 Constructor. More...
 
 disk_index (const disk_index &)=delete
 disk_index may not be copy-constructed.
 
disk_indexoperator= (const disk_index &)=delete
 disk_index may not be copy-assigned.
 

Private Member Functions

void create_index (const std::string &config_file)
 This function initializes the disk index; it is called by the make_index factory function. More...
 
void load_index ()
 This function loads a disk index from its filesystem representation.
 
bool valid () const
 

Private Attributes

util::pimpl< implinv_impl_
 Implementation of this index.
 

Friends

template<class Index , class... Args>
std::shared_ptr< Index > make_index (const std::string &, Args &&...)
 inverted_index is a friend of the factory method used to create it.
 
template<class Index , template< class, class > class Cache, class... Args>
std::shared_ptr< cached_index< Index, Cache > > make_index (const std::string &config_file, Args &&...args)
 inverted_index is a friend of the factory method used to create cached versions of it. More...
 

Additional Inherited Members

- Protected Attributes inherited from meta::index::disk_index
util::pimpl< disk_index_implimpl_
 Implementation of this disk_index.
 

Detailed Description

The inverted_index class stores information on a corpus indexed by term_ids.

Each term_id key is associated with a per-document frequency (by doc_id).

It is assumed all this information will not fit in memory, so a large postings file containing the (term_id -> each doc_id) information is saved on disk. A lexicon (or "dictionary") contains pointers into the large postings file. It is assumed that the lexicon will fit in memory.

Constructor & Destructor Documentation

meta::index::inverted_index::inverted_index ( const cpptoml::table &  config)
protected
Parameters
configThe table that specifies how to create the index.

Member Function Documentation

void meta::index::inverted_index::tokenize ( corpus::document doc)
Parameters
docThe document to tokenize
auto meta::index::inverted_index::search_primary ( term_id  t_id) const
virtual
Parameters
t_idThe term_id to search for
Returns
the postings data for a given term_id
uint64_t meta::index::inverted_index::doc_freq ( term_id  t_id) const
Parameters
t_idThe term to search for
Returns
the document frequency of a term (number of documents it appears in)
uint64_t meta::index::inverted_index::term_freq ( term_id  t_id,
doc_id  d_id 
) const
Parameters
t_idThe term_id to search for
d_idThe doc_id to search for
uint64_t meta::index::inverted_index::total_corpus_terms ( )
Returns
the total number of terms in this index
uint64_t meta::index::inverted_index::total_num_occurences ( term_id  t_id) const
Parameters
t_idThe specified term
Returns
the number of times the given term appears in the corpus
double meta::index::inverted_index::avg_doc_length ( )
Returns
the average document length in this index
void meta::index::inverted_index::create_index ( const std::string &  config_file)
private

This function initializes the disk index; it is called by the make_index factory function.

Parameters
config_fileThe configuration to be used
bool meta::index::inverted_index::valid ( ) const
private
Returns
whether this index contains all necessary files

Friends And Related Function Documentation

template<class Index , template< class, class > class Cache, class... Args>
std::shared_ptr<cached_index<Index, Cache> > make_index ( const std::string &  config_file,
Args &&...  args 
)
friend

inverted_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create it.

Usage:

auto idx = index::make_index<derived_index_type>(config_path);
Parameters
config_fileThe path to the configuration file to be used to build the index
argsany additional arguments to forward to the constructor for the chosen index type (usually none)
Returns
A properly initialized index

forward_index is a friend of the factory method used to create cached versions of it.

forward_index is a friend of the factory method used to create it.

Usage:

auto idx =
index::make_index<dervied_index_type,
cache_type>(config_path, other, options);

Other options will be forwarded to the constructor for the chosen cache class.

Parameters
config_filethe path to the configuration file to be used to build the index.
argsany additional arguments to forward to the constructor for the cache class chosen
Returns
A properly initialized, and automatically cached, index.

The documentation for this class was generated from the following files: