ModErn Text Analysis
META Enumerates Textual Applications
|
The inverted_index class stores information on a corpus indexed by term_ids. More...
#include <inverted_index.h>
Classes | |
class | impl |
Implementation of an inverted_index. More... | |
class | inverted_index_exception |
Basic exception for inverted_index interactions. More... | |
Public Types | |
using | primary_key_type = term_id |
using | secondary_key_type = doc_id |
using | postings_data_type = postings_data< term_id, doc_id > |
using | index_pdata_type = postings_data< std::string, doc_id > |
using | exception = inverted_index_exception |
Public Member Functions | |
inverted_index (inverted_index &&) | |
Move constructs a inverted_index. | |
inverted_index & | operator= (inverted_index &&) |
Move assigns a inverted_index. | |
inverted_index (const inverted_index &)=delete | |
inverted_index may not be copy-constructed. | |
inverted_index & | operator= (const inverted_index &)=delete |
inverted_index may not be copy-assigned. | |
virtual | ~inverted_index () |
Default destructor. | |
void | tokenize (corpus::document &doc) |
virtual std::shared_ptr< postings_data_type > | search_primary (term_id t_id) const |
uint64_t | doc_freq (term_id t_id) const |
uint64_t | term_freq (term_id t_id, doc_id d_id) const |
uint64_t | total_corpus_terms () |
uint64_t | total_num_occurences (term_id t_id) const |
double | avg_doc_length () |
Public Member Functions inherited from meta::index::disk_index | |
virtual | ~disk_index ()=default |
Default destructor. | |
std::string | index_name () const |
uint64_t | num_docs () const |
std::string | doc_name (doc_id d_id) const |
std::string | doc_path (doc_id d_id) const |
std::vector< doc_id > | docs () const |
uint64_t | doc_size (doc_id d_id) const |
class_label | label (doc_id d_id) const |
label_id | lbl_id (doc_id d_id) const |
label_id | id (class_label label) const |
class_label | class_label_from_id (label_id l_id) const |
uint64_t | num_labels () const |
std::vector< class_label > | class_labels () const |
virtual uint64_t | unique_terms (doc_id d_id) const |
virtual uint64_t | unique_terms () const |
term_id | get_term_id (const std::string &term) |
std::string | term_text (term_id t_id) const |
disk_index (disk_index &&)=default | |
Move constructs a disk_index. | |
disk_index & | operator= (disk_index &&)=default |
Move assigns a disk_index. | |
Protected Member Functions | |
inverted_index (const cpptoml::table &config) | |
Protected Member Functions inherited from meta::index::disk_index | |
disk_index (const cpptoml::table &config, const std::string &name) | |
Constructor. More... | |
disk_index (const disk_index &)=delete | |
disk_index may not be copy-constructed. | |
disk_index & | operator= (const disk_index &)=delete |
disk_index may not be copy-assigned. | |
Private Member Functions | |
void | create_index (const std::string &config_file) |
This function initializes the disk index; it is called by the make_index factory function. More... | |
void | load_index () |
This function loads a disk index from its filesystem representation. | |
bool | valid () const |
Private Attributes | |
util::pimpl< impl > | inv_impl_ |
Implementation of this index. | |
Friends | |
template<class Index , class... Args> | |
std::shared_ptr< Index > | make_index (const std::string &, Args &&...) |
inverted_index is a friend of the factory method used to create it. | |
template<class Index , template< class, class > class Cache, class... Args> | |
std::shared_ptr< cached_index< Index, Cache > > | make_index (const std::string &config_file, Args &&...args) |
inverted_index is a friend of the factory method used to create cached versions of it. More... | |
Additional Inherited Members | |
Protected Attributes inherited from meta::index::disk_index | |
util::pimpl< disk_index_impl > | impl_ |
Implementation of this disk_index. | |
The inverted_index class stores information on a corpus indexed by term_ids.
Each term_id key is associated with a per-document frequency (by doc_id).
It is assumed all this information will not fit in memory, so a large postings file containing the (term_id -> each doc_id) information is saved on disk. A lexicon (or "dictionary") contains pointers into the large postings file. It is assumed that the lexicon will fit in memory.
|
protected |
config | The table that specifies how to create the index. |
void meta::index::inverted_index::tokenize | ( | corpus::document & | doc | ) |
doc | The document to tokenize |
|
virtual |
t_id | The term_id to search for |
uint64_t meta::index::inverted_index::doc_freq | ( | term_id | t_id | ) | const |
t_id | The term to search for |
uint64_t meta::index::inverted_index::term_freq | ( | term_id | t_id, |
doc_id | d_id | ||
) | const |
t_id | The term_id to search for |
d_id | The doc_id to search for |
uint64_t meta::index::inverted_index::total_corpus_terms | ( | ) |
uint64_t meta::index::inverted_index::total_num_occurences | ( | term_id | t_id | ) | const |
t_id | The specified term |
double meta::index::inverted_index::avg_doc_length | ( | ) |
|
private |
This function initializes the disk index; it is called by the make_index factory function.
config_file | The configuration to be used |
|
private |
|
friend |
inverted_index is a friend of the factory method used to create cached versions of it.
forward_index is a friend of the factory method used to create cached versions of it.
forward_index is a friend of the factory method used to create it.
Usage:
config_file | The path to the configuration file to be used to build the index |
args | any additional arguments to forward to the constructor for the chosen index type (usually none) |
forward_index is a friend of the factory method used to create cached versions of it.
forward_index is a friend of the factory method used to create it.
Usage:
Other options will be forwarded to the constructor for the chosen cache class.
config_file | the path to the configuration file to be used to build the index. |
args | any additional arguments to forward to the constructor for the cache class chosen |