|
ModErn Text Analysis
META Enumerates Textual Applications
|
The inverted_index class stores information on a corpus indexed by term_ids. More...
#include <inverted_index.h>
Classes | |
| class | impl |
| Implementation of an inverted_index. More... | |
| class | inverted_index_exception |
| Basic exception for inverted_index interactions. More... | |
Public Types | |
| using | primary_key_type = term_id |
| using | secondary_key_type = doc_id |
| using | postings_data_type = postings_data< term_id, doc_id > |
| using | index_pdata_type = postings_data< std::string, doc_id > |
| using | exception = inverted_index_exception |
Public Member Functions | |
| inverted_index (inverted_index &&) | |
| Move constructs a inverted_index. | |
| inverted_index & | operator= (inverted_index &&) |
| Move assigns a inverted_index. | |
| inverted_index (const inverted_index &)=delete | |
| inverted_index may not be copy-constructed. | |
| inverted_index & | operator= (const inverted_index &)=delete |
| inverted_index may not be copy-assigned. | |
| virtual | ~inverted_index () |
| Default destructor. | |
| void | tokenize (corpus::document &doc) |
| virtual std::shared_ptr< postings_data_type > | search_primary (term_id t_id) const |
| uint64_t | doc_freq (term_id t_id) const |
| uint64_t | term_freq (term_id t_id, doc_id d_id) const |
| uint64_t | total_corpus_terms () |
| uint64_t | total_num_occurences (term_id t_id) const |
| double | avg_doc_length () |
Public Member Functions inherited from meta::index::disk_index | |
| virtual | ~disk_index ()=default |
| Default destructor. | |
| std::string | index_name () const |
| uint64_t | num_docs () const |
| std::string | doc_name (doc_id d_id) const |
| std::string | doc_path (doc_id d_id) const |
| std::vector< doc_id > | docs () const |
| uint64_t | doc_size (doc_id d_id) const |
| class_label | label (doc_id d_id) const |
| label_id | lbl_id (doc_id d_id) const |
| label_id | id (class_label label) const |
| class_label | class_label_from_id (label_id l_id) const |
| uint64_t | num_labels () const |
| std::vector< class_label > | class_labels () const |
| virtual uint64_t | unique_terms (doc_id d_id) const |
| virtual uint64_t | unique_terms () const |
| term_id | get_term_id (const std::string &term) |
| std::string | term_text (term_id t_id) const |
| disk_index (disk_index &&)=default | |
| Move constructs a disk_index. | |
| disk_index & | operator= (disk_index &&)=default |
| Move assigns a disk_index. | |
Protected Member Functions | |
| inverted_index (const cpptoml::table &config) | |
Protected Member Functions inherited from meta::index::disk_index | |
| disk_index (const cpptoml::table &config, const std::string &name) | |
| Constructor. More... | |
| disk_index (const disk_index &)=delete | |
| disk_index may not be copy-constructed. | |
| disk_index & | operator= (const disk_index &)=delete |
| disk_index may not be copy-assigned. | |
Private Member Functions | |
| void | create_index (const std::string &config_file) |
| This function initializes the disk index; it is called by the make_index factory function. More... | |
| void | load_index () |
| This function loads a disk index from its filesystem representation. | |
| bool | valid () const |
Private Attributes | |
| util::pimpl< impl > | inv_impl_ |
| Implementation of this index. | |
Friends | |
| template<class Index , class... Args> | |
| std::shared_ptr< Index > | make_index (const std::string &, Args &&...) |
| inverted_index is a friend of the factory method used to create it. | |
| template<class Index , template< class, class > class Cache, class... Args> | |
| std::shared_ptr< cached_index< Index, Cache > > | make_index (const std::string &config_file, Args &&...args) |
| inverted_index is a friend of the factory method used to create cached versions of it. More... | |
Additional Inherited Members | |
Protected Attributes inherited from meta::index::disk_index | |
| util::pimpl< disk_index_impl > | impl_ |
| Implementation of this disk_index. | |
The inverted_index class stores information on a corpus indexed by term_ids.
Each term_id key is associated with a per-document frequency (by doc_id).
It is assumed all this information will not fit in memory, so a large postings file containing the (term_id -> each doc_id) information is saved on disk. A lexicon (or "dictionary") contains pointers into the large postings file. It is assumed that the lexicon will fit in memory.
|
protected |
| config | The table that specifies how to create the index. |
| void meta::index::inverted_index::tokenize | ( | corpus::document & | doc | ) |
| doc | The document to tokenize |
|
virtual |
| t_id | The term_id to search for |
| uint64_t meta::index::inverted_index::doc_freq | ( | term_id | t_id | ) | const |
| t_id | The term to search for |
| uint64_t meta::index::inverted_index::term_freq | ( | term_id | t_id, |
| doc_id | d_id | ||
| ) | const |
| t_id | The term_id to search for |
| d_id | The doc_id to search for |
| uint64_t meta::index::inverted_index::total_corpus_terms | ( | ) |
| uint64_t meta::index::inverted_index::total_num_occurences | ( | term_id | t_id | ) | const |
| t_id | The specified term |
| double meta::index::inverted_index::avg_doc_length | ( | ) |
|
private |
This function initializes the disk index; it is called by the make_index factory function.
| config_file | The configuration to be used |
|
private |
|
friend |
inverted_index is a friend of the factory method used to create cached versions of it.
forward_index is a friend of the factory method used to create cached versions of it.
forward_index is a friend of the factory method used to create it.
Usage:
| config_file | The path to the configuration file to be used to build the index |
| args | any additional arguments to forward to the constructor for the chosen index type (usually none) |
forward_index is a friend of the factory method used to create cached versions of it.
forward_index is a friend of the factory method used to create it.
Usage:
Other options will be forwarded to the constructor for the chosen cache class.
| config_file | the path to the configuration file to be used to build the index. |
| args | any additional arguments to forward to the constructor for the cache class chosen |
1.8.9.1