|
ModErn Text Analysis
META Enumerates Textual Applications
|
Implementation of an inverted_index. More...
Public Member Functions | |
| impl (inverted_index *parent, const cpptoml::table &config) | |
| Constructs an inverted_index impl. More... | |
| void | tokenize_docs (corpus::corpus *docs, chunk_handler< inverted_index > &handler) |
| void | create_lexicon (const std::string &postings_file, const std::string &lexicon_file) |
| Creates the lexicon file (or "dictionary") which has pointers into the large postings file. More... | |
| void | compress (const std::string &filename, uint64_t num_unique_terms) |
| Compresses the large postings file. | |
Public Attributes | |
| std::unique_ptr< analyzers::analyzer > | analyzer_ |
| The analyzer used to tokenize documents. | |
| util::optional< util::disk_vector< uint64_t > > | term_bit_locations_ |
| PrimaryKey -> postings location. More... | |
| uint64_t | total_corpus_terms_ |
| the total number of term occurrences in the entire corpus | |
Private Attributes | |
| inverted_index * | idx_ |
| Pointer to the inverted_index this is an implementation of. | |
Implementation of an inverted_index.
| meta::index::inverted_index::impl::impl | ( | inverted_index * | parent, |
| const cpptoml::table & | config | ||
| ) |
Constructs an inverted_index impl.
| parent | The parent of this impl |
| config | The config group |
| void meta::index::inverted_index::impl::tokenize_docs | ( | corpus::corpus * | docs, |
| chunk_handler< inverted_index > & | handler | ||
| ) |
| docs | The documents to be tokenized |
| handler | The chunk handler for this index |
| void meta::index::inverted_index::impl::create_lexicon | ( | const std::string & | postings_file, |
| const std::string & | lexicon_file | ||
| ) |
Creates the lexicon file (or "dictionary") which has pointers into the large postings file.
| postings_file | |
| lexicon_file |
| util::optional<util::disk_vector<uint64_t> > meta::index::inverted_index::impl::term_bit_locations_ |
PrimaryKey -> postings location.
Each index corresponds to a PrimaryKey (uint64_t).
1.8.9.1