ModErn Text Analysis
META Enumerates Textual Applications
|
Implementation of an inverted_index. More...
Public Member Functions | |
impl (inverted_index *parent, const cpptoml::table &config) | |
Constructs an inverted_index impl. More... | |
void | tokenize_docs (corpus::corpus *docs, chunk_handler< inverted_index > &handler) |
void | create_lexicon (const std::string &postings_file, const std::string &lexicon_file) |
Creates the lexicon file (or "dictionary") which has pointers into the large postings file. More... | |
void | compress (const std::string &filename, uint64_t num_unique_terms) |
Compresses the large postings file. | |
Public Attributes | |
std::unique_ptr< analyzers::analyzer > | analyzer_ |
The analyzer used to tokenize documents. | |
util::optional< util::disk_vector< uint64_t > > | term_bit_locations_ |
PrimaryKey -> postings location. More... | |
uint64_t | total_corpus_terms_ |
the total number of term occurrences in the entire corpus | |
Private Attributes | |
inverted_index * | idx_ |
Pointer to the inverted_index this is an implementation of. | |
Implementation of an inverted_index.
meta::index::inverted_index::impl::impl | ( | inverted_index * | parent, |
const cpptoml::table & | config | ||
) |
Constructs an inverted_index impl.
parent | The parent of this impl |
config | The config group |
void meta::index::inverted_index::impl::tokenize_docs | ( | corpus::corpus * | docs, |
chunk_handler< inverted_index > & | handler | ||
) |
docs | The documents to be tokenized |
handler | The chunk handler for this index |
void meta::index::inverted_index::impl::create_lexicon | ( | const std::string & | postings_file, |
const std::string & | lexicon_file | ||
) |
Creates the lexicon file (or "dictionary") which has pointers into the large postings file.
postings_file | |
lexicon_file |
util::optional<util::disk_vector<uint64_t> > meta::index::inverted_index::impl::term_bit_locations_ |
PrimaryKey -> postings location.
Each index corresponds to a PrimaryKey (uint64_t).