ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Public Attributes | Private Attributes | List of all members
meta::index::inverted_index::impl Class Reference

Implementation of an inverted_index. More...

Public Member Functions

 impl (inverted_index *parent, const cpptoml::table &config)
 Constructs an inverted_index impl. More...
 
void tokenize_docs (corpus::corpus *docs, chunk_handler< inverted_index > &handler)
 
void create_lexicon (const std::string &postings_file, const std::string &lexicon_file)
 Creates the lexicon file (or "dictionary") which has pointers into the large postings file. More...
 
void compress (const std::string &filename, uint64_t num_unique_terms)
 Compresses the large postings file.
 

Public Attributes

std::unique_ptr< analyzers::analyzeranalyzer_
 The analyzer used to tokenize documents.
 
util::optional< util::disk_vector< uint64_t > > term_bit_locations_
 PrimaryKey -> postings location. More...
 
uint64_t total_corpus_terms_
 the total number of term occurrences in the entire corpus
 

Private Attributes

inverted_indexidx_
 Pointer to the inverted_index this is an implementation of.
 

Detailed Description

Implementation of an inverted_index.

Constructor & Destructor Documentation

meta::index::inverted_index::impl::impl ( inverted_index parent,
const cpptoml::table &  config 
)

Constructs an inverted_index impl.

Parameters
parentThe parent of this impl
configThe config group

Member Function Documentation

void meta::index::inverted_index::impl::tokenize_docs ( corpus::corpus docs,
chunk_handler< inverted_index > &  handler 
)
Parameters
docsThe documents to be tokenized
handlerThe chunk handler for this index
Returns
the number of chunks created
void meta::index::inverted_index::impl::create_lexicon ( const std::string &  postings_file,
const std::string &  lexicon_file 
)

Creates the lexicon file (or "dictionary") which has pointers into the large postings file.

Parameters
postings_file
lexicon_file

Member Data Documentation

util::optional<util::disk_vector<uint64_t> > meta::index::inverted_index::impl::term_bit_locations_

PrimaryKey -> postings location.

Each index corresponds to a PrimaryKey (uint64_t).


The documentation for this class was generated from the following file: