The implementation of a disk_index.
More...
#include <disk_index_impl.h>
|
static const std::vector< const char * > | files |
| Filenames used in the index. More...
|
|
The implementation of a disk_index.
void meta::index::disk_index::disk_index_impl::initialize_metadata |
( |
uint64_t |
num_docs = 0 | ) |
|
Initializes the following metadata maps: doc_sizes_, labels_, unique_terms_.
- Parameters
-
num_docs | The number of documents stored in the index |
void meta::index::disk_index::disk_index_impl::load_doc_sizes |
( |
uint64_t |
num_docs = 0 | ) |
|
Loads the doc sizes.
- Parameters
-
num_docs | The number of documents stored in the index |
void meta::index::disk_index::disk_index_impl::load_labels |
( |
uint64_t |
num_docs = 0 | ) |
|
Loads the doc labels.
- Parameters
-
num_docs | The number of documents stored in the index |
void meta::index::disk_index::disk_index_impl::load_unique_terms |
( |
uint64_t |
num_docs = 0 | ) |
|
Loads the unique terms per document.
- Parameters
-
num_docs | The number of documents stored in the index |
string_list_writer meta::index::disk_index::disk_index_impl::make_doc_id_writer |
( |
uint64_t |
num_docs | ) |
const |
void meta::index::disk_index::disk_index_impl::set_label |
( |
doc_id |
id, |
|
|
const class_label & |
label |
|
) |
| |
Sets the label for a document.
- Parameters
-
id | The document id |
label | The new label |
void meta::index::disk_index::disk_index_impl::set_length |
( |
doc_id |
id, |
|
|
uint64_t |
length |
|
) |
| |
Sets the size of a document.
- Parameters
-
id | The document id |
length | The number of terms that will appear in the document |
void meta::index::disk_index::disk_index_impl::set_unique_terms |
( |
doc_id |
id, |
|
|
uint64_t |
terms |
|
) |
| |
Sets the number of unique terms for a document.
- Parameters
-
id | The document id |
terms | The number of unique terms that will appear in the document |
const io::mmap_file & meta::index::disk_index::disk_index_impl::postings |
( |
| ) |
const |
- Returns
- the mmap file for the postings.
uint64_t meta::index::disk_index::disk_index_impl::total_unique_terms |
( |
| ) |
const |
- Returns
- the total number of unique terms in the index.
label_id meta::index::disk_index::disk_index_impl::doc_label_id |
( |
doc_id |
id | ) |
const |
- Returns
- the label id for a given document.
- Parameters
-
std::vector< class_label > meta::index::disk_index::disk_index_impl::class_labels |
( |
| ) |
const |
- Returns
- the possible class labels for this index
label_id meta::index::disk_index::disk_index_impl::get_label_id |
( |
const class_label & |
lbl | ) |
|
|
private |
- Parameters
-
lbl | the string class label to find the id for |
- Returns
- the label_id of a class_label, creating a new one if necessary
const std::vector< const char * > meta::index::disk_index::disk_index_impl::files |
|
static |
Initial value:= {"/docids.mapping", "/docids.mapping_index", "/docsizes.counts",
"/docs.labels", "/docs.uniqueterms", "/labelids.mapping",
"/postings.index", "/termids.mapping", "/termids.mapping.inverse"}
Filenames used in the index.
doc_id -> document path mapping.
Each index corresponds to a doc_id (uint64_t).
doc_id -> document length mapping.
Each index corresponds to a doc_id (uint64_t).
Maps which class a document belongs to (if any).
Each index corresponds to a doc_id (uint64_t).
Holds how many unique terms there are per-document.
This is sort of like an inverse IDF. For a forward_index, this field is certainly redundant, though it can save querying the postings file. Each index corresponds to a doc_id (uint64_t).
A pointer to a memory-mapped postings file.
It is a pointer because we want to delay the initialization of it until the postings file is created in some cases.
The documentation for this class was generated from the following files: