ModErn Text Analysis
META Enumerates Textual Applications
vocabulary_map_writer.h
Go to the documentation of this file.
1 
10 #ifndef META_VOCABULARY_MAP_WRITER_H_
11 #define META_VOCABULARY_MAP_WRITER_H_
12 
13 #include <cstdint>
14 #include <fstream>
15 #include <stdexcept>
16 #include <string>
17 
18 namespace meta
19 {
20 namespace index
21 {
22 
58 {
59  public:
69  vocabulary_map_writer(const std::string& path, uint16_t block_size = 4096);
70 
78 
87  void insert(const std::string& term);
88 
92  class vocabulary_map_writer_exception : public std::runtime_error
93  {
94  using std::runtime_error::runtime_error;
95  };
96 
97  private:
101  void write_padding();
102 
106  void flush();
107 
109  std::ofstream file_;
110 
116  uint64_t file_write_pos_;
117 
119  std::ofstream inverse_file_;
120 
122  std::string path_;
123 
125  uint16_t block_size_;
126 
128  uint64_t num_terms_;
129 
132 
134  uint64_t written_nodes_;
135 };
136 }
137 }
138 #endif
void insert(const std::string &term)
Inserts this term into the map.
Definition: vocabulary_map_writer.cpp:33
An exception that can be thrown during the building of the tree.
Definition: vocabulary_map_writer.h:92
std::ofstream inverse_file_
The file containing the reverse mapping.
Definition: vocabulary_map_writer.h:119
uint64_t written_nodes_
Number of written nodes to be "merged" when writing the next level.
Definition: vocabulary_map_writer.h:134
std::string path_
The path to the tree file.
Definition: vocabulary_map_writer.h:122
void flush()
Flushes a node to disk after writing the padding bytes.
Definition: vocabulary_map_writer.cpp:73
~vocabulary_map_writer()
The destructor for a vocabulary_map_writer flushes the last leaf node and builds the internal structu...
Definition: vocabulary_map_writer.cpp:80
vocabulary_map_writer(const std::string &path, uint16_t block_size=4096)
Creates a writer for a tree at the given path and block_size.
Definition: vocabulary_map_writer.cpp:17
void write_padding()
Writes null bytes to fill up the current block.
Definition: vocabulary_map_writer.cpp:61
uint16_t block_size_
The block size of every node in the tree, in bytes.
Definition: vocabulary_map_writer.h:125
uint64_t file_write_pos_
The current write position in the forward mapping tree file.
Definition: vocabulary_map_writer.h:116
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition: analyzer.h:24
std::ofstream file_
The file containing the forward mapping tree.
Definition: vocabulary_map_writer.h:109
uint16_t remaining_block_space_
The remaining space in the block currently being written.
Definition: vocabulary_map_writer.h:131
uint64_t num_terms_
The total number of terms inserted so far.
Definition: vocabulary_map_writer.h:128
A class that writes the B+-tree-like data structure used for storing the term id mapping in an index...
Definition: vocabulary_map_writer.h:57