ModErn Text Analysis
META Enumerates Textual Applications
vocabulary_map.h
Go to the documentation of this file.
1 
10 #ifndef META_VOCABULARY_MAP_H_
11 #define META_VOCABULARY_MAP_H_
12 
13 #include "io/mmap_file.h"
14 #include "util/disk_vector.h"
15 
16 namespace meta
17 {
18 namespace util
19 {
20 template <class>
21 class optional;
22 }
23 }
24 
25 namespace meta
26 {
27 namespace index
28 {
29 
37 {
38  private:
43 
49 
53  uint64_t block_size_;
54 
59  uint64_t leaf_end_pos_;
60 
66 
72  int compare(const std::string& term, const char* other) const;
73 
74  public:
85  vocabulary_map(const std::string& path, uint16_t block_size = 4096);
86 
90  vocabulary_map(vocabulary_map&&) = default;
91 
96 
101  util::optional<term_id> find(const std::string& term) const;
102 
110  std::string find_term(term_id t_id) const;
111 
115  uint64_t size() const;
116 };
117 }
118 }
119 
120 #endif
A class for representing optional values.
Definition: vocabulary_map.h:21
uint64_t initial_seek_pos_
The position of the first internal node that is not the root.
Definition: vocabulary_map.h:65
Memory maps a text file readonly.
Definition: mmap_file.h:24
int compare(const std::string &term, const char *other) const
Convenience wrapper for comparing the term with strings in the tree.
Definition: vocabulary_map.cpp:78
util::disk_vector< uint64_t > inverse_
Byte positions for each term in the leaves to allow for reverse lookup of a the string associated wit...
Definition: vocabulary_map.h:48
std::string find_term(term_id t_id) const
Finds the term associated with the given id.
Definition: vocabulary_map.cpp:83
uint64_t size() const
The number of terms in the map.
Definition: vocabulary_map.cpp:88
A read-only view of a B+-tree-like structure that stores the vocabulary for an index.
Definition: vocabulary_map.h:36
vocabulary_map & operator=(vocabulary_map &&)=default
Move assigns a vocabulary_map.
util::optional< term_id > find(const std::string &term) const
Finds the given term in the tree, if it exists.
Definition: vocabulary_map.cpp:31
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition: analyzer.h:24
io::mmap_file file_
The file containing the tree.
Definition: vocabulary_map.h:42
vocabulary_map(const std::string &path, uint16_t block_size=4096)
Creates a vocabulary map reading the file in the given path with the given block size.
Definition: vocabulary_map.cpp:15
uint64_t block_size_
The size of the nodes in the tree.
Definition: vocabulary_map.h:53
uint64_t leaf_end_pos_
The ending position of the leaf nodes.
Definition: vocabulary_map.h:59