ModErn Text Analysis
META Enumerates Textual Applications
score_data.h
Go to the documentation of this file.
1 
10 #ifndef META_SCORE_DATA_H_
11 #define META_SCORE_DATA_H_
12 
13 #include "meta.h"
14 
15 namespace meta
16 {
17 
18 namespace corpus
19 {
20 class document;
21 }
22 
23 namespace index
24 {
25 class inverted_index;
26 }
27 }
28 
29 namespace meta
30 {
31 namespace index
32 {
33 
39 struct score_data
40 {
41  // general info
42 
46  double avg_dl;
48  uint64_t num_docs;
50  uint64_t total_terms;
53 
54  // term-based info
55 
57  term_id t_id;
59  uint64_t query_term_count;
61  uint64_t doc_count;
64 
65  // document-based info
66 
68  doc_id d_id;
70  uint64_t doc_term_count;
72  uint64_t doc_size;
74  uint64_t doc_unique_terms;
75 
84  score_data(inverted_index& p_idx, double p_avg_dl, uint64_t p_num_docs,
85  uint64_t p_total_terms, const corpus::document& p_query)
86  : idx(p_idx), // gcc no non-const ref init from brace init list
87  avg_dl{p_avg_dl},
88  num_docs{p_num_docs},
89  total_terms{p_total_terms},
90  query(p_query) // gcc no non-const ref init from brace init list
91  {
92  /* nothing */
93  }
94 };
95 }
96 }
97 
98 #endif
uint64_t doc_unique_terms
number of unique terms in the doc
Definition: score_data.h:74
Contains top-level namespace documentation for the META toolkit.
The inverted_index class stores information on a corpus indexed by term_ids.
Definition: inverted_index.h:54
uint64_t num_docs
total number of documents
Definition: score_data.h:48
uint64_t doc_count
number of docs that t_id appears in
Definition: score_data.h:61
uint64_t corpus_term_count
number of times t_id appears in corpus
Definition: score_data.h:63
uint64_t total_terms
total number of terms in the index
Definition: score_data.h:50
score_data(inverted_index &p_idx, double p_avg_dl, uint64_t p_num_docs, uint64_t p_total_terms, const corpus::document &p_query)
Constructor to initialize most elements.
Definition: score_data.h:84
Represents an indexable document.
Definition: document.h:31
double avg_dl
average document length
Definition: score_data.h:46
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition: analyzer.h:24
doc_id d_id
document id
Definition: score_data.h:68
inverted_index & idx
index queries are running on
Definition: score_data.h:44
uint64_t query_term_count
query term count
Definition: score_data.h:59
A score_data object contains information needed to evaluate a ranking function.
Definition: score_data.h:39
const corpus::document & query
the current query
Definition: score_data.h:52
uint64_t doc_size
total number of terms in the doc
Definition: score_data.h:72
uint64_t doc_term_count
number of times the term appears in the current doc
Definition: score_data.h:70
term_id t_id
doc term id
Definition: score_data.h:57