ModErn Text Analysis
META Enumerates Textual Applications
|
Fills document objects with content line-by-line from gzip-compressed input files. More...
#include <gz_corpus.h>
Public Member Functions | |
gz_corpus (const std::string &file, std::string encoding) | |
bool | has_next () const override |
document | next () override |
uint64_t | size () const override |
Public Member Functions inherited from meta::corpus::corpus | |
corpus (std::string encoding) | |
Constructs a new corpus with the given encoding. More... | |
virtual | ~corpus ()=default |
Destructor. | |
const std::string & | encoding () const |
Private Attributes | |
doc_id | cur_id_ |
The current document we are on. | |
uint64_t | num_lines_ |
The number of lines in the file. | |
io::gzifstream | corpus_stream_ |
The stream for reading the corpus. | |
io::gzifstream | class_stream_ |
The stream to read the class labels. | |
io::gzifstream | name_stream_ |
The stream to read the document names. | |
Additional Inherited Members | |
Static Public Member Functions inherited from meta::corpus::corpus | |
static std::unique_ptr< corpus > | load (const std::string &config_file) |
Fills document objects with content line-by-line from gzip-compressed input files.
meta::corpus::gz_corpus::gz_corpus | ( | const std::string & | file, |
std::string | encoding | ||
) |
file | The path to the compressed corpus file, where each line represents a document |
encoding | The encoding for the file |
|
overridevirtual |
Implements meta::corpus::corpus.
|
overridevirtual |
Implements meta::corpus::corpus.
|
overridevirtual |
Implements meta::corpus::corpus.