ModErn Text Analysis
META Enumerates Textual Applications
|
Provides interface to with multiple corpus input formats. More...
#include <corpus.h>
Classes | |
class | corpus_exception |
Basic exception for corpus interactions. More... | |
Public Member Functions | |
corpus (std::string encoding) | |
Constructs a new corpus with the given encoding. More... | |
virtual bool | has_next () const =0 |
virtual document | next ()=0 |
virtual uint64_t | size () const =0 |
virtual | ~corpus ()=default |
Destructor. | |
const std::string & | encoding () const |
Static Public Member Functions | |
static std::unique_ptr< corpus > | load (const std::string &config_file) |
Private Attributes | |
std::string | encoding_ |
The type of encoding this document uses. | |
Provides interface to with multiple corpus input formats.
meta::corpus::corpus::corpus | ( | std::string | encoding | ) |
Constructs a new corpus with the given encoding.
encoding | The encoding to interpret the text as |
|
pure virtual |
Implemented in meta::corpus::line_corpus, meta::corpus::file_corpus, and meta::corpus::gz_corpus.
|
pure virtual |
Implemented in meta::corpus::line_corpus, meta::corpus::file_corpus, and meta::corpus::gz_corpus.
|
pure virtual |
Implemented in meta::corpus::line_corpus, meta::corpus::file_corpus, and meta::corpus::gz_corpus.
const std::string & meta::corpus::corpus::encoding | ( | ) | const |
|
static |
config_file | The cpptoml config file containing what type of corpus to load |