ModErn Text Analysis
META Enumerates Textual Applications
Classes | Public Member Functions | Static Public Member Functions | Private Attributes | List of all members
meta::corpus::corpus Class Referenceabstract

Provides interface to with multiple corpus input formats. More...

#include <corpus.h>

Inheritance diagram for meta::corpus::corpus:
meta::corpus::file_corpus meta::corpus::gz_corpus meta::corpus::line_corpus

Classes

class  corpus_exception
 Basic exception for corpus interactions. More...
 

Public Member Functions

 corpus (std::string encoding)
 Constructs a new corpus with the given encoding. More...
 
virtual bool has_next () const =0
 
virtual document next ()=0
 
virtual uint64_t size () const =0
 
virtual ~corpus ()=default
 Destructor.
 
const std::string & encoding () const
 

Static Public Member Functions

static std::unique_ptr< corpusload (const std::string &config_file)
 

Private Attributes

std::string encoding_
 The type of encoding this document uses.
 

Detailed Description

Provides interface to with multiple corpus input formats.

Constructor & Destructor Documentation

meta::corpus::corpus::corpus ( std::string  encoding)

Constructs a new corpus with the given encoding.

Parameters
encodingThe encoding to interpret the text as

Member Function Documentation

virtual bool meta::corpus::corpus::has_next ( ) const
pure virtual
Returns
whether there is another document in this corpus

Implemented in meta::corpus::line_corpus, meta::corpus::file_corpus, and meta::corpus::gz_corpus.

virtual document meta::corpus::corpus::next ( )
pure virtual
Returns
the next document from this corpus

Implemented in meta::corpus::line_corpus, meta::corpus::file_corpus, and meta::corpus::gz_corpus.

virtual uint64_t meta::corpus::corpus::size ( ) const
pure virtual
Returns
the number of documents in this corpus

Implemented in meta::corpus::line_corpus, meta::corpus::file_corpus, and meta::corpus::gz_corpus.

const std::string & meta::corpus::corpus::encoding ( ) const
Returns
the encoding for the corpus.
std::unique_ptr< corpus > meta::corpus::corpus::load ( const std::string &  config_file)
static
Parameters
config_fileThe cpptoml config file containing what type of corpus to load
Returns
a unique_ptr to the corpus object containing the documents

The documentation for this class was generated from the following files: