ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Private Attributes | List of all members
meta::corpus::line_corpus Class Reference

Fills document objects with content line-by-line from an input file. More...

#include <line_corpus.h>

Inheritance diagram for meta::corpus::line_corpus:
meta::corpus::corpus

Public Member Functions

 line_corpus (const std::string &file, std::string encoding, uint64_t num_lines=0)
 
bool has_next () const override
 
document next () override
 
uint64_t size () const override
 
- Public Member Functions inherited from meta::corpus::corpus
 corpus (std::string encoding)
 Constructs a new corpus with the given encoding. More...
 
virtual ~corpus ()=default
 Destructor.
 
const std::string & encoding () const
 

Private Attributes

doc_id cur_id_
 The current document we are on.
 
uint64_t num_lines_
 The number of lines in the file.
 
io::parser parser_
 Parser to read the corpus file.
 
std::unique_ptr< io::parserclass_parser_
 Parser to read the class labels.
 
std::unique_ptr< io::parsername_parser_
 Parser to read the document names.
 

Additional Inherited Members

- Static Public Member Functions inherited from meta::corpus::corpus
static std::unique_ptr< corpusload (const std::string &config_file)
 

Detailed Description

Fills document objects with content line-by-line from an input file.

It is up to the tokenizer used to be able to correctly parse the document content into labels and features.

Constructor & Destructor Documentation

meta::corpus::line_corpus::line_corpus ( const std::string &  file,
std::string  encoding,
uint64_t  num_lines = 0 
)
Parameters
fileThe path to the corpus file, where each line represents a document
encodingThe encoding for the file
num_linesThe number of lines in the corpus file if known beforehand. If unknown, leave out this parameter and the value will be calculated in the constructor.

Member Function Documentation

bool meta::corpus::line_corpus::has_next ( ) const
overridevirtual
Returns
whether there is another document in this corpus

Implements meta::corpus::corpus.

document meta::corpus::line_corpus::next ( )
overridevirtual
Returns
the next document from this corpus

Implements meta::corpus::corpus.

uint64_t meta::corpus::line_corpus::size ( ) const
overridevirtual
Returns
the number of documents in this corpus

Implements meta::corpus::corpus.


The documentation for this class was generated from the following files: