ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Private Attributes | List of all members
meta::corpus::gz_corpus Class Reference

Fills document objects with content line-by-line from gzip-compressed input files. More...

#include <gz_corpus.h>

Inheritance diagram for meta::corpus::gz_corpus:
meta::corpus::corpus

Public Member Functions

 gz_corpus (const std::string &file, std::string encoding)
 
bool has_next () const override
 
document next () override
 
uint64_t size () const override
 
- Public Member Functions inherited from meta::corpus::corpus
 corpus (std::string encoding)
 Constructs a new corpus with the given encoding. More...
 
virtual ~corpus ()=default
 Destructor.
 
const std::string & encoding () const
 

Private Attributes

doc_id cur_id_
 The current document we are on.
 
uint64_t num_lines_
 The number of lines in the file.
 
io::gzifstream corpus_stream_
 The stream for reading the corpus.
 
io::gzifstream class_stream_
 The stream to read the class labels.
 
io::gzifstream name_stream_
 The stream to read the document names.
 

Additional Inherited Members

- Static Public Member Functions inherited from meta::corpus::corpus
static std::unique_ptr< corpusload (const std::string &config_file)
 

Detailed Description

Fills document objects with content line-by-line from gzip-compressed input files.

Constructor & Destructor Documentation

meta::corpus::gz_corpus::gz_corpus ( const std::string &  file,
std::string  encoding 
)
Parameters
fileThe path to the compressed corpus file, where each line represents a document
encodingThe encoding for the file

Member Function Documentation

bool meta::corpus::gz_corpus::has_next ( ) const
overridevirtual
Returns
whether there is another document in this corpus

Implements meta::corpus::corpus.

document meta::corpus::gz_corpus::next ( )
overridevirtual
Returns
the next document from this corpus

Implements meta::corpus::corpus.

uint64_t meta::corpus::gz_corpus::size ( ) const
overridevirtual
Returns
the number of documents in this corpus

Implements meta::corpus::corpus.


The documentation for this class was generated from the following files: