|
ModErn Text Analysis
META Enumerates Textual Applications
|
Creates document objects from individual files, each representing a single document. More...
#include <file_corpus.h>
Public Member Functions | |
| file_corpus (const std::string &prefix, const std::string &doc_list, std::string encoding) | |
| bool | has_next () const override |
| document | next () override |
| uint64_t | size () const override |
Public Member Functions inherited from meta::corpus::corpus | |
| corpus (std::string encoding) | |
| Constructs a new corpus with the given encoding. More... | |
| virtual | ~corpus ()=default |
| Destructor. | |
| const std::string & | encoding () const |
Private Attributes | |
| uint64_t | cur_ |
| the current document we are on | |
| std::string | prefix_ |
| the path to all the documents | |
| std::vector< std::pair< std::string, class_label > > | docs_ |
| contains doc class labels and paths | |
Additional Inherited Members | |
Static Public Member Functions inherited from meta::corpus::corpus | |
| static std::unique_ptr< corpus > | load (const std::string &config_file) |
Creates document objects from individual files, each representing a single document.
| meta::corpus::file_corpus::file_corpus | ( | const std::string & | prefix, |
| const std::string & | doc_list, | ||
| std::string | encoding | ||
| ) |
| prefix | The path to where the files are located |
| doc_list | A file containing the path to each document in the corpus |
| encoding | the encoding of the corpus |
|
overridevirtual |
Implements meta::corpus::corpus.
|
overridevirtual |
Implements meta::corpus::corpus.
|
overridevirtual |
Implements meta::corpus::corpus.
1.8.9.1