|
ModErn Text Analysis
META Enumerates Textual Applications
|
Represents an indexable document. More...
#include <document.h>
Public Member Functions | |
| document (const std::string &path="[NONE]", doc_id d_id=doc_id{0}, const class_label &label=class_label{"[NONE]"}) | |
| Constructor. More... | |
| void | increment (const std::string &term, double amount) |
| Increment the count of the specified transition. More... | |
| std::string | path () const |
| const class_label & | label () const |
| std::string | name () const |
| void | name (const std::string &n) |
| uint64_t | length () const |
| double | count (const std::string &term) const |
| Get the number of occurrences for a particular term. More... | |
| const std::unordered_map< std::string, double > & | counts () const |
| void | content (const std::string &content, const std::string &encoding="utf-8") |
| Sets the content of the document to be the parameter. More... | |
| void | encoding (const std::string &encoding) |
| Sets the encoding for the document to be the parameter. More... | |
| const std::string & | content () const |
| const std::string & | encoding () const |
| doc_id | id () const |
| bool | contains_content () const |
| void | label (class_label label) |
| Sets the label for this document. More... | |
Private Attributes | |
| std::string | path_ |
| Where this document is on disk. | |
| doc_id | d_id_ |
| The document id for this document. | |
| class_label | label_ |
| Which category this document would be classified into. | |
| std::string | name_ |
| The short name for this document (not the full path) | |
| size_t | length_ |
| The number of (non-unique) tokens in this document. | |
| std::unordered_map< std::string, double > | counts_ |
| Counts of how many times each token appears. | |
| util::optional< std::string > | content_ |
| What the document contains. | |
| std::string | encoding_ |
| The encoding for the content. | |
Represents an indexable document.
Internally, a document may contain either string content or a path to a file it represents on disk.
Once tokenized, a document contains a mapping of term -> frequency. This mapping is empty upon creation.
| meta::corpus::document::document | ( | const std::string & | path = "[NONE]", |
| doc_id | d_id = doc_id{0}, |
||
| const class_label & | label = class_label{"[NONE]"} |
||
| ) |
Constructor.
| path | The path to the document |
| d_id | The doc id to assign to this document |
| label | The optional class label to assign this document |
| void meta::corpus::document::increment | ( | const std::string & | term, |
| double | amount | ||
| ) |
Increment the count of the specified transition.
| term | The string token whose count to increment |
| amount | The amount to increment by |
| std::string meta::corpus::document::path | ( | ) | const |
| const class_label & meta::corpus::document::label | ( | ) | const |
| std::string meta::corpus::document::name | ( | ) | const |
| void meta::corpus::document::name | ( | const std::string & | n | ) |
| n | The new name for this document |
| uint64_t meta::corpus::document::length | ( | ) | const |
| double meta::corpus::document::count | ( | const std::string & | term | ) | const |
Get the number of occurrences for a particular term.
| term | The string term to look up |
| const std::unordered_map< std::string, double > & meta::corpus::document::counts | ( | ) | const |
| void meta::corpus::document::content | ( | const std::string & | content, |
| const std::string & | encoding = "utf-8" |
||
| ) |
Sets the content of the document to be the parameter.
| content | The string content to assign into this document |
| encoding | the encoding of content, which defaults to utf-8 |
| void meta::corpus::document::encoding | ( | const std::string & | encoding | ) |
Sets the encoding for the document to be the parameter.
| encoding | The string label for the encoding |
| const std::string & meta::corpus::document::content | ( | ) | const |
| const std::string & meta::corpus::document::encoding | ( | ) | const |
| doc_id meta::corpus::document::id | ( | ) | const |
| bool meta::corpus::document::contains_content | ( | ) | const |
| void meta::corpus::document::label | ( | class_label | label | ) |
Sets the label for this document.
| label | The new label for this document |
1.8.9.1