ModErn Text Analysis
META Enumerates Textual Applications
|
Analyzer that operates over sequences, generating features based on a set of "observation functions". More...
#include <sequence_analyzer.h>
Classes | |
class | basic_collector |
Implementation-detail collector. More... | |
class | collector |
Interface class used for analyzing observations inside user-provided feature functions. More... | |
class | const_collector |
Const version of the collector. More... | |
class | default_collector |
Non-const version of the collector. More... | |
class | exception |
Public Member Functions | |
sequence_analyzer ()=default | |
Default constructor. | |
sequence_analyzer (const std::string &prefix) | |
Constructs a new sequence analyzer that will load its output from the given prefix (folder). More... | |
sequence_analyzer (const sequence_analyzer &)=default | |
Sequence analyzers may be copy constructed. | |
sequence_analyzer (sequence_analyzer &&)=default | |
Sequence analyzers may be move constructed. | |
sequence_analyzer & | operator= (const sequence_analyzer &)=default |
Sequence analyzers may be copy assigned. | |
sequence_analyzer & | operator= (sequence_analyzer &&)=default |
Sequence analyzers may be move assigned. | |
void | load (const std::string &prefix) |
Loads a sequence analyzer from a folder given by prefix. More... | |
void | save (const std::string &prefix) const |
Saves the sequence analyzer into the folder given by prefix. More... | |
void | analyze (sequence &sequence) |
Analyzes a sequence, generating new label_ids and feature_ids for unseen elements. More... | |
void | analyze (sequence &sequence, uint64_t idx) |
Analyzes a single point in a sequence, generating new label_ids and feature_ids for unseen elements. More... | |
void | analyze (sequence &sequence) const |
Analyzes a sequence, but ignores any new label_ids or feature_ids. More... | |
void | analyze (sequence &sequence, uint64_t idx) const |
Analyzes a single point in a sequence,b ut ignores any new label_ids or feature_ids. More... | |
feature_id | feature (const std::string &feature) |
Looks up the feature id for the given string representation. More... | |
feature_id | feature (const std::string &feature) const |
Looks up the feature_id for the given string representation. More... | |
uint64_t | num_features () const |
label_id | label (tag_t lbl) const |
tag_t | tag (label_id lbl) const |
uint64_t | num_labels () const |
const std::string & | prefix () const |
const util::invertible_map< tag_t, label_id > & | labels () const |
template<class Function > | |
void | add_observation_function (Function &&function) |
Adds an observation function to the list of functions to be used for analyzing observations. More... | |
Private Member Functions | |
void | load_feature_id_mapping (const std::string &prefix) |
Loads the feature_id mapping from disk. More... | |
void | load_label_id_mapping (const std::string &prefix) |
Loads the label_id mapping from disk. More... | |
void | add_feature (observation &obs, const std::string &feature, double weight=1.0) |
Adds a feature to an observation. More... | |
Private Attributes | |
std::vector< std::function< void(const sequence &, uint64_t, collector &)> > | obs_fns_ |
The observation functions. | |
std::unordered_map< std::string, feature_id > | feature_id_mapping_ |
The feature_id mapping (string to id) | |
util::invertible_map< tag_t, label_id > | label_id_mapping_ |
The label_id mapping (tag_t to label_id) | |
Analyzer that operates over sequences, generating features based on a set of "observation functions".
Observation functions must have an operator()
of the form:
and can only refer to the symbols in the sequence, not the tags! These functions should not modify the sequence directly and should instead use the collector
interface. For example:
meta::sequence::sequence_analyzer::sequence_analyzer | ( | const std::string & | prefix | ) |
Constructs a new sequence analyzer that will load its output from the given prefix (folder).
prefix | The folder to load/save mappings to |
void meta::sequence::sequence_analyzer::load | ( | const std::string & | prefix | ) |
Loads a sequence analyzer from a folder given by prefix.
prefix | the prefix to load the analyzer from |
void meta::sequence::sequence_analyzer::save | ( | const std::string & | prefix | ) | const |
Saves the sequence analyzer into the folder given by prefix.
prefix | The folder to save the analyzer to |
void meta::sequence::sequence_analyzer::analyze | ( | sequence & | sequence | ) |
Analyzes a sequence, generating new label_ids and feature_ids for unseen elements.
sequence | The sequence to be analyzed |
void meta::sequence::sequence_analyzer::analyze | ( | sequence & | sequence, |
uint64_t | idx | ||
) |
Analyzes a single point in a sequence, generating new label_ids and feature_ids for unseen elements.
sequence | The sequence to be analyzed |
t | The position in the sequence to be analyzed |
void meta::sequence::sequence_analyzer::analyze | ( | sequence & | sequence | ) | const |
Analyzes a sequence, but ignores any new label_ids or feature_ids.
Used for analyzing test items, for example, so that existing models don't need to special case unseen feature ids.
sequence | The sequence to be analyzed |
void meta::sequence::sequence_analyzer::analyze | ( | sequence & | sequence, |
uint64_t | idx | ||
) | const |
Analyzes a single point in a sequence,b ut ignores any new label_ids or feature_ids.
Used for analyzing test items, for example, so that existing models don't need to special case unseen feature ids.
feature_id meta::sequence::sequence_analyzer::feature | ( | const std::string & | feature | ) |
Looks up the feature id for the given string representation.
If one doesn't exist, it will assign the next feature_id to this string
feature | The string representation of the feature |
feature_id meta::sequence::sequence_analyzer::feature | ( | const std::string & | feature | ) | const |
Looks up the feature_id for the given string representation.
If one doesn't exist, it will simply assign the next feature_id to the string, but it will not remember the assignment.
feature | The string representation of the feature |
uint64_t meta::sequence::sequence_analyzer::num_features | ( | ) | const |
label_id meta::sequence::sequence_analyzer::label | ( | tag_t | lbl | ) | const |
lbl | The tag |
tag_t meta::sequence::sequence_analyzer::tag | ( | label_id | lbl | ) | const |
lbl | The label_id |
uint64_t meta::sequence::sequence_analyzer::num_labels | ( | ) | const |
const std::string& meta::sequence::sequence_analyzer::prefix | ( | ) | const |
const util::invertible_map< tag_t, label_id > & meta::sequence::sequence_analyzer::labels | ( | ) | const |
|
inline |
Adds an observation function to the list of functions to be used for analyzing observations.
function | The function to add |
|
private |
Loads the feature_id mapping from disk.
prefix | The folder to load the mapping from |
|
private |
Loads the label_id mapping from disk.
prefix | The folder to load the mapping from |
|
private |
Adds a feature to an observation.
obs | The observation |
feature | The string representing the feature |
weight | The weight for the feature (default = 1.0) |