ModErn Text Analysis
META Enumerates Textual Applications
Classes | Public Member Functions | Private Member Functions | Private Attributes | List of all members
meta::sequence::sequence_analyzer Class Reference

Analyzer that operates over sequences, generating features based on a set of "observation functions". More...

#include <sequence_analyzer.h>

Classes

class  basic_collector
 Implementation-detail collector. More...
 
class  collector
 Interface class used for analyzing observations inside user-provided feature functions. More...
 
class  const_collector
 Const version of the collector. More...
 
class  default_collector
 Non-const version of the collector. More...
 
class  exception
 

Public Member Functions

 sequence_analyzer ()=default
 Default constructor.
 
 sequence_analyzer (const std::string &prefix)
 Constructs a new sequence analyzer that will load its output from the given prefix (folder). More...
 
 sequence_analyzer (const sequence_analyzer &)=default
 Sequence analyzers may be copy constructed.
 
 sequence_analyzer (sequence_analyzer &&)=default
 Sequence analyzers may be move constructed.
 
sequence_analyzeroperator= (const sequence_analyzer &)=default
 Sequence analyzers may be copy assigned.
 
sequence_analyzeroperator= (sequence_analyzer &&)=default
 Sequence analyzers may be move assigned.
 
void load (const std::string &prefix)
 Loads a sequence analyzer from a folder given by prefix. More...
 
void save (const std::string &prefix) const
 Saves the sequence analyzer into the folder given by prefix. More...
 
void analyze (sequence &sequence)
 Analyzes a sequence, generating new label_ids and feature_ids for unseen elements. More...
 
void analyze (sequence &sequence, uint64_t idx)
 Analyzes a single point in a sequence, generating new label_ids and feature_ids for unseen elements. More...
 
void analyze (sequence &sequence) const
 Analyzes a sequence, but ignores any new label_ids or feature_ids. More...
 
void analyze (sequence &sequence, uint64_t idx) const
 Analyzes a single point in a sequence,b ut ignores any new label_ids or feature_ids. More...
 
feature_id feature (const std::string &feature)
 Looks up the feature id for the given string representation. More...
 
feature_id feature (const std::string &feature) const
 Looks up the feature_id for the given string representation. More...
 
uint64_t num_features () const
 
label_id label (tag_t lbl) const
 
tag_t tag (label_id lbl) const
 
uint64_t num_labels () const
 
const std::string & prefix () const
 
const util::invertible_map< tag_t, label_id > & labels () const
 
template<class Function >
void add_observation_function (Function &&function)
 Adds an observation function to the list of functions to be used for analyzing observations. More...
 

Private Member Functions

void load_feature_id_mapping (const std::string &prefix)
 Loads the feature_id mapping from disk. More...
 
void load_label_id_mapping (const std::string &prefix)
 Loads the label_id mapping from disk. More...
 
void add_feature (observation &obs, const std::string &feature, double weight=1.0)
 Adds a feature to an observation. More...
 

Private Attributes

std::vector< std::function< void(const sequence &, uint64_t, collector &)> > obs_fns_
 The observation functions.
 
std::unordered_map< std::string, feature_id > feature_id_mapping_
 The feature_id mapping (string to id)
 
util::invertible_map< tag_t, label_id > label_id_mapping_
 The label_id mapping (tag_t to label_id)
 

Detailed Description

Analyzer that operates over sequences, generating features based on a set of "observation functions".

Observation functions must have an operator() of the form:

void operator()(const sequence& seq, uint64_t index, collector& coll)

and can only refer to the symbols in the sequence, not the tags! These functions should not modify the sequence directly and should instead use the collector interface. For example:

// feature function that gets the current word
auto fun = [](const sequence& seq, uint64_t t, collector& coll)
{
std::string word = seq[t].symbol();
coll.add("w[t]=" + word, 1);
};

Constructor & Destructor Documentation

meta::sequence::sequence_analyzer::sequence_analyzer ( const std::string &  prefix)

Constructs a new sequence analyzer that will load its output from the given prefix (folder).

Parameters
prefixThe folder to load/save mappings to

Member Function Documentation

void meta::sequence::sequence_analyzer::load ( const std::string &  prefix)

Loads a sequence analyzer from a folder given by prefix.

Parameters
prefixthe prefix to load the analyzer from
void meta::sequence::sequence_analyzer::save ( const std::string &  prefix) const

Saves the sequence analyzer into the folder given by prefix.

Parameters
prefixThe folder to save the analyzer to
void meta::sequence::sequence_analyzer::analyze ( sequence sequence)

Analyzes a sequence, generating new label_ids and feature_ids for unseen elements.

Parameters
sequenceThe sequence to be analyzed
void meta::sequence::sequence_analyzer::analyze ( sequence sequence,
uint64_t  idx 
)

Analyzes a single point in a sequence, generating new label_ids and feature_ids for unseen elements.

Parameters
sequenceThe sequence to be analyzed
tThe position in the sequence to be analyzed
void meta::sequence::sequence_analyzer::analyze ( sequence sequence) const

Analyzes a sequence, but ignores any new label_ids or feature_ids.

Used for analyzing test items, for example, so that existing models don't need to special case unseen feature ids.

Parameters
sequenceThe sequence to be analyzed
void meta::sequence::sequence_analyzer::analyze ( sequence sequence,
uint64_t  idx 
) const

Analyzes a single point in a sequence,b ut ignores any new label_ids or feature_ids.

Used for analyzing test items, for example, so that existing models don't need to special case unseen feature ids.

feature_id meta::sequence::sequence_analyzer::feature ( const std::string &  feature)

Looks up the feature id for the given string representation.

If one doesn't exist, it will assign the next feature_id to this string

Parameters
featureThe string representation of the feature
Returns
the feature id associated (or just assigned to) this feature
feature_id meta::sequence::sequence_analyzer::feature ( const std::string &  feature) const

Looks up the feature_id for the given string representation.

If one doesn't exist, it will simply assign the next feature_id to the string, but it will not remember the assignment.

Parameters
featureThe string representation of the feature
Returns
the feature id associated with this feature, or the "one-past-the-end" feature id
uint64_t meta::sequence::sequence_analyzer::num_features ( ) const
Returns
the number of feature_ids used so far to describe observations
label_id meta::sequence::sequence_analyzer::label ( tag_t  lbl) const
Parameters
lblThe tag
Returns
the label_id assigned a given tag
tag_t meta::sequence::sequence_analyzer::tag ( label_id  lbl) const
Parameters
lblThe label_id
Returns
the tag that corresponds with this label_id
uint64_t meta::sequence::sequence_analyzer::num_labels ( ) const
Returns
the number of labels used so far to describe observations
const std::string& meta::sequence::sequence_analyzer::prefix ( ) const
Returns
the prefix for this analyzers files
const util::invertible_map< tag_t, label_id > & meta::sequence::sequence_analyzer::labels ( ) const
Returns
The invertible_map that stores the label id mapping
template<class Function >
void meta::sequence::sequence_analyzer::add_observation_function ( Function &&  function)
inline

Adds an observation function to the list of functions to be used for analyzing observations.

Parameters
functionThe function to add
void meta::sequence::sequence_analyzer::load_feature_id_mapping ( const std::string &  prefix)
private

Loads the feature_id mapping from disk.

Parameters
prefixThe folder to load the mapping from
void meta::sequence::sequence_analyzer::load_label_id_mapping ( const std::string &  prefix)
private

Loads the label_id mapping from disk.

Parameters
prefixThe folder to load the mapping from
void meta::sequence::sequence_analyzer::add_feature ( observation obs,
const std::string &  feature,
double  weight = 1.0 
)
private

Adds a feature to an observation.

Parameters
obsThe observation
featureThe string representing the feature
weightThe weight for the feature (default = 1.0)

The documentation for this class was generated from the following files: