Glossary
The following definitions are specific to the Arthur platform, though in most cases apply to ML more broadly.
Container class for inferences uploaded to the Arthur platform. An inference comprises input features, prediction values, and (optionally) ground truth values and any NonInput data.
Example:
ground_truth = {
"Consumer Credit Score": 652.0
}
inference = arthur_model.get_inference(external_id)
inference.update(ground_truth)
Related terms: inference, ArthurModel
A model object sends and retrieves data pertinent to a deployed ML system. The ArthurModel
object is separate from the underlying model trained and makes predictions; it serves as a wrapper for the underlying model to access Arthur platform functionality.
An ArthurModel contains at least aname
, an InputType
and a OutputType
.
Examples:
arthur_model = connection.model(name="New_Model",
input_type=InputType.Tabular,
model_type=OutputType.Regression)
arthur_model = connection.get(model_id)
arthur_model.send_inference(...)
Arthur Model Groups are an organizational construct the Arthur platform uses to track different versions of an Arthur model. Every Arthur Model is a version of one Model Group, and a Model Group will always have at least one Arthur Model. The Model Group for an Arthur Model can only be specified during onboarding, and once the Arthur Model is saved, its group cannot be changed. If an Arthur Model is created without specifying a Model Group, a new Model Group will be created automatically with the new model as its single version.
When adding a model to a model group, the model is assigned a unique, incrementing Version Sequence Number (starting at 1) corresponding to the order in which it was added to the model group. Additionally, you can provide a Version Label to store a custom version string label along with the Version Sequence Number.
Example:
# retrieve the first version of a model
arthur_model_v1 = connection.get(model_id)
model_group = arthur_model_v1.model_group
# create the new version of the model
arthur_model_v2 = connection.model(name="Model_V2",
input_type=InputType.Tabular,
model_type=OutputType.Regression)
# add the new model to the model group
model_group.add_version(arthur_model_v2, label="2.0.1")
arthur_model_v2.save()
Related terms: Version Label, Version Sequence Number
A variable associated with a model. , be input, prediction, ground truth or ancillary information (these groupings are known as Stages in the Arthur platform). It can be categorical or continuous.
Example:
The attribute
age
is an input to the model, whereas the attributecreditworthy
is the target for the model.
Synonyms: variable, {predictor, input}, {ouput, target}, prediction.
Related terms: input, stage, prediction, ground truth
While bias is an overloaded term in stats&ML, we refer specifically to where the model's outcomes have the potential to lead to discriminatory outcomes.
Example:
This credit approval model tends to lead to biased outcomes: men are approved for loans at a rate 50% higher than women are.
Related terms: bias detection, bias mitigation, disparate impact
The detection and quantification of {ref}algorithmic bias <bias_monitoring>
in an ML system, typically as evaluated on a model's outputs (predictions) across different definitions of a sensitive attribute.
Many definitions of algorithmic bias have been proposed, including group and individual fairness definitions.
Group fairness definitions are often defined by comparing groupconditional statistics about the model's predictions. In the below definitions, the group membership feature is indicated by G, and a particular group membership value is simplified by g.
Example:
Common metrics for group fairness include Demographic Parity, Equalized Odds, and Equality of Opportunity.
Related terms: bias mitigation
Demographic Parity
A fairness metric that compares groupconditional selection rates. The quantity being compared is:
P(Y^ = 1  G = g)
There is not necessarily a normative ideal relationship between the selection rates for each group: in some situations, such as allocating resources, it may be important to minimize the disparity in selection rates across groups; in others, metrics based on groupconditional accuracy may be more relevant. However, even in the latter case, understanding groupconditional selection rates, especially when compared against the original training data, can be useful contextualization for the model and its task as a whole.
Related term: disparate impact
Equal Opportunity
A fairness metric that compares groupconditional true positive rates. The quantity being compared is:
P(Y^ = 1  Y =1, G = g)
For all groups, a true positive rate closer to 1 is better.
Equalized Odds
A fairness metric that incorporates both groupconditional true positive rates and false positive rates, equivalently, true positive and negative rates. There are a variety of implementations due to the fact that some quadrants of the confusion matrix are complements of one another); here is one possible quantity to compare across groups:
P(Y^ = 1  Y = 1, G = g) + P(Y^ = 0  Y= 0, G = g)
In this implementation, this quantity should be as close to 2 as possible for all groups.
Automated techniques to mitigate bias in a discriminatory model. Can be characterized by where the technique sits in the model lifecycle:
 PreProcessing: Techniques that analyze datasets and often modify/resample training datasets to make the learned classifier less discriminatory.
 InProcessing: Techniques for training a fairnessaware classifier (or regressor) that explicitly trades off optimizing for accuracy and maintaining fairness across sensitive groups.
 PostProcessing: Techniques that only adjust the output predictions from a discriminatory classifier without modifying the training data or the classifier.
Related terms: bias detection
A modeling task where the target variable belongs to a discrete set with two possible outcomes.
Example:
This binary classifier will predict whether or not a person is likely to default on their credit card.
Related terms: output type, classification, multilabel classification
An attribute whose value is taken from a discrete set of possibilities.
Example:
A person's blood type is a categorical attribute: it can only be A, B, AB, or O.
Synonyms: discrete attribute
Related terms: attribute, continuous, classification
An attribute whose value is taken from an ordered continuum can be bounded or unbounded.
Example:
A person's height, weight, income, IQ can all be through of as continuous attributes.
Synonyms: numeric attribute
Related terms: attribute, continuous, regression
A modeling task where the target variable belongs to a discrete set with a fixed number of possible outcomes.
Example:
This classification model will determine whether an input image is of a cat, a dog, or fish.
Related terms: output type, binary classification, multilabel classification
Data Drift
Refers to the problem arising when, after a trained model is deployed, changes in the external world lead to degradation of model performance and the model becoming stale. Detecting data drift will provide a le ing indicator of data stability and integrity.
Data drift can be quantified with respect to a specific reference set (e.g., the model's training data) or, more generally, the model's temporal shifts in a variable with respect to past time windows.
Your project can {ref}query data drift metrics through the Arthur API <data_drift>
. This section will provide an overview o the available data drift metrics in Arthur's query service.
Related terms: ou Arthur'stribution
Arthur also offers a multivariate Anomaly Score, which you can configure via the steps detailed in Enabling Enrichments.
See Anomaly Detection for an explanation of how these scores are used and Arthur Algorithms for how they're calculated.
Legal terminology originally from Fair Lending case law. This constraint is strictly harder than Dispara e Treatment and asserts that model outcomes must not be discriminatory across protected groups. That is, the outcome of a decision process should not be substantially higher (or lower) for one group of a protected class over another.
While there does not exist a single threshold for establishing the presence or absence of disparate impact, the socalled "80% rule" is commonly referenced. However, harm certain subgroups of a population differentially r, we strongly recommend against adopting this ruleofthumb, as these analyses should be grounded in usecasespecific analysis and the legal framework pertinent to a given industry.
Example:
Even though the model didn't take
gender
as input, it still results in disparate impact when we compare outcomes for males and females.
Related terms: bias, disparate treatment
Legal terminology originally from Fair Lending case law. Disparate Treatment asserts that you are not allowed to consider protected variables (e.g., race, age, gender) when approving or denying a credit card loan application.
In practical terms, a data scientist cannot include these attributes as inputs to a credit decision model.
Adherence to Disparate Treatment is not a sufficient condition for actually achieving a fair model (see proxy and bias detedefinitionstion). "Fairness through "unawareness" is not good enough.
Related terms: bias, disparate impact
Generally used to describe data or metrics added to raw data after ingestion. Arthur provides various enrichments such as Anomaly Detection and Explainability. See entity enrichments for details about using enrichments within Arthur.
An individual attribute that is an input to a model
Example:
The credit scoring model has features like
“home_value”
,“zip_code”
,“height"
.
The true label or target variable (Y) corresponds to inputs (X) for a dataset.
Examples:
pred = sklearn_model.predict_proba(X)
arthur_model.send_inference(
model_pipeline_input=X,
predicted_values={1:pred, 0: 1pred})
Related terms: prediction
Imagery data is commonly used for computer vision models.
Related terms: attribute, output type, Stage
One row of a dataset. Inference refers to passing a single input into a model and the model's prediction. Data associated with that inference might include (1) the input to the model, (2) the model's prediction and (3) the corresponding ground truth.
With respect to the Arthur platform, the term inference denotes any and all of those related components of data for a single input&prediction.
Related terms: ArthurInference, stage
A single data instance upon which a model can calculate an output prediction. The input consists of all relevant features together.
Example:
The input features for the credit scoring model consist of
“home_value”
,“zip_code”
,“height"
.
For an ArthurModel, this field declares what kind of input datatype will be flowing into the system.
Allowable values are defined in the InputType
enum:
Example:
arthur_model = connection.model(name="New_Model",
input_type=InputType.Tabular,
model_type=OutputType.Regression)
Related terms: output type, tabular data, nlp data
On the UI dashboard, you will see a model health score between 0100 for each of your models. This score averages over a 30day window of the following normalized metrics: performance, drift, and ingestion.

Performance:
 Regression: 1  Normalized MAE
 Classification: F1 Score

Drift
 1  Average Anomaly Score

Ingestion
 The variance of normalized periods between ingestion events
 The variance of normalized volume differences between ingestion events
You can also extract the health score via an API call.
Model onboarding refers to the process of defining an ArthurModel
, preparing it with the necessary reference dataset, passing it through a validation check, and saving it to the Arthur system.
Once your model is onboarded onto Arthur, you can use the Arthur system to track the model and view all its performance and analytics in your online Arthur dashboard.
Related terms: ArthurModel, reference dataset
A modeling task where each input is associated with one label from a fixed set of possible labels.
Often this is a binary classifier (the output is either 0 or 1), but the output can also have more than 2 possible labels.
Example:
This NLP model applies the most relevant tag to news articles. The model is trained on example articles which are tagged with a topic like
Congress
.
Related terms: multilabel clasification, output type,
A modeling task where each input is associated with two or more labels from a fixed set of possible labels.
Example:
This NLP model applies relevant tags to news articles. The model is trained on example articles which are tagged with multiple topics like
Politics, Elections, Congress
.
Related terms: output type, multiclass clasification
Unstructured text sequences are commonly used for Natural Language Processing models.
Related terms: attribute, output type, Stage
A noninput attribute is an attribute that an ArthurModel will track that does not actually enter the model as an input.
Common noninput attributes are protected class attributes such as age, race, or sex. By The model ending such noninput attributes to Arthur, you can track model performance based on these groups in your data to evaluate model bias and fairness.
Related terms: attribute, bias
The OutputType is for computer vision models to detect an object within an image and output a box that bounds the object.
This bounding box is used to identify where the object resides in the image.
Related terms: image
Refers to the challenge of detecting when an input (or set of inputs) is substantially different from the distribution of a larger set of reference inferences. This term commonly arises in data drift, where we want to detect if new inputs differ from the training data (and distribution thereof) for a particular model. OOD Detection is a relevant challenge for Tabular data and unstructured data such as images and sequences.
Related terms: {ref}glossary_data_drift
For an ArthurModel, this field declares what kind of output predictions will flow out of the system.
Allowable values are defined in the OutputType
enum:
Regression
 appropriate for continuousvalued targets
Multiclass
 appropriate for both binary classifiers and multiclass classifiers
Multilabel
 appropriate for multilabel classifiers
ObjectDetection
 only available for computer vision models
Example:
arthur_model = connection.model(name="New_Model",
input_type=InputType.Tabular,
output_type=OutputType.Regression)
Related terms: input type
The output prediction (y_hat) of a trained model for any input.
Related terms: ground truth
An attribute of an inference that is considered sensitive with respect to model bias. Common examples include race, age, and gender. The term "protected" comes from the Civil Rights Act of 1964.
Synonyms: sensitive attribute
An input attribute in a model (or combination thereof) is highly correlated with a protected attribute such as race, age, or gender. The presence of proxies in a dataset makes it difficult to rely only on [Disparate Treatment] as a standard for fair ML.
Example:
In most US cities, zip code is a strong proxy for race. Therefore, one must be cautious when using zip code as an input to a model.
Related terms: bias, disparate impact, disparate treatment
The dataset is used as a baseline reference for an Arthur model.
A reference dataset must include a sample of the input features a model receives.
A reference dataset can optionally include a sample of model outputs, ground truth values, and other noninput attributes as metadata.
The reference dataset for a model is used to compute drift: the distribution of input features in the reference dataset makes up the baseline against which future inferences are compared to compute anomaly scores.
Related terms: inference
A modeling task (or model) where the target variable is a continuous variable.
Example:
This regression model predicts what the stock price of $APPL will be tomorrow.
Related terms: output type
The Arthur platform uses taxonomy to delineate how attributes contribute to the model computations.
Allowable values are defined in the Stage
enum:
ModelPipelineInput
: Input to the entire model pipeline. This will most commonly be the Stage used to represent all model inputs. Will contain base input features familiar to the data scientist: categorical and continuous columns of a tabular dataset.PredictFunctionInput
: Potential alternative input source, representing direct input into the model's predi t() method. Therefore, data in the specific models have already undergone all relevant transformations, including scaling, onehot encoding, or embedding.PredictedValue
: The predictions coming out of the model.GroundTruth
: The ground truth (or target) attribute or a model. Must be onehot for classifiersGroundTruthClass
: The ground truth class for classification models, not onehot encodedNonInput
: Ancillary data that can be associated with each inference but not necessarily a direct input t the model. For example, sensitive attributes likeage
,sex
, orrace
might not be direct model inputs, but will be useful to associate with each prediction.
The data type for model inputs where the data can be thought of as a table (or spreadsheet) composed o rows and columns. Each column represents an input attribute for the model, and each row represents a separate record that composes the training data. In supervised learning, exactly one of the columns acts as the target.
Example:
This credit scoring model is trained on tabular data. The input attributes are
income
,country
, andage
and the target is FICO score.
Related terms: Attribute, output type, Stage
Token Likelihood
The token likelihood is a number between 0 and 1 that quantifies the model’s level of surprise that this token was the next predicted token of the sentence. If a token has a low likelihood (close to 0), the model is more unsure about selecting this token. While a likelihood close to 1 indicates that the model is very confident in predicting this token.
A Version Label is a string representing a custom version of your Arthur Model within its A thur Model Group. Version Labels are not required, and the platform will default to using the Version Sequence Number when not provided.
Example:
# retrieve the model group
model_group = connection.get_model_group(model_group_id)
# create the new version of the model
arthur_model_v2 = connection.model(name="Model_V2",
input_type=InputType.Tabular,
model_type=OutputType.Regression)
# add the new model to the model group
model_group.add_version(arthur_model_v2, label="2.0.1")
label = arthur_model_v2.version_label
arthur_model_v2.save()
# label == "2.0.1"
Related terms: Arthur Model, Arthur Model Group, Version Sequence Number
A Version Sequence Number is a unique, autoincrementing (starting at 1) integer assigned to Arthur Models in an A thur Model Group. This number uniquely represents an Arthur Model’s Version with the Model Group. If a Version Label is not provided, the platform will show the Version Sequence Number instead.
Example:
# retrieve the first version of a model
arthur_model_v1 = connection.get(model_id)
num = arthur_model_v1.version_sequence_num
# num == 1
# retrieve the second version of a model
model_group = arthur_model_v1.model_group
arthur_model_v2 = model_group.get_version(sequence_num=2)
num = arthur_model_v2.version_sequence_num
# num == 2
Related terms: Arthur Model, Arthur Model Group, Version Label
Updated 3 months ago