This guide is useful for teams that want to manually register all their model attributes or register a model without a reference dataset.

Setting Up To Individually Onboard Model Attributes

Setting up your model to add each attribute individually is similar to the typical starting point for Creating Arthur Model Objects. Teams must define their Arthur Model Object:

model = arthur.model(partner_model_id=f"CreditRisk_Batch_QS-{datetime.now().strftime('%Y%m%d%H%M%S')}",
                            display_name="Credit Risk Batch",
                            input_type=InputType.Tabular,
                            output_type=OutputType.Multiclass,
                            is_batch=True)

More information on defining this and all the steps can be found on the Creating Arthur Model Objects page here.

Set up Arthur Input Attributes

Teams looking to manually onboard models can start by manually adding all PipelineInput and NonInput attributes to their Arthur Model.

🚧
Sending Null Values for Attributes
Unless otherwise specified below in different types, Null values are allowed for different input types. Null and NaN values are allowed for onboarding through the Arthur SDK. On the other hand, Null values are allowed when onboarding with the API, but Null values are not.

Numerical Attribute

Numerical attributes are input attributes meant to track continuous numerical values. They can be manually added to a model in Arthur using the add_attribute function.

from arthurai.common.constants import Stage, ValueType

# adds a float input attribute directly to the model
arthur_model.add_attribute(
    name="Num_Attr_Name",
    value_type=ValueType.Float,
    stage=Stage.ModelPipelineInput
)

📘
Inferring Numerical Attributes as Categorical
When Arthur is inferring the model schema, Float and Integer columns are assumed to be categorical if there are fewer than 20 unique values and if Float values are all whole numbers. String and boolean columns are always assumed to be categorical for Tabular models.

## Ensure that numerical attributes are valued as numerical and not categorical
arthur_model.get_attribute("Num_Attr_Name", stage=Stage.ModelPipelineInput).categorical = False

Teams may also choose to specify the exact value type. The options are "Integer" or "Float".

arthur_model.get_attribute("Num_Attr_Name", stage=Stage.ModelPipelineInput).value_type = 'INTEGER'

Categorical Attribute

Categorical Attributes are attributes that represent a finite group of values (or categories). They can be manually added to a model in Arthur using the add_attribute function.

from arthurai.common.constants import Stage, ValueType

# adds a float input attribute directly to the model
arthur_model.add_attribute(
    name="Cat_Attr_Name",
    value_type=ValueType.String,
    stage=Stage.ModelPipelineInput
)

👍
Ensure All Possible Production Attributes Are Specified
Attributes that are set to categorical must have at least one column. In edge cases where this is not possible, the category list can be set using a single "dummy" category (e.g., ["n/a"]). While new categories will be taken in by the platform, they will not be utilized in drift calculations or segmented visualization in the UI. So, it is important to ensure that all potential categories are listed before onboarding.

Setting Possible Categories

Based on the callout above, teams may manually specify the potential categories.

## Set Categories to N/A
arthur_model.get_attribute("Cat_Attr_Name", stage=Stage.ModelPipelineInput).categories = ["n/a"]

## Set Categories to a List 
arthur_model.get_attribute("Cat_Attr_Name", stage=Stage.ModelPipelineInput).categories = ["n/a", "bachelors","masters","highschool"]]]

Setting Attribute Labels

When teams have set up numerical encoding for their categorical variables, providing the mapping back to human understanding for the Arthur platform may be useful. This will make it easier for end users to utilize the UI to understand categorical attributes better.

# labels the value 0 for the attribute 'education_level'
# to have the label 'elementary', etc.
arthur_model.set_attribute_labels(
    'education_level',
    {0 : 'elementary', 1 : 'middle', 2 : 'high', 3 : 'university'}
)

Timestamp Attribute

Timestamp Attributes are model features that represent a date/time. These are frequently found in time series models.

from arthurai.common.constants import Stage, ValueType

# adds a timestamp input attribute directly to the model
arthur_model.add_attribute(
    name="Timestamp_Attribute_Name",
    value_type=ValueType.Timestamp,
    stage=Stage.ModelPipelineInput
)

It is important to note that the DateTime object being put into Arthur, must be in a DateTime format and include a timezone. A common example of how to set up these transformations can be seen below:

## Example of Converting to Pandas DateTime 
## This function will need to change depending on how your time strings are formatted 
def get_timestamps(x):
    new_time = x.split('.')[0]
    return datetime.strptime(new_time, '%Y-%m-%d %H:%M:%S')

df['timestamps'] = df['timestamps'].apply(lambda x: get_timestamps(x))

## Ensure Appropriate tzinfo Timezone Added
df['timestamp'] = df['timestamp'].apply(lambda x: x.replace(tzinfo=pytz.UTC))

❗️
Null and NaN Values Are Not Allowed
For Timestamp attributes, Null and NaN values are not allowed within Arthur.

Time Series Attribute

Time Series Attributes are model features that represent a value over time.

from arthurai.common.constants import Stage, ValueType

# adds a time series input attribute directly to the model
arthur_model.add_attribute(
    name="Time_Series_Attribute_Name",
    value_type=ValueType.TimeSeries,
    stage=Stage.ModelPipelineInput
)

It is important to note that the TimeSeries object being put into Arthur must be formatted as a list of dicts with "timestamp" and "value" keys. The timestamps must be formatted according to the restrictions on Timestamp attributes. The values must be floats.

❗️
There must be data for every timestamp on a regular time interval
Each time series attribute should be considered to have some regular time interval (eg. 1 day, 1 week, etc.) at which the value it is recording is polled. The value thus must be recorded on consistent time intervals, and if data is not recorded on a given timestamp in that consistent interval, a data point with a Null value must still be recorded.

Text (NLP) Attribute

Unstructured Text Attributes refer to input text for NLP models. They can be manually added to a model in Arthur using the add_attribute function. The ArthurAttribute type of UnstructuredText is designed to be used for NLP models only.

from arthurai.common.constants import Stage, ValueType

# adds a float input attribute directly to the model
arthur_model.add_attribute(
    name="NLP_Input_Attribute_Name",
    value_type=ValueType.UnstructuredText,
    stage=Stage.ModelPipelineInput
)

Generative Text Models

Teams wanting to monitor generative text models should refer to the Generative Text Model Onboarding Guide. This provides a step-by-step walkthrough in manually onboarding those models.

Image Attribute

Image Attributes refer to the images input to computer vision models. They can be manually added to a model in Arthur using the add_image_attribute function.

model.add_image_attribute("ImageColumnName")

The ImageColumnName string contains the column's name in your future reference or inference data frames containing the path to each image.

Unique Identifier

A Unique Identifier Attribute within Arthur is created to specify unique values within the platform. These String-type categorical attributes within Arthur specify unique values for every category.

from arthurai.common.constants import Stage, ValueType

# adds a float input attribute directly to the model
arthur_model.add_attribute(
    name="Cat_Unique_Attr_Name",
    value_type=ValueType.String,
    stage=Stage.ModelPipelineInput
)

## Specify this category as unique 
arthur_model.get_attribute("Cat_Unique_Attr_Name", stage=Stage.ModelPipelineInput).unique = True

## Specify no categories 
arthur_model.get_attribute("Cat_Unique_Attr_Name", stage=Stage.ModelPipelineInput).categories = []

🚧
Do Not Onboard Your Partner Inference ID with Reference Data
While the Partner Inference ID (your internal teams inference identifier) is the most common unique identifier within Arthur, this is something that you can specify (or Arthur will create) when you send inferences onto the platform. This is not specified when building out a reference dataset.

Set up Arthur Predicted/Ground Truth Attributes

After sending all input attribute information, teams can specify their model's predicted and ground truth attributes. This will depend on the model task type they are trying to onboard.

❗️
Null Values Are Not Allowed for Predicted or Ground Truth Attributes
Null values are not supported for predicted or ground truth attributes.

Classification

To add output attributes for classification tasks, teams must first specify what type of classification model they want to onboard. They can choose between:

Binary Classification with columns for each predicted attribute and ground truth value
Multi-Class Classification with columns for each predicted attribute and ground truth value
Either Binary or Multi-Class Classification with columns for each predicted attribute but only a single column for ground truth

The type of classification you choose should be based on the schema you expect for onboarding reference or inference data later.

Binary Classification

If you expect your inference schema to consist of two predictions and two ground truth columns for your binary classification task, then you should utilize the add_binary_classifier_output_attributes function. In this function, you need to provide:

Prediction to Ground Truth Mapping: Mapping of each predicted column to its corresponding ground truth column
Positive Predicted Attribute: The positive class is the class that is related to your objective function. For example, if you want to classify whether the objects are present in a given scenario. So all the data samples where objects are predicted present will be considered positively predicted.

# map PredictedValue attributes to their corresponding GroundTruth attributes
PRED_TO_GROUND_TRUTH_MAP = {'pred_0' : 'gt_0',
                            'pred_1' : 'gt_1'}

# add the ground truth and predicted attributes to the model
# specifying that the `pred_1` attribute is the
# positive predicted attribute, which means it corresponds to the
# probability that the binary target attribute is 1
arthur_model.add_binary_classifier_output_attributes(positive_predicted_attr='pred_1',
                                                     pred_to_ground_truth_map=PRED_TO_GROUND_TRUTH_MAP)

Multi-Class Classification

If you expect your inference schema to consist of multiple predictions and their corresponding multiple ground truth columns for your classification task, then you should utilize the add_multiclass_classifier_output_attributes function. In this function, you need to provide:

Prediction to Ground Truth Mapping: Mapping of each predicted column to its corresponding ground truth column
Positive Predicted Attribute: The positive class is the class that is related to your objective function. For example, if you want to classify whether the objects are present in a given scenario. So all the data samples where objects are predicted present will be considered positively predicted.

# map PredictedValue attributes to their corresponding GroundTruth attributes
PRED_TO_GROUND_TRUTH_MAP = {
    "dog": "dog_gt",
    "cat": "cat_gt",
    "horse": "horse_gt"
}

# add the ground truth and predicted attributes to the model
arthur_model.add_multiclass_classifier_output_attributes(
    pred_to_ground_truth_map = PRED_TO_GROUND_TRUTH_MAP
)

Single Column Classification

Single-column classification is very similar to previous techniques; however, there is only a single ground truth column in this technique.

Prediction to Ground Truth Mapping: Mapping of each predicted column to its corresponding ground truth value
Positive Predicted Attribute: The positive class is the class that is related to your objective function. For example, if you want to classify whether the objects are present in a given scenario. So all the data samples where objects are predicted present will be considered positively predicted.
Ground_Truth_Column: You must specify the single-column ground truth

# Map PredictedValue attribute to its corresponding GroundTruth attribute value.
# This tells Arthur that the `pred_survived` column represents
# the probability that the ground truth column has the value 1
PRED_TO_GROUND_TRUTH_MAP = {
    "pred_value": 1
}

# Add the ground truth and predicted attributes to the model,
# specifying which attribute represents ground truth and
# which attribute represents the predicted value.
arthur_model.add_classifier_output_attributes_gtclass(
    positive_predicted_attr = 'pred_value',
    pred_to_ground_truth_class_map = PRED_TO_GROUND_TRUTH_MAP,
    ground_truth_column = 'gt_column'
)

Regression

To manually specify your regression output, teams need to specify a prediction to ground truth mapping with the following:

Predicted Value: The column that contains your numerical predicted output
Ground Truth Value: The column that contains the ground truth

from arthurai.common.constants import ValueType

# map PredictedValue attributes to their corresponding GroundTruth attributes
PRED_TO_GROUND_TRUTH_MAP = {
    "pred_value": "gt_value",
}

# add the ground truth and predicted attributes to the model
arthur_model.add_regression_output_attributes(
    pred_to_ground_truth_map = PRED_TO_GROUND_TRUTH_MAP,
    value_type = ValueType.Float
)

Object Detection

To manually specify your object detection models, teams need to specify the following:

Predicted Attribute Name: This is the column name that will store your predicted bounding boxes
Ground Truth Attribute Name: This is the name of the column with the true labeled bounding boxes
Class Labels: All potential object labels for that your model is detecting

predicted_attribute_name = "objects_detected"
ground_truth_attribute_name = "label"
class_labels = ['cat', 'dog', 'person']

arthur_model.add_object_detection_output_attributes(
    predicted_attribute_name,
    ground_truth_attribute_name,
    class_labels)

Generative Text (LLM)

Teams wanting to monitor generative text models should refer to the Generative Text Model Onboarding Guide. This provides a step-by-step walkthrough in manually onboarding those models.

Setting Reference Data Later

For teams that have chosen to manually onboard all of their model attributes to ensure that they were inferred correctly but still want to include a reference dataset for drift calculations, they can! After manually creating the model schema above, this can be done by setting the reference dataset.

# reference dataframe of model inputs
reference_set = pd.DataFrame(....)

# produce model predictions on reference set
# in this example, the predictions are classification probabilities
preds = model.predict_proba(reference_set)

# assign the column corresponding to the positive class
# as the `pred` attribute in the reference data
reference_set["pred"] = preds[:, 1]

# set ground truth labels
reference_set["gt"] = ...

# configure the ArthurModel to use this dataframe as reference data
arthur_model.set_reference_data(data=reference_set)

Registering Model Attributes Manually

Setting Up To Individually Onboard Model Attributes

Set up Arthur Input Attributes

🚧
Sending Null Values for Attributes

Numerical Attribute

📘
Inferring Numerical Attributes as Categorical

Categorical Attribute

👍
Ensure All Possible Production Attributes Are Specified

Setting Possible Categories

Setting Attribute Labels

Timestamp Attribute

❗️
Null and NaN Values Are Not Allowed

Time Series Attribute

❗️
There must be data for every timestamp on a regular time interval

Text (NLP) Attribute

Generative Text Models

Image Attribute

Unique Identifier

🚧
Do Not Onboard Your Partner Inference ID with Reference Data

Set up Arthur Predicted/Ground Truth Attributes

❗️
Null Values Are Not Allowed for Predicted or Ground Truth Attributes

Classification

Binary Classification

Multi-Class Classification

Single Column Classification

Regression

Object Detection

Generative Text (LLM)

Setting Reference Data Later

Setting Up To Individually Onboard Model Attributes

Set up Arthur Input Attributes

🚧Sending Null Values for Attributes

Numerical Attribute

📘Inferring Numerical Attributes as Categorical

Categorical Attribute

👍Ensure All Possible Production Attributes Are Specified

Setting Possible Categories

Setting Attribute Labels

Timestamp Attribute

❗️Null and NaN Values Are Not Allowed

Time Series Attribute

❗️There must be data for every timestamp on a regular time interval

Text (NLP) Attribute

Generative Text Models

Image Attribute

Unique Identifier

🚧Do Not Onboard Your Partner Inference ID with Reference Data

Set up Arthur Predicted/Ground Truth Attributes

❗️Null Values Are Not Allowed for Predicted or Ground Truth Attributes

Classification

Binary Classification

Multi-Class Classification

Single Column Classification

Regression

Object Detection

Generative Text (LLM)

Setting Reference Data Later

🚧
Sending Null Values for Attributes

📘
Inferring Numerical Attributes as Categorical

👍
Ensure All Possible Production Attributes Are Specified

❗️
Null and NaN Values Are Not Allowed

❗️
There must be data for every timestamp on a regular time interval

🚧
Do Not Onboard Your Partner Inference ID with Reference Data

❗️
Null Values Are Not Allowed for Predicted or Ground Truth Attributes