How do you all deal with AI model bias when it’s not super obvious?

Hey everyone,

So I’m working on this project for one of my AI ethics courses, and I’ve been diving into model monitoring and fairness stuff, which is how I ended up here. Super cool platform btw been reading through a bunch of the docs and forum posts. Way more helpful than some of my lectures tbh

Anyway, I’m running into this problem and wondering how people here usually handle it. I trained this text classification model (pretty basic, using scikit-learn + a small BERT model) on a dataset that’s supposedly balanced, but when I actually test it out, the predictions are kinda sus. Like, it seems to underperform when handling non-standard English especially sentences from non-native speakers. It's not blatantly wrong all the time, but you can definitely feel the skew, you know?

I did some manual checks and it just seems like the model is more “comfortable” predicting things when the input text is standard academic English. I’m guessing it’s because a lot of the training data came from published articles or formal writing, so anything outside that style kinda throws it off. I get that this kind of bias is subtle, but it’s still real and kinda frustrating.

I’m trying to figure out how to actually measure this kind of bias without manually tagging hundreds of examples. I saw that Arthur has tools for fairness monitoring and explainability, but I’m not sure where to start if the bias isn’t tied to obvious protected attributes like gender or race. Like how do you track bias based on language fluency or tone? Is that even possible? Or do I need to define my own proxy metrics for that?

Also, totally side note, but this reminds me of how academic systems are often lowkey biased too. Like I’m doing the IB program, and the level of formal English they expect in essays is so high it feels like you’re constantly being judged more on tone than ideas. I was talking to a classmate who’s super smart but struggles with that polished academic writing, and he said he ended up using an IB extended essay writing service just to get through the structure and formatting part. Kinda sucks that stuff like that becomes necessary just because the expectations are so rigid, even in a supposedly international curriculum.

But yeah back to the AI side of things, I’m really curious: when you’re dealing with bias that isn’t easily tied to clear demographic data, how do you even begin to audit for that? Are there tools for detecting more “linguistic” or contextual biases? Or do I just need to get creative with custom metrics and hope for the best?

Would love to hear how others here have approached this, especially if you've dealt with NLP models that had similar issues.