Data ethics as a practice intensives

I just finished two days (1am – 9am PDT…ugh) of intensives for this course I’m taking in the second semester, ‘Data Ethics as a Practice.’ There was a lot of information to ingest in a short amount of time, but here are five key takeaways:

  1. None of the data we use are value-neutral. Nor are the companies that create them. So we need to start with that mindset.
  2. There is no such thing as an unbiased dataset anywhere in the world. The only way that would be possible is to create a dataset representing the entirety of the universe. Everything else is just a slice of that.
  3. From Professor Shannon Vallor, I learned that AI ethics professionals are not general practitioners but typically focus on two or three including:
    • Data protection and privacy policy and regulations
    • Responsible Al policy and governance
    • Technical approaches to privacy (differential privacy, federated ML)
    • Ethics reviews/ethical impact assessment/hazard analysis
    • Adversarial testing (red-teaming, jailbreaking)
    • Model evaluation and documentation + algorithmic auditing
    • ML fairness research and development
    • Explainability research and development
    • Automated content moderation, filters and model fine-tuning
    • Al safety
  4. Measuring data ethics skills is difficult, so even if we can train people on AI ethics, we still have the challenge of measuring it. Perhaps it’s not quantitative, but qualitative.
  5. Bias, a term we use often, is always contextual. But to understand what bias means in any context, we first have to define what it means to be “the same” and what it means to be “different.”