Representing Data

I took this course for two reasons. First, I wanted a course more oriented around developing data visualization skills, as opposed to another course that focused on writing skills, of which I’ve had many. Second, I wanted to explore how ethics is related to data visualization. Even though I started my career as a designer, I never really explored the impact of ethics on data visualization. In short, I accepted the data I was given and then made it look good. The course itself teaches data visualization skills through the use of different technologies and applications. However, the tools are meant to be used in a particular order, which is interesting and valuable for projects in the future. In design school, we’d often start with sketches and then move directly to first-round mock-ups in Adobe Whatever. In this course, there is more of a progression of tools, each meant to be used for a specific phase of the project, which I’ve briefly described below:

  1. Sketches: This is the most basic step but arguably the most important. The reason is that you have to decide what exactly you want to visualize. One important thing I learned, or shall I say, re-learned, in the course is that data visualization does not mean you have to visualize every single column and row. Instead, one must make deliberate decisions on what to focus on. The story in the data is often a subset of the entire dataset, much like how the real story in a movie is really a subset of everything that was filmed. Sketches help you decide what to focus on at the onset.
  2. RawGraphs: This was a new tool for me and one I really liked. RawGraphs is excellent for rapid data experimentation. It allows you to load a dataset (or pick one of theirs) and start experimenting with different visualizations without building and rebuilding data sets. These visualizations are pre-configured, so you simply have to assign which columns you want to focus on and hit “enter.”
  3. Tableau: Obviously, this is a juggernaut application and one that I’ve been using for years. I will also say that it’s still the best and probably the most robust app in the space compared to Looker, Power BI, etc.) The learning curve on Tableau is steep, but they have a great online learning community built around the product with tons of free resources. Tableau is the main application we used for the group project, whereby our four-person group chose a dataset and set to task visualizing it. The thing about Tableau is you can do a lot with your visualization rather quickly. However, the best visualizations always have lots of nuance and detail, and these aspects require a learning commitment. 
  4. Python: This is the one tool taught in the class that isn’t “drag ‘n drop.” Anyone who uses Python knows that the packages bring data to life. In this course, we’re focusing primarily on Seaborn and Altair, which are used for statistical data visualization, and Vega-Altair, which is used for interactive data visualizations. Of course, you still need the basic Python packages like Pandas, MatplotLib, and NumPy, but most of the coursework requires visualization tools. I prefer Python over Tableau for the hard-core statistical analysis work. Tableau has some of this functionality, but doesn’t offer the level of control that Python does. Plus, it’s much better for use cases such as visualizing ML model performance.

This course has also encouraged me to rethink the data visualization components of my futures project. My futures project, which focuses on shifting ethics from compliance to competency in what I call “ethics as a skillset,” is majoritively a written report that includes some information tables on specific research aspects of the industry, including adoption trends, current business processes, and ethical frameworks currently in use. At first, I thought I’d approach the data visualization as a set of standard tables and whatnot. However, now I’m thinking about how to make them more interactive. Going further, I’m even considering creating the entire future project as a Jupyter notebook. Not because it’s simply a novel format but because the whole point is to get developers to integrate it into their current process, which means how they read and access the information itself. Remember, while the assignment itself is academic, its use is meant to be practical and used in everyday practices. This is an idea I’ll speak to my advisor about in our second meeting.