Skip to main content
University of Texas University of Texas Libraries

Data Visualization

Data Preparation

Data Processing Pipeline

Steps of the data processing pipeline: data source, discover and acquire, extract, clean, transform and integrate (this stage will take 60-80% of your time), analyze, and present

 

The Clean, Transform, and Integrate stage will take 60-80% of your time. Data visualization can be a part of the analyze or present stages. 

Data Processing Tips for Spreadsheets

Cardinal Rules for Spreadsheets

  • Put all your variables in columns.
  • Don't combine multiple pieces of information in one cell.
  • Put each observation on its own row. 
  • Leave the raw data raw- don't mess with it!
  • Export the cleaned data to a text based format like CSV.

Common Spreadsheet Issues

  • Multiple tables
  • Multiple tabs
  • Not filling in zeros
  • Using bad null values
  • Using formatting to convey information
  • Using formatting to make the data sheet look pretty
  • Placing comments or units in cells
  • More than one piece of information in a cell
  • Field name problems
  • Special characters in data
  • Inclusion of metadata in data table
  • Date formatting

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 2.0 Generic License.