- Python Data Science Essentials
- Alberto Boschetti Luca Massaron
- 178字
- 2021-08-13 15:19:38
Data Munging
We are just getting into the action with data! In this chapter, you'll learn how to munge data. What does data munging mean ?
The term mung is a technical term that was coined about half a century ago by students of at Massachusetts Institute of Technology (MIT). Munging means to change, in a series of well-specified and reversible steps, a piece of original data to a completely different (and hopefully more useful) one. Deep-rooted in hacker culture, munging is often described in the data science pipeline using other, almost synonymous, terms such as data wrangling or data preparation.
Given such premises, in this chapter, the following topics will be covered:
- The data science process (so that you'll know what is going on and what's next)
- Uploading data from a file
- Selecting the data you need
- Cleaning up any missing or wrong data
- Adding, inserting, and deleting data
- Grouping and transforming data to obtain new and meaningful information
- Managing to obtain a dataset matrix or an array to feed into the data science pipeline