Shaking up Mental Models

Author

Kevin

Published

2021-09-14


Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Once you see it; you can’t unsee it.🤔

One benefit of learning a new framework is that it gives you the opportunity to compare and contrast with an established mental model.

I have been taking a Pandas Course from Kaggle and the lesson in Pandas involved establishing new variables in order to extract information from the data frame.

indices = [0, 1, 10, 100]
var = ['country', 'province', 'region_1', 'region_2']
df = reviews.loc[indices, var]

This seemed like a pain in order to select certain rows and columns, but it did open my perspective to a challenge I was having.

I have been working on reading data from a Qualtrics survey and there are nearly 147 columns and only about 117 are needed. (Long story on templated survey tools). To parse the data frame down, I had been using indexes for selections. Using an index is okay but frustrating as you are testing because the index selection breaks when there is a change to the survey. It was also a pain to write out all those terrible column names. The python script above made me think to create a vector to reference in a select statement.

Is this possible–yes it is, and now I seem to see it everywhere.

Below is a minimal example with the ‘mtcars’ data set.

remove <- mtcars %>%
  dplyr::select(drat, wt, qsec)

remove <- names(remove) #create a vector with the names of the columns you eventually want to exclude


new_mtcars <- mtcars %>%
  dplyr::select(-all_of(remove))  #within the select statement us the helper 'all_of' with the - operator to deselect the vector of interest. 


new_mtcars
                     mpg cyl  disp  hp vs am gear carb
Mazda RX4           21.0   6 160.0 110  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110  0  1    4    4
Datsun 710          22.8   4 108.0  93  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175  0  0    3    2
Valiant             18.1   6 225.0 105  1  0    3    1
Duster 360          14.3   8 360.0 245  0  0    3    4
Merc 240D           24.4   4 146.7  62  1  0    4    2
Merc 230            22.8   4 140.8  95  1  0    4    2
Merc 280            19.2   6 167.6 123  1  0    4    4
Merc 280C           17.8   6 167.6 123  1  0    4    4
Merc 450SE          16.4   8 275.8 180  0  0    3    3
Merc 450SL          17.3   8 275.8 180  0  0    3    3
Merc 450SLC         15.2   8 275.8 180  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205  0  0    3    4
Lincoln Continental 10.4   8 460.0 215  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230  0  0    3    4
Fiat 128            32.4   4  78.7  66  1  1    4    1
Honda Civic         30.4   4  75.7  52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65  1  1    4    1
Toyota Corona       21.5   4 120.1  97  1  0    3    1
Dodge Challenger    15.5   8 318.0 150  0  0    3    2
AMC Javelin         15.2   8 304.0 150  0  0    3    2
Camaro Z28          13.3   8 350.0 245  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175  0  0    3    2
Fiat X1-9           27.3   4  79.0  66  1  1    4    1
Porsche 914-2       26.0   4 120.3  91  0  1    5    2
Lotus Europa        30.4   4  95.1 113  1  1    5    2
Ford Pantera L      15.8   8 351.0 264  0  1    5    4
Ferrari Dino        19.7   6 145.0 175  0  1    5    6
Maserati Bora       15.0   8 301.0 335  0  1    5    8
Volvo 142E          21.4   4 121.0 109  1  1    4    2

Conclusion

Learning python helped me shake up my mental model and apply it to my R workflow.


About

Kevin is a nonprofit data professional operating out of Lakeland, Florida.
My expertise is helping nonprofits collect, manage and analyze their program data.