When it comes to structured data analysis Pandas is a great Python package to use. It’s fast, powerful and super flexible! If you’re new to Python or the Pandas library, here are a few tricks to help you get the most out this tool.
select_dtypes
Sometimes we have large tables to analyse, with a wide range of data types in each column. To save some time when pre-processing data within python, you can use the following command to give you a list of all the datatypes in your data frame.

df.dtypes.value_counts()
If for example your data frame contained columns with the datatype’s bool, int64, float64, object, category and timedelta64, you could then do
df.select_dtypes(include=[‘float64’, ‘int64’])
to select a sub-dataframe with only the selected datatypes. This could help you to quickly separate out chunks of your data and apply changes necessary to specific datatypes.
map
This command allows you to substitute values in a dataframe and do easy transformations. It is used in the following way, where the argument can be a function, a dictionary or a series.
convertor = {‘cat’: 1, ‘dog’: 2, ‘hamster’: 3}
df[‘converted_c’] = df[‘c’].map(convertor) df[column].map(argument)

map
This command allows you to substitute values in a dataframe and do easy transformations. It is used in the following way, where the argument can be a function, a dictionary or a series.
convertor = {‘cat’: 1, ‘dog’: 2, ‘hamster’: 3}
df[‘converted_c’] = df[‘c’].map(convertor) df[column].map(argument)

You can set NaN values or add na_action = ‘ignore’ to skip. This is another great pre-processing tool and can be used to identify trends and patterns within data sets.
copy
If you haven’t already heard of this command, you are missing out on an important one that can save you from easy to make errors when data handling. Let’s take a look at the following scenario:
df1 = pd.DataFrame({‘Col1’: [1, 2, 3], ‘Col2’: [4, 5, 6]})
df2 = df1
df2[‘Col1’] = df2[‘Col1’] + 1
print(df1)

You may think that printing ‘df1’ would give you the original dataframe, however you’ll find that ‘df1’ has been altered. This is because when assigning df2 to df1 you are setting up a pointer, therefore changes to df2 affect df1 and vice versa. To fix this issue we can use our handy copy() command.
df2 = df1.copy()
This creates a copy of df1, allowing you to make changes to df2 without affecting your original dataframe.

Percentile groups
The next hack is an alternative way to classify values in a column into groups. You might currently use pandas.cut but here is another option that is much more efficient.
cut_points = [np.percentile(df[‘c’], i) for i in [50, 80, 95]]
df[‘group’] = 1
for i in range (3):
df[‘group’] = df[‘group’] + (df[‘c’] < cut_points[i])
This uses the NumPy percentile command, so don’t forget to import the package. As no apply function is used, it’s a lot faster to run.

value counts
We used this command in slightly different way to check datatypes, however on its own it can be used to check value distributions. For example, if you’d like to check the possible values and the frequency for each individual value in a column you can do the following

df[‘col’].value_counts ()
There’re some useful tricks / arguments of it:
- normalize = True: if you want to check the frequency instead of counts.
- dropna = False: if you also want to include missing values in the stats.
- df[‘c’].value_counts().reset_index(): if you want to convert the stats table into a pandas dataframe and manipulate it
- df[‘c’].value_counts().sort_index(): show the stats sorted by distinct values in column ‘c’ instead of counts.
I hope you found these hacks useful and are able to apply them to some of the things you are currently working on. Pandas has so much to offer, so take some time to look into some of the cools things you can do with it