OK, Python

Automate repetitive tasks in Excel,
let Python do the work for you
and simplify your life.

Get the book

OK, Python

I know Python will simplify my life, but I don't know how to start

This book is a manual for all who work with Excel and want to learn to use Python.

This means for all those who are tired of excessive mouse clicking, copy and paste operations and all that manual work they have to do every day.

Numerous examples of Python, Pandas and NumPy can be found on the Internet. What I personally miss, however, is a comprehensive set of examples which can be used in practice, examples which you can easily modify and adapt, examples which you can easily understand although you are not a programmer.

And this is the very objective of this manual. Showing how to solve everyday tasks quickly and elegantly, with just a few lines of code. And it's all thanks to the Python programming language and the Pandas and NumPy libraries.

Do not reinvent the wheel

The internet is full of long, messy and inefficient code. If you want to read and understand such examples, make yourself a cup of coffee and prepare yourself to spend a lot of time trying to understand them.

I personally think it's wrong. I think it's better (and smarter) if you can do something quickly and easily. I don't think productivity is a competition to see who can write the most complicated program.

Take a look at the following examples. Both do the same thing, but one is shorter, more readable and faster. Which one will you choose?

import openpyxl
import re

workbook = openpyxl.load_workbook('data/input.xlsx')
worksheet = workbook['Sheet1']

for row in range(1, worksheet.max_row+1):
  for cell in range(1, worksheet.max_column+1):
    value = worksheet.cell(row, cell).value
    if isinstance(value, str):
      worksheet.cell(row, cell).value = re.sub(r'A|B|C', '*', value)

workbook.save('data/output.xlsx')

import pandas as pd

df = pd.read_excel('data/input.xlsx')
df.replace('A|B|C', '*', regex=True, inplace=True)
df.to_excel('data/output.xlsx', index=False)

Most books about Python are too theoretical

Theory certainly has its place in education. Especially if you want to be a top programmer. But if you want to learn clever ways to make your job easier, a lot of uninteresting theory is useless. Practice is much more useful.

Python will save you a lot of work and time. And you literally only need a few lines of code.

For whom this book is intended

The book will most of all benefit those with at least essential knowledge in Python. If they have come across the Pandas library, it is a plus. If not, never mind. The examples will lead them in the right direction.

Practical instructions in the book, however, can also be used by those who work with Python and Pandas already. But perhaps not so efficiently as they could.

The input and output format to be used is Excel (.xlsx format). The reason is that data in Excel can be found in every company. This format is so flexible as to enable its users to be enormously creative. Sometimes, however, this creativity must be tamed, and it is exactly in this way that Python will help you.

For whom this book is not intended

This book is not suitable for absolute beginners with no previous experience in Python.

Organization of individual chapters

Every example is based on tasks which can be solved by means of Excel. The aim is to show how Python and Pandas can be used for these tasks. The result is an easily comprehensible code, mostly consisting of a few lines only, which you can easily adapt to your specific needs.

Test data

This book does not contain test data files on purpose. You can generate data in the XLSX format yourself by means of NumPy and Pandas libraries. The Excel file created is then used as the input file in the Solution section.

Creation of test data often includes use of the NumPy library. This library works efficiently with memory and is quick even at a high number of lines.

Beginners can copy and run the code in this part with an easy conscience. The more advanced can find in it alternative solutions to tasks for which they may use for loops in Python.

Solution

The given task is solved in a few lines, easily comprehensible even to beginners. Each part includes commentary for better understanding of what is going on. Thanks to the Pandas library, data processing is quick even at a large data volume.

About the author

Lubo

Lubomír "Lubo" Husar

  • Founder of Data21.io - Hire skilled remote Data Analysts and Engineers.
  • Founder of LovelyData.cz - Online courses trusted by major companies and government institutions.
  • Manager/consultant at ABN Amro, DHL, Electrabel, Sibelga, Fortis, IBM, ING Bank, O2, Volvo Cars and other companies on many projects in many countries.
  • Author of the book OK, Python - Automate repetitive tasks in Excel, let Python do the work for you and simplify your life.
  • Other ventures that would make this list much longer. ;)

Contents

  1. How to replace text and numbers
  2. Split e-mail addresses (using the at-sign)
  3. Select only numbers from the text
  4. How to find out minimum and maximum
  5. How to use IF
  6. What to do with VLOOKUP
  7. Mark duplicate values
  8. Splitting one sheet into more sheets
  9. Merging more sheets into a single sheet
  10. Generate calendar
  11. Compare sheets and find differences
  12. Rearranging columns
  13. Remove extra spaces
  14. A simple pivot table
  15. Complex pivot table
  16. Listing files in a folder
  17. Deleting empty lines
  18. Deleting rows by condition
  19. Creating HTML table
  20. Count sheets, rows and columns

What you'll get

  • Easy-to-follow guide to increase your productivity
  • PDF Book (80 pages)
  • 20 chapters
  • Ready to use Python code
  • 41 Python files in single zip file