Supercharged pandas: Encrypting Excel Files Written from DataFrames | by Ji Wei Liew | Jun, 2023


  1. Opening and Encrypting the Excel file
  2. Generating a strong password
  3. Putting it all together

Full code here.

When working with python and pandas, there are good reasons to open the Excel file, for example, visually inspecting it during testing, or formatting it before sending to stakeholders. If there is an additional need to encrypt the Excel file, then there would be 3 scenarios to handle: open-only, encrypt-only, open & encrypt. If we are neither opening nor encrypting the Excel file, then nothing needs to be done since df.to_excel() would suffice.

ExcelHelper is a class written to launch Excel (the application) and then open the workbook based on the path provided. Programmatically, this is a 2-step process. Most people never realize it because the Excel application and workbook launches together when you double-click an Excel file.

Initialize the ExcelHelper class

  • __init__(self, launch=True, encrypt=False, password=None, length=20, quit_=True) This is the initialization call for the ExcelHelper.
  • If launch equals to True, the workbook is displayed after encryption is done.
  • If encrypt equals to True, the self._encrypt() method, which will be explained later, is called.
  • password allows the user to input a preferred password, otherwise it will automatically suggest a strong password with length number of characters, where the maximum length is 255.

Open workbook

  • _open_wb(self, path, visible=False) converts the given path to an absolute path and then opens it. Converting a path to an absolute path is necessary otherwise applications dispatched by win32com.client cannot find the file. (Previously, I used a try-except block to prepend the path with the current working directory but that is unnecessarily verbose and takes a bit of time to really understand what one is trying to do.)
  • visible controls whether the application is visible to the user. Generally, it makes sense to show the application only after the encryption is completed. So if we are launching and encrypting, we should set visible=True only after self._encrypt() is done.

Encrypt Excel

  • _encrypt(self, visible=False) encrypts the Excel workbook and then displays the application by setting the self.xl.Visible attribute after the encryption is completed.
  • Setting self.xl.DisplayAlerts to True is important otherwise the launched Excel file will not give any alerts (case in point, if you press Ctrl+F and try to find some gibberish and there would be no prompt 😱; it happened for me and I was really confused!).

Execute method

  • execute(self, path, launch, encrypt, quit_) handles the 3 scenarios described above.
  • quit_ argument closes the Excel application (trailing underscore is a convention to denote that quit is a reserved keyword in python). When ExcelHelper is initiated, if launch=False, the Excel application runs in the background and the Excel file is opened. If the user now double-clicks on the Excel file, he will be prompted that it can only be open in read-only mode. It is quite difficult for non-technical user to close the file; the workaround would be to open Task manager, select the Excel program and then End Task. Therefore, there is a need to call .Quit() to terminate the Excel application. We could have just closed the workbook, but perhaps there is no need to handle this so finely for now.

Initially, I was using from cryptography.fernet import Fernet; Fernet.generate_key() to generate random passwords. While several users were pleasantly surprised by the length and randomness of the password, I did not like it very much because it is a bit too long and does not contain a variety of punctuation. I googled and found a better approach on StackOverflow. (I am always very impressed by how easily one can get very high quality answers on stackoverflow. All the tough work has already been done by all the giants, all we need to do is to search for it, copy, paste, and make small tweaks (e.g. change variable names).) The function is quite straightforward and rather self-explanatory.

import secrets
import string

def gen_password(self, length):
char = string.ascii_letters + string.digits + string.punctuation
return ''.join(secrets.choice(char) for _ in range(length))

Just as everything was going a bit too smoothly, while testing my code, I noticed that occasionally the passwords cannot be used to open the file! I was really puzzled. It took a bit of trial-and-error and I began to suspect that there could be some characters which are not suitable as passwords because this phenomenon only occurred when passwords contain 2 backslashes \.

Here’s a bit of background to allow you to appreciate the situation: I use Powershell and Notepad++, and my code prints the passwords to stdout. Next, I highlight the password printed on Powershell and then paste it when Excel prompts me to enter the password. So the issue is that is an escape character, hence the first should be ignored when I enter this as the password. It is troublesome to handle and for the purpose of passwords, I can do with one less character. Hence, all I did was slice out backslash in string.punctuation.

  def _get_password(self, length):
string_punc = string.punctuation[:23] + string.punctuation[24:]
char = string.ascii_letters + string.digits + string_punc
return ''.join(secrets.choice(char) for _ in range(length))

Since there is almost no value-add to instantiate an ExcelHelper object if you are not launching or encrypting the Excel file, one should start with if launch or encrypt:. Next, is merely passing the keyword arguments from to_excelp to ExcelHelper and returning the object and password.

def to_excelp(df, *arg, launch=True, encrypt=False, password=None, **kw):
''' Writes to Excel and opens it'''
filename, *arg = arg

if not filename.endswith(('.xlsx','.xls','.xlsm')):
filename += '.xlsx'

if os.path.isfile(filename):
name, ext = filename.rsplit('.')
filename = f'{name}_{timestr()}.{ext}'

# Default index=False
index = kw.get('index', False)
if not index:
kw['index']=False

df.to_excel(filename, *arg, **kw)
if launch or encrypt:
xl = ExcelHelper(filename, launch=launch, encrypt=encrypt, password=password)
return xl, xl.password
else:
return filename

If you are writing dataframes to several different Excel files by calling this function, I would recommend to store the results in a list of tuples. You can subsequently iterate over this list of tuples to get the path to the Excel files and their passwords. Storing the object might be useful in future, especially if you intend to add more functionalities to ExcelHelper.

l_xl_pw = []

for df in (df1, df2, df3, df4):
xl, pw = df.to_excelp(launch=False, encrypt=True, password=None)
l_xl_pw.append((xl, pw))

l_path_pass = [[xl.path, pw] for (xl, pw) in l_xl_pw]
df_path_pass = pd.DataFrame(l_path_pass, columns=['Path', 'Pw'])

# df_path_pass can also be written to Excel using .to_excelp(), how elegant! :D

ExcelHelper can also be added to other your existing scripts too.

def some_func():
df = pd.read_excel('some_file.xlsx')
# some data manipulation...
df.to_excel('some_file_modified.xlsx')

def some_func(launch=False, encrypt=True, password='5tr0ngP@ssw0rd'):
df = pd.read_excel('some_file.xlsx')
# some data manipulation...
df.to_excel('some_file_modified.xlsx')
if launch or encrypt:
xl = ExcelHelper('some_file_modified.xlsx', launch=launch, encrypt=encrypt, password=password)
return xl, xl.password

Revisiting old code written by myself is like a walk down memory lane which reveals how little I knew back then. Although I am very much embarrassed by it, I am glad to know that I have progressed since then.

“If you’re not embarrassed by your old code then you aren’t progressing as a programmer.” [Anonymous]

Writing these little classes, functions takes time but there are immense benefits for having them as it automates the mechanical and not-so-fun parts of work and allows one to focus on the important tasks. (Imagine having to think about passwords with uppercase, lowercase, numbers and punctuation and storing them in a file all the time.)



Source link

Leave a Comment