Handling Binary Files in Python
While text files are human-readable, many files you’ll encounter are binary files. These files store data in its raw, binary format (as a sequence of bytes), which is not meant to be read by humans directly but is highly efficient for computers to process.
Common examples of binary files include:
- Images:
.jpg,.png,.gif - Audio/Video:
.mp3,.mp4,.wav - Compiled Code:
.exe,.dll,.pyc - Compressed Archives:
.zip,.gz - Serialized Data: Python
picklefiles, database files
The Key: Binary Mode ('b')
To work with binary files, you must append a 'b' to your file mode string. This tells Python to make no assumptions about the file’s content—it will not attempt to encode or decode the data, nor will it translate newline characters. It will simply read and write raw bytes.
'rb': Read Binary. Opens for reading in binary mode.'wb': Write Binary. Opens for writing in binary mode (erases or creates).'ab': Append Binary. Opens for appending in binary mode.
Critical Rule: Never open a binary file in text mode, and never open a text file in binary mode. Doing so will almost certainly lead to UnicodeDecodeError or corrupt your file.
Writing and Reading Raw Bytes
When you work with binary files, you are not dealing with str objects, but with bytes objects. A bytes literal in Python is created by prefixing a string with a b.
Writing with 'wb'
You can write a bytes object directly to a file opened in a binary write mode.
Pyground
Create a file named `data.bin` and write a sequence of raw bytes to it.
Expected Output:
Successfully wrote 8 bytes to data.bin
Output:
Practical Use Case: Copying an Image
A perfect real-world example of binary file handling is making a copy of an image. You simply read all the bytes from the source image and write them to a destination file.
The Process
- Open the source image file in read binary (
'rb') mode. - Read the entire content into a
bytesobject. - Open the destination file in write binary (
'wb') mode. - Write the
bytesobject to the new file.
Pyground
Copy the Python logo from a public URL to a local file named `python_logo.png`.
Expected Output:
Image successfully downloaded and saved as python_logo.png
Output:
After running the code above, you should see a new file named python_logo.png in your project directory. You can open it to verify that the copy is perfect!
Serialization: Storing Python Objects with pickle
What if you want to save a complex Python object, like a dictionary or a custom class instance, to a file? You can’t just .write() it. This is where serialization comes in.
Serialization (or “pickling” in Python) is the process of converting a Python object into a byte stream. Deserialization (“unpickling”) is the reverse process: converting the byte stream back into a Python object.
The pickle module is Python’s built-in library for this task. It’s incredibly powerful for saving and loading application state.
pickle.dump(): Saving an Object
The pickle.dump(obj, file) function serializes your object obj and writes it to the file object. The file must be opened in a binary write mode ('wb').
Pyground
Create a dictionary with user settings and save it to a file named `settings.pkl`.
Expected Output:
User settings have been pickled and saved to settings.pkl
Output:
Security Warning: Never Unpickle Data from an Untrusted Source!
The pickle format is not secure. A malicious pickle file can be crafted to execute arbitrary code on your computer during unpickling. Only ever unpickle data that you trust. For exchanging data with untrusted sources, use secure, human-readable formats like JSON.