PythonFile HandlingWorking with Binary Files

Handling Binary Files in Python

While text files are human-readable, many files you’ll encounter are binary files. These files store data in its raw, binary format (as a sequence of bytes), which is not meant to be read by humans directly but is highly efficient for computers to process.

Common examples of binary files include:

  • Images: .jpg, .png, .gif
  • Audio/Video: .mp3, .mp4, .wav
  • Compiled Code: .exe, .dll, .pyc
  • Compressed Archives: .zip, .gz
  • Serialized Data: Python pickle files, database files

The Key: Binary Mode ('b')

To work with binary files, you must append a 'b' to your file mode string. This tells Python to make no assumptions about the file’s content—it will not attempt to encode or decode the data, nor will it translate newline characters. It will simply read and write raw bytes.

  • 'rb': Read Binary. Opens for reading in binary mode.
  • 'wb': Write Binary. Opens for writing in binary mode (erases or creates).
  • 'ab': Append Binary. Opens for appending in binary mode.
🔥

Critical Rule: Never open a binary file in text mode, and never open a text file in binary mode. Doing so will almost certainly lead to UnicodeDecodeError or corrupt your file.

Writing and Reading Raw Bytes

When you work with binary files, you are not dealing with str objects, but with bytes objects. A bytes literal in Python is created by prefixing a string with a b.

Writing with 'wb'

You can write a bytes object directly to a file opened in a binary write mode.

Pyground

Create a file named `data.bin` and write a sequence of raw bytes to it.

Expected Output:

Successfully wrote 8 bytes to data.bin

Output:

Practical Use Case: Copying an Image

A perfect real-world example of binary file handling is making a copy of an image. You simply read all the bytes from the source image and write them to a destination file.

The Process

  1. Open the source image file in read binary ('rb') mode.
  2. Read the entire content into a bytes object.
  3. Open the destination file in write binary ('wb') mode.
  4. Write the bytes object to the new file.

Pyground

Copy the Python logo from a public URL to a local file named `python_logo.png`.

Expected Output:

Image successfully downloaded and saved as python_logo.png

Output:

💡

After running the code above, you should see a new file named python_logo.png in your project directory. You can open it to verify that the copy is perfect!

Serialization: Storing Python Objects with pickle

What if you want to save a complex Python object, like a dictionary or a custom class instance, to a file? You can’t just .write() it. This is where serialization comes in.

Serialization (or “pickling” in Python) is the process of converting a Python object into a byte stream. Deserialization (“unpickling”) is the reverse process: converting the byte stream back into a Python object.

The pickle module is Python’s built-in library for this task. It’s incredibly powerful for saving and loading application state.

pickle.dump(): Saving an Object

The pickle.dump(obj, file) function serializes your object obj and writes it to the file object. The file must be opened in a binary write mode ('wb').

Pyground

Create a dictionary with user settings and save it to a file named `settings.pkl`.

Expected Output:

User settings have been pickled and saved to settings.pkl

Output:

☠️

Security Warning: Never Unpickle Data from an Untrusted Source! The pickle format is not secure. A malicious pickle file can be crafted to execute arbitrary code on your computer during unpickling. Only ever unpickle data that you trust. For exchanging data with untrusted sources, use secure, human-readable formats like JSON.