Alexander Gromnitsky's Blog

Steganography with zip archives

Latest update:

The elegance of CVE-2020-1464 comes from the internal structure of the Zip file format. While many other archive formats, such as Microsoft Cab, place an index of the compressed files at the beginning of the archive, zip archivers position it at the end of the file.

The reason is historical: apparently, in 1989, disk drives were so slow that it was more cost-effective to add a new blob to an existing file and append a new index to it, rather than copying chunks of the original archive to a new file.

The CVE reminded me of an old joke about hiding a .zip file in a .jpg. When you append a .zip file to an image file, the recipient of the jpeg may not necessarily notice any junk in the image. However, if you are aware of such a "hidden" part, any ordinary unzip tool is capable of extracting it.

This got me thinking: can we hide a file inside of a .zip? BlackHat Europe 2010 had a talk about steganography in popular archives formats. In one of the described tricks, carefully inserting a blob before a zip index, makes it invisible to all common unpackers.

To verify this claim, I wrote a couple of small Ruby scripts, that inject & extract a 'hidden' blob. The approach works: Windows Explorer, 7-Zip, WinRAR, bsdtar(1), unzip(1) didn't see anything unusual. Even in the extreme cases like:

$ du -h foo.zip
4.1G  foo.zip

$ bsdtar ftv foo.zip
-rw-r--r--  0 1000   100         1 Aug 25 21:58 q

that certainly may look unusual to an innocent user--a 4 gigabyte archive that unpacks into an exactly 1 byte file! The opposite of a zip bomb.

A Zip index is formally termed central directory. It consists of 2 main parts: ① central directory headers (CHDs) & ② end of central directory (EOCD) record. A CHD contain metadata about a particular file, EOCD--metadata about the index itself:1

class Eocd < BinData::Record
  endian :little

  uint32 :signature, asserted_value: 0x06054b50
  uint16 :disk
  uint16 :disk_cd_start
  uint16 :cd_entries_disk
  uint16 :cd_entries_total
  uint32 :cd_size
  uint32 :cd_offset_start
  uint16 :comment_len
  string :comment, :read_length => :comment_len,
         onlyif: -> { comment_len.nonzero? }
end

The thing of interest here is cd_offset_start (officially called offset of start of central directory2), a 4-byte value that indicates how many bytes to skip after the first file entry in an archive.

Therefore, after inserting a blob, we need to update cd_offset_start, otherwise the zip file becomes broken.

Just because a user has no clue about the hidden blob whatsoever, doesn't mean specialized tools won't notice it. Say, we have an archive w/ 2 text files:

$ bsdtar ft orig.zip
The Celebrated Jumping Frog of Calaveras County.txt
What You Want.txt

We inject a .png image to it:

$ zipography-inject orig.zip blob1.png > 1.zip

Whilst bsdtar is still none the wiser:

$ bsdtar ft 1.zip
The Celebrated Jumping Frog of Calaveras County.txt
What You Want.txt

Hachoir correctly recognises it as an unparsed block:


  1. This is a DSL from BinData package that provides a declarative way to read/write structured binary data in Ruby.
  2. Field names in PKWARE's spec are quite verbose.

Tags: ойті
Authors: ag