Filegen - file generator for tool testing

12 Apr 2013

One of my students is currently researching data recovery on solid state drives. Part of the testing requires that he create a large number of files with known and easily identifiable content. There are many ways of doing this and it is something I have done many times before. However every time have meant to write a script to do the work. This time round I figured I would write something to solve the program once and for all.

In this case the objective was to be able to determine the amount of recoverable data after a collection of files had been wiped. So we needed:

  1. Files of different sizes (including small enough to be resident)
  2. Unique filenames
  3. Readily identifiable unique content for each file

So I wrote filegen.py to generate the files we needed. It is a pretty simple program, in fact processing the options takes more code than generating the files but is simple to use and makes generating test files easy.

In order to address the unique filename requirement the user is able to pass a base filename that will then have a number appended to it. The file is then filled with a repeating pattern of the filename plus the size of the file (in bytes). When the string does not fit the desired file size the end of the file is padded with zeros. This way a keyword search can be used to identify how much of any given file is recoverable. Given the file size it is a simple matter to determine how many times the pattern repeats in the file.

The options are:
-s[1024]   size in bytes, can include a range if -c is also used i.e. -s 1024-1024000 -c 5 will produce 5 files between 1024 and 1024000 bytes
-c[1]   count of files to be produced
-t[evidence]    text to be included in the file this will have a file count number appended to the end of it.
-i[1]   inital number, the starting number to be appended to the string
-b[8192]   block size (how much data is written to file at once
-e[1]     step for difference between sizes in random files
-p [.\]    path for files to be written to, will be created if it does not exist
-v    version

One surprising outcome of the research was that in some cases more file content was recoverable than should have been written to the disk, but that is a story for another time.

The program is available as python code or a windows binary.



-----------------------