One of my students is currently researching data recovery on solid state drives. Part of the testing requires that he create a large number of files with known and easily identifiable content. There are many ways of doing this and it is something I have done many times before. However every time have meant to write a script to do the work. This time round I figured I would write something to solve the program once and for all.
In this case the objective was to be able to determine the amount of recoverable data after a collection of files had been wiped. So we needed:
So I wrote filegen.py to generate the files we needed. It is a pretty simple program, in fact processing the options takes more code than generating the files but is simple to use and makes generating test files easy.
In order to address the unique filename requirement the user is able to pass a base filename that will then have a number appended to it. The file is then filled with a repeating pattern of the filename plus the size of the file (in bytes). When the string does not fit the desired file size the end of the file is padded with zeros. This way a keyword search can be used to identify how much of any given file is recoverable. Given the file size it is a simple matter to determine how many times the pattern repeats in the file.
The options are:
One surprising outcome of the research was that in some cases more file content was recoverable than should have been written to the disk, but that is a story for another time.