Ever tried to copy a directory/folder in Python? Ever failed? No matter. Try again!

If you haven't already read it, look at the article How to Copy a File in Python with shutil for an explanation of how to copy files with shutil.

Recursively Copying a Directory/Folder of Files in Python

In the article that was mentioned above, we saw how to copy individual files in Python. I think you'll agree that much more useful is the ability to copy entire directories into other directories.

Python's shutil module once again saves our butts and has a function called copytree for doing just what we want.

We don't need that drastic of a change, so I'll modify our code above slightly to be able to copy directories as follows:

Ok, nice! Almost the same as our function that copies single files, but it only allows to copy directories instead of files and the exceptions thrown are a bit different. Awesome.

Copying Both Files and Directories in Python

But what if we want one solid function that copies both files and directories? No problem! We can write a simple, slightly more complex function for that.

Observe:

This function will copy both files and directories. First, we put our copytree function in a try block to catch any nasty exceptions. If our exception was caused because the source directory/folder was actually a file, then we copy the file instead. Simple stuff.

A note to add is that it really isn't possible to actually copy a directory over. You need to recursively walk the directories and create the directory structure based on the old one, and copy the files in each sub-directory to their proper directories at the destination. If you wished to implement this yourself, see this article for an example of how to recursively walk directories in Python.

Ignoring Files and Directories

The function shutil.copytree takes in an argument that allows you to specify a function that returns a list of directories or files that should be ignored.
A simple example function that does this is as follows:

Alright, what does this function do? Well, we specify some sort of file or directory name as the argument ignore, which acts as the filter for names. If ignore is in names, then we add it to an ignored_names list that specifies to copytree which files or directories to skip.

How would we use this function? See our modified copy function below:

But what if we want to filter out more than just one file? Luckily, shutil has got us covered again! It's got a function called shutil.ignore_patterns that allows us to specify glob patterns to filter out files and directories. See our once again modified copy function below:

This will ignore all Python files, shell files and our own specific file when copying over the directory. ignore_patterns takes in arguments that specify the patterns to ignore and returns a function that copytree can understand, a lot like how our custom function ignore_function did it, but more robust.

That's all I have to share with you!

Peace.

About The Author

  • Malik Rumi

    Joey, nice article, but you left me confused when you said “A note to add is that it really isn’t possible to actually copy a directory over. ” Isn’t that the whole point of this article? What do you mean it isn’t possible? Isn’t that what you just did? I don’t get it…… :-(

    • http://jacksonc.com Jackson Cooper

      Hi Malik. At a file-system level, everything is flat (think of a hard drive like a piece of paper), and technically you can’t have a hierarchy of data. You simply have data all at the same level, and each file / directory has meta-data specifying how they are all linked. So for a directory, the directory itself and all files beneath it would be represented by an ID (in Unix systems, called an inode), where they all link up with each other. The OS then displays it in a hierarchical way.

      So at a technical level, you can’t simple “copy” a directory. If you “copied” a directory, you would only create a new directory object with a new inode, and that’s it. You must create the new directory, and then recursively iterate through the hierarchy, creating the same objects (files and directories). Then you need to open file descriptors for every file, read from the old object, and write to the new object.

  • Keds

    I have a small point to make on the code copying file or directory: you should not be expecting exceptions as a matter of course. Exceptions should be exceptional. It will be better to check whether the file is a directory or a file and then call the appropriate function, rather than relying on an exception.

    • http://jacksonc.com Jackson Cooper

      Hi Keds. Working with files and directories opens the possibilities of race conditions. You can check if it’s a directory / file and check for read and write permissions, but unless you lock the resource it may not be available when you access it. Say the checks are done, and all looks good, and the script goes to copy the file. But in the very small time between performing the checks and doing a read on the file, the file (or directory) has been renamed, deleted, or simply another program is writing to the file. Then your program will crash. In this case exceptions must be used, because the only reliable way to check if the program has acquired a lock on the file is after you attempted to.

  • Sachin Karur

    Works flawlessly! very well explained too. Thank you very much!