How do you organize your computer files? How do you maintain organization of your computer files? Anyone have any tips or best practices for computer file organization?
I’ve recently started formalizing my computer file organization. For years my computer file organization would have been best described as ad-hoc and short-sighted. Even now, after trying to clean up the mess, when I look at some directories from 5 or more years ago I have a very hard time telling what separates two different versions of the same directory. I rarely left README like files explaining what’s what, mostly because I didn’t think about it.
Here are a few things I’ve learned:
Decide on a reasonable directory structure and iterate towards a better one. I can’t anticipate how my needs would be better served by a different structure in the future, so I don’t try that hard to. I can create new directories and move things around as needed. My current home directory is roughly structured into the following directories: backups, classes, logs, misc (financial info, etc.), music, notes, projects (old projects the preceded my use of version control), reference, svn, temp (files awaiting organization, mostly because I couldn’t immediately think of an appropriate place for them), utils (local executable utilities).
Symbolic links are necessary when you think a file might fit well in two places in a hierarchy. I don’t care too much about making a consistent rule about where to put the actual file.
Version control allows you to synchronize files across different computers, share them with others, track changes, roll back to older versions (where you can know what changed based on what you wrote in the log), and encourages good habits (e.g., documenting changes in each revision). I use version control for most of my current projects, even those that do not involve programming (e.g., my notes repository is about 700 text files). I don’t think which version control system you use is that important, though some (e.g., cvs) are worse than others. I use Subversion because it’s simple.
I store papers, books, and other writings that I keep in a directory named reference. I try to keep a consistent file naming scheme: Author_Year_JournalAbbreviation.pdf. I have a text file that lists my own journal abbreviation conventions. If the file is not from a journal, I’ll use something like “chapter” or “book” as appropriate. (Other people use softwares like Zotero or Mendeley for this purpose. I have Zotero, but mostly use it for citation management because I find it to be inconvenient to use.)
In terms of naming files, I try to think about how I’d find the file in the future and try to make it obvious if I navigate to the file or search for it. For PDFs, you often can’t search the text, so perhaps my file naming convention should include the paper title to help with searching.
README files explaining things in a directory are often very helpful, especially after returning to a project after several years. Try to anticipate what you might not remember about a project several years disconnected from it.
Synchronizing files across different computers seems to encourage me to make sure the directory structure makes at least some sense. My main motivation in cleaning things up was to make synchronizing files easier. I use rsync; another popular option is Dropbox.
Using scripts to help maintain your files is enormously helpful. My goals are to have descriptive file names, to have correct permissions (important for security; I’ve found that files that touched a Windows system often have completely wrong permissions), to minimize disk space used, and to interact well with other computers. I have a script that I titled “flint” (file system lint) that does the following and more:
checks for duplicate files, sorting them by file size (fdupes doesn’t do that; my script is pretty crude and not yet worth sharing)
scans for Windows viruses
checks for files with bad permissions (777, can’t be written to, can’t be read, executable when it shouldn’t be, etc.)
deletes unneeded files, mostly from other filesystems (.DS_Store, Thumbs.db, Desktop.ini, .bak and .asv files where the original exists, core dumps, etc.)
checks for nondescriptive file names (e.g., New Folder, untitled, etc.)
checks for broken symbolic links
lists the largest files on my computer
lists the most common filenames on my computer
lists empty directories and empty files
I’d be very interested in any other tips, as I often find my computer file organization to be a bottleneck in my productivity.
This is set to auto-delete everything in it weekly. I had a chronic problem where small files that were useful for some minor task or another from months or years ago would clutter up everything. This was my “elegant” solution to the problem and it’s served me well for years, because it gave me an actual incentive to put my finished work in a sensible place.
Although now that I think about it, it would be a better idea for it to only delete files that haven’t been touched for a week, rather than wiping everything all at once on a Saturday ..
Although now that I think about it, it would be a better idea for it to only delete files that haven’t been touched for a week, rather than wiping everything all at once on a Saturday ..
The Linux program tmpreaper will do this. It can be made into a cron job. I’ve got mine set for 30 days.
If you’re comfortable with command-line UIs, git-annex is worth a look for creating repositories of large static files (music, photos, pdfs) you sync between several computers.
I use regular git for pretty much anything I create myself, since I get mirroring and backups from it. Though it’s mostly text, not audio or video. Large files that you change a lot probably need a different backup solution. I’ve been trying out Obnam as an actual backup system. Also bought an account at an off-site shell provider that also provides space for backups.
Use the same naming scheme for your reference article names and the BibTeX identifiers for them, if you’re writing up some academic research.
GdMap or WinDirStat are great for getting a visualization of what’s taking space on a drive.
If your computer ever gets stolen, you probably want it to have had a full-disk encryption. That way it’s only a financial loss, and probably not a digital security breach.
It constantly fascinates me that you can name the exact contents of a file pretty much unambiguously with something like a SHA256 hash of it, but I haven’t found much actual use for this yet. I keep envisioning schemes where your last-resort backup of your media archive is just a list of file names and content hashes, and if you lose your copies you can just use a cloud service to retrieve new files with those hashes. (These of course need to be files that you can reasonably assume other people will have bit-to-bit equal copies of.) Unfortunately, there don’t seem to be very robust and comprehensive hash-based search and download engines yet.
I keep envisioning schemes where your last-resort backup of your media archive is just a list of file names and content hashes, and if you lose your copies you can just use a cloud service to retrieve new files with those hashes.
They probably know about it already. I think the eDonkey network is pretty much what I envision. The problem is that the network needs to be very comprehensive and long-lived to be a reliable solution that can be actually expected to find someone’s copy of most of the obscure downloads you want to hang on to, and things that people try to sue into oblivion whenever they get too big have trouble being either. There’s also the matter of agreeing on the hash function to use, since hash functions come with a shelf-life. A system made in the 90s that uses the MD5 function might be vulnerable nowadays to a bot attack substituting garbage for the known hashes using hash collision attacks. (eDonkey uses MD4, which seems to be similarly vulnerable to MD5.)
There probably are parts of the problem that are cultural instead of technical though. People aren’t in the mindset of wanting to have their media archive as a tiny hash metadata master list with the actual files treated as cached representations, so there isn’t demand and network effect potential for a widely used system accomplishing that.
Use the same naming scheme for your reference article names and the BibTeX identifiers for them, if you’re writing up some academic research.
This is very smart, and I’ll look into changing my bibiligraphy files appropriately.
If your computer ever gets stolen, you probably want it to have had a full-disk encryption. That way it’s only a financial loss, and probably not a digital security breach.
I want to reiterate the importance of this. I’ve used full-disk encryption for years for the security advantages, and I’ve found the disadvantages to be pretty negligible. The worst problem with it that I’ve had was trying to chroot into my computer, but you just have to mount everything manually. Not a big deal once you know how to do it.
My wording was unclear. I sort the list of duplicate files by file size, e.g., the list might be like 17159: file1, file2; 958: file3, file4. This is useful because I have a huge number of small duplicate files and I don’t mind them too much.
My first reflex is to exclaim that I don’t organize my files in any way, but that is incorrect: I merely lack comphrehension of how my filing system works, it’s inconsistent, patchy and arbitrary, but I do have some sort of automatic filing system which feels “right” and when my files are not in this system my computer feels “wrong”.
I wouldn’t reccommend duplicating my filesystem (it’s most likely less useful than most filing systems which aren’t “throw everything in one folder/on desktop and forget about it”) but I’ll note some key features:
Files reside inside folder trees of which the folders are either named clearly as what they are, or in obsuficating special words or made up phrases (even acronyms) which have only special meaning to me in the context of that paticular position in the file tree.
Different types of files have seperate folders in places
Folder trees are arranged in sets of categories, sub categories and filetypes (the order of sorting is very ad-hoc and arbitrary) you could have for example:
Media > Type of media > genre of media > Creator > Work
but it could just as easily have Creator at the root of the tree.
I really suggest you just make your own system or copy someone else’s; it will more likely than not provide more utility.
Edit: just to be clear I don’t have any sort of automated software which organizes my files for me, I am merely saying that my
mind organizes the files semiconciously so I’m not directly “driving” when the act of organizing occurs
The only thing that’s worked for me in the long term is making things public on the internet. This generally means putting it on my website, though code goes on github and shared-editing things go in Google Docs. Everything else older than a couple years is either gone or not somewhere I can find anymore.
For the last couple of years I have used Google drive exclusively for all new documents and am finding it works pretty well. I use a simple folder structure which makes it a bit easier when you want to browse docs, though the search obviously works really well.
root - random docs, works in progress, other
|-- AI—AI notes, research papers, ebooks
|-- dev - used as a dumping ground for code when transferring between PCs (my master dev folder lives on the PC)
|-- study -course notes, lectures
The best part is that I can access them from home, work or on the road (android app works very well), so backups and syncing is not an issue.
For files on the home PC I use a NAS which is pretty amazing, and allows access from any home PC or tablet/phone via a mapped drive. The folder structure there is:
|-- photos—all pictures, photos, videos,
|-- dev—master location for all source code
|-- docs—master location for all documents older than 2 years (rest is on google drive)
|-- info—with lots of subfolders, any downloaded ebook, webpage, dataset that I didn’t create
I don’t use the clients but I am annoyed there isn’t a simple way to download all google docs to the computer in RTF/Word or even text format—you can do full backups , but it they only work with Google drive. I don’t think Google will go out of business any day soon, so it is not an imminent risk at this stage.
Seems that tools like Google Drive take care of many issues you describe. Directory structure and symlinks are superseded by labels, version control is built-in, search is built-in and painless, synchronization is built-in, no viruses to worry about, etc.
That does sound nice. I wasn’t aware of the version control, and I’m somewhat curious how that would work. Thinking about it, I’d prefer the manual approach Subversion requires where I can enter a message each time. After doing a few searches, I’m not sure you can even get anything similar to a commit message in Google Drive. The commit messages I’ve found to be essential in decoding what separates older versions of files from newer ones.
There are some more practical issues for me. I run Linux. There’s no official Google Drive client for Linux, and last I checked the clients that exist aren’t good. I also sometimes work at a government science lab. They don’t allow any sort of cloud file synchronization software aside from their own version of SkyDrive, which requires me to log in via a VPN (and is a total pain). No idea if SkyDrive works on Linux, anyway. They don’t seem to be aware of rsync, thankfully. :-)
Every couple of weeks, Google Drive chooses an important document to lock me out of editing. This pretty much eliminates it as a serious solution for file management for me.
How do you organize your computer files? How do you maintain organization of your computer files? Anyone have any tips or best practices for computer file organization?
I’ve recently started formalizing my computer file organization. For years my computer file organization would have been best described as ad-hoc and short-sighted. Even now, after trying to clean up the mess, when I look at some directories from 5 or more years ago I have a very hard time telling what separates two different versions of the same directory. I rarely left README like files explaining what’s what, mostly because I didn’t think about it.
Here are a few things I’ve learned:
Decide on a reasonable directory structure and iterate towards a better one. I can’t anticipate how my needs would be better served by a different structure in the future, so I don’t try that hard to. I can create new directories and move things around as needed. My current home directory is roughly structured into the following directories: backups, classes, logs, misc (financial info, etc.), music, notes, projects (old projects the preceded my use of version control), reference, svn, temp (files awaiting organization, mostly because I couldn’t immediately think of an appropriate place for them), utils (local executable utilities).
Symbolic links are necessary when you think a file might fit well in two places in a hierarchy. I don’t care too much about making a consistent rule about where to put the actual file.
Version control allows you to synchronize files across different computers, share them with others, track changes, roll back to older versions (where you can know what changed based on what you wrote in the log), and encourages good habits (e.g., documenting changes in each revision). I use version control for most of my current projects, even those that do not involve programming (e.g., my notes repository is about 700 text files). I don’t think which version control system you use is that important, though some (e.g., cvs) are worse than others. I use Subversion because it’s simple.
I store papers, books, and other writings that I keep in a directory named reference. I try to keep a consistent file naming scheme: Author_Year_JournalAbbreviation.pdf. I have a text file that lists my own journal abbreviation conventions. If the file is not from a journal, I’ll use something like “chapter” or “book” as appropriate. (Other people use softwares like Zotero or Mendeley for this purpose. I have Zotero, but mostly use it for citation management because I find it to be inconvenient to use.)
In terms of naming files, I try to think about how I’d find the file in the future and try to make it obvious if I navigate to the file or search for it. For PDFs, you often can’t search the text, so perhaps my file naming convention should include the paper title to help with searching.
README files explaining things in a directory are often very helpful, especially after returning to a project after several years. Try to anticipate what you might not remember about a project several years disconnected from it.
Synchronizing files across different computers seems to encourage me to make sure the directory structure makes at least some sense. My main motivation in cleaning things up was to make synchronizing files easier. I use rsync; another popular option is Dropbox.
Using scripts to help maintain your files is enormously helpful. My goals are to have descriptive file names, to have correct permissions (important for security; I’ve found that files that touched a Windows system often have completely wrong permissions), to minimize disk space used, and to interact well with other computers. I have a script that I titled “flint” (file system lint) that does the following and more:
checks for duplicate files, sorting them by file size (fdupes doesn’t do that; my script is pretty crude and not yet worth sharing)
scans for Windows viruses
checks for files with bad permissions (777, can’t be written to, can’t be read, executable when it shouldn’t be, etc.)
deletes unneeded files, mostly from other filesystems (.DS_Store, Thumbs.db, Desktop.ini, .bak and .asv files where the original exists, core dumps, etc.)
checks for nondescriptive file names (e.g., New Folder, untitled, etc.)
checks for broken symbolic links
lists the largest files on my computer
lists the most common filenames on my computer
lists empty directories and empty files
I’d be very interested in any other tips, as I often find my computer file organization to be a bottleneck in my productivity.
I have a folder that I do my short term work in:
D:/stupid shit that I can’t wait to get rid of
This is set to auto-delete everything in it weekly. I had a chronic problem where small files that were useful for some minor task or another from months or years ago would clutter up everything. This was my “elegant” solution to the problem and it’s served me well for years, because it gave me an actual incentive to put my finished work in a sensible place.
Although now that I think about it, it would be a better idea for it to only delete files that haven’t been touched for a week, rather than wiping everything all at once on a Saturday ..
The Linux program
tmpreaper
will do this. It can be made into a cron job. I’ve got mine set for 30 days.That’s an interesting way to force yourself to organize things, or at least pay attention to them. I might try this.
If you’re comfortable with command-line UIs, git-annex is worth a look for creating repositories of large static files (music, photos, pdfs) you sync between several computers.
I use regular git for pretty much anything I create myself, since I get mirroring and backups from it. Though it’s mostly text, not audio or video. Large files that you change a lot probably need a different backup solution. I’ve been trying out Obnam as an actual backup system. Also bought an account at an off-site shell provider that also provides space for backups.
Use the same naming scheme for your reference article names and the BibTeX identifiers for them, if you’re writing up some academic research.
GdMap or WinDirStat are great for getting a visualization of what’s taking space on a drive.
If your computer ever gets stolen, you probably want it to have had a full-disk encryption. That way it’s only a financial loss, and probably not a digital security breach.
It constantly fascinates me that you can name the exact contents of a file pretty much unambiguously with something like a SHA256 hash of it, but I haven’t found much actual use for this yet. I keep envisioning schemes where your last-resort backup of your media archive is just a list of file names and content hashes, and if you lose your copies you can just use a cloud service to retrieve new files with those hashes. (These of course need to be files that you can reasonably assume other people will have bit-to-bit equal copies of.) Unfortunately, there don’t seem to be very robust and comprehensive hash-based search and download engines yet.
Suggest it to the folks who run The Pirate Bay.
They probably know about it already. I think the eDonkey network is pretty much what I envision. The problem is that the network needs to be very comprehensive and long-lived to be a reliable solution that can be actually expected to find someone’s copy of most of the obscure downloads you want to hang on to, and things that people try to sue into oblivion whenever they get too big have trouble being either. There’s also the matter of agreeing on the hash function to use, since hash functions come with a shelf-life. A system made in the 90s that uses the MD5 function might be vulnerable nowadays to a bot attack substituting garbage for the known hashes using hash collision attacks. (eDonkey uses MD4, which seems to be similarly vulnerable to MD5.)
There’s an entire field called named data networking that deals with similar ideas.
There probably are parts of the problem that are cultural instead of technical though. People aren’t in the mindset of wanting to have their media archive as a tiny hash metadata master list with the actual files treated as cached representations, so there isn’t demand and network effect potential for a widely used system accomplishing that.
Zooko did this: Tahoe-LAFS
You can safely use it for private files too, just don’t lose your preencryption hashes.
Great suggestions.
This is very smart, and I’ll look into changing my bibiligraphy files appropriately.
I want to reiterate the importance of this. I’ve used full-disk encryption for years for the security advantages, and I’ve found the disadvantages to be pretty negligible. The worst problem with it that I’ve had was trying to chroot into my computer, but you just have to mount everything manually. Not a big deal once you know how to do it.
How can identical files be sorted by file size?
My wording was unclear. I sort the list of duplicate files by file size, e.g., the list might be like 17159: file1, file2; 958: file3, file4. This is useful because I have a huge number of small duplicate files and I don’t mind them too much.
Ah. Well, you’re right that it’s not easy to do that… Might want to subscribe to the bug report so you know if anyone comes up with anything useful: http://code.google.com/p/fdupes/issues/detail?id=3
My first reflex is to exclaim that I don’t organize my files in any way, but that is incorrect: I merely lack comphrehension of how my filing system works, it’s inconsistent, patchy and arbitrary, but I do have some sort of automatic filing system which feels “right” and when my files are not in this system my computer feels “wrong”.
Intriguing. Can you describe some of your automated filing rules? I am considering trying such a setup via fsniper.
I wouldn’t reccommend duplicating my filesystem (it’s most likely less useful than most filing systems which aren’t “throw everything in one folder/on desktop and forget about it”) but I’ll note some key features:
Files reside inside folder trees of which the folders are either named clearly as what they are, or in obsuficating special words or made up phrases (even acronyms) which have only special meaning to me in the context of that paticular position in the file tree.
Different types of files have seperate folders in places
Folder trees are arranged in sets of categories, sub categories and filetypes (the order of sorting is very ad-hoc and arbitrary) you could have for example: Media > Type of media > genre of media > Creator > Work but it could just as easily have Creator at the root of the tree.
I really suggest you just make your own system or copy someone else’s; it will more likely than not provide more utility.
Edit: just to be clear I don’t have any sort of automated software which organizes my files for me, I am merely saying that my mind organizes the files semiconciously so I’m not directly “driving” when the act of organizing occurs
The only thing that’s worked for me in the long term is making things public on the internet. This generally means putting it on my website, though code goes on github and shared-editing things go in Google Docs. Everything else older than a couple years is either gone or not somewhere I can find anymore.
For the last couple of years I have used Google drive exclusively for all new documents and am finding it works pretty well.
I use a simple folder structure which makes it a bit easier when you want to browse docs, though the search obviously works really well. root - random docs, works in progress, other |-- AI—AI notes, research papers, ebooks
|-- business—books, invoices, marketing notes, plans
|-- dev - used as a dumping ground for code when transferring between PCs (my master dev folder lives on the PC)
|-- study -course notes, lectures
The best part is that I can access them from home, work or on the road (android app works very well), so backups and syncing is not an issue.
For files on the home PC I use a NAS which is pretty amazing, and allows access from any home PC or tablet/phone via a mapped drive. The folder structure there is:
|-- photos—all pictures, photos, videos,
|-- dev—master location for all source code
|-- docs—master location for all documents older than 2 years (rest is on google drive)
|-- info—with lots of subfolders, any downloaded ebook, webpage, dataset that I didn’t create
I don’t use the clients but I am annoyed there isn’t a simple way to download all google docs to the computer in RTF/Word or even text format—you can do full backups , but it they only work with Google drive. I don’t think Google will go out of business any day soon, so it is not an imminent risk at this stage.
The main risk is not Google as a whole going out of business, but rather, withdrawing their support from the particular service you prefer.
Seems that tools like Google Drive take care of many issues you describe. Directory structure and symlinks are superseded by labels, version control is built-in, search is built-in and painless, synchronization is built-in, no viruses to worry about, etc.
That does sound nice. I wasn’t aware of the version control, and I’m somewhat curious how that would work. Thinking about it, I’d prefer the manual approach Subversion requires where I can enter a message each time. After doing a few searches, I’m not sure you can even get anything similar to a commit message in Google Drive. The commit messages I’ve found to be essential in decoding what separates older versions of files from newer ones.
There are some more practical issues for me. I run Linux. There’s no official Google Drive client for Linux, and last I checked the clients that exist aren’t good. I also sometimes work at a government science lab. They don’t allow any sort of cloud file synchronization software aside from their own version of SkyDrive, which requires me to log in via a VPN (and is a total pain). No idea if SkyDrive works on Linux, anyway. They don’t seem to be aware of rsync, thankfully. :-)
Every couple of weeks, Google Drive chooses an important document to lock me out of editing. This pretty much eliminates it as a serious solution for file management for me.
What excuse do they give?