For the past couple of weeks I’ve been writing a utility to search through code quickly. I’m doing this because at work, some large dependencies got tossed in extern, making ack and grep pretty slow. At first I tried to make them faster (creating aliases to ignore certain files), but I soon gave up and started writing my own thing.
Grep is slow because it doesn’t ignore files by default. Ack is slow because it’s written in Perl. So I’m writing it in C, using libpcre for the regex matching. So far it’s about 3x faster than ack and 10x faster than grep. With some tweaking to ignore special files like generated code, I got it even faster than that. (0.5 seconds to search the codebase. For comparison, grep was 12 seconds and ack was 4.)
The Github repo is here. I don’t recommend anyone try it out yet. It’s not even close to done. Although I use it daily, I still need to iron out some formatting bugs, a couple of potential crashes, and then write docs. I bet it’ll take me a couple weeks to sort all that stuff out.
(Before anyone replies: Yes, I do know about ctags and git-grep. Ctags requires rebuilding an index after changing any files, and git-grep doesn’t work on non-git repositories. Also git-grep can’t ignore files committed in the repo, such as everything in extern.)
Have you considered existing solutions, such as Krugle, or IntelliJ and Eclipse’s built-in tools or plugins (assuming you’re coding in Java or Python or whatever else Eclipse supports) ? If so, what were their major deficiencies, and how is your solution better ? The reason I’m asking is because, well, I’m a selfish bastard who doesn’t feel like implementing his own code search engine, so I might as well use yours :-)
I’ve tried Eclipse’s search before, and it’s way too slow for my needs. Also, the Eclipse UI has a lot of annoyances since it’s not a native OS X application. It doesn’t obey my keyboard map, for example.
I haven’t seen grepcode before, but it looks like it builds an index. That’s a non-starter for me, since code often changes and I don’t want to wait for an index to get rebuilt before searching. If the tool silently rebuilds the index in the background, it’s even worse. Then I don’t know if the search results are correct or not.
If the tool silently rebuilds the index in the background, it’s even worse. Then I don’t know if the search results are correct or not.
It doesn’t have to mean that. It could respond to your search by walking the directory structure checking last-modification times, comparing them against its index, and updating anything that’s been modified.
I see, that makes sense, but I think that you might be better off with a hybrid approach: build an index first, and do real-time search on all files that have been changed, and thus haven’t been [re-]indexed yet. I’m not sure if any of the existing systems do that, but it’s worth checking out. Of course, if your codebase is relatively small, performance won’t be much of a problem...
An update for those who are curious: Ag is now the 11th most-starred C repository on GitHub. It’s more popular than memcached or Arduino. It will soon surpass XBMC to become #10. People freakin’ love it.
For the past couple of weeks I’ve been writing a utility to search through code quickly. I’m doing this because at work, some large dependencies got tossed in extern, making ack and grep pretty slow. At first I tried to make them faster (creating aliases to ignore certain files), but I soon gave up and started writing my own thing.
Grep is slow because it doesn’t ignore files by default. Ack is slow because it’s written in Perl. So I’m writing it in C, using libpcre for the regex matching. So far it’s about 3x faster than ack and 10x faster than grep. With some tweaking to ignore special files like generated code, I got it even faster than that. (0.5 seconds to search the codebase. For comparison, grep was 12 seconds and ack was 4.)
The Github repo is here. I don’t recommend anyone try it out yet. It’s not even close to done. Although I use it daily, I still need to iron out some formatting bugs, a couple of potential crashes, and then write docs. I bet it’ll take me a couple weeks to sort all that stuff out.
(Before anyone replies: Yes, I do know about ctags and git-grep. Ctags requires rebuilding an index after changing any files, and git-grep doesn’t work on non-git repositories. Also git-grep can’t ignore files committed in the repo, such as everything in extern.)
Have you considered existing solutions, such as Krugle, or IntelliJ and Eclipse’s built-in tools or plugins (assuming you’re coding in Java or Python or whatever else Eclipse supports) ? If so, what were their major deficiencies, and how is your solution better ? The reason I’m asking is because, well, I’m a selfish bastard who doesn’t feel like implementing his own code search engine, so I might as well use yours :-)
I’ve tried Eclipse’s search before, and it’s way too slow for my needs. Also, the Eclipse UI has a lot of annoyances since it’s not a native OS X application. It doesn’t obey my keyboard map, for example.
I haven’t seen grepcode before, but it looks like it builds an index. That’s a non-starter for me, since code often changes and I don’t want to wait for an index to get rebuilt before searching. If the tool silently rebuilds the index in the background, it’s even worse. Then I don’t know if the search results are correct or not.
It doesn’t have to mean that. It could respond to your search by walking the directory structure checking last-modification times, comparing them against its index, and updating anything that’s been modified.
I see, that makes sense, but I think that you might be better off with a hybrid approach: build an index first, and do real-time search on all files that have been changed, and thus haven’t been [re-]indexed yet. I’m not sure if any of the existing systems do that, but it’s worth checking out. Of course, if your codebase is relatively small, performance won’t be much of a problem...
An update for those who are curious: Ag is now the 11th most-starred C repository on GitHub. It’s more popular than memcached or Arduino. It will soon surpass XBMC to become #10. People freakin’ love it.