Grep is the ubiquitous command line tool for finding lines in files that match a pattern. Originally invented by computer science luminary Ken Thompson in November 1974, it was originally developed for the Unix operating system, but is available today, in some form or another, on almost all systems. Grep has been the defacto standard for programmers everywhere to find stuff in files. However, as time and technology has advanced, the sheer size and number of files has grown at a rapid rate. A good example is the source code for the linux kernel which at version 1.0 in 1994 consisted of 170,000 lines of code, and as of version 4.8 is now over 22M lines of code.
As you can well imagine, grep on most systems is fairly dated when taking modern multicore processors into account. Grep uses a single thread to do its work, and performance clearly suffers over large filesets even with our modern powerful systems.
Now you can break up a grep in a variety of ways in order to inject some potential concurrency upside for your search. I say “potential” as there are many factors involved including your search tree, hardware, IO bottlenecks, etc. Here is one way:
% find . -type f -print0 | xargs -0 -P [number_of_processes] grep [pattern]
This gets the job done, but surely we can more productive than this for our everyday development. In fact, why don’t we have a tool that is both fast, concurrently capable, and tailored to the needs of developers? In fact, we do!
In the Perl world we have Andy Lester’s excellent ack! This works on any Perl (or ActivePerl of course!) A great benefit for developers, it ignores your coredump files, binaries, backup and code repository files. It also has the advantage of using Perl’s regular expressions which have always been top notch amongst languages. Typical usage:
% ack [pattern]
But I suggest you try the following:
% ack --thppt
Ack, why didn’t you specify the directory to search from? Well in order to be maximally productive, it defaults to recursive directory search from the current working directory! This is perfect when searching your code trees. However, ack has a performance weakness…it still isn’t concurrent out-of-the-box (but could be used in a similar way like in the above grep example).
This brings us to Geoff Greer’s feature-rich, ack-inspired, The Silver Searcher (great name!). This fantastic tool claims to be an order of magnitude faster than ack due to its implementation in C and use of pthreads to provide concurrency among many other improvements. Now, instead of “ack”, we can just type the clever “ag”, which is the periodic table chemical symbol for silver:
% ag [pattern]
It also ignores files you normally don’t want to be searching in your codebase and looks for .gitignore and .hgignore files and you can even specify your own .ignore file.
Lastly, we’ve been doing a lot of work with the Go language here at ActiveState. We are working towards our upcoming ActiveGo™ distribution and I would be remiss if I didn’t mention another fantastic grep-like tool built out of Go that is performance equivalent to ag.
I discovered this when Dave Cheney gave a shoutout to Monochromegane’s The Platinum Searcher on episode 16 of the stellar Go Time podcast. This solution is coded in Go and makes fantastic use of Go’s built in concurrency support to obtain its solid performance. Like ack and ag it also ignores code repo files you don’t want to search through. Imitation is clearly the highest form of flattery. Installation is a snap as you can just grab the binary you need for mac, windows or linux and put it in your path and you’ve got another great grep-like tool. Basic usage is the same as The Silver Searcher, but with platinum’s chemical symbol instead:
% pt [pattern]
If you’re still working with your system installed grep, I highly recommend you move to a higher performance and developer oriented tool such as the Silver or Platinum Searchers!
Having opened the article mentioning Ken Thompson, and his invention of grep, we’ve come full circle…Ken Thompson is one of the co-creators of Go a mere 35 years later! Happy searching!