Linux Forensics: Pattern Matching with Grep and Related Tools

Posted in Uncategorized on June 2, 2011 by mattseanbachman

I’m trying to piece together a resume from the scattered work I’ve done over the years; thought I’d post this here, dug it out of my docs folder.

Linux Forensics: Pattern Matching with Grep and Related Tools

Pattern matching is locating a given sequence within a pool of information. Everyone who has used Google knows in essence what this is and the importance of refining search terms to weed out unnecessary information from the vast sums available on the Internet. This analogy is applicable to forensic investigations involving digital evidence; it is desirable to avoid the clutter of unwanted information. The benefits of pattern matching are two in number: to increase productivity and the likelihood of finding desired information. A synopsis of regular expressions and an exploration of their importance and efficacy regarding those ends follows; their use is applied with tools common to most GNU/Linux systems. Ancillary topics include network forensic tools and scripting, the latter of which seeks to provide analogous functions between the tools discussed and competing forensic software.

Introducing Pattern Matching:

A term intertwined with pattern matching is ‘regular expressions’. They’re synonyms in essence, with the former denoting the action of locating a desired occurrence in a larger data set, and the latter denoting the language by which this is often accomplished.I II Another, potentially inaccurate, synonym for pattern matching and regular expressions is “grep.” The word “grep” goes back to Unix in which editors like ed which phrased search and print functions like g/re/p, wherein “re” is the desired search pattern, and it would print the result to standard output.III Forward closer to the present day, and grep is less used in such a specific context; it now means approximately to find a given pattern. Specifically, grep is one of many programs that use regular expressions (the language of pattern matching). Alternatively, it is oft used as a verb to connote this action. This paper will make liberal use of the word in the spirit of grep’s use in colloquial English.

Over time regular expressions became diversified into a multitude of different camps, of which about a dozen are reasonably popular at the present day. Some are as followsIV:
Perl
GNU BRE
GNU ERE
POSIX BRE
POSIX ERE
Java
.NET
Python
Ruby
XML
XPath

It is important to note that since these were developed somewhat independently, one should not trust on the fact that regular expressions for one tool will work with another, unless said tool is explicit in stating the standard being used. For instance, FTK and EnCase, use syntax similar to Perl. Without such knowledge, one may assume a pattern in one (grep with BRE syntax) would apply to the other, and evidence may be passed over because of such an error.V Though set standards for regular expressions exist,VI VII derivations from a given standard to incorporate aspects from other standards or to add additional functionality may be present and the lack of such should not be assumed barring the explicit declaration of software providers that given tools conform strictly to set standards.

A simple example of such a difference between different regular expression standards would be the pattern [a-z]\{3\} using the Perl and POSIX BRE engines. The POSIX BRE engine would match a string like “abc”, while the Perl engine would match something like “b{3}” literally. This is one of many differences between the engines that are available—because of this, it can be helpful, at least initially, to focus primarily upon one style of regular expressions, adjusting them when necessary, rather than attempting to explore the nuances of each in turn.VIII

Perl-style syntax allows the search of non-printable characters.IX Secondly, support for Perl regex is widespread, probably more so than any other regex engine. The GNU grep utility discussed in a later section has a -P switch signifying Perl syntax for the regular expression, saving the frustration of dealing with an entirely new syntax. Also, transitions from Perl syntax to POSIX BRE is both less likely to be necessary and perhaps easier than the opposite. The preponderance of tools explored in a later section of this paper have shared support for the Perl syntax as well. In the effort to make this paper easier to understand, non-Perl syntax will be eschewed when possible.X XI

Keeping this in mind, consider for a moment the regular expression syntax of the most popular engine at the moment, Perl.XII Perl is a scripting language, similar to PHP, most commonly tied to server-side scripting, dynamic web page generation, and a close relationship with MySQL.XIII PHP uses Perl syntax. On many websites, data is entered by the customer and sent to the server. If this data is not in the appropriate form when said data reaches the server, PHP can alter said data via three sets of functions: the preg group, the ereg group, and the mb_ereg group.XIV Of these, only the preg group will be discussed,XV and it is not even necessary to know either scripting language to comprehend said languages’ regex capacity.

The function preg_match(‘/cat/’,$string) would search for the phrase “cat” within the string $string. The single-quotes embody the regular expression, and the forward slashes act as delimiters:
CODE :

The result to the terminal would be “1.”

A slightly more complex expression might be cat|dog, where the expression matches either the phrase “cat” or “dog.” This is a very useful feature called “alternation,” the use of which will be shown later for searching for a number of different patterns at once.

Applied uses of regular expressions:

In a multitude of books available on the subject of regular expressions, as the book progresses further towards the conclusion, the example expressions seemingly continue to advance further and further in complexity. This is an example of a complex expression:XVI

(?=\d)^(?:(?!(?:10\D(?:0?[5-9]|1[0-4])\D(?:1582))|(?:0?9\D(?:0?[3-9]|1[0-3])\D(?:1752)))((?:0?[13578]|1[02])|(?:0?[469]|11)(?!\/31)(?!-31)(?!\.31)|(?:0?2(?=.?(?:(?:29.(?!000[04]|(?:(?:1[^0-6]|[2468][^048]|[3579][^26])00))(?:(?:(?:\d\d)(?:[02468][048]|[13579][26])(?!\x20BC))|(?:00(?:42|3[0369]|2[147]|1[258]|09)\x20BC))))))|(?:0?2(?=.(?:(?:\d\D)|(?:[01]\d)|(?:2[0-8])))))([-.\/])(0?[1-9]|[12]\d|3[01])\2(?!0000)((?=(?:00(?:4[0-5]|[0-3]?\d)\x20BC)|(?:\d{4}(?!\x20BC)))\d{4}(?:\x20BC)?)(?:$|(?=\x20\d)\x20))?((?:(?:0?[1-9]|1[012])(?::[0-5]\d){0,2}(?:\x20[aApP][mM]))|(?:[01]\d|2[0-3])(?::[0-5]\d){1,2})?$

The expression captures dates, times, and datetimes, including leap years. While this is a very comprehensive pattern and excellent intellectual exercise, the most useful and helpful regular expressions may be much less complex.XVII Additionally, the more complex the pattern, the more likely it is to fail, both on account of user error and the restrictiveness of the search pattern. Keeping this in mind, it is more useful to start with simplistic patterns and refine towards more restrictive ones than vice versa.

Knowing how to tweak regular expressions is more valuable than having a seemingly infallible set of regular expressions to fall back on; despite the advanced features of matching synonyms and fuzzed spelling in FTK, there are instances in which these fail and custom-made patterns are necessary.

What follows are examples of composed regular expressions and the application of several expressions in a forensic context.XVIII As well,this paper branches out to include specific instances of the utilization of regular expressions and pertinent information surrounding the use of grep in the context of Linux-based forensic investigation.XIX There will obviously be far fewer example regular expressions than could have been incorporated into such a paper, being as the number of expressions possibly relevant is limited only by the imagination. These were primarily withheld on account of a desire for a reasonably terse discussion about regular expressions in particular instances—books have been written on the subject which might serve to better elucidate readers of different expressions of pertinence. The references for this paper serve as an excellent guides specifically for regular expressions, as well as accompanying topics such as procuring forensic images with Linux, for any issue deemed by readers to be covered in insufficient detail.XX

Introducing Grep:

The grep tool’s usefulness comes from its ability to sift through data sets to match a pattern, making it well suited for forensic work.XXI Two common (not necessarily forensic) uses are as follows:
CODE :

ps -e | grep “ge”

This prints all processes (ps lists processes to standard output) that have “ge” in the process name.XXII
CODE :

cat /var/log/messages | grep “fail”

Prints the file /var/log/messages to standard output. This however is redirected with the ‘|’ (pipe) as standard input to the grep program. Grep prints out the lines matching the pattern “fail.”

Grep can be a capable tool in an examiner’s toolkit, especially if live analysis is desired on a Linux system. Since grep is very likely already present, it may as well be used.XXIII Exploring the implications of live analysis is beyond the scope of this paper, but note that using grep on a machine on which it already exists would likely alter little as opposed to the introduction of novel programs to a system.XXIV

Going back to the examples of grep’s usage above, the pipe operator is frequently used; the pipe symbol signals the shell to direct the standard output of the first command and use such as the standard input of the second. Knowledge of standard streams/file descriptors is required to understand the full implications of this. Most of the requisite understanding of such can be gathered from online sources.XXV

Concerning file descriptors, grep’s output is easily redirected to a file for later review.XXVI Frequently in examining a case, the output would be better read to a file. This is easily done, as shown:
CODE :

grep “greed” ./* > file 2> err

The ‘>’ symbol redirects this data to a file for subsequent examination. The ‘2>’ directs error messages (e.g. “Warning: recursive directory loop”) to a different file. If you do not care about the errors at all, direct 2 to /dev/null. Many errors are helpful in discerning why a particular search is not working as expected, but it is possible as has been illustrated to separate error messages from ordinary output, both of which are, by default, written to the terminal.

Another terminal trick is as follows:
CODE :

grep “greed” ./* &

After this, pressing enter will return the user to a command prompt. It is possible via such to run multiple searches at the same time (it is recommended to combine this with redirection to a file). Typing “fg” will bring this background job to the foreground once again. This assumes the use of Bash; for other shells consult the documentation for similar functionality.

Concerning the topic of the three major forms of grep–grep, egrep, and grep -P—the last will be and should be used most frequently. The reasons for this are several. First, grep by default uses POSIX BRE syntax, which varies significantly from grep -P in that special characters must be escaped. This ensures for more cross-compatibility between regular expressions composed on the Linux command line and tools such as FTK. Next both grep and egrep do not support searching for non-printable ASCII characters such as spaces via \x20. Lastly, the selection of the Perl syntax with grep allows for alternation, which is supported under egrep as well but avoids the cross-compatibility issues.

Building expressions:

The following illustrates some simple searches with grep using patterns that may be forensically pertinent. Worth mentioning is that it may be helpful to experiment with expressions as opposed to simply reading of them. In EnCase, you may utilize the keyword tester (available in the tab for keywords when you make a new keyword).XXVII The following examples shall be formatted for the grep utility bundled with many Linux distributions—downloadable for no cost from many websites.XXVIII For the most part these examples may even be done via the use of a Live distribution—a bootable cd/dvd. The Bash shell is assumed.XXIX

The following grep will capture all jpeg photos in the current directory:
CODE :

grep -P “^\xFF\xD8\xFF” ./*

The -P switch tells the grep program to use Perl syntax, followed by the pattern of hexadecimal characters (using the anchor ‘^’, notably), and then the search path, which is all files in the current working directory. It is worth mentioning that something to this effect is done with forensic software that categorizes files via signature values—this is done via pattern matching as well.

Exif metadata in a forensic investigation may provide interesting and possibly crucial data pertinent to an investigation and serves a good example for something easily locatable with regular expressions. Typical attributes present in Exif metadata include camera make and model, date and time information, camera settings, picture thumbnail (oft utilized for display on a camera screen).

Some new, high-end camera models actually incorporate a feature called geolocation, which tags photos with information about the locality of the picture.

Exif metadata is typically distinguishable from a typical picture by ASCII text subsequent to the file header.XXX With jpeg files, a regular expression can be constructed to determine which files may contain Exif metadata and which don’t:XXXI
CODE :

grep -P “^.{6,30}Exif” ./

FTK and EnCase do not contain the capability to sort images based on this determinant.XXXII

Assuming a series of files are found pertinent to a given crime or circumstance, this may lend investigators the cause to search for and seize digital equipment not specified or justified in an initial search warrant.XXXIII

The following expression matches a large number of email addressesXXXIV:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

A sample of grabbing an IP address with pertinent limitations: XXXV

\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

An alternative form of this without the limitations of each octet ranging from 0-255, decimal, might be found in the following:

\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

This would match IP addresses, but not have the added benefit of weeding out IP addresses like 400.600.800.900, which are impossible. Also, the ‘\b’ word boundaries will not work if there is a larger string within which an apparent IP was found. E.g. 123.456.78.9.123.456.78. It will match on this; one solution would be to do something like this:
CODE :

grep -rP “[^\d\.]\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}[^\d\.]” ./*

The following grep search uses the /dev/ entry and treats the entire device (in my case a partition on a USB disk) similarly to that of a single file. Thus, such could be utilized to comb through either deleted files or file slack:
CODE :

sudo grep -abP “hiddendata\!” /dev/sdb3

The -b switch will print out a byte offset. In this case it’s very useful to have being as its a whole partition to sort through.

This grep search was somewhat problematic on account of a possible bug with the -P switch, labeled as “experimental” in the man page of grep.XXXVI It serves as an example of the caution needed when testing expressions.

grep -P “baked(?!beans)” ./wordlist

To solve this issue, any of the following worked:XXXVII
CODE :

grep -P “baked” ./wordlist | grep -v “bakedbeans”
grep -P ‘baked(?!beans)’ ./wordlist
x=’!beans’; grep -P “baked(?${x})” ./wordlist

Keyword searches:

Regular expressions in any given case need to be flexibly adapted to fit the needs of the investigation at hand.XXXVIII An example keyword search might be as follows:XXXIX
CODE :

grep -Pr “(torrent)|(h33t)|(tpb)|(thepiratebay)|(demonoid)|(mininova)|(waffles)|(what\.cd)” ./

This might be an example of a search conducted on an individual suspected of software piracy. The search terms, separated by alternations (the pipe ‘|’ symbol), are names of common keywords pertaining to torrents, common file sharing tools notably not illegal in and of themselves, but commonly abused avenue for the sharing of illicit warez. The keywords can and should be adjusted pending the circumstances of the case.

The most will be said by far about this sort of pattern, as it is both powerful and flexible. The basic idea is to separate desired patterns in between alternations, so a match of any result will be seen. There is no feasible limit to the number of terms that may be searched for. In the effort to provide a means for the quicker development of searches using a large number of keywords, here is the source of a small php script designed to be run from the command line to hopefully facilitate the process:
CODE :

$argv[3]”;
else
echo “\n”.$g_query;
?>

Where the basic syntax is “[scriptname.php] [inputfile] [searchlocation] [outputfile]”.

Consider this example:
CODE :

php myscript.php input / outputfile.

This would run the script ‘myscript.php’, using ‘input’ as the input file, searching through the directory ‘/’, and using output as the output file for redirection. The actual output of the script would be as follows:
CODE :

grep -Pr “torrent|h33t|tpb|thepiratebay|demonoid|mininova|waffles|what\.cd” / > outputfile

For the input file, simply make a comma-separated file of the keywords. This sort of script is simple, and not perfect, but it works for reducing the workload on large or frequently used keyword-search type grepping. It should be mentioned that grep without the -P switch can do this with a newline-separated file, specified with the -f switch. The Perl syntax (-P switch) doesn’t allow for this, however, necessitating the php script to shorten.XL This was tested on sets of input keywords as large as 1411 different alternations. Regarding the speed differences between a search of 1411 alternations and one with many fewer, the speed differences were 0.0086250 seconds per alternation for a search with some ten alternations, and .004123317 seconds per alternation with a search with 1411 alternations.XLI While speed concerns are not a primary aspect of this paper, these preliminary benchmarks seem to indicate grep’s efficiency in handling large numbers of alternations.XLII The script could also append to a log-file of grep expressions.

It is not ideal, much could be added and changed.XLIII More special characters could be escaped in the same way that the ‘.’ character is already.

Grep and Packet Sniffing:

Grep with redirection (recall the discussion of standard streams) can be useful for several applications. The first reason, that has already been mentioned, is that the ‘>’ character can be appended to a grep command to write the output to a file. The ‘>>’ character can be used to append a later search onto the end of an earlier one.

Another viable use of grep could be to combine packet sniffing with a grep of the data. The command tcpdump is a tool also commonly found by default with Linux systems—no additional software is typically required—and this tool lets a user elevated privileges (typically) put an interface into promiscuous mode, looking at all of the traffic as opposed to only the traffic that is destined for the host.

Detailed information on tcpdump can be found on the man pages. Here is an example that will sniff payload data and write the the data to a file (called “data”):
CODE :

sudo tcpdump -vvv -s0 -wdata

The ‘-vvv’ switch controls the level of verbosity. On a typical DSL line running at 1.5 Mbps, the traffic generated by even a very short session of sniffing can often reach many thousands of packets (by very short I mean a few seconds). After dumping an adequate amount of traffic, Ctrl-C will stop the sniffing and return you to the prompt. Grep can then be used for searching through the captured data, as follows:
CODE :

grep -aP “how.to.kill” ./data

The -a switch here is used to tell grep to treat the file as text, and print out matching lines. Data captured in this way is frequently marked as binary data, in which case grep will not print out matching lines by default. The pattern might be used in an instance wherein a person has been suspected of plotting a crime (likely murder or an analogous crime in this case). This is quite rudimentary and only should be used as an example; a real case should account for permutations and synonyms of search keywords.

The method has significant limitations, the primary one being that tcpdump merely dumps data, it does not have built-in functionality to decode data. Examining tcpdump’s output will reveal data passed from source to destination and vice versa without any concern for whether or not such a format is in human-readable form.

Probably the most desired traffic is going to be web traffic—oftentimes traffic is left essentially out in the open for easy sniffing, often even with somewhat sensitive information being passed.XLIV The headers can reveal whether or not traffic destined from a given destination will be privy to easy observation via the use of tcpdump or not. Take the following two examples:

Encoded data:

HTTP/1.1 200 OK..Cache-Control: private, max-age=0..Date: Fri, 19 Feb 2010 05:12:16 GMT..Expires: -1..Content-Type: text/html; charset=UTF-8..Set-Cookie: SS=Q0=bmlnZw; path=/search..Server: gws..Transfer-Encoding: chunked..Content-Encoding: gzip

This was generated with a client that attempted a google search. Google gzips traffic, so searching for plaintext keywords in a grep will be fruitless for the payload of HTTP packets. Presumably this is done to save bandwidth from unnecessary traffic. Contrast such with the output of this header:

Non-encoded:

HTTP/1.1 200 OK..Date: Fri, 19 Feb 2010 05:11:39 GMT..Server: Apache/2.2.10 (Fedora)..Last-Modified: Thu, 18 Feb 2010 14:12:47 GMT..ETag: “4c07c-213-47fe08fb135c0”..Accept-Ranges: bytes..Content-Length: 531..Connection: close..Content-Type: text/html;

This would be an example of traffic to a site that does not employ gzipped encoding. The use of tcpdump with such a site would suffice.

It is important to note that gzipped encoding is not synonymous with encryption—tcpdump simply lacks the capability of dumping traffic in a form other than that which is passed along the wire.
If decoding traffic is necessary, tshark, the command-line counterpart to Wireshark, is a viable alternative. The following form of the command dumps fully decoded packets to the file “data2”:XLV
CODE :

sudo tshark -V -s0 > data2

Wherein searches would be performed against the ‘data2’ file. To users familiar with grep this can be significantly more effective than using Wireshark to accomplish the same thing.

To reiterate the point concerning gzipped encoding, tcpdump suffices when circumstances do not require dumping the full contents of packets. When full packets are required—e.g. to rebuild what a suspect was basically presented with at a given page—tshark is a much better choice.XLVI Tshark is also preferable to tcpdump for grepping network traffic for aforementioned reasons. Though neither will pick up encrypted traffic, tshark is able to decompress encoded traffic, allowing the use of grep.

In instances where web traffic is desired, often the desired output will be located in a section “Line-based text data: text/html,” so using grep is not necessarily mandatory, but the -b switch with a quick grep search may be helpful in locating which section of the file deserves examination. Another method to cull data would be to specify a capture filter, such as “-f “port 80””.XLVII

It’s left as an open question as to the specific instances wherein network forensics may come into play. Often, since warrants are served on crimes long since committed, it’s likely that an investigator wouldn’t need to sniff data off the wire whatsoever. It is a useful tool regardless, if not for the average investigator, then for the systems and network administrators.XLVIII

The Find command:

The find program can be used to search for specific types of files. The following searches for SQLite files (as identified with their common extension):
CODE :

find /home/ -name “*.sqlite”

SQLite files can often contain forensically pertinent information; one notable mention is that Firefox stores a treasure trove of information in SQLite databases. Some of this information includes downloads, form history, bookmarks and browsing history. By default this is stored under the .mozilla folder in the user’s home directory. The dot signifies that it’s hidden, it won’t show up to a ls unless the the -a switch is applied to ls when looking at a home directory through a live shell.XLIX

The following is a more complex example of find’s capabilities.
CODE :

find /home/ -type f -mtime -1 -name “*.exe”

In turn, the switches dictate to print those files (type f) modified (the ‘m’ in ‘mtime’) up to a day ago (-1) whose name ends in “.exe”. On a side note, files with .exe extensions are a rarity on Linux filesystems, and may even be a cause for suspicion in some instances. They are, however, becoming more common with the popularity of wine.L

Here is another advanced form of find:
CODE :

find . -name “*.png” -exec grep -lPa “^\x89\x50\x4E\x47\x0DA\x1A\x0A” {} \;

This time find is working on finding files in the current working directory with the apparent extension of “.png” and grep is testing to see if the files have .png file signatures.LI LII

One of the main advantages to using find is the easy of searching through additional levels of data such as file names.LIII The following command finds files with an apparent extension of .jpg in the /home directory:
CODE :

find /home/ -name “*.jpg”

This search is recursive by default. Notice that in this case, the “.” symbol should be taken literally and not as a regex token for any character.

Find can separate who owns what (by owner or group):
CODE :

find ./ -user root
find ./ -group root

Print results with a stipulation of time (in this case, ‘-mmin’ means anything modified less than thirty minutes ago):
CODE :

find /var -mmin -30

Finds files with permissions set to 007 (does not match 657, for instance)LIV:
CODE :

find ./ -xdev -type f -perm 007

This finds files which are r-w-x for world (the other bits do not matter):LV
CODE :

find ./ -xdev -type f -perm -007

Finally, find works well with xargs:
CODE :

find /home/toor -name “*.txt” | xargs grep -i “john doe” 2> /dev/null

The ‘2>’ directs stream 2 (stderr) to /dev/null.LVI

Database and Directory Service Text searches:

The two examples that follow will be searching through a directory service (openLDAP) and a MySQL database; these are two specific examples of an almost infinite amount of permutations of specific circumstances that dictate investigating certain things. For example, a case involving suspected child pornography would have a definite emphasis on multimedia-based searches. Investigating a cracker would involve keywords surrounding such a subculture, and an investigation into piracy would involve searches tailored for such. These examples serve as a guide for how to treat cases with unique circumstances—patterns come secondary to knowing the ins and outs of how these services function.

Forensics of this sort are broadly classified as “database forensics,” and deserve a significant amount of dedication to fully appreciate what such a term entails. Books have been written on this topic, rightfully so. This paper is merely the tip of the iceberg about what may be said concerning database forensics—those wishing for more may consult the cited references.

Grep and find together can uncover a significant amount about a database or directory service. This becomes increasingly helpful with an increase in the amount of data. MySQL will be discussed first.

MySQL has differing storage engines that determine whether or not the following searches will even work. Differing formats will require differing searches. The one format that will be considered is the MyISAM storage engine and the .MYD and .MYI files. This format is purely chosen as a suitable example of finding data related to a MySQL database. Other engines and files types may be as follows, depending on the circumstances: .MRG (MERGE), .ibd (indexes and data for InnoDB), .CSM and .CSV (comma-separated), and .ARZ and .ARM (ARCHIVE).LVII

A simple command such as the following will suffice to track down most locations of pertinence to finding database files:
CODE :

sudo find / -name “*.MYD”

It may be necessary to run this as a privileged user. Locations in which MySQL database files are held are frequently under the ownership of “mysql,” “mysql” group, and as such will not be accessible to non-privileged users. If this concept seems foreign, find information pertaining to file vs. directory permissions.LVIII Alternatively, one could alter the permissions of files and directories recursively:
CODE :

chmod -R o+r,g+r,a+r ./dir

This would not be ideal, as it would alter finds based on file/directory permissions. Better to run all searches on restricted directories as a super-user.

There are three main types of files related to a single table in a database: .frm, .MYD, and .MYI. Typically these files are prefixed with the name of the table, such as table1.frm. The main value (to the human eye) of .frm files is that they list the column names of a table. .MYI files are indexes and do not allow for ease of grepping data therein (mostly non-ASCII characters). .MYD files are the main table files; they contain all the data held in a given table.

MySQL tables are frequently built using batch mode scripts that take administrator input in creating the table and the mysql program reads it in as if it were typed on the command line. One possible avenue for tracking these scripts down (there is no definitive trait of their filename or extension, though .sql would probably be something to try if possible) is to search for data likely to be present in such a file:
CODE :

grep -Pri “CREATE (TABLE|DATABASE)” /home/

If table backups are desired, something to the effect of this would suffice:
CODE :

grep -Pr “\-\- MySQL dump” /home/

This captures the typical output of the utility mysqldump, a common tool to dump a batch script for the backup of a database.

These aforementioned searches basically allow for a determination of whether or not a database exists, and if this is so, recovering perhaps some of the data. More complex applications might be recovering log files of transactions to recover and/or reverse altered/deleted fields.LIX This is beyond the scope of this paper as simple search tools cannot provide the sort of functionality by which to do this.

Directory Services:

Directory services are not the usual suspects for a forensic investigation, but given their sparse mention in the literature of the craft, it is useful to discuss such here to serve as an example of pattern matching for an unusual target.LX The directory service employed herein is openLDAP with slapd; other services differ but overall the commonalities should outweigh these.LXI

Assuming nothing is known about a directory service beyond the fact that it exists (perhaps not even that) on seized server media, likely the most fruitful search would be to use find to segregate any files with an extension of .ldif. LDAP Data Interchange Format (LDIF) files are commonly used as a form by which to load new entries into a directory via the use of a tool such as ldapadd. With openldap the configuration files are typically held (in Debian) under /etc/ldap/. The databases themselves are stored in a binary format elsewhere, under /var/lib/ldap. Files and logs stored in these locations are about as readable as a binary executable with strings of code interlaced with ASCII text.

Grep makes short work of locating specific entries within files once these files are discovered. Barring prohibitive file/directory permissions locating a known entry is no more difficult than including a keyword as a pattern, such as the following:LXII

grep “ou=xyz,dc=site,dc=com” ./input.ldif

A grep of the distinguished name typically works, as the distinguished name is written often in plain-text in files associated with ldap services.
CODE :

grep -r “dc=site,dc=com” /var/lib/

And the following will capture files with a particular pattern and copy matches to a particular destination:
CODE :

sudo grep -lr “dc=home,dc=com” /var/lib/ldap/ | xargs sudo cp -t /home/user/Desktop LXIII

sudo find /etc/ -name “*.ldif” | xargs sudo cp -t /home/user/Desktop/ldiffiles/

OpenLDAP and LDAP (and MySQL for that matter) are not commonly employed by the average user; there is not much documentation available for directory service forensics—until such a time wherein they are more commonly used and encountered in forensic investigations, directory service forensics is mostly a novelty; however, these principles are applicable to other, more viable, forensic applications.

Other Instances of Grep:

Though grep as a word is primarily denotational of the program, it has as well come to connote the general actions of finding information. Grep is as much a noun as it is a verb; additionally, many related programs have adopted naming conventions which are amalgamations of this word and that adhere to the original program’s spirit.

A few noteworthy programs are as follows:LXIV

ngrep: network grep, searches network trafficLXV
sgrep: searches for structured patterns using region expressions
pcregrep: grep that uses PCRELXVI
ext3grep: grep-like program designed to assist recovering data from EXT3 filesystems
agrep: this program stands for “approximate-grep,” and allows for a number of errors in the search pattern (fuzzy spelling)
beagle: provides indexing featuresLXVII

Foremost: Carving and Sorting

Until this point, grep has been used to sort through files allocated on a disk. Deleted or otherwise unallocated files have been neglected. Foremost is a tool that allows files of numerous sorts to automatically and effortlessly be exported from a dd disk image into another folder for easy viewing, separated by file signatures. Foremost’s invocation in its simplest form is seen in the following:
CODE :

foremost -i image.dd -o image.dd.folder

After processing has completed, changing directory into the image.dd.folder will show folders separating files by file type. If the standard signatures are insufficient for a particular sort of file, additional ones may be added in the /etc/foremost.conf file. Help is displayed in typical command-line fashion, with the -h switch.LXVIII

Simple Forensics Scripts:

Scripting common searches into an executable file provides an easy method for quickly processing media in a controllable fashion. In Bash on Linux (as well as other shells of course) typed commands can then be strung together an ran in analogous fashion to a program, wherein each line is essentially equivalent to a typed line on the terminal.

The following code is an example that accomplishes some basic forensics tasks.LXIX
CODE :

#!/bin/bash

echo “Example forensic script. Copies .png and .jpg files to specified directory. Verifies file signatures. Location to be searched passed at command argument 1.”

read pause

echo “working”

# find files with a .png extension and see if they contain a png file signature.
find $1 -name “*.png” -exec grep -Pl “^\x89\x50\x4e\x47” ‘{}’ \; > ./picslist
# do the same to apparent jpg files. Append matches to the file picslist
find $1 \( -name “*.jpg” -o -name “*.jpeg” \) -exec grep -Pl “^\xFF\xD8\xFF” ‘{}’ \; >> ./picslist

# grep for patterns in the location specified by $1 (command argument 1), output results to a file.
grep -Pr “1337.haX0Rz|where.to.dump.a.body|murder” $1 > keyword_results

# find files modified within 10 daysand write results to a file.
find $1 -mtime -10 > modified_file_list

echo “complete”
# process results for display in browser via php script “sort.php”
php sort.php > test
firefox test
#optionally, remove temporary files
#rm picslist
#rm test

The accompanying sort.php file:
CODE :

<?php

//picslist has list of all picture paths.
$file = "./picslist";
$handle = fopen($file, 'r');
//$data = fread($handle, filesize($file));
$file2 = "./modified_file_list";
$handle2 = fopen($file2, 'r');

echo "”;
//process each path and print link to picture
while (fgets($handle) !== FALSE)
{
$data = fgets($handle);

echo “ Path: “.$data.”

“;

echo “

“;
}
fclose($handle);

echo “

Modified file listing:


“;
//this code does as above but with links to each of the modified files
while (fgets($handle2) !== FALSE)
{
$data2=fgets($handle2);
echo “ Path: “.$data2.”

“;
}
fclose($handle2);

echo “”;

?>

The power of scripting comes from automating anything that would typically be done by hand otherwise. Another easily automated task might be commands to make a forensic image:
CODE :

#!/bin/bash

dd if=$1 | split -d -b 700m – image.

cat image.* >> $2

This would take a specified device, image it in 700 MB chunks (unnecessary but helpful for burning to discs), and then concatenates the chunks into a single full image.LXX

The following would be a brief continuation of the former, making an image, hashing the result for verification, and mounting the resultant image to a folder and doing a grep search on it:

CODE :
#!/bin/bash

dd if=$1 | split -d -b 700m – image.

cat image.* >> $2

#dd if=$1 of=$2
cat image.* | md5sum > $2.split.md5
md5sum $2 > $2.md5

mkdir mounted

sudo mount $2 -o loop -oro ./mounted

grep -Prl “warez|piracy|torrents?” ./mounted/ > $2.grep.result

foremost -i $2 -o $2.output

nautilus $2.output

The $1 and $2 signify command arguments. After doing a chmod on the script file to allow its execution, typing in something to the effect of ./script.sh command1 command2 runs the file “script.sh” in the current directory, with the “command1” and “command2” passed to the $1 and $2 in the script respectively.

This creates a raw dd image comparable to FTK Imager’s raw image (the two resultant images can be verified to be the same). The image is mounted and then grep is used to search through the mounted image. Foremost runs after the grep search, and the folder is opened via nautilus for each viewing. Expansions/revisions upon this can and should be added per case requirements. Should circumstances necessitate compression, this can be accomplished with the likes of gzip, bzip2, tar, or similar utilities. Using gzip is as simple as “gzip [image name],” whereafter the image will be named [image name].gz when possible.LXXI

Linux as a forensics platform:

Hopefully by this point it has been shown that many aspects of forensic investigation can be done via the use of a no cost operating system, including imaging, file carving and exportation, keyword searching, and sorting by file types.LXXII Many versions of Linux can be had for no monetary cost, and the freedom to tweak and adjust aspects as needed are of a significant benefit especially in forensic investigations involving unique circumstances.LXXIII Proprietary firms that make and distribute forensic software are swayed principally by monetary concerns can conceivably leave investigators out to pasture if the latter’s needs are not matched by the goals of the former. The power and control over open-source tools allows for modifications and advancements beyond the concerns of closed-source software.LXXIV Brian Carrier also argued that open source tools more effectively meet the criteria for forensic evidences’ admissibility per the “Daubert test.”LXXV

Linux however may have higher barriers to entry than does Windows, in which case it must be determined whether or not the costs of a windows system (and the accompanying Windows forensics tools) balance or are outweighed to the benefits of using Linux. This entry barrier is solely on a per-user basis given the preponderance of investigators primarily dealing with Windows.LXXVI

One of the criticisms of Linux involves mounting drives as read only. On face value, this can be accomplished easily with something such as the following:
CODE :

mount -oro /dev/sdb3 /media/imaged/

However, a process called journal recovery with certain file-systems such as Ext3/4 and others may change the evidence. There is an option ‘noload’ or ‘loop’ that supposedly corrects this issue, but given the ease by which one may neglect to include it, and the ever-present concern of some unforeseen circumstance that might cause the kernel to write to the drive, it is prudent to use a hardware write-blocker.LXXVII LXXVIII

Another issue involves auto-mounted devices, such as USB drives and such. Typically, when these are plugged into most Linux systems, they are mounted without asking the user. Doing this with evidentiary media is a poor forensic practice in most circumstances. As mentioned, the best by far is to use a hardware write-blocker, but disabling processes that automount should work as well.LXXIX LXXX

An issue specific to grep is the lack of support for Unicode-16 and U-32, which shall become an increasingly large obstacle in proportion to the frequency of such encountered in investigations.LXXXI

There are some other criticisms to using Linux for investigations: Linux can’t see the last sector on a device with an odd number of sectors.LXXXII But probably the most salient criticism of using Linux as the primary forensic medium for most is the higher barrier to entry given that you must learn a good deal of commands and how to navigate via a console instead of GUI-based tools. This is no longer fully convincing with tools such as Autopsy coming onto the market; though Autopsy lacks the flair of EnCase and FTK, it does many of the same things.LXXXIII The detriments need to be fully explored by any investigator desiring a transition from Windows to Linux forensics tools—one need be mindful that any different operating system will present different problems.

Despite these criticisms, benefits of Linux abound. The first is a greater familiarity with a different tool set. Linux is especially prevalent on high-end systems. Four-fifths of the world’s supercomputers run LinuxLXXXIV, and live forensic analysis on one of these would be the worst possible time to acquaint one’s self with the basics of grep. Knowing at least basic knowledge of Linux lends to a greater degree of competency with non-Windows OS, including Mac OS X, Solaris, and others.LXXXV LXXXVI LXXXVII

It would be remiss to neglect the costs of Linux versus proprietary alternatives. Due primarily to the relative easy by which forensics investigations may be conducted with a license of either EnCase or FTK, coupled with a demand for forensics investigators that would be otherwise estranged from the field in lieu of such a productLXXXVIII, a premium has been (perhaps rightfully so) charged for the use of their products in the form of hefty licensing fees. Though this paper only serves as a sliver of the material needed to match the intricacies of competing products, if a community effort were to materialize around forensically-oriented concerns, it is definitely conceivable that EnCase and FTK would have a competitor selling software at an extremely attractive price.LXXXIX Need has brought forth such software as GIMP (free alternative to Photoshop), OpenOffice (alternative to Microsoft Office), and thousands of others; analogous forensic software is less a fantasy than a probable future.XC

Lastly, it has to be asked whether the field of forensics is benefited by a pair of relatively monopolistic businesses. Though this is enough to ensure healthy competition to further improve one product over said product’s competition, any enthusiastic programmer wishing to contribute to the effort is denied the opportunity to do so by the very nature of proprietary code. The arrangement at present primarily benefits the producers and not the users of forensic software.

Ultimately, the decision over which of these two competitors is better is left up to the reader’s discretion. In the future, hopefully GNU/Linux will become more a competitor to Windows as a platform of computer forensic investigation. Regardless of whether or not Linux gains significant market share in forensic software, additional option will increase the pressure to optimize software with additional features to benefit end-users.

Grep command glossary:XCI XCII

grep : program that prints lines matching a pattern. Equivalent to grep -G, for basic regular expressions (i.e. BRE)
Egrep : ‘extended grep’, equivalent to grep -E
grep -P : grep using Perl syntax. Most uses of grep in this paper use grep -P.
-r : recursively search through folders.
-i : case insensitivity
-f : obtain patterns from a specified file (one per line)
-v : select non-matching lines (rarely used in this paper)
-c: output file and ‘count’ the number of occurrences of the pattern
-a : treat all files as text. Use this to find data that may be hidden in binary files
-l : print name of file. Stops after first match.
-m [#]: stop reading a file after a certain number of matches
-n : Prints out the line number that matches the pattern
-A [#]: print # lines after a match
-B [#]: print # lines before a match
–exclude-dir=[DIRPATH]: exclude a directory. Useful for avoiding recursive loops.
-w : print all lines containing pattern as a word (the pattern ‘eye’ would match ‘eye’ but not ‘eyelid’)

Notes on Regex Symbols and Glossary: XCIII

Global: this term refers to an option by which multiple matches can be found in a given string/file. The tools mentioned in this guide are global by default. The opposite of this would stop after the first match.

Case sensitivity: determines whether or not a pattern such as ‘google’ is matched in the data “gOOgle” or “GOOGLE” or not. With grep, the -i switch can enable case insensitivity, in which case the aforementioned example would match.

Extended: This is somewhat an ambiguous term. It can refer to ERE, extended regular expressions, as in POSIX ERE, or more generally, to a feature that ignores white space in the searched data.

Dotall: this determines whether the wildcard ‘.’ will match newlines or not.

Multiline: most often pertinent in the scripting languages’ utilization of regular expressions, this determines the functionality of the anchors ^ and $, whether they are matched only by the start of the string and its end, or whether newlines will cause said anchors to match the start and end of each respective line.

Character classes:

. : matches any character
\w : matches any word character
\W : negation of \w
\d : matches any digit
\D : negation of \d
\s : matches a whitespace character
\S : negation of \S, any non-whitespace character

Character sets:
[\WxZ] : braces act as an OR statement, in which anything inside may occur for a single character. In said example, either \W, x, or z may be matched. May also be a range, such as [a-z], or a set of ranges, [a-z0-4]
[^abc] : matches a character that is not a, b, or c.

Special characters:

\t : tab
\r : carriage return
\n : new line/line break
\xAB : hex character (e.g. \x20 for a space, \x0A for a new line)

Characters which typically need to be escaped for literal match:

\, ., +, *, ?, ^, $, [, ], |, {, }, /, ‘, #, (, )

Anchors:

\b : matches a word boundary, typically white space before and after words, or the start of a line
\B : negation of \b
^ : matches the start of a string*
$ : matches the end of a string*

*: discussed in this paper what precisely this entails

Lookaround:

abc(?=afas): Lookahead. This would look for “afas” after the pattern “abc.” “abc” would not be included in the result.

abc(?!afas): Negated lookahead. E.g. if afas is directly after abc, discard the result. XCIV

(?<=afas)abc: Lookbehind. Does the same as lookahead but looks before a given pattern. An example of this would be "afasabc". The lookaround pattern is not included.

(?<!afas)abc: Negated lookbehind. If 'afas' precedes 'abc,' discard the result.

Quantifiers:

? : makes the preceding character optional. Works on any token.
* : matches zero or more of the preceding token.
*?: matches zero or more. "Lazy" match, matches as few characters as possible
+ : Matches 1 or more of preceding token. Greedy, will match as much as possible.
+? : Matches zero or more. Alternative form of *?.
{3} : match preceding token exactly three times.
{10,12} : match preceding token 10-12 times.
{3-7}? : Match preceding 3-7 times. Lazy match, will match as little as possible.

Grouping:

(cat) : groups tokens together in a capture group.
(?:cat): groups tokens together, no capture group.

Capturing groups are a way of storing matched substrings that can be referenced later. These are mostly useful for scripting (e.g. with sed and other applications)—less so for searching a hard drive.XCV

Alternation:

| : the 'pipe' character. Allows for the matching of groups. cat|dog matches 'cat' or 'dog' literally. To apply this within a larger expression, quotes may be used to separate groups. To match 'catog' or 'cadog', the pattern ca(t|d)og would suffice.

Endnotes/Citations:

I http://en.wikipedia.org/wiki/Pattern_matching

II http://en.wikipedia.org/wiki/Regular_expressions

III http://www.catb.org/~esr/jargon/html/G/grep.html

IV http://www.regular-expressions.info/refflavors.html

V Regular expressions composed for tool should never be carted over to another without significant testing. Taking an EnCase regular expression keyword search of any significant complexity and using it on grep with POSIX BRE syntax would be disastrous. There may not even be a warning, and special characters would likely be taken literally. Evidence loss would be a likely consequence.

VI http://perldoc.perl.org/perlre.html

VII The PCRE manual, available via 'man pcre'

VIII Another simple example of the various forms of regex is seen regarding delimitation. Expressions are often shown as “[abc]{3}”, [abc]{3}, and “/[abc]{3}/”, each of which may be correct or incorrect given the specific tool in use, even among a given standard (i.e. Perl regex). Because this paper principally deals with a few programs, many of these nuances are ameliorated, but their presence deserves mention.

IX This is not intended to imply that it is the only regex format to do so.

X Forensic article on Perl syntax in forensics: http://blogs.sans.org/computer-forensics/2009/04/17/forensics-and-perl-fu/

XI Any notable exceptions, such as grep without the -P switch, use patterns that should be comparable with Perl, PCRE, and other formats.

XII http://www.regular-expressions.info/tutorial.html

XIII PHP has a number of functions that allow access and use of MySQL. PHP and MySQL do not by necessity need to be used together, and the use of one does not imply the other.

XIV http://www.regular-expressions.info/php.html

XV The reason for discussing PHP in lieu of Perl is solely due to authorial preference, due to the fact that though PHP uses PCRE, this is designed to mimic Perl syntax anyway.

XVI http://regexlib.com/DisplayPatterns.aspx?cattabindex=4&categoryId=5

XVII Most searches will probably be for keywords. Even assuming alternation, these are relatively simple to construct.

XVIII There will obviously be far fewer example regular expressions than could have been incorporated into such a paper, being as the number of expressions possibly relevant is limited only by the imagination. These were primarily withheld on account of a desire for a reasonably terse discussion about regular expressions in particular instances—books have been written on the subject which might serve to better elucidate readers of different expressions of pertinence.

XIX Henceforth I will mostly refer to GNU/Linux as Linux solely to conserve space and due to habit. See Free as in Freedom for a better understanding of this distinction.

XX Readers unfamiliar with grep and/or regex should see the glossary of terms and the synopsis of the grep manual at their representative sections.

XXI Grep doesn't search through free space/slack space unless you specify the /dev entry. Doing this is admittedly messy. For a tool that helps with this see: http://www.sleuthkit.org/autopsy/help/grep_lim.html

XXII This search would be useless except on a live machine or for testing purposes. To stick to a consistent format, this will assume that searches are being done on a live system, as opposed to an imaged system.

XXIII Live analysis is beyond this paper's scope and should not be attempted without a full understanding of the risks involved.

XXIV Note that it would alter it somewhat. For instance, a history entry would be added to the .bash_history file of a live machine for each typed line in the Bash shell

XXV http://learnlinux.tsf.org.za/courses/build/shell-scripting/ch01s04.html is one such helpful source

XXVI http://en.wikipedia.org/wiki/File_descriptor

XXVII If one cannot afford EnCase, similar testing may be done with FTK or online at http://www.gskinner.com/RegExr/ . Also mentionable is JGSoft products, especially Regex Buddy, which is extremely useful for developing regular expressions.

XXVIII See http://www.ubuntu.com/GetUbuntu/download for one very popular distribution.

XXIX If these examples do not work, consider trying a downloadable live cd of Ubuntu Linux, on which these have been thoroughly tested. I am not familiar with the differences between Bash and other shells in depth.

XXX This is derived from experience and is not necessarily mandatory.

XXXI Due to the fact that I was unable to tease a clear answer concerning the precise location of EXIF metadata online, the broad range of 6-30 characters preceding its occurrence should suffice.

XXXII This feature was added to FTK 3.0 when “expand compound files” was checked in the preprocessing selection. I am still unaware of any such feature for EnCase.

XXXIII The reason being for this is that Exif metadata can be used to track down specific information tagged about the picture, such as the make and model of the camera. Frequently, these things are listed in readable format upon dumping the contents of a Exif-tagged image; utilities can parse out the less visible aspects, such as geolocation and timestamps.

XXXIV http://www.regular-expressions.info/email.html

XXXV http://www.regular-expressions.info/examples.html

XXXVI Fully, what the man page says is as follows : “-P, –perl-regexp experimental and grep -P may warn of unimplemented features. ”

XXXVII I'm quite sure this is a bug. Perhaps one reason for labeling -P as “experimental” in the man pages.

XXXVIII A regular expression for credit cards would have little or nothing to do with crimes such as media piracy. Additionally, any investigator doing explicit searches for material not related to the warrant that justified a given seizure of assets may end up jeopardizing the admissibility of any evidence found therein. So if the reasons why serious investigators should know how to construct at least basic regular expression searches has not already become plainly evident, perhaps now it has. Searching for things not dictated by a search warrant endangers evidence admissibility.

XXXIX The quotes are actually optional for this expression.

XL This script is most helpful over repeated investigations. It can save a good deal of typing, depending upon the number of keywords, and since keywords are saved to a file, they're reusable.

XLI This was done on a small test file, and would likely change a good deal reflecting this variable. Again, this example is purely illustrative.

XLII The alternations used to get to such large amounts (i.e. 1400+) were repeated eventually. The best test would be to use all unique alternations, as grep may somehow parse out repeated (identical) alternations, but I don't see this reflected anywhere in the documentation.

XLIII If of value, my script can be used with or without attribution, and altered by anyone in any way

XLIV This continues to be a serious problem for innocent end-users, as well as potentially a huge boon for investigators, though the former is much more likely to realize it than the latter on account of a general lack of specific interest and procedures in network forensics, while sniffing is realized to the fullest by criminal elements hoping to find low-hanging fruit, network traffic transmitted in clear-text.

XLV An intriguing issue with this is that oftentimes long lines of source code (such as is expected in pages that do not separate lines frequently, but rather mash them together so as to partially obfuscate the reading of the source) are “[truncated]” under the section “Line-based text data:”. This issue appears to not easily be resolvable, and should be considered in cases where full fidelity source code is desired in sniffing.

XLVI Wireshark too is also excellent, and is perhaps even easier to deal with for users without an intimate knowledge of the console (it is much more popular, probably due primarily to the GUI). The primary reason why tshark is being discussed at the expense of the other is that in many circumstances a GUI will not be available; these are often eschewed for their unreliability on servers. Tshark will be the only alternative as Wireshark requires a GUI to operate. Installing additional programs on a live system is almost universally unacceptable in the typical forensic context. Assuming that it is permissible, wireshark may be installed by the command “sudo apt-get install wireshark” (on Ubuntu and Ubuntu based systems) or “yum install wireshark” (on Red Hat-based systems). These commands should resolve any dependencies as well. If in doubt install Wireshark on the non-subject system and sniff traffic via a hub.

XLVII Capture filters are undoubtedly one of the most important features of a sniffer. The example presented captures all traffic, but as such, the resultant file can quickly reach huge proportions. Capture filters help to disregard non-necessary data from being written to the output file.

XLVIII In-house forensics experts might frequently come into gathering evidence. The following details one such instance where they may be needed: http://blogs.sans.org/computer-forensics/2009/05/07/deconstructing-a-webserver-attack/

XLIX Hidden folders and other small facts are thrown in throughout this paper; though extraneous to the paper's primary focus, these things become crucial to the minority of readers I imagine could be ignorant of them.

L Wine being a software program allowing one to run Windows programs on Linux.

LI http://content.hccfl.edu/pollock/unix/findcmd.htm gives helpful information concerning the find command

LII The file signature for PNG images is taken from here: http://www.garykessler.net/library/file_sigs.html

LIII Commonly referred to as metadata, or data of data. A simple example of this would be to hash a file and change the file's name and verify it once again. The data is unchanged, and the hash remains the same. The metadata has been changed.

LIVInvolves the use of -xdev, useful for not crossing mount points. Beyond the scope of this paper, more on this is available here: http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1271729811024+28353475&threadId=1285730

LV A better explaination of this: http://linux.about.com/od/commands/a/blcmdl1_findx.htm

LVI Much more could be said about this. Do a man on xargs to start. There are numerous resources available online as well.

LVII http://www.cbtnuggets.com/webapp/product?id=508 Taken from “Managing Tables and Indexes Part 1”

LVIII http://www.dartmouth.edu/~rc/help/faq/permissions.html

LIX https://www.blackhat.com/presentations/bh-usa-07/Fowler/Presentation/bh-usa-07-fowler.pdf

Fowler also wrote a book entitled SQL Server Forensic Analysis that deals with database forensics in depth

LX The typical directory service can be thought of as a white pages; specifically, directory services are utilized in instances where reading is given precedence to writing. Directory services are optimized databases of sorts for reading over writing. Many applications of directory services involve user accounts and information associated with such. Directory services employ the use of LDAP, also associated with single sign-on capabilities that allow a user to access disparate aspects (e.g. different areas that require authentication) of a set of systems without having to authenticate to each of them in turn.

LXI More specifically, the commonalities of LDAP syntax should carry over into other programs. File locations and other aspect of differing programs will be completely different.

LXII This exact search is exemplary only. Barring a very big ldif file, this search wouldn't be an effective use of one's time.

LXIII http://www.cyberciti.biz/faq/linux-unix-bsd-xargs-construct-argument-lists-utility/ Good introduction to the xargs command.

LXIV These were taken from http://packages.ubuntu.com/search?keywords=grep and is likely not a comprehensive list.

LXV This paper has demonstrated features roughly equivalent to this program.

LXVI Given that the -P switch is built into grep, the use of this tool was strategically neglected from the larger portion of this paper by reason of the greater popularity of the standard grep program.

LXVII http://beagle-project.org/Main_Page

LXVIII More information concerning foremost can be found via man foremost or at http://foremost.sourceforge.net/

LXIX As this is a forensics paper first, and a software development paper second, I fully expect my code to be unoptimized.

LXX I got the splitting of images in part from this article: http://www.forensicfocus.com/linux-dd-basics

LXXI http://www.cyberciti.biz/faq/howto-compress-expand-gz-files/

Python alarm clock

Posted in Uncategorized on May 31, 2011 by mattseanbachman

A Simple Python Alarm Clock

I saw this guy’s alarm clock and thought I might make some adjustments to have it suit my needs; ported to 3 to boot.

import subprocess 
import time
import os

not_executed = 1

i_hour = int(input("Hour?\n"))
i_minute = int(input("Minute?\n"))
ap = input("AM or PM?\n")
if ap == "PM":
        i_hour += 12


while(not_executed):
        dt = list(time.localtime())
        hour = dt[3]
        minute = dt[4]
        if hour == i_hour and minute == i_minute:
                subprocess.Popen("vlc ./v.mp3", shell=True)
                not_executed = 0
        time.sleep(30)

Might start using git for this stuff; I’m doing it a lot, have more projects to come. Arg, now I have to learn git.

Thiel on Education

Posted in Uncategorized on May 26, 2011 by mattseanbachman

Else, to-morrow a stranger will say with masterly good sense precisely what we have thought and felt all the time, and we shall be forced to take with shame our own opinion from another.

I came across some colorful commentary by Thiel of Paypal that substantiated that which I believed all along. The article.

In reading it, I realized that I wasn’t alone in these sentiments. A quick search for more of his thoughts on the matter brought up this article from the National Review in which during an interview he discusses this subject in length:

SHAFFER: I understand you think we’re in a big higher-education bubble.

THIEL: Yes. Education is a bubble in a classic sense. To call something a bubble, it must be overpriced and there must be an intense belief in it. Housing was a classic bubble, as were tech stocks in the ’90s, because they were both very overvalued, but there was an incredibly widespread belief that almost could not be questioned — you had to own a house in 2005, and you had to be in an equity-market index fund in 1999.

Probably the only candidate left for a bubble — at least in the developed world (maybe emerging markets are a bubble) — is education. It’s basically extremely overpriced. People are not getting their money’s worth, objectively, when you do the math. And at the same time it is something that is incredibly intensively believed; there’s this sort of psycho-social component to people taking on these enormous debts when they go to college simply because that’s what everybody’s doing.

It is, to my mind, in some ways worse than the housing bubble. There are a few things that make it worse. One is that when people make a mistake in taking on an education loan, they’re legally much more difficult to get out of than housing loans. With housing, typically they’re non-recourse — you can just walk out of the house. With education, they’re recourse, and they typically survive bankruptcy. If you borrowed money and went to a college where the education didn’t create any value, that is potentially a really big mistake.

There have been a lot of critiques of the finance industry’s having possibly foisted subprime mortgages on unknowing buyers, and a lot of those kinds of arguments are even more powerful when used against college administrators who are probably in some ways engaged in equally misleading advertising. Like housing was, college is advertised as an investment for the future. But in most cases it’s really just consumption, where college is just a four-year party, in the same way that buying a large house with a really big swimming pool, etc., is probably not an investment decision but a consumption decision. It was something about combining the investment decision and the consumption decision that made the housing thing so tricky to get a handle on — and I think that’s also true of the college bubble.

One important difference between the housing bubble and the education bubble is that there was sort of a class aspect to the housing bubble: upper-middle-class people in the U.S. tend to be invested in equities, and middle-class people tend to be invested in housing, so there was a way in which the housing bubble was a way of making fun of the middle class for various sophisticated elites that ran all the way through the housing bubble. It was sort of like, “Look at those dumb people and beatniks in suburban America who are doing this crazy housing thing.” So even though it was a crazy bubble, there was at least a kind of counter-narrative; you had a bit of a dissenting narrative. Education is an upper-middle-class thing, and so something that is not questioned by elites at all, and that’s why the education market is more likely to be distorted.

You know, we’ve looked at the math on this, and I estimate that 70 to 80 percent of the colleges in the U.S. are not generating a positive return on investment. Even at the top universities, it may be positive in some sense — but the counterfactual question is, how well would their students have done had they not gone to college? Are they really just selecting for talented people who would have done well anyway? Or are you actually educating them? That’s the kind of question that isn’t analyzed very carefully. My suspicion is that they’re just good at identifying talented people rather than adding value. So there are a lot of things about it that are very strange.

The Great Recession of 2008 to the present is helping to bring the education bubble to a head. When parents have invested enormous amounts of money in their kids’ education, to find their kids coming back to live with them — well, that was not what they bargained for. So the crazy bubble in education is at a point where it is very close to unraveling.

In early 2009, there was a question of why the stimulus money was not going to infrastructure, and a very large amount was going to subsidizing college loans and encouraging people to go back to school. The argument was that we get a higher return on human capital than on infrastructure. While that’s certainly possible, and I agree that human capital is extremely important, I think we’re not actually measuring the return we’re getting on the human capital. It is, in fact, considered in some ways inappropriate to even ask the question of what the return is. We are given bromides to the effect of, “Well, you know college education is good, but it’s good precisely because it doesn’t teach you anything specific; you become a more well-rounded person, a better citizen, you learn how to learn.” There tends to be an evasion of specificity of what exactly it is that is learned. And so these human-capital intuitions may be very far off in a lot of colleges.

When I have more time, I’ll have to write up more on why I agree with him with my own experiences on this subject; for now, I’ll keep this blog mostly descriptive and try to avoid editorialization. *Fist-bumps Thiel*

Rupert Murdoch Nails it

Posted in Uncategorized on May 25, 2011 by mattseanbachman

Murdoch nails it on education:

Our schools remain the last holdout from the Digital Revolution

[…]colossal failure of imagination, much worse, an abdication of our responsibility to our children[…]

Forensic Linguistics

Posted in Uncategorized on May 24, 2011 by mattseanbachman

A few hours ago I noticed that some of the hits on one of my posts on this blog were for “mohammed lego”. This got me thinking in more generalised terms of whether or not one could do a bit of people tracking utilising dialect. In this case, it would be the spelling of the word “mohammed”: can such signify where in the world a given person comes from?

Obviously, this is a pretty deep topic, and I don’t suppose I could do half a decent job on it without a significant amount of study in fields I’m wholly ignorant of. That said, I can at least try. (This is my requisite admission of humility).

To take the example of Muhammed, the spelling of this name can be tied to geographic regions generally. See, for instance, this canonical source.

The name is also transliterated as Mohammad (primarily in Iran, Afghanistan and Pakistan), Muhammad (in India and Bangladesh), Muhammed (Arab World, primarily in North Africa), Mohamed and Mohamad (Arab World), Muhammad (Arab World), Muhammed, Muhamed (Bosnia and Herzegovina), Muhammed, Muhamed, Muhammet, or Muhamet (Turkey and Albania).

So, what we have is…well wait, that didn’t help at all. I did find something that spoke on this name; apparently this is the most common spelling in Britain.

As I read further, I began to find out that this fell into a field entitled “forensic linguistics”. This really blows my mind. As I don’t feel qualified to speak on this, I’ll throw down some neat resources on it, hopefully readers can check stuff out at their leisure on this topic.

Forensic Linguistics on Wikipedia

BBC Article, references the text in the first video.

Linux Forensics: Pattern Matching with Grep and Related Tools

Posted in Uncategorized on October 31, 2010 by mattseanbachman

Here’s a paper from a while back.
patterns_neut It’s got no use merely sitting on my hard drive collecting e-dust.

I hope someone is benefits from this. I might try to post it to HTS as well (maybe).

Political attack ads

Posted in Uncategorized on October 30, 2010 by mattseanbachman

My new job doesn’t allow me many of the luxuries I’ve grown accustomed to. One of which, the subject of this post, is my researching candidates prior to voting.

This upcoming election actually revealed to me a very disturbing realization: I was planning upon going into the booth to vote with little to no knowledge of the candidates or their positions.

Now for the shocking part: many, if not the majority, of persons with similar full-time jobs must have many of these same issues! Now, imagine that, people voting literally how they are told, straight from the dictates of their television, which itself is controlled by who has the most money.

Contemplate that and see if it scares you as much as it did me.