Student and Class Conversations

This blog is dedicated to methodology development of the scientific approach to computer science education and cybersecurity fields like digital forensics and information assurance. This site is not intended as a reference for practitioners of cybersecurity, but a guidance for those entering this field and would like to learn the basics of the scientific approach and methodological testing of scientific problems that is missing from a basic computer science and technology based STEM education.

Friday, September 26, 2014

Back to basics - NTFS Data Runs

This is not really the basics, but an advanced knowledge from a technical point-of-view. Since it is a published process explained in great details, it becomes basic knowledge. Those in non-scientific fields are not used to calculating and verifying steps and procedures and that basic premise moves the field of digital forensics into an educational definition of STEM fields. STEM stands for Science Technology Engineering and Math.

This post will discuss the complex process and understanding of data storage in the New Technology File System ( NTFS ) specifically the $80 attribute's lesser understood structure of it's data runs.

This image is from the book "Guide to Computer Forensics and Investigations", September 28, 2009, by Bill Nelson (Author), Amelia Phillips (Author), Christopher Steuart (Author)

Thus, based on the image above, the data run can be extracted and analyzed for the actual data cluster locations.

If you want to create the same analysis and documentation of the data clusters, here is the actual string of the data runs: 32B1078C8C0022630795ED32BC063C360122350302FA210B6CFE229E01E904

The example above contains 6830 clusters for the file with positive and negative offsets to cluster runs. You can not get any more complex than this one. If you understand this example, you understand how NTFS saves non-resident files. If you are into programming, I would suggest you do this analysis by hand or with a simple application like I did here with Excel before attempting to write a program in a lower level programming language.

Good luck practicing and getting better in understanding technology at a deeper level.

Sunday, September 21, 2014

Back to basics - Create Your Own Evidence

One of the most important skills one can have in forensics is to be able to create a controlled evidence where all aspects of the evidence is known in order to test the reliability of tools and methodologies. In this case, I wanted to explore a few options in enCase and create a test image that can help test the keyword search capabilities.

You can watch my video on the details and you can also request the final evidence file. http://youtu.be/iP9UzHG19Gw
If you need to request the evidence file than I failed to get my point across that you need to be able to create a baseline evidence in order to test any tool that you might come across.

The evidence is based on Central Daylight Savings time and NTFS file system.

D:\>dir /t:c creation times
Volume in drive D is NTFS_1024
Volume Serial Number is F807-E907

09/21/2014 03:06 AM 10,241 file_c.txt
09/21/2014 03:08 AM 2,049 file_d.txt
09/21/2014 03:06 AM 2,049 file_e.txt
09/21/2014 03:06 AM 2,049 file_f.txt
4 File(s) 16,388 bytes
0 Dir(s) 53,191,680 bytes free

D:\>dir /t:a last access times
09/21/2014 03:06 AM 10,241 file_c.txt
09/21/2014 03:08 AM 2,049 file_d.txt
09/21/2014 03:06 AM 2,049 file_e.txt
09/21/2014 03:06 AM 2,049 file_f.txt

D:\>dir /t:w last written times
09/21/2014 03:04 AM 10,241 file_c.txt
09/21/2014 03:05 AM 2,049 file_d.txt
09/21/2014 03:05 AM 2,049 file_e.txt
09/21/2014 03:05 AM 2,049 file_f.txt

MFT record location in sector and the two data run sector locations for the file called file_a.txt where a keyword "keyword2" is spanned between two data run locations and the file is deleted. enCase allows for the file to be un-deleted before searched for keywords, so this file will be crucial to test that capability.
42722 - MFT record
280 - file_a.txt
41230 ( custer 20615)

The VBR will need to be corrupted in order to write directly to the raw device, so the first 7 bytes will be zeroed out and restored after we are done with the evidence drive creation. ( Thanks to Chuck Black for researching and finding this simple trick )
EB 52 90 4E 54 46 53 - VBR

Details of keyword locations and offset values.
....akeyword2a.... file_a.txt 230b3
^keyword1-1^ 23123
....keyword2.... 1424042 second data run RAM slack
....keyword2.... 1424305 second data run drive slack
...keyword1-1... 1424372 UNICODE second data run drive slack
....keyword2zzzz 14243FD split between last cluster and next unused cluster
...aaaakey 25bfd first half or split keyword datarun 1
word2aaaaa... 1421C00 second half of split keyword datarun 2

keyword2ccc... file_c.txt 25c00
^keyword1-1^ 25c91
^keyword1-1^ UNICODE 25D14
...keyword2... RAM slack 28473
...keyword2... Drive slack 28724

dddkeyword2ddd... file_d.txt 28AC5
^keyword1-1^ 28fb3
..keyword1-1.. UNICODE RAM slack 29063
...keyword2... RAM slack 290D5
...keyword2... Drive slack 292B4

bbbkeyword2bbb... file_b.txt deleted 294c5
...keyword2... RAM slack 2B0A3
...keyword2... Drive slack 2B2C4

split between file_b.txt and file_f.txt
....keyword2fffff 2B3FD
ffffkeyword2ffff 2B454

...eeekeyword2eee... file_e.txt 14D20F3
...keyword2... RAM slack 14d28a4
...keyword2... Drive slack 14d2ac4
...a2V5d29yZDI=... drive slack Base64 encoded 14D2B34

...keyword2... File_f.txt MFT record 14DCA43

....keyword2... unused MFT record 14DDB04

...zzzzkeyword2zzzz... unallocated space 3692683

Drive size IEC vs. ISU

What is the big deal? The size of the drive is reported by the forensic tool and I just need to bookmark it or document it. Forensic tools are tested and vetted in courts, so I don't need to worry about them. Right? The answer is not that simple since 1998. In 1998 the International Electrotechnical Commission (IEC) decided to resolve the old standing conflict of orders of magnitudes like kilo or mega that are used to represent a Base-10 prefix and not a Base-2 prefix. Thus, a 1000m run can be referred to as a 1Km while a 1024 Byte memory block is referred to as 1KiB, 1 kibibyte.

The calculation does not change, only the unit of measure reflects the binary nature the order of magnitude.

There is not much focus on this change and many experts might not even know about it, but it is annoying if the tools we use do not confirm to this changed standard. As long as we can refer to the byte value, there is no problem since only the prefix that needs to be examined for the correct spelling.

I have seen the hard drive manufacturers following this new standard for years now while the software vendors lagging behind.

http://www.seagate.com/www-content/product-content/nas-fam/nas-hdd/en-us/docs/100724684.pdf
i.e.
7814037168 * 512 = 4000787030016 / 1000000000 = 4TB.

So, what do we see in forensic tools, in operating systems, and in generic tools? Well, it depends.

AccessData FTK Imager 3.1.3 calculates the drive sizes for an easy and quick reference. We can also easily find the drive sector sizes in this tool.

Physicaldrive0 Sector Count = 103,824 = 53157888 bytes
Physicaldrive1 Sector Count = 18,874,368 = 9663676416 bytes
Physicaldrive2 Sector Count = 20,480 = 10485760 bytes
Physicaldrive3 Sector Count = 208,896 = 106954752 bytes
Physicaldrive4 Sector Count = 31,457,280 = 16106127360 bytes

Reference calculations:
Physicaldrive0 Size = 50.69MiB or 53.15MB
Physicaldrive1 Size = 9GiB or 9.66 GB
Physicaldrive2 Size = 10MiB or 10.48MB
Physicaldrive3 Size = 102MiB or 106.95MB
Physicaldrive4 Size = 15GiB or 16.1GB

Sample calculation based on PhysicalDrive4

Total Sectors		31,457,280
	Bytes	16106127360
	International Electrotechnical Commission (IEC)			International System of Units System ( Metric )
	Kibibytes	KiB	15728640	kilobyte	kB	16106127
	megibyte	MiB	15360	megabyte	MB	16106.127
	gibibyte	GiB	15	gigabyte	GB	16.106127

I'm not really sure where FTK Imager got some of the values for its physical size, drive 1 seems to be in GiB, drive 2 is a mystery number, drive 3 seems to be in MB, and drive in GB.

Encase_forensic_imager_7.06 also shows the cluster count and the drive sizes in an easy format. It also lists the sizes in a Base-2 format while using the Base-10 unit of measures, but it is more consistent than FTK Imager.

Windows Management Instrumentation Command-line (WMIC) shows the physical devices, but the size and total sectors are not the physical size values.

Windows shows the physical sizes, but not even close to the actual size of the devices, but we know from the MBR master partition table calculations that partition size calculations are based on Base-2 calculations.

Example value calcualted from MBR of Disk 1, first partition entry.
00200300 = 0x32000 = 204,800 sectors in partition, thus the value of the number of bytes in the partition is 204800*512 = 104857600 bytes / ( 1024 * 1024 * 1024 ) = 100MiB

So, Microsoft is using the wrong measures of unit to display storage device size information. Disk 2 and Disk 3 size values are way off from either of the calculated values, but those that are the right values, those are calculated by the Base-2 conversion method, so the unit of measures should be MiB and GiB not MB and GB.

Linux on the other hand is using the Base-10 conversion for the correct unit of measures in MB and GB.

/dev/hda size was an anomaly and I was not able to find a suitable explanation why the value was off, but it might have had something to do with a virtual IDE hard drive. I have verified the existence of sector 104447 using dcfldd and xxd ( dcfldd if/dev/hda bs=512 skip=104447|xxd ). Even though all other tools showed only 103,824 sectors on the drive, I did locate 104448.

/dev/sda->18874368 sectors consistent with other Windows tools, but the capacity is correctly calculated in MB to 9663MB or 9.66GB.

/dev/sdb->20480 sectors consistent with other Windows tools, but the capacity is correctly calculated in MB to 10MB.

/dev/sdc->208896 sectors consistent with other Windows tools, but the capacity is correctly calculated in MB to 106MB.

/dev/sdd->31457280 sectors consistent with other Windows tools, but the capacity is correctly calculated in GB to 16.1GB.

So, my conclusion is that Windows based software vendors did not make the adjustment in the last 16 years to label their storage device sizes properly. The most surprising are the forensic tool vendors not seeing the need to label properly or show the proper capacity of the drives. As long as the size is referred to in bytes, the values are correct and it might be needed to start referring to evidence size in bytes to avoid confusion.

Thursday, September 18, 2014

Back to basics - Code Analysis

This post was triggered by the great blog post explaining code analysis in gdb.

http://erenyagdiran.github.io/I-was-just-asked-to-crack-a-program-Part-1/

Code analysis is not part of a forensic technician's required skill set and even some digital forensic analysts would never need to know how to trace a code in debug. In some cases, an investigator might be lucky enough to have a case with simple enough code to quickly see a pattern and see if it might help the investigation in order to even "mess" with interpreting the code.

The following simple code might be worth while to quickly see what the password is, so we could use that information somewhere else in the case. We know this code uses XOR ( ^ ) to check the password and even have a code commented out to decode the password for us. So, we either need to know the characteristics of XOR and decode it ourselves or use the code and compile it ourselves to see the solution.

In XOR, a clear text XORd with a key results in ciphertext, but if we have the ciphertext and the key, we can XOR them to derive at the clear text password.
Clear text 10010101 Ciphertext 10110000
Key 00100101 Key 00100101
Ciphertext 10110000 Clear text 10010101

Also, with XOR, if we do not know the key, but able to monitor the cypher text, if we enter clear texts and it results of a ciphertext of all zeroes, then the clear text entered is the key itself.

Guessed password 10010101011
Unknown key 10010101011
Ciphertext 00000000000

Thus, the guessed password is the key we are looking for. So, every ciphertext that we'll find on this system can be easily decrypted using the discovered key.

I've written a simple code to practice this process and see if you can decode my password by hand or see if you know how to compile a C++ code to let the code do it for you. Thus, basic understanding of encryption and basic knowledge of compiling code might be required in this field and in degree plans for those interested in digital forensics. Of course, you might like this type of investigations and you'd like to learn much more about programming, in that case, you might need to pursue computer science at higher institutions in order to take your skills to the next level.

#include <iostream>
#include<string>

using namespace std;

int main(){
string password;
string key = "abcdefgh";
string pass = "\x1b\r\xf\x10\x4\bV\\";

cout << "Please enter your password: ";
cin >> password;

//encode password
for (int index = 0; index<password.length(); index++)
password[index] = password[index] ^ key[index];

cout << "encoded: " << password << endl;

//decode password
//for (int index = 0; index<password.length(); index++)
// password[index] = password[index] ^ key[index];
//cout << "decoded: " << password << endl;

if (strcmp(&password[0],&pass[0])==0)
cout << "You got the password" << endl;
else
cout << "Incorrect password was entered!!!" << endl;

return 0;
}

Can you write a flow chart for this code and a methodology for the decoding approach?

Sunday, September 14, 2014

Back to basics - Operator Precedence

Why do we need to test forensic tools why the programmers compiled the code without any errors? The concept of logical errors and algorithm implementations can not be detected by compiling code, they can be found by continuous testing with the right input and output needs to be monitored for the correct values. We need to avoid garbage in, garbage out conditions for reliable tool testings. One of the implementation issues that can be detected by testing is the operator precedence.

In this presentation, I wanted to talk about the order of operations that are ignored in many cases. Order of operations are used by systems to evaluate the value of an expression by parsing the expression by operator precedence as defined for the given system.

Analyzing code requires not just pattern recognition to specific code, but also the recognition of logical errors that might have been exploited.

In this chart, I give an example of the flow of operator evaluation, but the accompanying video will give a more in-depth explanation. http://youtu.be/7EQ5YZOU7tw

You can practice operator precedence on the command line by setting variables
by arithmetic operations.
C:\>set /a test=(9*9)*4/(9*(5*5*5)-(14-6))
0

Saturday, September 6, 2014

Back to Basics - FAT File/Folder Structure

Have you ever wondered how File Allocation Table ( FAT ) maintains the file system structure? Many forensic books and certification exams discuss the structure of the file system, but I yet to see discussion on how the file system links the directory structure together. In this post, I wanted to examine and model the links between files and folders.

Many books discuss the concept that we can navigate the file system by running cd . or cd .. to change directory to the current directory or to the parent of the current directory. The . and .. files turned out to be very important to understand how FAT maintains the directory structure.

Each directory maintains its own Directory Entry ( DE ) in a unique cluster where the root DE is considered as the cluster 0. Cluster 1 was never referenced. Referring to the FAT table, we know that FAT signature in FAT16 is F8FF and another FFFF that refers to the DE. Thus, F8FF is cluster reference 0 while FFFF following F8FF should be the reference to cluster 1. Thus, the first usable cluster for files is cluster 2.

I have created test case on a thumb drive using the following structure:

D:\file1.txt
D:\folder1
->file2.txt
->folder1-1
->file3.txt

I have traced the file system structures to their starting and ending sector numbers to find a pattern that lead me to understand how the files are stored.

The chart of sector numbers was used to develop a model of file structure on storage device.

The model can be verified by examining the actual structure of the DEs to establish the links between the DE entries.

A simplified view of relevant cluster number designations shows the repeating pattern of folders pointing to themselves by referring to the cluster number where the DE resides holding the DE entry for the file and the .. file entry is referring to the parent's DE cluster.

In some cases, we can examine the actual data structures on disk to reveal patterns that can be used to understand how technology works. The steps, documentation, and methodology are all crucial skills for any beginning forensic examiner or analyst while forensic technicians would not have to know technology at this level. Only education and hard work can develop a forensic analyst for a higher level of understanding of data structures while training of forensic technicians will never be able to develop professionals capable of this type of skills. I hope, the type of documents will help even technicians understand that there is more to learn about technology than pushing buttons and reading output from invalidated tools.

Tuesday, September 2, 2014

The True Scientific Model of Digital Forensic Analysis

The formal model of digital forensic analysis can be summarized in a single application methodology since users interact with applications ( an operating system is a special purpose application that functions to manage basic resources ( I/O, interrupt ), processes, memory, rights, and file systems ).

Many people talk about and write books on what digital forensics is, but most covers the forensic technician skills. Forensic technicians are trained Personal Computer ( PC ) technicians with skills for the most recent technology in order to mainly acquire and retrieve digital data. In many cases, the technology is so new and techniques of retrieving data is so unknown to a sector of technicians that it is considered "woodoo forensics". ( i.e. Chip-off )

At the end, digital forensic analysis is the true detective work where acquisition becomes the sub-process that supports the actual investigation. Forensic technicians can be trained to focus on risk management in order to maintain evidence integrity, but getting the data in new devices violates many forensic science rules. ( i.e. uploading client software to phones in order to acquire physical data )

Digital forensic analysis is to result showing human involvement using application(s) to commit an act unlawful or against policy in a way that resulting relevant evidence can be presented in court proceedings.

Scientists rely on facts, numbers, and logic. Technicians rely on tools, methodologies, and skills. Courts require relevant, scientific, and admissible evidence. Digital forensics could grow into science if scientists will focus on analysis in a scientific manner and not technicians try to prove their cutting edge skills as science. The ultimate goal is to find a human connection to the digital data and not to look at digital data extraction as the "Holy Grail". A phlebotomist is not a doctor, he/she is a trained technician with tools, methodologies, and skills to draw blood, but at the end the doctor will use that acquired specimen to draw conclusions and to gather numbers to see a larger problem than just an individual being sick. The phlebotomist will just draw blood seeing discreet individuals. Thus, data acquisition is nothing to do with the analysis of data nor the technician needs to be scientifically educated, only trained on how to extract data with various methodologies.

So, the scientific analysis result shows data states at the time in question ( stored, transactional, transmission ). The data itself can be generated by the user, application, or operating system where the user generated data is considered hearsay, thus the weakest evidence. User generated data must be supported and validated by business records ( application and/or operating system generated data ). The data content need to be also considered in order to lead the investigator toward the truth of finding intent if unregulated encoding or encryption of data is located on storage device(s). Many user activities result in data being deleted or hidden from view of "normal means". All these activities and modifications of data can be traced back to the ability and motivation of the human involved that can be as deep as cultural influence. Since activities do not necessarily mean illegal activities, the scope of the investigation need to be considered in order to find out the who, what, when, and how questions or determine the need to extend the scope of the investigation ( scope creep ). Since science does not guarantee undisputed evidence, but merely offer the scientifically proven facts based on knowledge at the time of question, it is the investigator's duty to find relevant evidence that is unbiased in nature ( inculpatory vs. exculpatory ).

Digital forensics is not a business process driven by monetary gains, but the location of the truth. Those believing that looking for only inculpatory evidence is what digital forensics is about should not be considered forensic analysts, but merely business men. Digital forensic analysis is also not merely the location of digital data, location if digital data is done by technicians. Court require the evidence to be scientifically produced and scientific method does not exists for partial methodology, but for the location of the truth.

There are not many people that can simplify the definition of science, but this chart does it.
http://undsci.berkeley.edu/article/scienceflowchart

What is Science?

I have also been in conversations and read a lot about discussion to have digital forensics accepted as a field of science , but no one was specific about how they would fit it into science. So, I created a comprehensive chart of the science field as a reference, so anyone bringing up the subject again could also point to the section that they think digital forensics should be inserted into.

Inman & Rudin defined forensic processes:
Identification: - determination of physical-chemical composition (i.e., illicit drugs)
Classification: - determination of class, type (i.e., hair, fibers, blood type, DNA)
Individualization: - determination of unique identity of source (i.e., fingerprints) by means of class characteristics with known frequency in the relevant population and individual characteristics (also called typica)
Association: - determination of contact between two objects (i.e., fibers, glass)
Reconstruction: - determination of facts of the case: nature and place of events in time and space(i.e., murder, explosion)

Digital Forensic association:
Identification: - determination of physical-location, number, and relevance of storage devices (i.e., CD/DVD, USB, SATA, PATA, SCSI, IEEE 1394)
Classification: - determination of class, type (i.e., Operating System, File System, volatility)

Individualization: - determination of unique identity of source (i.e., userID to human mapping, serial number, IMSI/ICCID ) by means of class characteristics with known frequency in the relevant population and individual characteristics (also called typica)
Association: - determination of contact between two objects (i.e., date/time, browsing history, tool usage, link files)
Reconstruction: - determination of facts of the case: nature and place of events in time and space(i.e., keyword search, create user, create file, install/uninstall application )

It leads toward more of a cognitive or behavioral science field like psychology than a branch of formal science. The question will remain open for a long time since mostly non-scientists are focusing on this issue at this time.

( Note: Let me know about any additions or modifications that you might think would be appropriate in order to have a compete and accurate chart. )

Pages