Monday, July 2, 2012

Digital Forensics is not just HOW but WHY

Digital Forensics is not just HOW but WHY


This paper will focus on the proper understanding of Digital Forensics from the educational point-of-view and explain the problem with certification based skill validation to establish "expert" skills since education did not really existed up to just a few years ago.  Since Digital Forensics is a discipline of Forensic Science when it comes to properly identifying relevant evidence, strong education should be enforced.  The term forensics, by definition, deal with presenting evidence to a forum, so forensics is not automatically means the discipline of forensic science and does not need the deep technical knowledge as Digital Forensics does.  Forensics is the presentation of artifacts located by a digital forensic expert who is properly educated on technology and the scientific method to evaluate the reliability of the located artifacts.

Methodology Based Discussion


The practice of digital forensics is based on three basic premises:

- Risk management  
(due diligence to find, handle, and protect evidence) 
- Pattern recognition 
(patterns can lead to conclusions and speedier processing of data in question)
- Process control      
(following laws, regulations, and policies that control how data needs to be examined)

Digital forensics is the process of acquiring, analyzing, and presenting relevant and admissible digital data from different data states in a forensically sound manner suitable for litigation support.  Procedures involve digital evidence is used in many type of crimes scenes where digital evidence must be preserved and protected since digital evidence is easily altered and never assume that it was destroyed.[4] Tools and procedures are generally following the Daubert standard and the process of analysis based on the scientific method.  Daubert v. Merrell, 509 U.S. 579 (1993), established the base for the admissibility of expert witnesses’ testimony in federal legal proceedings.  Texas is one of the states accepting Daubert as a rule of admissibility [2].  The scientific method [3] establishes procedures to aid the methodology of determining events that occurred and testing hypotheses in analyzing digital evidence.  Digital forensics key aspects are reliability, repeatability, and verifiable results.

 “A digital investigation is a process to answer questions about digital states and events.”[1]
 “A digital forensic investigation is a special case of a digital investigation where the procedures and techniques that are used will allow the results to be entered into a court of law.  The digital investigation process involves formulating and testing hypotheses about the state of a computer.”[1]
“Digital evidence is data that supports or refutes a hypothesis that was formulated during the investigation.  This is a general notion of evidence and may include data that may not be court admissible because it was not properly or legally acquired.”[1]

References:
1.Brian D. Carrier , “Basic Digital Forensic Investigation Concepts “,June 07, 2006 , http://www.digital-evidence.org/di_basics.html, Accessed: January 20, 2009
2.“ADMISSIBILITY OF SCIENTIFIC EVIDENCE UNDER DAUBERT”, http://faculty.ncwc.edu/mstevens/425/lecture02.htm, Accessed: January 20, 2009
3.Kenneth Lafferty Hess Family Charitable Foundation , “Steps of the Scientific Method”, http://www.sciencebuddies.org/science-fair-projects/project_scientific_method.shtml, Accessed: January 20, 2009
4.U.S. Department of Justice, “Digital Evidence Field Guide Version 1.1”, http://www.rcfl.gov/downloads/documents/FieldGuide_sc.pdf, Accessed: January 20, 2009


In forensic science, where computer forensics is one of the disciplines, the focus must be on the scientific method and on the location of relevant information to the case using tools and techniques that will result in a forensically sound reporting of the facts in the case.  What are those facts and where they come from?  Do you only look at the contents of the data or you consider the data states, the way the data was stored, and the types of data you encounter?  Would the text that was accessed from the suspect's My Documents folder, which was plain text saved by the user be more important than a text that was located in the 4th sector of the hard drive with Base64 encoding and no known user tools that can access that part of the drive?  I would consider a different intent in the second case.  The cube below explores all possibilities when we analyze data so we can present the findings based on the context and also on the circumstance we found that data.

The following methodology, cube will help you analyze your case with all the major technical aspects in mind.


Business Aspects and Keeping the Industry Viable

Besides the technical aspects, business aspects need to be considered when we talk about Digital Forensics and the future of the industry.  It would be nice not to pay for surgery if the patient did not make it out of surgery.  There are challenges and additional time that needs to be considered with allocated visible data vs. hidden data that is encrypted. Recovering data has its challenges on its own.  Data that has been wiped only once can introduce just as much challenge as data that has been over written many times.  Modern storage devices are manufactured based on more precise technology then older drives and newer drives also store data in a denser configuration.  Technology maturity makes data recovery not feasible in some cases.  The time, effort, and cost associated with data recovery needs to be examined before the proper tools are selected.  In digital forensics, most of the time we only need to result to use software utilities to do the job, but in some cases custom controller board, or in extreme cases a scanning tunneling microscope ( STM ) is required to recover data.  In some cases, we need to remove the chip containing the data and extract the raw data from the chip or JTAG the controller board to extract the raw information.  Even if all the efforts placed on recovering data, the results might only be partial structure of the lost data or just bits and bytes that will be full of missing information to really make sense out of.  In some cases, we can get lucky and recover the whole data and its structure using software tools that is cost prohibitive, effort less, and timely.
The following methodology can be used to point out these challenges and end results to clients to eliminate the “CSI effect” where customers expect full recovery of everything in 20 minutes ( including commercial breaks ).  It might be possible, but not feasible to recover data in a timely manner that can be relevant and admissible evidence.  Again, certifications do not prepare professionals to consider or to deal with such data recovery challenges.  I have yet to see a certification that requires the candidate to recover a partially overwritten image and reconstruct that image in a viewable form like forensic tools do it by shading the unrecoverable area of the image black.  Therefore, certifications cannot be considered, even a combination of them, as a measure of expertise of a person.  We have to point back to methodologies and education as the best measure of skills required to evaluate completeness and relevance of artifacts.

 Besides the understanding of basic premises and technical aspects of relevant evidence analysis, appropriate analysis cannot be done without proper education in the field of Computer Science or Information Technology.  Many people rely on certifications that only provide immediate training on a certain aspects of software usage, but do not provide education to understand why software behaves a certain way and how to test the error rate of the software.  There is not focus in certifications on failure detection and verification of located data.  That training cannot be recognized as a skill justification for a field that is a discipline of forensic science.  Certifications are not focusing on the time and cost associated with data analysis and the value of triage to reduce these factors.  Most professionals in this field take pride in their ability to triage a case and analyze a case faster than others, but certifications are not a validation of these important aspects of case work.

The stages of professional development are:
1. Awareness - introduces the WHAT concept without technical knowledge or skill
2. Training    - introduces the HOW concept where proper software/hardware usage established without understanding the implementation of tools validation of findings
3. Education  - establishes the WHY that is based on awareness and the training aspects including verification and detailed understanding of technology "behind" the software/hardware tools for a thorough understanding of artifacts resulted from those tools.

In the following methodology, awareness was referred to as tools since awareness of tools can help us locate data.  Tools are still requiring training to use them properly, but education that makes value of those findings.


Digital Forensic Analysis Flowchart

The basic flowchart of a thorough digital forensic analysis starts with the scope identification.  Analysis is only performed on a forensic duplicate of the evidence that was created utilizing reliable tools.  Every hypothesis is verified with experiments and observations.  The results are not drawn until verification of findings is verified by multiple tools or preferably by hand in a hex editor.


Transfer of Evidence or a DNA of an Event

Let's start with a reference to this theory that has been used in forensic science, Locard's exchange principle. Edmond Locard (1877-1966) was the founder and director of the Institute of criminalistics at the University of Lyons in France. Locard believed that whenever a criminal came into contact with his environment, a cross-transference of evidence occurred. He believed that "every criminal can be connected to a crime by dust particles carried from the scene." (Saferstein, Richard, Criminalistics, Seventh Ed., 2001)
Here we'll be focusing on evidence or data location as a process based discovery where we have to triage the event in question. In any digital system, humans interact with an operating system by using applications in turn the operating system interacts with the hardware. Thus, relevant evidence transfer must take place at each of these interaction points.

Therefore, relevant evidence can connect a person to a crime scene by blood, semen, saliva, and hair, to paint, explosive, drugs, impressions, and chemicals. In digital device interaction or even network communication, the basic premise is that where ever we go ( browse or launch an application ), we will carry some evidence with us and leave some behind. We cannot interact with digital devices without a transfer of evidence occurring.

The main transfer points in local systems are:
- UA -User to Application ( i.e user starts IE browser )
- AOS -Application to Operating System ( i.e. IE browser stores recently typed URLs in the registry )
- UOS -User to Operating System ( i.e. user interrupts the boot process to load kernel drivers for a SCSI drive )
- OSH -Operating System to Hardware ( i.e. OS saves a file to the physical drive or temporarily stores data in physical memory )
- UH -User to Hardware (user changes the hard drive jumper or sets the thumb drive switch to read only)


Wiki page for methodology development: http://dfmethodologies.wikispaces.com/Digital+Forensics
On this Wiki, you can also contribute to improve on existing methodologies and help develop new ones.

Digital Forensics’ core idea is to not change digital evidence regardless of the storage formats being examined by following a forensic triage using forensically sound tools.  The triage is a hierarchical process where the first step is to acquire the data by a bit-by-bit copy method so the analysis can be done using that copy instead of the original storage device.  Before analysis of the data, the integrity of the data must be maintained by protecting the acquired and the original data.  The forensic triage or any type of data stored on any operating system must be performed by a tool accepted in the court of law.  The basic characteristic of the forensically sound tool is to be testable so the relevant scientific community can review its operations and determine its expected error rate.  Thus, tools must be made available for testing and peer review before the tools can be used to perform any of the stages in the forensic triage.  In most cases, in lieu of extended testing, using tools from different sources but with similar capabilities can be used to validate the accuracy of the findings.  Most IT practitioner overlooks the importance of the final stage of the investigation; PRESENTATION is just as much important as the other stages.  Report writing and presenting the findings of the case in an easy to understand manner is key to help decision makers understand the relevance of the located facts.  Presentation is an indirect part of digital forensics that is based on the located relevant evidence, thus we can only refer to it as forensics.

Note: eDiscovery and always-on devices like cell phones, PDAs, and the need for real time analysis of systems of volatile data will make the protection and validation a challenge if it is questioned based on the traditional methodology.


Note: In the methodology above, storage formats lists the major branches or operating system types since one of the function of the operating systems is to manage the file system, therefore the implementation of the operating system that controls how a data is actually stored on secondary storage.

Understanding of the concepts discussed before aid the development of specific methodologies, for example in application research.  Education provides understanding of what happens as a result or user interaction, how to identify those changes, and why we might misinterpret data if we do not look at all aspects us user interaction.  User interaction can create files on their desktop that is recorded by the operating system on the file system and applications can record that action in their logs.  Applications and operating systems can record changes in many different ways, but on Windows OS the most common place these days is one of the registry hives and/or their log files.

In the following methodology, you can see the focus on the registry and understanding that registry changes might be updated at different operating system states.  In this methodology, you can distinguish user, application, and operating system changes.  This way, you can establish if there were any user interactions at all or if an application was making changes without user interaction.  The user interaction might be that someone can schedule a task to run later in time, but the actual event will be taken place without that user's presence at that time.  This is the methodology that is hard to cross reference by automatic tools and partially analyzed by those tools, but only an educated user can identify flaws or finish the partial analysis of the software tools.

Conclusion

This paper was exploring the possible clarification of terminologies and to develop a methodology based explanation of aspects in Digital Forensics.  Methodologies like the ones presented in this paper can be used to develop more educated professionals in this industry and will help better tool development.  I hope to receive feedback on these methodologies and to develop others with the digital forensic community's help.