Monday, December 22, 2014

Hash and test

One of the most basic concept we learn in digital forensics is to ensure our evidence is not changed after acquisition is hashing.  Hashing helps verify the integrity of the data and helps reduce the dataset by identifying known good files.  Hashes can also identify known "bad" data or partial hashes can identify data that are close enough to investigate further for relevance.  Of course, hashes are also used to store passwords for authentication.  There are many algorithms available, but each algorithm must work exactly the same in software implementations.

When using libraries and third party implementations, you still need to test and validate if the implementation works are designed and implemented properly.

The following is an implementation using third party library:

using System;
using XCrypt;
//http://www.codeproject.com/Articles/483490/XCrypt-Encryption-and-decryption-class-wrapper
//Click to download source "Download source code"
//Click on Project -> Add Reference -> navigate to where you have extracted XCrypt.dll

namespace hashMD5
{
    class Program
    {
        static void Main(string[] args)
        {
            XCryptEngine encrypt = new XCryptEngine();
            encrypt.InitializeEngine(XCryptEngine.AlgorithmType.MD5);
            Console.WriteLine("Enter string to hash:");
            string inText = Console.ReadLine();
            string hashText = encrypt.Encrypt(inText);
            Console.WriteLine("Input: {0}\r\nHash: {1}", inText, hashText);
            byte[] temp=GetBytes(hashText);  //for debugging to see each byte value
            Console.ReadLine();

        }
        static byte[] GetBytes(string str)
        {
            byte[] bytes = new byte[str.Length * sizeof(char)];
            System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
            return bytes;
        }
    }
}

Running the code results in the following output.

Enter string to hash:
Richland College
Input: Richland College
Hash: zlC4yZP3XqYqqboh5Lv4IA== 

The output looks strange and more like Base64 than MD5.  We can place break points in the code and monitor for the actual byte values to see the results to see if it is even close to the actual solution.


We can see the hash values are 122, 0 , 108, 0 ...
Now, let see another program implementation of MD5:
using System;
using System.Collections.Generic;
using System.Text;
using System.Security.Cryptography;

namespace anotherHashMD5SHA1
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Enter an message: ");
            string message = Console.ReadLine();
            System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
            MD5 md5 = new MD5CryptoServiceProvider();
            SHA1 sha1 = new SHA1CryptoServiceProvider();
            byte[] messageBytes = encoding.GetBytes(message);
            byte[] hashmessage = md5.ComputeHash(messageBytes);
            string stringMD5 = ByteToString(hashmessage);
            hashmessage = sha1.ComputeHash(hashmessage);
            string stringSHA1 = ByteToString(hashmessage);
            Console.WriteLine("MD5: {0}\r\nSHA-1: {1}", stringMD5, stringSHA1);
//Console.WriteLine("MD5: {0}\r\nSHA-1: {1}",System.Text.Encoding.Default.GetString(hashmessage), stringSHA1);
            Console.ReadLine();

        }
        public static string ByteToString(byte[] buff)
        {
            string sbinary = "";
            for (int i=0; i < buff.Length; i++)
            {
                sbinary += buff[i].ToString("X2");
            }
            return (sbinary);
        }
    }
}
And the output of this code is as follows,
Enter an message:
Richland College
MD5: CE50B8C993F75EA62AA9BA21E4BBF820
SHA-1: B3A6FC316A94949871594C633C8977D28C70E8B7
So, we also need to see what the resulting byte values are for the hash value in order to see if we just have different encoding of the same byte values displayed and the results are really the same or not.
No, we do not have the same byte values, this one gives us 206, 80, 184, 201, ..., so witch one do we trust and use in our code?
You can use a few IT tools to see what the results of those tools will be.  I recommend HashOnClick.
http://www.2brightsparks.com/onclick/hoc.html
You can create a simple text file, in this case, I used the same text like I used with tool, "Richland College".  
CE50B8C993F75EA62AA9BA21E4BBF820 testfile.txt
The results show the same value as the second code sample, so the second code sample should be implemented.
So, as you can see, there are many implementations of the same algorithm and programmers should use libraries and code from others as much as possible to increase productivity and reduce development time, but only responsible code selection can lead to meaningful and more secure code.  Maybe secure coding should have a prerequisite of knowing IT tools and understanding what we expect tools to do before we try to implement code by compiling and "crossing fingers".

No comments:

Post a Comment