Pt 1 - Chaos Based Encryption applied to stegoanalysis

Pt 1 - Chaos Based Encryption applied to stegoanalysis

Before diving deeper into the implementation and analysis of image fiel types, I would like to acknowledge Lazaros Moysis and his YouTube series on Chaos-Based Encryption. Youtube Channel: https://www.youtube.com/@lazarosmoysis5095

Each section of this article will feature one of his videos, along with a detailed explanation of its content. The goal is to provide a more detailed breakdown of the topics he covers, as well as possible scripts or implementations he might be using.

Understanding Image Functionality

How it is represented

To start understanding how an image is represented, the knowledge of a NxM matrix is needed.

$$ A = \begin{bmatrix} a_{ij} \end{bmatrix} $$ $$ \quad a_{ij} \in \mathbb{Z}, ; a_{ij} \in [0, 255], ; i = 1, \dots, N, ; j = 1, \dots, M $$ Every integer in the interval [0,255] can be represented usinng 8 bits, following the fundamentals of number theory:

$$ a_{ij} = a_{ij}^1 {{2^0}} + a_{ij}^2 {{2^1}} + a_{ij}^3 {{2^3}} + a_{ij}^4 {{2^4}} + a_{ij}^5 {{2^5}} + a_{ij}^6 {{2^6}} + a_{ij}^7 {{2^7}} + a_{ij}^8 {{2^8}} $$

Therefore, image creation involves manipulating a matrix, processing each pixel bit by bit. The following is an example:

With a 4 pixel Image, it will follow the process:

import numpy as np

imageData = np.array([[0, 85], [170, 255]], dtype=np.uint8)

def extractBits(image):
    height, width = image.shape
    bitLayers = np.zeros((8, height, width), dtype=np.uint8)
    
    for bit in range(8):
        bitLayers[bit] = (image >> bit) & 1
    
    return bitLayers

bitLayers = extractBits(imageData)

for bit in range(8):
    print(f"Layer {bit + 1}:\n{bitLayers[bit]}", end='\n\n')

Output:

Layer 1:
[[0 1]
 [0 1]]

Layer 2:
[[0 0]
 [1 1]]

Layer 3:
[[0 1]
 [0 1]]

Layer 4:
[[0 0]
 [1 1]]

Layer 5:
[[0 1]
 [0 1]]

Layer 6:
[[0 0]
 [1 1]]

Layer 7:
[[0 1]
 [0 1]]

Layer 8:
[[0 0]
 [1 1]]

For greater clarity, each layer can be compared to the positional values of bits in a binary representation.

>>> print(bin(0))
0b0
>>> print(bin(85))
0b01010101

>>> print(bin(170))
0b10101010
>>> print(bin(255))
0b11111111

PD: In the layers the first number goes to the back, and the last to the front, so the number 0babcdefgh in layers will be hgfedcba

Representation of Authentic Images

And all of this applies only to a black-and-white image. When working with full RGB color, things become more complex due to the increased amount of data. As a fun fact, when compressing an image, what you are doing exactly is to reconstruct the image from levels 5-8, 6-8, usually, obtaining the last 3 bits.

Python Script: RGB Process

To apply in RGB images replace the 8 bits to 24 bits, diving each 8 block bits for Red Green and Blue colors.

import numpy as np

imageData = np.array(
	[
	[  # R    G    B
		[85, 170, 255], 
		[255, 255, 255]
	],
	[
		[170, 85, 255],
		[0, 0, 0]
	]
	],
dtype=np.uint8)

def extractBitsRGB(image):
    height, width, channels = image.shape
    bitLayers = np.zeros((8, height, width, channels), dtype=np.uint8)
    for bit in range(8):
        for channel in range(3):
            bitLayers[bit, :, :, channel] = (image[:, :, channel] >> bit) & 1
    return bitLayers

bitLayers = extractBitsRGB(imageData)

# Mostrar los resultados
for bit in range(8):
    print(f"Red:\n{bitLayers[bit][:, :, 0]}",end='\n\n')
	print(f"Green:\n{bitLayers[bit][:, :, 1]}",end='\n\n')
    print(f"Blue:\n{bitLayers[bit][:, :, 2]}",end='\n\n')

Once understood, the basics, we can proceed with the Chaos-Based-Image Encryption.


Chaos Based Image Encryption

Why to choose Chaos-Based Encryption?

Chaos-Based Encryption (CBE) are very sensitive with their seed, with a minimun change in the seed can cause a total diferent result, this can be very useful, to create dynamic keys or encrypted patterns with a low entropy. Making harder to decrypt with linear and diferencials attacks based on the correlations between input and output Mainly the CBE is used in multimedia content such as images, videos, audio and text, with a simple matemathical operations more than AES or RSA, can be run in a low cost hardware.

Furthermore in the Differential Cryptanalysis on Chaotic Based Image Encryption Scheme article from Lee Kong Chian Faculty of Engineering and Science, Universiti Tunku Abdul Rahman, 43000 Kajang, Selangor, Malaysia mentioned in the NPCR and UACI cryptoanalysis tests says:

The rapid development of computer network technology allows widespread transmission of multimedia data such as images and videos over insecure communication channels.> Therefore, a fast and secure image encryption method is deemed important to protect the multimedia data from being accessed by the unauthorized users. The traditional data encryption methods such as Data Encryption Standard (DES), Advanced Encryption Standard (AES) and International Data Encryption Algorithm (IDEA) are no longer suitable for image encryption due to the bulk data capacity and high data capacity. To overcome this drawback, different image encryption methods were proposed based on DNA [1,2], hash function [3], Substitution-box (S-box) [4] and chaotic maps [5–13].

On the other hand, it is true that traditionals methods like AES, RSA and ECC are well documented and standardized by organizations such us NIST. CBE being short of documentation complicating the implementation of such encryption. Furthermore in systems where is used the 32 bits may cause rounding errors (reducing security), if you configure it in a wrong way…

There can be specific attacks for these chaotic algorithms, such us: https://rac.es/ficheros/doc/01213.pdf or the mentioned before: Differential Attacks in CBE

def fun(x9,r,n):
	x = np.zeros(n) 
	x[0] = x0 
	for i in range(1, n): 
		x[i] = r * x[i-1] * (1 - x[i-1]) 
	return x

Differences Compared to Standard Cryptography

Something to keep in mind is the different technicalities between a chaotic system and a cryptographic algorithm.

  graph TD
  subgraph Crypto_Algo [Cryptographic algorithms]
    CA1["Cryptographic algorithms"]
    CA2["Phase space: finite set of integers"]
    CA3["Rounds"]
    CA4["Key"]
    CA5["Diffusion"]
    CA6["Security and performance"]
    CA1 --> CA2 --> CA3 --> CA4 --> CA5 --> CA6
  end

  subgraph Chaotic_Sys [Chaotic systems]
    CS1["Chaotic systems"]
    CS2["Phase space: (sub)set of real numbers"]
    CS3["Iterations"]
    CS4["Parameters"]
    CS5["Sensitivity to initial conditions"]
    CS6["Unpredictability"]
    CS1 --> CS2 --> CS3 --> CS4 --> CS5 --> CS6
  end

Security Testing and Image Analysis

Below is a detailed explanation of several techniques used to assess the security of a chaotic system

Histogram Analysis

There are two mainly encryption methods with CBE:

  • Permutation = Shuffling the pixels of the image
  • Substitution = Changing the values of the pixel images

By reading this, you can think…

Is pixel permutation enough?

This is a very clear and brief question about the comparation between shuffling and encryption, and the fast answer is: no. Buy why is it not enought? Well, permutation only changes the positions of pixels, meaning that their values still the same. Furthermore permutation is inherently deterministic and reversible.

Reading and investigating I found the following conclusion:

$$ \forall a, b \in S,, \pi(a) = \pi(b) \Rightarrow a = b $$ $$ \forall y \in S,, \exists x \in S ; \pi(x) = y $$

A permutation is a bijective mapping, a one-to-one and onto function, from a set to itself. So the set of all permutations of n elemnts forms the symmetric group S. Since permutatinos are bijections, we can conclude they are invertible, being:

$$ \pi^{-1}(\pi(x)) = x $$ So the reversibility of permutations come from the permutations invertibles bijections from the group theory.

The group theory is the studies of algebraic structures know as groups. And says that a group is a set of elements and an operation that meets 4 other conditions:

  1. $\text{It is closed if a,b} \in G \text{, then }ab \in G.$
  2. $\text{It has an identity, if there is an } e \in G \text{ such that for all }a \in G : ea = ae = a.$
  3. $\text{Every elemnt has an inverse for all } a \in G \text{ there is a } b \in G \text{ suich that } ab = e.$
  4. $\text{Its operation is associative for all }a,b,c \in G : (ab)c =a(bc).$

If you want to learn more about group theory watch: https://www.youtube.com/watch?v=IRZK3Dq0OZM.

To conclude this section it is demostrated that permutation is a symmetric group and it can be inverted, being more insecure than encryption.


For checking, one part of the security of the encryption we can use the Histogram, which decipts the distributions of greyscale values. Images depicting information will have a non-uniform histogram.

The histogram of the original image and the permutated/shuffled iamge are the same while the substituted image are diferent, usually with a smaller variance…

If you are treating with non-gray images, such us RBG formats, you will see a histogram for each color, here an example of original image in RGB and substituted/encrypted image in RGB:

Pixel Correlation Analysis

Another method for testing the security is the Correlation, measuring similarity between adjacent pixels.

Adjacent Pixel refers to the pixel points in an image that are positioned next to each other. The correlation between adjacent pixels is used to evaluate the encryption effect, with original images showing strong correlations and encrypted images displaying little to no correlation among neighboring pixel points.

It can be used to asses the encryption effect, there are 4 main formulas to calculares this correlation between neigboring pixels:

E(x) - Expected value (Mean)

$$ E(x) = \frac{1}{N} \sum_{i=1}^{N} x_i $$ This formula represents the average value of the random variable x, with this information we are able to know the average color and brightness of the pixel, and can help us to detect if there are posibilities of a LSB (Least Significant Bits) stego method. E(x) is used in statistical prediction models and in some classification algorithms to normalize additional data for multimedia file types.

It gives us an idea of the central value of the data. In images, if the average is high, the image is generally bright, else the image is dark. To detect the diference we need to follow this simple formula:

$$ \Delta E = E(x_{\text{modified}}) - E(x_{\text{original}}) $$

Script in Sage (Python Interpreter)

Manual

Now we are going to recreate the formula in sage, for this we need to understand what is x and N:

  • x represents the intensity of value of each pixel
  • N is the amount of pixels in the image (width × height)
def E(x):
    N = len(x)
    return sum(x) / N
OpenVC with Numpy

Anyways we have public modules with this imported… Using OpenVC and Numpy we can do the same function:

import cv2
import numpy as np

image = cv2.imread("image.png", cv2.IMREAD_GRAYSCALE)
meanValue = np.mean(image)

print(meanValue)

Type 1 | Spatial Autocorrelation Example

Okay, perfect for the moment, but how we can use it, only having the suspicious file we can calculate the diference we mention before: $$ \Delta E = E(x_{\text{modificado}}) - E(x_{\text{original}}) $$ I will use the implementation of OpenVC and Numpy, but you can try to use the manual one:

import cv2
import numpy as np

image = cv2.imread("image.png", cv2.IMREAD_GRAYSCALE)

diffX = np.abs(image[:, 1:] - image[:, :-1])
meanX = np.mean(diffX)

diffY = np.abs(image[1:, :] - image[:-1, :])
meanY = np.mean(diffY)

print(f"Variantion axis X: {meanX}")
print(f"Variantion axis Y: {meanY}")

Output:

Variantion axis X: 96.24803571428572
Variantion axis Y: 90.30474206349206
Comparation of result

To check if it is a suspicious variation we need to compare with normal images, once we have a few examples we run this code, if Z-Score is between [-2, 2] we can consider a normal output, otherwise the image could be manipulated

import numpy as np

# Normal variances in images
variancesX = [95, 98, 100, 96, 92, 94, 99, 97, 101, 93]
variancesY = [90, 89, 88, 92, 94, 91, 93, 87, 95, 90]

meanX, stdX = np.mean(variancesX), np.std(variancesX)
meanY, stdY = np.mean(variancesY), np.std(variancesY)

varX = 96.24803571428572
varY = 90.30474206349206

ZscoreX = (varX - meanX) / stdX
ZscoreY = (varY - meanY) / stdY

print(f"Z-score X: {ZscoreX}")
print(f"Z-score Y: {ZscoreY}")

Type 2 | Histogram Analysis

This method is simpler than the Spatial Autocorrelation but is less accurate, we create an Histogram taking the values of the pixeles (0-255) and with matplot we search for any peak/beak in the image…

import matplotlib.pyplot as plt
import cv2

image = cv2.imread("mia.jpg", cv2.IMREAD_GRAYSCALE)
hist = cv2.calcHist([image], [0], None, [256], [0,256])

plt.plot(hist)
plt.title("Histogram Intensity")
plt.xlabel("Intensity value (0-255)")
plt.ylabel("Frequency")
plt.show()

Here are the results of two examples, honestly I don’t use and I don’t recomend anyone this method, but is good to know about it:

Type 3 | Local Average Analysis

import cv2
import numpy as np
import matplotlib.pyplot as plt
  
image = cv2.imread("uno.jpg", cv2.IMREAD_ANYCOLOR)
if len(image.shape) == 3:
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

block_size = 8 
h, w = image.shape
means = []


for i in range(0, h, block_size):
    for j in range(0, w, block_size):
        block = image[i:i+block_size, j:j+block_size]
        means.append(np.mean(block))

plt.hist(means, bins=50, color='blue', alpha=0.7)
plt.title("Distribución de Medias por Bloques")
plt.xlabel("Media del Bloque")
plt.ylabel("Frecuencia")
plt.show()

We can detect the bright of each range of pixels with their frequency…

But anyways, the exhaustive analysis come in the followings formulas.

D(x) - Variance

$$ D(x) = \frac{1}{N} \sum_{i=1}^{N} (x_i - E(x))^2 $$

Full formula (Integration of Expected value):

$$ D(x) = \frac{1}{N} \sum_{i=1}^{N} (x_i - (\frac{1}{N} \sum_{i=1}^{N} x_i,))^2 $$ The variance trys to explain how far the values of a variable are from their expected mean. In terms of images, it shows us the intensity values of each pixel and its vary. This variance is mainly calculated in this part of the formula, inside the summation:

$$ (x_i - E(x))^2 $$ Being x the value of the pixel and E(x) the central value of the image. It is squared to avoid negative numbers, in case it may occur.

How to interpret the results (Theory)

It may be a high variance or a low one, here a poor definition and why it may happen.

High Variance:

When the values are widely dispersed from the average, can means a possible manipulation in the pixels, but can be a false flag, considering the following cases:

  1. High detailed image
  2. High color contrast
Low Variance:

Usually in monochrome images where the pixels are closely to the average value, meaning there is no alteration in the pixels, it also can occurs in:

  1. Blurred image
  2. Natural images, which follows a pattern

Script in Sage

Manual
from ExpectedValue import E

def D(x):
	N = len(x)
	
	def squareDIF():
		for xi in x:
			yield (xi - E(x))**2

	return sum(squareDIF()) / N

For non-experience python programmers, the funcion squareDIF() using yield will return an array, known as *args, with the values of the summation, using this to avoid a oneliner such us:

return sum((xi - E(x))** 2 for xi in x) / N

But as yield can cause a call overhead, we will use list comprehensions, for more optimization:

import cv2

def E(x):
    return sum(x) / len(x)

def D(x):
    N = len(x)
    return sum((xi -  E(x)) ** 2 for xi in x) / N 

def readIM(file):
    img = cv2.imread(file, cv2.IMREAD_ANYCOLOR) 
    if img is None:
        raise ValueError("File not found")
    
    return img.flatten().tolist()

data = readIM('uno.jpg')
print("Variance:", D(data))

Using this example we will obtain the following output:

Variance: 4225.138798415301
OpenVC with Numpy
import numpy as np
import cv2

image = cv2.imread("uno.jpg", cv2.IMREAD_ANYCOLOR)
variance = np.var(image)

print("Variance:", variance)

As output we will get:

Variance: 4225.138798416118

Interpret results (Practice)

As we can see both codes give us the same output with an only difference of: $8.17 \times 10^{-13}$. As we are here to learning, we are going to compare how much time spend each program.

With the manual process:

Variance: 4225.138798415301
Execution Time: 0.142820 seconds

With the modules process:

Variance: 4225.138798416118
Execution Time: 0.008577 seconds

We can see, of course, the computer process of the numpy library is much faster than the manual one… so it is not effective as the Expected Value (mean).

I want to remember that with this information we are not able to get a 100% acurrate rate of knowing is an image suffers from LBS but can helps us to understand future advanced stego tools.

cov(x,y) - Covariance (Population Covariance)

$$ \operatorname{cov}(x, y) = \frac{1}{N} \sum_{i=1}^{N} (x_i - E(x))(y_i - E(y)) $$

Full formula (Integration of Expected value):

$$ \operatorname{cov}(x, y) = \frac{1}{N} \sum_{i=1}^{N} (x_i - (\frac{1}{N} \sum_{i=1}^{N} x_i))(y_i - (\frac{1}{N} \sum_{i=1}^{N} y_i)) $$

Covariance is the next level of Variance formula, where measures the relationship between two variables, usually horizontal and vertical pixels, showing how the vary together. There can be 3 types of results applying this formula:

  1. Positive Covariance - Direct relationship
  2. Negative Covariance - Inverse relationship
  3. Near Zero Covariance - Null relationship

This is a long and almost infinite content with a lot of diferent ways of calculating, but we are mainly focussed in the saga of videos, but here two usefulls reports to read:

When we talk about the covarianze of images, we focus on the pattern of change of two variables simultaneosly. This helps us to understand the differencies between satellite images in unlike bands, or the R and G channel of an RGB image…

To be honest, this a inmense and almost infinite world but as I would like this study to come to light someday, I will not go much deeper into these topic… since it is only an uncertain study to verify whether or not there is considered steganography, and we are not in the third video of twelve… (If you want to go deeper please contact me and we can study and understand it better)

Script in Sage

from ExpectedValue import E

def cov(x,y):
    N = len(x)
    r = 0
    
    for i in range(N):
        r += (x[i] - E(x)) * (y[i] - E(y))

    return (1 / N) * r

cov(x,y) - Sampling covariance

This is also a covariance but used for a fraction of the image, the formula we have seen is if the input data, the image, is the raw one, without modifications or trimming. If we have change the sample size the Sampling covariance will be more exactly that the Population. The formula varies in the way we write the denominator, we have seen: N but for this new type of covariance we will subtract one.

$$ \operatorname{cov}(x, y) = \frac{1}{N-1} \sum_{i=1}^{N} (x_i - E(x))(y_i - E(y)) $$

Both, the Latex formula and the Sage script, are almost the same.

from ExpectedValue import E

def cov(x,y):
    N = len(x)
    r = 0
    
    for i in range(N):
        r += (x[i] - E(x)) * (y[i] - E(y))

    return (1 / (N - 1)) * r

γ(x,y) - Pearson correlation coefficient

$$ \gamma(x, y) = \frac{\operatorname{cov}(x, y)}{\sqrt{D(x)} \sqrt{D(y)}} $$

Full formula (Integration of Variance + Expected value + Covariance):

$$ \gamma(x, y) = \frac{\frac{1}{N} \sum_{i=1}^{N} (x_i - (\frac{1}{N} \sum_{i=1}^{N} x_i))(y_i - (\frac{1}{N} \sum_{i=1}^{N} y_i))}{\sqrt{(\frac{1}{N} \sum_{i=1}^{N} (x_i - (\frac{1}{N} \sum_{i=1}^{N} x_i))^2)} \sqrt{(\frac{1}{N} \sum_{i=1}^{N} (x_i - (\frac{1}{N} \sum_{i=1}^{N} y_i))^2)}} $$

Also this formula can be spotted in google as: $$ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} $$

Pearson’s correlation coefficient is a statistical measure to quantify the linear relationship between two associated pixels. Used for the detection of possible modifications in an image caused by the insertion of additional information.

In the formula, in the numberer, the function cov(x,y) is a population covariance without normalization; and in the denominator will be the standard deviations without normalizing. But if you want to take a sample or a small fraction of the image you will need to normalizate both (but at the end of the day… there will be no changes, because the denominator where the normalization is located will be simplified), numberer and denominator, the numberer was mention before but the variance formula will result in:

$$ D(x) = \frac{1}{N-1} \sum_{i=1}^{N} (x_i - E(x))^2 $$ By writting down the “Sampling Pearson Correlation Coefficient” this big and complex formula will appear.

$$ \gamma(x, y) = \frac{\frac{1}{N-1} \sum_{i=1}^{N} (x_i - (\frac{1}{N-1} \sum_{i=1}^{N} x_i))(y_i - (\frac{1}{N-1} \sum_{i=1}^{N} y_i))}{\sqrt{(\frac{1}{N-1} \sum_{i=1}^{N} (x_i - (\frac{1}{N} \sum_{i=1}^{N} x_i))^2)} \sqrt{(\frac{1}{N-1} \sum_{i=1}^{N} (x_i - (\frac{1}{N} \sum_{i=1}^{N} y_i))^2)}} $$ Before simplifying the formula I want to mention that the standarization/normalization is the action of subtracting one to the amount of pixels in the (piece of the) image. If you have paid attention you will see that the expected value is not normalized, but why the rest of them are and this one is not?

The expected value (mean) is calculated by dividing the sum of the values by N (or by N if it is the population), and is not “normalized” in the same way. This is because in the Pearson Correlation or in Sampling Pearson Correlation the denominators will simplify, so they need to be in an equal status to be able to work correctly.

After that, we continue with the simplyfied formula. Starting with a variable change will be performed, instead of calling the sumatory of the expected value we are going to name it: \bar{x} and \bar{y}

$$ \bar{x} = E(x) = \frac{1}{N} \sum_{i=1}^{N} x_i $$ $$ \bar{y} = E(y) = \frac{1}{N} \sum_{i=1}^{N} y_{i} $$ Then canceling all denominator, that’s why you need to have Covariance and Variance in the same type, or normalizated/sampled or not, you will found the exact formula found in Google. $$ \gamma(x, y) = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 , \sum (y_i - \bar{y})^2}} $$ If we name properly the formula we will have $$ \gamma(x, y) = \frac{\operatorname{cov}(x, y)}{\sqrt{D(x)} \sqrt{D(y)}} $$

Script in Sage

from ExpectedValue import E
from Variance import D
from Covariance import cov

def r(x,y):
	numberer = cov(x,y)
	denominator = D(x) * D(y)

	return numberer / denominator

Interpretting results

The outcome of this formula is inside the range of -1 and 1, depending on the proximity of each value: {-1, 0, 1}: $$ \gamma(x,y) \approx 1: \text{The relation between the pixels is strong and in principle does not suffer from manipulation} $$ $$ 1 > \gamma(x,y) \ge 0.1: \text{High probability that the image have been manipulated. If } \gamma = 0 \text{ there are not correlation between pixels from } x \text{ and } y. $$ $$ \gamma < 0: \text{The pixels are inversely proportional, also high probability of being manipulated.} $$ We have finish the Pixel Correlation Analysis, this is more focussed on how images work, but can help us in the detection of steganography, but I repeat my-self again, IT IS NOT ACCURATE.

Testing for Security

As all we have seen, we can study the correlation of pixels without a problem, having the comparation of 3 images:

  1. Plaintext
  2. Shuffled
  3. Encrypted

With their correlation:

Entropy Analysis

Let us begin with the precise definition of entropy:

Computting entropy:

In computing, entropy is the randomness collected by an operating system or application for use in cryptography or other uses that require random data. This randomness is often collected from hardware sources, either pre-existing ones such as mouse movements or specially provided randomness generators.

Information theory entropy

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information contained in a message, usually in units such as bits. Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable

The conclusion is that the enyropy is the amount of informatino and unpredictability/uncertainty present in an image. The highest theoretical value of the entropy is 8, the higher entropy denotes high unpredictability/uncertainity pixel correlation, being the higher, the safer. To continue a detailed explanation of the formula, with a main source of the survey of Computational entropy of Harvard.

Entropy formulas (Unidimensional)

$$ E = - \sum_{2i=0}^{2^L - 1} p_i \log_2 p_i $$

It mentions the measures of the entropy when X is a discrete random variable, the same thing that we are studying:

$$ H(X) = \mathbb{E}_{x \leftarrow X} \left[ \log_2 \left( \frac{1}{\Pr[X = x]} \right) \right] $$ At a glance, it seems to be a completed difference formula, but it is the same formula, but the implementation of some variable change are necesary, the expectation can be written as a sum over all possible values of x (in X).

$$ \mathbb{E_{x \leftarrow X}} = \sum_{ x \in \mathcal{X} } Pr[X = x] $$

In a correct notation. which will be not used, is writed as,

$$ \mathbb{E}|X| = \sum_{x \in \mathcal{X}} x \times Pr[X = x] $$

Pr[X = x] is the probability of the molecular proposition X has the value of the atomic proposition x.


After the variable chage, it will be a result of:

$$ H(X) = \sum_{x \in \mathcal{X}} {\Pr[X = x]} \left[ \log_2 \left( \frac{1}{\Pr[X = x]} \right) \right] $$

For convenience all possible values of x will be named as p, and the possible value of x equals i

$$ p_{i} = {\Pr[X = i]} $$

$$ H(X) = \sum_{i} {p_{i}} \left[ \log_2 \left( \frac{1}{p_{i}} \right) \right] $$

By properties of logarithms can be:

$$ H(X) = \sum_{i} {p_{i}} \log_2 \left( \frac{1}{p_{i}} \right) = \sum_{i}{p_{1}} \log_2 \left( {p_{i}} \right)^{-1} = -\sum_{i}p_{i} \log_{2} p_{i} $$ Finally, considering the value of X are: i = 0,1,...,2L-1, where L is the number of possible binary values, with two possibles results, (1 and 0), the number of combinations could be calculated as: 2^L.

Demostrating the correlation of both formulas. $$ E = - \sum_{2i=0}^{2^L - 1} p_i \log_2 p_i $$ Now that the relationship is mentioned the following formulas can be understood. But before, the same formula scripped in Sagemath.

Sage Script
from math import log

def Entropy(L:list,E:float=0.0) -> float:
	for p in L:
	    if p > 0:
		    E -= p*log2(p)

	return E

min-entropy | Minimun

Formula

$$ H_{\infty}(X) = min_{x}\left[ \log_{2}\left( \frac{1}{Pr[X=x]} \right) \right] = \log_{2}\left( \frac{1}{max_{x} Pr[X=x]} \right) = - \log_{2} (max_{x} Pr[X=x]) $$

Sage Script
from math import log2

def H_Inf(X:list) -> float:
	Pmax = max(X)
	return -log2(Pmax)


max-entropy | Maximun

Also know as the Entropy of Hartley, uses the set-builder notation which obtains only the non-negativa values from the source predicate.

Formula

$$ H_{0}(X) = \log |{x : Pr[X=x] > 0}| $$

Sage Script
from math import log2

def H_0(X:list) -> float:
	x=0.0
	for xi in X:
		if xi > 0:
			x+=1

	return log2(x) if x != 0 else 0.0

Unimensional review

H(X) measures the average number of bits of randomness in X, while H_{infty}(X) and H_{0}(X) are worst-case lower and upper bounds on H(X). Indeed, we have:

$$ H_{\infty}(X) \le H(x) \le H_{0}(X) $$ $$ \text{with equalitity if and only if X is uniform on } {x : Pr[X=x] > 0} \text{ ; that is, X is a flat distribution} $$

But… all of this only for an unidimensional array, in terms of images, only the horizontal axis, so now, the full image, a bidimensional Shannon & Hartley Entropy will be shown. Before entering dellate, it is possible to flatten the image:

>>> import numpy as np
>>> image = np.random.random((3,3))
>>> print(image)
[[0.10002498 0.83730832 0.26422208]
 [0.42196167 0.34551549 0.68976181]
 [0.08653794 0.23869748 0.69132533]]
>>> flattenImage = image.flatten()
>>> print(flattenImage)
[0.10002498 0.83730832 0.26422208 0.42196167 0.34551549 0.68976181
 0.08653794 0.23869748 0.69132533]

If the array needs to be flattened and they not come from numpy the following function can be used:

flattenImage = [pixel for row in p for pixel in row]

The results will be the same, a very good option to avoid complications. Also, this is only for greyscale images, for RGB scale, it is necesary to split the channels.

def splitI(image):
	Rc = []
	Gc = []
    Bc = []
    
    for row in image:
        Rr = []
        Gr = []
        B_row = []
        
        for pixel in row:
            R, G, B = pixel
            R_row.append(R)
            G_row.append(G)
            B_row.append(B)
        R_channel.append(R_row)
        G_channel.append(G_row)
        B_channel.append(B_row)
    
    return R_channel, G_channel, B_channel

imageRGB = [[(),(),(),()],[(),(),(),()],[(),(),(),()]]
R,G,B = splitI(image_color)

Bidimensional Entropy Formulas

The main formula does not change in its mathematical principles; only the way we implement the code varies.

from math import log

def Entropy(L:list,E:float=0.0) -> float:
	for row in L:
		for p in row:
		    if p > 0:
			    E -= p*log2(p)

	 return E

Bidimensional min-entropy

Formula

$$ H_{\infty}(I) = min_{(x,y)}\left[ \log_{2}\left( \frac{1}{Pr[I(x,y)]} \right) \right] = \log_{2}\left( \frac{1}{max_{(x,y)} Pr[I(x,y)]} \right) = - \log_{2} (max_{(x,y)} Pr[I(x,y)]) $$

Sage Script

To simplify the problem, the use of numpy will be applied.

import numpy as np
from math import log2

def H_inf(I:list) -> float:
	pMax = np.max(I)

	return -log2(pMax)

Remember that to use np.max() requires an np.array.

Bidimensional max-entropy

Formula

$$ H_{0}(I) = \log |{(x,y) : I(x,y) > 0}| $$

Sage Script
import numpy as np
from math import log2

def H_0(I:list) -> float:
	c = np.count_nonzero(I >0)
	return log2(c) if c != 0 else 0.0

As the same as min-entropy it is required a numy array, if not use: flattenImage = [pixel for row in p for pixel in row]

Testing for Security

While using this scripts, setting the value of L as (8 bytes) the maximun value will be 8, like the following examples:

Maybe the image you are using could not be the same highest theoretical value, thats why the max-entropy function was explained. Before the survey is continued, I recommend take a look to the following pdf I have already mentioned, it is very good resource for learning Entropy:

Local Entropy

If you noticed, the Entroy has a little problem, the data “Plaintext” and “Shuffled”, have the same entropy level, causing a confusion…. That’s why local entropy is created. Consider a collection of non-overlapping blocks in a image, compute the entropy of each block and calculate the avegrage entropy of these entropies, to obtain the local entropy. You can consider N number of blocks with MxM size pixels.

$$ \lim_{ N \to \infty } = \frac{1}{N} \sum_{i=1}^{N} H(B_{i}) = H(X) $$

Furthermore, by the CLT (Central Limit Theorem), the estimation error usually decays as follows:

$$ |H(X) - \hat{H}_{N}| \propto \frac{1}{\sqrt{ N }} $$

Here a brief comparation:

Python Script for Blocks

import numpy as np
import cv2
import random
from typing import List

def nonOverlappingBlocks(image: np.ndarray, blockSize: int, numberBlocks: int) -> List[np.ndarray]:

    height, width = image.shape[:2]
    blocks = []
    usedPositions = set()
    
    if blockSize > min(height, width):
        raise ValueError(f"{blockSize} > (greater than) {height}x{width} (Image shape)")
    
    maxAttempts = 100
    attempts = 0
    
    while len(blocks) < numberBlocks and attempts < maxAttempts:
        x = random.randint(0, width - blockSize)
        y = random.randint(0, height - blockSize)
        
        overlaps = False
        for (used_x, used_y, used_size) in usedPositions:
            if not (x + blockSize <= used_x or x >= used_x + used_size or
                    y + blockSize <= used_y or y >= used_y + used_size):
                overlaps = True
                break
        
        if not overlaps:
            block = image[y:y+blockSize, x:x+blockSize]
            blocks.append(block)
            usedPositions.add((x, y, blockSize))
        else:
            attempts += 1
    
    if len(blocks) < numberBlocks:
        print(f"Maximun non-overlapped blocks: {len(usedPositions)}")
    
    return blocks

Use example:

if __name__ == "__main__":
    imagePath = "image.png"
    image = cv2.imread(imagePath)
    
    if image is None:
        raise FileNotFoundError(f"File not found: {imagePath}")
    
    
    blockSize = 32  
    numberBlocks = 10 
    
    blocks = nonOverlappingBlocks(image, blockSize, numberBlocks)
    
    for block in blocks:
        print(block)

NPCR & UACI Cryptanalysis

As It is mentioned in one of the first sections of this post, Chaos Based Encryption is typically prone to suffer cryptographic differentials attacks such as:

  • Differential and Chosen-Plaintext Attacks (CPA)
  • Boomerang y Rectangle Attacks
  • Differential-Linear Cryptanalysis
  • NPCR/UACI: Chosen-Plaintext Differential in encrypted images
  • Differential Known-Plaintext Attacks in Chaotic PRBS

In this section the NPCR/UACI attacks will be spoken, anyways If it is precise a further explanation check the following PDF: https://arxiv.org/pdf/1910.11679, due you are reading a brief explanation of some important part of CBE, in the article mentioned you will find some conventional attacks e.g., chosen plaintext attack, known plaintext attack, and arguing the vulnerable encryption scheme for embedded systems based on continuous third-order hyperbolic sine chaotic system proposed by Z. Lin et al. in A novel fast image encryption algorithm for embedded systems.. Using:

$$ \ddot{x} + 0.75\dot{x} + x + 1.2 \times 10^{-6} \cdot x \sinh\left(\frac{\dot{x}}{0.026}\right) = 0 $$

Both tests are selected to measure the resistance of a proposed cryptosystem against the differential attacks introduced by Eli Biham and Adi Shamir in differential cryptanalysis of DES-like cryptosystems. Journal of CRYPTOLOGY, 4(1):3–72, 1991.

Number of Pixels Change Rate - NPCR

NPCR measures the percentage of pixels variations between two encrypted images when their corresponding cleartext images differ with only one pixel, with the purpose of evaluates the sensitive of the encryption system to small changes in the plain image. The higher the NPCR, the more noticeable the difference is proportionally, meaning the CBE systems works correctly.

Knowing this we can go into the formula:

$$ NPCR = \frac{\sum_{i,j}D(i,j)}{M \times N} \times 100% $$ Where:

  • $D(i,j) = \begin{cases} 1 & \text{if } C_1(i,j) \neq C_2(i,j) \\ 0 & \text{if } C_1(i,j) = C_2(i,j) \end{cases} \text{ ; also }D(i,j) \text{ is the binary difference indicator and not the variance function of Pixel Correlation Analysis}$
  • $C_{1} \text{ and } C_{2} \text{ are the two encrypted images}$
  • $M \times N \text{ is the image dimensions}$

It is suposed to return a ~>99.5% or higher value to being a normal Chaos Based Encryption, we can know with the following Sage Script:

def NPCR(img1,img2):
	M = len(img1)
	N = len(img1[0])

	if len(img2) != M or any(len(row) != N for row in img2):
		raise ValueError("Images dimensions do not coincide")

	variation = 0
	for x in range(M):
		for y in range(N):
			if img1[x][y] != img2[x][y]:
				variation+=1
				
	return (variation / M*N) * 100

It is also possible to apply the Pillow module:

from PIL import image
import numpy as np

def NPCR(imgPath1, imgPath2):
	img1 = Image.open(image_path1).convert('L')
    img2 = Image.open(image_path2).convert('L')

	arr1 = np.array(img1)
	arr2 = np.array(img2)

	if arr1.shape != arr2.shape:
		raise ValueError("Images dimensions do not coincide")

	M, N = arr1.shape
	D = arr1 != arr2
	variation = np.sum(D)

	return (variation / M*N) * 100

Unified Average Changing Intensity - UACI

UACI is designed to test the number of mean intesities modified between two encrypted images (commonly used in interferograms), when the difference between the plaintext image is subtle the optimal value of UACI is ~33.46%, in other words UACI quantifies how much the variation of the pixel, which reflects the algorithm’s abilioty to spread differences uniformly across the image.

  • Author anotation: While doing this part of the post I started thinking what was the relationship of the UACI tests with the Chaos Based Image, If you are reading this I am going to refresh you the meaning of a chaotic system, being the brutal variantion of encryption with a small differ in the seed/key/source of the system/algorithm. This tests, also for NPCR, evaluate the sensitive of the cypher at differentials attacks.*

$$ UACI = \frac{1}{M \times N} \sum_{i,j}\frac{|C_{1}(i,j)-C_{2}(i,j)|}{255} \times 100% $$ Where:

  • $C_{1} \text{ and } C_{2} \text{ are the two encrypted images}$
  • $M \times N \text{ is the image dimensions}$

As we mention earlier the perfect percent for this function is 33,46%, that indicates that the cryptographic algorithm and the chaos system is secure. To calculate the UACI from two cipher images use the following code:

def UACI(img1, img2):
	M = len(img1)
	N = len(img1[0])

	if len(img2) != M or any(len(row) != N for row in img2):
		raise ValueError("Images dimensions do not coincide")

	variation:float = 0
	for x in range(M):
		for y in range(N):
			diff = abs(img1[x][y] - img2[x][y])
			variation += diff / 255

	return (variation / M*N) * 100

Hamming Distance - HD

To conclude this anti-plain-text sensitivity attack sention I want to talk about the not mention Hamming Distance function from Mousa Farajallah in Chaos-based crypto and joint crypto-compression systems for images and videos.

$$ HD(C_{1}, C_{2}) = \frac{1}{|Ib|} \sum^{|Ib|}{K-1} (C{1}(K) \oplus C_{2}(K)) $$ Where:

  • $|Ib| = L \times C \times P \times 8$, is the size of thge image in bits
  • $L$ is the number of rows in the image, also called $lines$
  • $C$ is the number of columns in the image
  • $P$ is the number of color planes channels

The optimun HD value is 50%. A good bloc cipher should produce an HD close to 50%. “Probability of bit changes, which means that a one bit difference in plain-image will make every bit of the corresponding cipher-image change with a probability of half” mentions Xingyuan Wang, Dapeng Luan, and Xuemei Bao. in Cryptanalysis of an image en-cryption algorithm using chebyshev generator. Digital Signal Processing, 25:244–247, 2014

Therefore, the plain-text sensitivity attack would become a useless attacking method.

To tests it use:

def HD(C1, C2):
	L = len(C1)
	C = len(C1[0])

	if isinstance(C1[0][0], int):
        P = 1
    else:
        P = len(C1[0][0])

	totaBits = L * C * P * 8
	variance = 0

	for x in range(L):
		for y in range(C):
			if P == 1:
				px1 = C1[x][y]
				px2 = C2[x][y]
				xored = px1 ^ px2
				variance += bin(xor_result).count('1')
			else:
				for z in range(P):
					px1 = C1[x][y][k]
					px2 = C2[x][y][k]
					xored = px1 ^ px2
					variance += bin(xor_result).count('1')
	return (variance / totalBits) * 100

Creating Secure Chaos Based Systems

The next step is yours. In this paper we have seen what is a Chaotic System, what are they used for, some validity checking for being secure, but how to create one? What properties needs a algorithm to be consider a system? In this section we will get in-depth with some complex (at least for me) mathematical concepts.

Lyapunov exponent

First of all, we should mention Lyapunov exponent, which measures the rate at which an infinitesimally small distance between two initially close states grows over time, in other words, it measures exponential divergence.

To understand better the exponent think in two small balls, in the top of a hill, both are nearly the same distance apart, if the hill is smooth, flat, without any rocks you could calculate normaly the trajectory and predict that both will stop in the same space (being the initial space insignificant). This is a common dinamic system.

Only, in real life there are rocks, craters, around all the hill, making detours to the balls. Now, its harder to think that each ball will be nearly when they stop, this is how chaotic system works.

Formula

Thus, the Lyapunov exponent is formed as:

$$ F^{t}(x_{0} + \varepsilon) - F^{t}(x_{0}) \approx \varepsilon e^{\lambda t} $$

Being $\lambda$ the Lyapunov exponent. The left side is the distance between two initially close states after lost steps, and the right side is the assumption that the distance grows exponentially with time. $\lambda$ is measured by a long period time ideally $t \to \infty$. Also, $\epsilon$ are the rocks, craters with a minimun value $\epsilon « 1$ or $\epsilon = 10^{-6}$.

If $\lambda < 0$, the smalls distances are similar not consider a chaotic system. To clear the exponent follow:

$$ e^{\lambda t} \approx \frac{F^{t}(x_{0} + \varepsilon) - F^{t}(x_{0})}{\epsilon} $$

Being $\delta(t) = F^{t}(x_{0} + \varepsilon) - F^{t}(x_{0})$

$$ \lambda = \frac{1}{t} \ln ( \frac{\delta(t)}{\epsilon}) $$

Lastly, the limits:

$$ \lambda = \lim_{ t \to \infty ,\epsilon \to 0 } \frac{1}{t} \ln ( \frac{\delta(t)}{\epsilon}) $$

Once this is understood, let’s take the following POSSIBLE chaotic system: $f(x) = k \times x \times (x-1)$ and the linear map: $g(x) = x \times k$, where $k \text{ is cte}$ in both cases.

To calculate the exponent, use the following sage function:

import numpy as np

def lyapunov(mapFunction, x0, eps, idx, discard=100):
    x=x0
    xPert = x0 + eps
    lyap = 0.0

    for i in range(idx):
        x = mapFunction(x)
        xPert =  mapFunction(xPert)

        delta = abs(xPert - x)
        xPert = x + eps * (xPert - x) / delta 

        if i >= discard:
            lyap += np.log(delta / eps)
    ei =  idx - discard
    return (lyap / ei)

def f(k):
    return lambda x: k * x * (1 - x)

def g(k):
    return lambda x: k * x

Taking the following parameters:

  • $\epsilon$ | eps = $10^{-8}$ | 1e-8
  • x0 = 0.5
  • idx = 5000

We obtain:

  • $f(2) \approx -17.3998 $
  • $f(4) \approx 1.3863$
  • $g(0.5) \approx -0.6931$

To consider, the higher value for the iterations and the lower value for $\epsilon$ (initial pertubations) better success hit in $\lambda$. And remember there can be false positives, so the more tests your function pass, the more likely you are to get it right.

Famous Examples

Also to conclude my first paper I want to mention some famous maps:

Logistic Map

$$ x_{n+1} = r x_n (1 - x_n) $$ $\quad x_n \in (0, 1),\ r \in (3.57, 4]$

Henon Map

$$ x_{n+1} = 1 - a x_n^2 + y_n \\ y_{n+1} = b x_n $$ $\quad a = 1.4,\ b = 0.3$

Lorenz System

$$ \begin{cases} \dot{x} = \sigma(y - x) \\ \dot{y} = x(\rho - z) - y \\ \dot{z} = x y - \beta z \end{cases} $$ $\quad \text{ with } \sigma = 10,\ \rho = 28,\ \beta = \frac{8}{3}$

Tent Map

$$ x_{n+1} = \begin{cases} \mu x_n & \text{if } x_n < \frac{1}{2} \\ \mu (1 - x_n) & \text{if } x_n \geq \frac{1}{2} \end{cases} $$ $\quad \mu \in (1, 2]$

Ikeda Map

$$ \begin{cases} x_{n+1} = 1 + u (x_n \cos t_n - y_n \sin t_n) \\ y_{n+1} = u (x_n \sin t_n + y_n \cos t_n) \end{cases} $$ $\quad t_n = 0.4 - \dfrac{6}{1 + x_n^2 + y_n^2}$

Part 2?

This was only an introduction to the Chaos Based Encryption and theory, maybe in a future there will be a second part of this post, explaining more tests and checkers of chaotic systems, applying specific attacks into CBE and more!

For the moment I recommend you check a little of:

External References

[1] https://www.sciencedirect.com/science/article/abs/pii/S002002551200521X

[2] https://www.mdpi.com/2079-9292/10/12/1392

[3] https://www.mdpi.com/2073-8994/15/3/726

[4] https://www.mdpi.com/2073-8994/15/7/1311

[5] https://www.mdpi.com/2079-9292/11/19/3156

[6] https://arxiv.org/abs/nlin/0611017

[7] https://journals.mmupress.com/index.php/jiwe/article/download/1194/684/10872

[8] https://www.iieta.org/journals/isi/paper/10.18280/isi.250507

[9] https://hal.science/tel-01179610/document

[10] https://www.mdpi.com/2073-8994/15/12/2138

[11] https://www.sciencedirect.com/topics/computer-science/adjacent-pixel

[12] https://www.researchgate.net/publication/266333755_EMBEDDING_INFORMATION_IN_DCT_COEFFICIENTS_BASED_ON_AVERAGE_COVARIANCE/fulltext/54aea2820cf29661a3d39ee4/EMBEDDING-INFORMATION-IN-DCT-COEFFICIENTS-BASED-ON-AVERAGE-COVARIANCE.pdf

[13] https://hal.science/hal-02165866v1/document

[14] https://salil.seas.harvard.edu/sites/g/files/omnuum4266/files/salil/files/acm-publishedversion.pd

[15] https://link.springer.com/article/10.1007/s11042-018-6824-5

[16] https://www.sciencedirect.com/science/article/abs/pii/S1007570413005030?via%3Dihub

[17] https://www.qeios.com/read/K65HZS

[18] https://medium.com/@rusabhshah17/chaos-based-image-encryption-algorithm-9f4870805a1d

[19] https://www.researchgate.net/publication/340674341_An_Image_Encryption_Algorithm_Based_on_Random_Walk_and_Hyperchaotic_Systems

[20] https://espanol.libretexts.org/Matematicas/Computacion_Cientifica_Simulaciones_y_Modelado/Libro%3A_Introducci%C3%B3n_al_Modelado_y_An%C3%A1lisis_de_Sistemas_Complejos_(Sayama)/09%3A_Caos/9.03%3A_Exponente_de_Lyapunov