I. Entrance
This benchmark compares Confederation AI's Artificial Brain model with OpenAI's Reinforcement Learning model; It aims to compare learning ability, learning speed, performance and energy consumption metrics. This comparison was carried out with the Frozen-Lake task in the Gymnasium environment. Comparison results include learning ability, learning speed and amount of energy consumed. Reinforcement Learning's Q-Learning algorithm was used for this experiment.
II. Methodology
Q-Learning
It is a type of reinforcement learning algorithm. It is a learning method that allows the agent to learn only from successful experiences by interacting with its environment.
Frozen-Lake q-learning; https://towardsdatascience.com/q-learning-for-beginners-2837b777741
Adaptive Learning
It enables the agent to interact with its environment and learn from all its successful and unsuccessful experiences; It is a learning method in which learning processes are automatically adjusted according to users' needs and performance.
Training Time (min)
It is the total time it takes to reach the desired success rate in training.
Success Point
Number of Successful Task Endings / Number of Attempts
Success Rate (%)
Success Point * 100
Learning Speed (success/min)
Success rate / Time taken to achieve desired success rate
Performance
The success score achieved by the model according to the number of trials defines its performance. Success Point / Number of Trials
Amount of Energy Consumed (kW)
It is the total amount of energy consumed for the model to reach the desired success rate. Measurement was made with a socket type wattmeter.
III. Environment
Let's talk about the game we will solve in this study. Frozen Lake is a simple environment consisting of 16 tiles, where the agent must move towards a target from the first tile.
The tiles can be a safe frozen lake or a hole to trap the agent.
The agent has 4 possible actions: : ◀️ LEFT, 🔽 DOWN, ▶️ RIGHT and 🔼 UP.
The agent must learn to avoid holes so that it can reach the goal with the least number of actions.
The environment is a 4x4 map consisting of 16 tiles as mentioned above. This map consists of the following tiles and holes as you move from the starting point to the target;
IV. Q-Table
There are 16 tiles in Frozen Lake, which means our agent can be in 16 different positions, these are called states. There are 4 possible actions for each state: ◀️ LEFT, 🔽 DOWN, ▶️ RIGHT and 🔼 UP. Learning to play Frozen Lake is like learning which action you should choose in each situation.
With Reinforcement Learning, in order to know which action is best in a given situation, it is necessary to assign a quality value to actions. There are 16 states and 4 actions, so it is necessary to calculate 16 x 4 = 64 values. A nice way to represent this is to use a table known as a Q-table, where rows list each s state and columns list each a action. In this Q-table, each cell contains the value Q(s, a), which is the value (quality) of action a in state s (1 if it is the best possible action, 0 if it is truly bad). When our agent is in a particular state s, he just needs to check this table to see which action has the highest value. It makes sense to take the action with the highest value.
With Artificial Brain, in order to know which action is best in a particular situation, " consciousness of going to the goal" must be formed according to past experiences in every situation. For this, it is enough to find the target once and learn its location. To represent this, it is necessary to perform neuron representation, but to show the strong side in this comparison, neuron connections and states are also decoded to be shown in the q-table.
States | ◀️ LEFT | 🔽 DOWN | ▶️ RIGHT | 🔼 UP |
0 | Q(0, ◀️) | Q(0, 🔽) | Q(0, ▶️) | Q(0, 🔼) |
1 | Q(1, ◀️) | Q(1, 🔽) | Q(1, ▶️) | Q(1, 🔼) |
2 | Q(2, ◀️) | Q(2, 🔽) | Q(2, ▶️) | Q(2, 🔼) |
3 | Q(3, ◀️) | Q(3, 🔽) | Q(3, ▶️) | Q(3, 🔼) |
… | … | … | … | … |
14 | Q(14, ◀️) | Q(14, 🔽) | Q(14, ▶️) | Q(14, 🔼) |
15 | Q(15, ◀️) | Q(15, 🔽) | Q(15, ▶️) | Q(15, 🔼) |
V. Experimental Settings
Artificial Brain by Confederation AI | Q-Learning by Open AI |
Target Succes Rate: ~%80 | Target Succes Rate: ~%80 |
Episodes: 50 | Episodes: 500 |
- | Epsilon: 0,1 Amount of randomness in action selection |
- | epsilon_decay = 0.001 Fixed amount to be reduced |
Permenance Increment: 0,1 Learning Rate | Alpha: 0,5 Learning Rate |
Permenance Decrement: 0,05 Forgetting Rate | Gamma: 0,9 Discount Factor |
The most important difference here is; The agent working with the Reinforcement Learning model cannot learn the holes during the mission and falls from the same direction over and over again. This delays the agent reaching the goal and seriously negatively affects the learning speed.
While the agent works with the Artificial Brain model, it learns all situations on its own in real time and models every event in the environment. Thus, after the agent falls into a hole from one direction once, the " consciousness of not falling into the hole" occurs and avoids these holes in subsequent attempts.
VI. Results
Training Time
To achieve the targeted success rate (~80%), the agent spent 16 minutes and 30 seconds in total with q-learning, while it took 2 minutes and 30 seconds to complete this task with Artificial Brain. These results show that training with Artificial Brain is approximately 15 times faster than training with Q-Learning. This result reveals the importance of how efficient real-time adaptive learning is compared to offline education.
Success Rate
At the beginning of the experiment, the number of trials required for the Q-Learning algorithm to reach an 80% success rate was set to 500. At the end of the experiment, Q-Learning reached a success rate of 79.4% with 500 trials.
At the beginning of the experiment, the number of trials required for the Artificial Brain model to reach an 80% success rate was set to 50. And at the end of the experiment, the Artificial Brain model achieved a success rate of 82.4% with only 50 trials.
These results show that the Artificial Brain model accelerates the training process by 10 times compared to the Q-Learning algorithm. This is probably the minimum time required to train an AI model.
Learning Speed
When we look at the success rate during the training period; Q-Learning achieved a success rate of 79.4% in 16 minutes and 30 seconds, with a learning rate of 0.794/16.5 = 0.05. In other words, the rate of successful attempts per minute is 0.05.
Artificial Brain achieved a success rate of 82% in 2 minutes and 30 seconds, with a learning speed of 0.82/2.5 = 0.33. In other words, the rate of successful attempts per minute is 0.33.
These results show that Artificial Brain is a learning model that is 6.6 times faster than Q-Learning.
Performance
Q-learning reached a success score of 79.4 with 500 attempts; It has a performance score of 79.4/500 = 0.1588.
Artificial Brain reached 82 success points with 50 attempts; It has a performance score of 82/50 = 1.64.
These results prove that Artificial Brain performs approximately 10 times higher than Q-Learning.
Amount of Energy Consumed
In order to approach the desired success rate, Q-Learning consumed a total of 5,355 kW of energy during its time in office. According to wattmeter measurements, instantaneous energy consumption was 10 watts.
In order to approach the desired success rate, Artificial Brain consumed a total of 1,125 kW of energy during its time on duty. According to wattmeter measurements, instantaneous energy consumption was 35 watts.
From these results, it is understood that Q-Learning is 3.5 times more efficient than Artificial Brain when comparing the amount of energy consumed instantly while the models are running. When we compare the total amount of energy consumed, Artificial Brain is 4.8 times more energy efficient than Q-Learning.
The reason for this result is that Q-Learning normally reduces processing power because it cannot model everything around it. However, it consumes more energy in total because it increases the number of attempts and the number of transactions in achieving the desired success rate.
Results Summary
| Artificial Brain | Q-Learning |
Learning Type | Adaptive | Realtime | Reinforcement | Offline |
Target Success Rate | ~%80 | ~%80 |
Episode Number | 50 | 500 |
Training Time (min) | 2 min 30 sec | 16 min 30 sec |
Succes Rate (%) | 82.0 | 79.4 |
Learning Speed (success/min) | 0,33 | 0,05 |
Performance (Success Point/Episode Number) | 1,64 | 0,1588 |
Amount of Instant Consumed Energy (Watt) | 35 | 10 |
Total Amount of Energy Consumed (kWatt) | 1,125 | 5,355 |
Q-Table for Artificial Brain
States | ◀️ Left | 🔽 Down | ▶️ Right | 🔼 Up |
0 | 0. | 0.24179673 | 0.91986334 | 0. |
1 | 0.38154391 | 0. | 1.08633569 | 0. |
2 | 0. | 1.23258914 | 0. | 0. |
3 | 0. | 0. | 0. | 0. |
4 | 0. | 0.2932113 | 0. | 0.77129754 |
5 | 0. | 0.24179673 | 0. | 0. |
6 | 0. | 0. | 1.40008783 | 0. |
7 | 0. | 1.59451693 | 0. | 0. |
8 | 0. | 0. | 0.35028902 | 0.4578993 |
9 | 0.09886137 | 0.43757931 | 0. | 0. |
10 | 0. | 0. | 0.35109401 | 0. |
11 | 0. | 1.81670508 | 0. | 0. |
12 | 0. | 0. | 0. | 0. |
13 | 0. | 0. | 0.77552346 | 0.11977052 |
14 | 0.30450094 | 0. | 1.05627957 | 0. |
15 | 0. | 0.0 | 0. | 0. |
Q-Table for Q-Learning
States | ◀️ Left | 🔽 Down | ▶️ Right | 🔼 Up |
0 | 0. | 0.59049 | 0. | 0. |
1 | 0. | 0. | 0. | 0. |
2 | 0. | 0. | 0. | 0. |
3 | 0. | 0. | 0. | 0. |
4 | 0. | 0.6561 | 0. | 0. |
5 | 0. | 0. | 0. | 0. |
6 | 0. | 0. | 0. | 0. |
7 | 0. | 0. | 0. | 0. |
8 | 0. | 0. | 0.729 | 0. |
9 | 0. | 0.81 | 0. | 0. |
10 | 0. | 0. | 0. | 0. |
11 | 0. | 0. | 0. | 0. |
12 | 0. | 0. | 0. | 0. |
13 | 0. | 0. | 0.9 | 0. |
14 | 0. | 0. | 1. | 0. |
15 | 0. | 0. | 0. | 0. |
VII. Argument
It is important to note that this comparison has some limitations. First, testing was conducted in only one environment. Testing in different environments can allow us to better evaluate the performance of models. Secondly, testing the models with different hyperparameters will also be useful to show that Artificial Brain is a better model. On the other hand, even making a comparison in a single environment is sufficient to prove the adaptive learning ability and show the gains created by adaptive learning.
VIII. Future Studies
In future studies, optimization studies will be carried out for the Artificial Brain model. The optimized Artificial Brain is intended to outperform Reinforcement Learning in instant energy consumption.
The comparison between Artificial Brain and Reinforcement Learning will be carried out in a robot simulation environment with ROS2 in the next stage. It is aimed to obtain similar results in this environment, which is more complex than frozen-lake.
Kommentare