Back to all Blog Posts

Introduction to Reinforcement Learning – When Machines Learn Like Humans

  • Coding
  • Deep Learning
  • Python
09. January 2018
·

Sebastian Heinz
CEO

Most machine learning algorithms used in practice today belong to the class of supervised learning. In supervised learning, the machine learning model is presented ex post with an already known target variable $y$, which should be predicted as accurately as possible based on various influencing factors $X$ in the data through a function $f$. The function $f$ abstractly represents the respective machine learning model, which provides a mapping between the model's inputs and outputs.

funktion

The learned target variables of simple ML models are therefore typically static and change only over historical time, which is already known ex post. Over time, new data points can be collected, which are then incorporated into the learned mapping $f$ between $X$ and $y$ in the form of "retraining".

Based on the predictions of a model, further actions—which are usually still human-controlled—are triggered. These actions often represent the actual decisions that lead to business-relevant impacts and implications. As a result, business decisions based on ML models are, in many cases, still not fully machine-controlled or only partially automated. Only in very few areas today are fully autonomous models applied, capable of making independent decisions and monitoring or adjusting their behavior in a model-based manner.

On the path toward self-learning, autonomous algorithms, which are necessary for artificial intelligence, models from supervised learning provide only a limited contribution.

On the path toward self-learning, autonomous algorithms, which are essential for artificial intelligence, models from supervised learning provide only a limited contribution. This is primarily due to the fact that the 1:1 relationship between inputs and outputs learned by machine learning models is insufficient for more complex scenarios or decision-making environments. For example, most machine learning models struggle to learn multiple target variables simultaneously or to capture a sequence of target variables and actions. Additionally, the relationship between influencing factors and target variables can immediately vary depending on the environment or based on previous predictions and decisions. This would imply a continuous retraining of models under changing conditions.

To enable machine learning models to be applied in environments where they can independently learn action-reaction events, learning methods are required that account for the dynamic nature of a changing environment. A well-known example of the successful application of such algorithms is Google’s AI AlphaGo, which defeated the world’s best human Go player. AlphaGo would have been impossible to develop using classical supervised learning methods since, due to the infinite number of moves and scenarios, no model would have been capable of mapping the complexity of action-reaction relationships as a simple input-output function. Instead, methods are needed that can autonomously respond to new environmental conditions, anticipate possible future actions, and incorporate them into current decision-making. The class of learning methods on which systems like AlphaGo are based is called reinforcement learning.

What is Reinforcement Learning?

Reinforcement Learning (RL), alongside Supervised Learning and Unsupervised Learning, represents the third major category of machine learning methods. RL is a method inspired by the natural learning behavior of humans. Human learning, especially in early stages, often occurs through simple exploration of the environment. In a learning problem, our actions are defined within a certain action space. Through trial and error, we observe and evaluate the effects of different actions on our environment. In response to our actions, we receive feedback from our environment, abstractly represented in the form of rewards or punishments. However, the concept of reward or punishment should rarely be understood monetarily. In many cases, rewards come in the form of social acceptance, praise from others, personal well-being, or a sense of achievement. Often, there is also a temporal delay between an action and its reward. Humans typically aim not just to maximize immediate rewards, but rather to maximize their total expected reward over time through their actions.

An example: When learning to play the guitar, our action space consists of plucking the strings and pressing down on the fretboard. Through initially random exploration of these actions, we receive feedback from the environment in the form of sounds produced by the guitar. We are rewarded when the notes sound clear and punished when our neighbors bang on the ceiling with a broom. Our goal is to maximize the total expected reward in the form of correctly played notes and chords over our relevant time horizon. This does not mean that we stop learning as soon as we can play a single chord correctly—instead, it manifests through continuous training and the constant pursuit of new rewards and successes that await us over time. Naturally, the duration of exploration of possible actions and environmental rewards can be improved by introducing an external trainer. This example is, of course, highly simplified, but it effectively conveys the core principle of Reinforcement Learning.

Formally, Reinforcement Learning consists of five key components: (1) the agent, (2) the environment, (3) the state, (4) the action, and (5) the reward. The general process can be described as follows: The agent performs an action ($a_t$) from the available action space ($A$) in a given environment at a specific state ($s_t$), which leads to a response from the environment in the form of a reward ($r_t$).

reinforcement-learning

The environment's reaction to the agent's action then influences the agent's choice of action in the next state $s_{t+1}$. Over thousands, hundreds of thousands, or even millions of iterations, the agent can approximate the relationship between its actions and the expected future reward in each state, allowing it to behave optimally.

The agent constantly faces a dilemma between exploiting its previously acquired experience and exploring new strategies to increase its reward. This is known as the "exploration-exploitation dilemma." The approximation of rewards can be model-free, meaning it occurs purely through exploration of the environment, or it can be achieved using machine learning models that attempt to approximate the value of an action. The latter approach is particularly useful when the state and/or action space is high-dimensional.

Q-Learning

To train Reinforcement Learning systems, a commonly used method is Q-Learning. The name Q-Learning comes from the so-called Q-function $Q(s,a)$, which describes the expected reward $Q$ of an action $a$ in state $s$. The Q-values are stored in the Q-matrix $Q$, whose dimensionality is defined by the number of possible states and actions. During training, the agent attempts to approximate the Q-values of the Q-matrix through exploration, so that these values can later be used as a decision rule. The reward matrix $R$ contains, corresponding to $Q$, the respective rewards the agent receives for each state-action pair.

The approximation of Q-values works in its simplest form as follows: The agent starts in a randomly initialized state$s_t$. The agent then randomly selects an action$a_t$ from $A$, observes the corresponding reward$r_t$, and the next state$s_{t+1}$. The update rule of the Q-matrix is defined as follows:

$$Q(s_t,a_t)=(1-\alpha)Q(s_t, a_t)+\alpha(r_t+\gamma \max Q(s_{t+1},a))$$

The Q-value in state $s_t$ for executing action $a_t$ is a function of the already learned Q-value (first part of the equation) and the reward in the current state, plus the discounted maximum Q-value of all possible actions $a$ in the next state $s_{t+1}$.

The parameter $\alpha$, found in the first part of the equation, is called the learning rate and controls the extent to which newly observed information influences the agent’s decision to take a specific action.

The parameter $\gamma$ is known as the discount factor, which manages the trade-off between short-term and future rewards in the agent’s decision-making process. Smaller $\gamma$ values make the agent favor immediate rewards, while larger $\gamma$ values encourage the agent to prioritize long-term rewards in decision-making.

In model-based Q-Learning environments, exploration of the environment does not occur purely randomly. Instead, Q-values in the Q-matrix are approximated using machine learning models, typically neural networks and deep learning models. During training, model-based RL systems often include a random action component, where the agent executes an action with a probability $p \lt \epsilon$. This approach is called $\epsilon$-greedy and is designed to prevent the agent from always selecting the same actions during exploration.

After the learning phase, the agent selects the action with the highest Q-value in each state, $\max Q(s_t, a)$. This allows the agent to move from state to state, always choosing the action that maximizes the approximated reward.

Q-Learning is particularly useful when the number of possible states and actions remains manageable. Otherwise, due to combinatorial complexity, solving the problem with pure exploration mechanisms becomes difficult. For this reason, in high-dimensional state and action spaces, the approximation of Q-values is often performed using model-based approaches.

Another challenge in applying Q-Learning arises when rewards are temporally distant from the agent’s current state and action space. If no rewards exist in nearby states, the agent requires a long exploration phase before it can propagate rewards from distant future states back to its current decision-making process.

Minimal Example for Q-Learning

The new autonomous vacuum robot "Dusty3000" from the company DUSTWORX is designed to navigate unknown apartments fully autonomously. The device uses a Reinforcement Learning approach to determine which rooms in an apartment accumulate dust and lint. A virtual test apartment, where the robot is calibrated, has the following layout:

grundriss-wohnung-1

The virtual test apartment consists of five rooms, but in this test scenario, dust is only present in the living room. If the robot finds its way to the dust, it receives a reward of $r = 1$; otherwise, no reward is given($r = 0$). Rooms that the robot cannot reach from its current position are assigned $r = -1$ in the reward matrix. In matrix notation, the rewards and the possible actions per room are represented as follows:

reward-matrix-1

Let’s assume the vacuum robot starts its exploration randomly in the hallway (room 0). From its current state ($s = 0$), the robot has three possible actions: 1, 2, or 4. Actions 0 and 3 are not possible because these rooms cannot be reached directly from the hallway. The agent receives no reward if it chooses action $a = 1$, moving from the hallway to room 1 (bathroom). The same applies to action $a = 2$ (moving into the bedroom). However, if the robot chooses action $a = 4$ and moves into the living room, it finds the dust and receives a reward of $r = 1$. For an external observer, the optimal action seems obvious, but our robot does not know its environment and randomly selects from the available actions.

The robot begins its random exploration of the apartment. The learning rate is set to $\alpha = 1$, and the discount factor is set to $\gamma = 0.95$. Starting in the hallway, it selects action $a = 2$ randomly from the available actions. As a result, the robot moves into the bedroom. In the simplified case where $\alpha = 1.0$, the Q-value for moving from the hallway to the bedroom is defined as follows:

$$Q(0,2)=(1-\alpha)Q(0,2)+\alpha(r_t+\gamma \max Q(s_{t+1},a))$$$$=(1-1)*0+1*(0+0.95\max[Q(2,0),Q(2,4)])$$$$=0+1*(0+0.95 \max[0, 1])=0.95$$

Here, the importance of the discount factor becomes evident: If the discount factor were 0, the possibility of moving from the bedroom to the living room would not be considered when initially moving from the hallway to the bedroom. This would result in a Q-value of $Q(0,2) = 0$. However, since the discount factor in our example is greater than 0, the potential future reward is also incorporated into $Q(0,2)$.

Upon arriving in the bedroom, the robot again has two possible actions: It can either move back to the hallway ($a = 0$) or continue moving into the living room ($a = 4$). By chance, the robot chooses the living room, resulting in the following Q-value:

$$Q(2,4)=(1-\alpha)+\alpha(r_t+\gamma \max Q(s_{t+1},a)) $$$$=(1-1)*0+1*(1+0.95\max[Q(4,0),Q(4,3)]) $$$$ =0+1*(1+0.95\max[0,0])=1$$

The entire process from exploration to receiving a reward is called an episode. Since the robot has now reached its goal, the episode ends. During this iteration, the agent was able to calculate two Q-values, which are stored in the Q-matrix—essentially serving as the robot’s memory.

q-matrix-1

As training progresses, the values in the Q-matrix are updated step by step by the algorithm. The robot selects the action with the highest Q-value in each room. The exploration of the environment ends when the agent receives a reward.

Implementation in Python

To illustrate the example above, the following Python code implements the algorithm. First, a function is created that executes the described algorithm based on the reward matrix $R$, the learning rate $\alpha$, the discount factor $\gamma$, and the number of episodes.

# Imports
import numpy as np

# Funktion
def q_learning(R, gamma, alpha, episodes):

    """ Funktion für Q Learning """
    
    # Anzahl der Zeilen und Spalten der R-Matrix
    n, p = R.shape

    # Erstellung der Q Matrix (0-Werte)
    Q = np.zeros(shape=[n, p])
    
    # Loop Episoden
    for i in range(episodes):

        # Zufälliger Startpunkt des Roboters
        state = np.random.randint(0, n, 1)

        # Iteration
        for j in range(100):

            # Mögliche Rewards im aktuellen Status
            rewards = R[state]

            # Mögliche Bewegungen des Roboters im aktuellen Status
            possible_moves = np.where(rewards[0] > -1)[0]
            
            # Zufällige Bewegung des Roboters
            next_state = np.random.choice(possible_moves, 1)

            # Update der Q values berechnen
            Q[state, next_state] = (1 - alpha) * Q[state, next_state] + alpha * (
                R[state, next_state] + gamma * np.max(Q[next_state, :]))

            # Abbrechen der Episode wenn Ziel erreicht
            if R[state, next_state] == 1:
                break

    # Q-Matrix zurückgeben
    return Q

First, the number of states $n$ and the number of actions $p$ are determined. These values are derived from the reward matrix $R$. Next, an initially empty matrix $Q$ is created to store the Q-values during the learning process. Inside the loop over the specified number of episodes $episodes$, the agent starts at a randomly selected position. The described algorithm is then executed in $j=100$ iterations: From the set of possible actions $possible_moves$ for the current state $state$, a random action is selected and stored in the variable $next_state$. Then, the Q-values are updated according to the formula described above. Here, both the current Q-value at $Q[state, next_state]$ and the reward of the current state $R[state, next_state]$ are processed. After updating the Q-values, it is checked whether the agent has received a reward. If so, the inner loop terminates, and the simulation proceeds to the next episode. After all episodes are completed, the final Q-matrix is returned.

A practical application of the above function is shown in the code box below. First, the reward matrix is defined in analogy to the example described earlier, and then passed to the q_learning function along with the other parameters. The learning rate is set to $\alpha=0.8$, and the discount factor is set to $\gamma=0.95$. A total of $n=1000$ episodes are simulated.

# Anzahl der Räume
rooms = 5

# Belohnungs-Matrix
R = np.zeros(shape=[rooms, rooms])
R[0, :] = [-1,  0,  0, -1,  1]
R[1, :] = [ 0, -1, -1, -1, -1]
R[2, :] = [ 0, -1, -1, -1,  1]
R[3, :] = [-1, -1, -1, -1,  1]
R[4, :] = [ 0, -1,  0,  0,  1]

# Q-Learning
Q = q_learning(R=R, gamma=0.95, alpha=0.8, episodes=1000)

# Finale Q Matrix normalisieren und anzeigen
np.round(Q / np.max(Q), 4)
array([[ 0.    ,  0.9025,  0.95  ,  0.    ,  1.    ],
       [ 0.95  ,  0.    ,  0.    ,  0.    ,  0.    ],
       [ 0.95  ,  0.    ,  0.    ,  0.    ,  1.    ],
       [ 0.    ,  0.    ,  0.    ,  0.    ,  1.    ],
       [ 0.95  ,  0.    ,  0.95  ,  0.95  ,  1.    ]])

At the final Q-matrix, the learned actions of the robot can now be read. This is done by identifying the maximum Q-value per state (row). Example: If the robot is in the hallway (row 1 of the matrix), the maximum Q-value is found in the last column. This corresponds to a movement into the living room. If the agent is in the bathroom (row 2), it first moves into the hallway (row 1). From there, the maximum Q-value determines a movement into the living room.

A more complex example

Our Dusty3000 has already proven itself in the simple simulation environment. Now, we want to determine how the robot can navigate more complex apartments. For this purpose, the apartment has been expanded by two additional rooms, and the dust has been moved from the living room to the children's room:

grundriss-wohnung-2

The children's room is significantly more difficult to reach compared to the previous example and can only be accessed through various paths. As a result, the robot will have a harder time correctly estimating the Q-values. The reward matrix is adjusted accordingly as follows:

reward-matrix-2

Analogous to the previous example, the learning rate is set to $\alpha=0.8$ and the discount factor to $\gamma=0.95$. A total of $n=1000$ episodes are simulated.

# Anzahl der Räume
rooms = 7

# Belohnungs-Matrix
R = np.zeros(shape=[rooms, rooms])
R[0, :] = [-1,  0,  0, -1,  0, -1, -1]
R[1, :] = [ 0, -1, -1, -1, -1, -1, -1]
R[2, :] = [ 0, -1, -1, -1,  0,  1, -1]
R[3, :] = [-1, -1, -1, -1,  0, -1, -1]
R[4, :] = [ 0, -1,  0,  0, -1, -1,  0]
R[5, :] = [-1, -1,  0, -1, -1, -1,  0]
R[6, :] = [-1, -1, -1, -1,  0,  0, -1]

# Q-Learning!
Q = q_learning(R=R, gamma=0.95, alpha=0.8, episodes=1000)

# Normalize
np.round(Q / np.max(Q), 4)
array([[ 0.    ,  0.8571,  0.9499,  0.    ,  0.9023,  0.    ,  0.    ],
       [ 0.9023,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  0.    ],
       [ 0.9023,  0.    ,  0.    ,  0.    ,  0.9023,  1.    ,  0.    ],
       [ 0.    ,  0.    ,  0.    ,  0.    ,  0.9023,  0.    ,  0.    ],
       [ 0.9023,  0.    ,  0.9497,  0.857 ,  0.    ,  0.    ,  0.857 ],
       [ 0.    ,  0.    ,  0.9499,  0.    ,  0.    ,  0.    ,  0.8571],
       [ 0.    ,  0.    ,  0.    ,  0.    ,  0.9023,  0.9023,  0.    ]])

The resulting Q-matrix is more complex than in the simpler previous example. Example: Starting from the hallway (row 1), the agent moves into room 2 (bedroom) and from there into the children's room, where the reward awaits. Starting from the kitchen (row 4), the robot moves into the living room (row 5), then into the bedroom (row 3), and finally into the children's room.

Conclusion and outlook

With Reinforcement Learning and Q-Learning, it is possible to develop algorithms and systems that autonomously learn and execute actions in both deterministic and stochastic environments without knowing them explicitly. The agent always attempts to maximize the reward generated by the environment based on its actions. The discount factor can be used to control whether the agent should focus on short-term or long-term rewards. The application areas for such agents are diverse and exciting: Some time ago, Google published a paper in which an agent was trained using model-based Reinforcement Learning to play various Atari computer games. In the next stage of development, the previously developed RL system DQN (Deep Q-Network) was applied to the significantly more complex strategy game classic StarCraft. Unlike the example shown here, the derivation of the optimal action was not solved using a matrix-based overview but through Deep Learning models that approximate the Q-values in $s_{t+1}$ based on the pixels on the screen. We will explore this application in the second part of the Reinforcement Learning series and demonstrate how neural networks and Deep Learning can be used as Q-approximators. This opens up significantly more complex and realistic applications, as the number of states can be arbitrarily high.

References

  1. Sutton, Richard S.; Barto, Andrew G. (1998). Reinforcement Learning: An Introduction. MIT Press.
  2. Goodfellow, Ian; Bengio Yoshua; Courville, Cohan (2016). Deep Learning. MIT Press.
Linkedin Logo
Marcel Plaschke
Head of Strategy, Sales & Marketing
schedule a consultation
Zugehörige Leistungen
No items found.

More Blog Posts

  • Artificial Intelligence
AI Trends Report 2025: All 16 Trends at a Glance
Tarik Ashry
05. February 2025
Read more
  • Artificial Intelligence
  • Data Science
  • Human-centered AI
Explainable AI in practice: Finding the right method to open the Black Box
Jonas Wacker
15. November 2024
Read more
  • Artificial Intelligence
  • Data Science
  • GenAI
How a CustomGPT Enhances Efficiency and Creativity at hagebau
Tarik Ashry
06. November 2024
Read more
  • Artificial Intelligence
  • Data Culture
  • Data Science
  • Deep Learning
  • GenAI
  • Machine Learning
AI Trends Report 2024: statworx COO Fabian Müller Takes Stock
Tarik Ashry
05. September 2024
Read more
  • Artificial Intelligence
  • Human-centered AI
  • Strategy
The AI Act is here – These are the risk classes you should know
Fabian Müller
05. August 2024
Read more
  • Artificial Intelligence
  • GenAI
  • statworx
Back to the Future: The Story of Generative AI (Episode 4)
Tarik Ashry
31. July 2024
Read more
  • Artificial Intelligence
  • GenAI
  • statworx
Back to the Future: The Story of Generative AI (Episode 3)
Tarik Ashry
24. July 2024
Read more
  • Artificial Intelligence
  • GenAI
  • statworx
Back to the Future: The Story of Generative AI (Episode 2)
Tarik Ashry
04. July 2024
Read more
  • Artificial Intelligence
  • GenAI
  • statworx
Back to the Future: The Story of Generative AI (Episode 1)
Tarik Ashry
10. July 2024
Read more
  • Artificial Intelligence
  • GenAI
  • statworx
Generative AI as a Thinking Machine? A Media Theory Perspective
Tarik Ashry
13. June 2024
Read more
  • Artificial Intelligence
  • GenAI
  • statworx
Custom AI Chatbots: Combining Strong Performance and Rapid Integration
Tarik Ashry
10. April 2024
Read more
  • Artificial Intelligence
  • Data Culture
  • Human-centered AI
How managers can strengthen the data culture in the company
Tarik Ashry
21. February 2024
Read more
  • Artificial Intelligence
  • Data Culture
  • Human-centered AI
AI in the Workplace: How We Turn Skepticism into Confidence
Tarik Ashry
08. February 2024
Read more
  • Artificial Intelligence
  • Data Science
  • GenAI
The Future of Customer Service: Generative AI as a Success Factor
Tarik Ashry
25. October 2023
Read more
  • Artificial Intelligence
  • Data Science
How we developed a chatbot with real knowledge for Microsoft
Isabel Hermes
27. September 2023
Read more
  • Data Science
  • Data Visualization
  • Frontend Solution
Why Frontend Development is Useful in Data Science Applications
Jakob Gepp
30. August 2023
Read more
  • Artificial Intelligence
  • Human-centered AI
  • statworx
the byte - How We Built an AI-Powered Pop-Up Restaurant
Sebastian Heinz
14. June 2023
Read more
  • Artificial Intelligence
  • Recap
  • statworx
Big Data & AI World 2023 Recap
Team statworx
24. May 2023
Read more
  • Data Science
  • Human-centered AI
  • Statistics & Methods
Unlocking the Black Box – 3 Explainable AI Methods to Prepare for the AI Act
Team statworx
17. May 2023
Read more
  • Artificial Intelligence
  • Human-centered AI
  • Strategy
How the AI Act will change the AI industry: Everything you need to know about it now
Team statworx
11. May 2023
Read more
  • Artificial Intelligence
  • Human-centered AI
  • Machine Learning
Gender Representation in AI – Part 2: Automating the Generation of Gender-Neutral Versions of Face Images
Team statworx
03. May 2023
Read more
  • Artificial Intelligence
  • Data Science
  • Statistics & Methods
A first look into our Forecasting Recommender Tool
Team statworx
26. April 2023
Read more
  • Artificial Intelligence
  • Data Science
On Can, Do, and Want – Why Data Culture and Death Metal have a lot in common
David Schlepps
19. April 2023
Read more
  • Artificial Intelligence
  • Human-centered AI
  • Machine Learning
GPT-4 - A categorisation of the most important innovations
Mareike Flögel
17. March 2023
Read more
  • Artificial Intelligence
  • Data Science
  • Strategy
Decoding the secret of Data Culture: These factors truly influence the culture and success of businesses
Team statworx
16. March 2023
Read more
  • Artificial Intelligence
  • Deep Learning
  • Machine Learning
How to create AI-generated avatars using Stable Diffusion and Textual Inversion
Team statworx
08. March 2023
Read more
  • Artificial Intelligence
  • Human-centered AI
  • Strategy
Knowledge Management with NLP: How to easily process emails with AI
Team statworx
02. March 2023
Read more
  • Artificial Intelligence
  • Deep Learning
  • Machine Learning
3 specific use cases of how ChatGPT will revolutionize communication in companies
Ingo Marquart
16. February 2023
Read more
  • Recap
  • statworx
Ho ho ho – Christmas Kitchen Party
Julius Heinz
22. December 2022
Read more
  • Artificial Intelligence
  • Deep Learning
  • Machine Learning
Real-Time Computer Vision: Face Recognition with a Robot
Sarah Sester
30. November 2022
Read more
  • Data Engineering
  • Tutorial
Data Engineering – From Zero to Hero
Thomas Alcock
23. November 2022
Read more
  • Recap
  • statworx
statworx @ UXDX Conf 2022
Markus Berroth
18. November 2022
Read more
  • Artificial Intelligence
  • Machine Learning
  • Tutorial
Paradigm Shift in NLP: 5 Approaches to Write Better Prompts
Team statworx
26. October 2022
Read more
  • Recap
  • statworx
statworx @ vuejs.de Conf 2022
Jakob Gepp
14. October 2022
Read more
  • Data Engineering
  • Data Science
Application and Infrastructure Monitoring and Logging: metrics and (event) logs
Team statworx
29. September 2022
Read more
  • Coding
  • Data Science
  • Machine Learning
Zero-Shot Text Classification
Fabian Müller
29. September 2022
Read more
  • Cloud Technology
  • Data Engineering
  • Data Science
How to Get Your Data Science Project Ready for the Cloud
Alexander Broska
14. September 2022
Read more
  • Artificial Intelligence
  • Human-centered AI
  • Machine Learning
Gender Repre­sentation in AI – Part 1: Utilizing StyleGAN to Explore Gender Directions in Face Image Editing
Isabel Hermes
18. August 2022
Read more
  • Artificial Intelligence
  • Human-centered AI
statworx AI Principles: Why We Started Developing Our Own AI Guidelines
Team statworx
04. August 2022
Read more
  • Data Engineering
  • Data Science
  • Python
How to Scan Your Code and Dependencies in Python
Thomas Alcock
21. July 2022
Read more
  • Data Engineering
  • Data Science
  • Machine Learning
Data-Centric AI: From Model-First to Data-First AI Processes
Team statworx
13. July 2022
Read more
  • Artificial Intelligence
  • Deep Learning
  • Human-centered AI
  • Machine Learning
DALL-E 2: Why Discrimination in AI Development Cannot Be Ignored
Team statworx
28. June 2022
Read more
  • R
The helfRlein package – A collection of useful functions
Jakob Gepp
23. June 2022
Read more
  • Recap
  • statworx
Unfold 2022 in Bern – by Cleverclip
Team statworx
11. May 2022
Read more
  • Artificial Intelligence
  • Data Science
  • Human-centered AI
  • Machine Learning
Break the Bias in AI
Team statworx
08. March 2022
Read more
  • Artificial Intelligence
  • Cloud Technology
  • Data Science
  • Sustainable AI
How to Reduce the AI Carbon Footprint as a Data Scientist
Team statworx
02. February 2022
Read more
  • Recap
  • statworx
2022 and the rise of statworx next
Sebastian Heinz
06. January 2022
Read more
  • Recap
  • statworx
5 highlights from the Zurich Digital Festival 2021
Team statworx
25. November 2021
Read more
  • Data Science
  • Human-centered AI
  • Machine Learning
  • Strategy
Why Data Science and AI Initiatives Fail – A Reflection on Non-Technical Factors
Team statworx
22. September 2021
Read more
  • Artificial Intelligence
  • Data Science
  • Human-centered AI
  • Machine Learning
  • statworx
Column: Human and machine side by side
Sebastian Heinz
03. September 2021
Read more
  • Coding
  • Data Science
  • Python
How to Automatically Create Project Graphs With Call Graph
Team statworx
25. August 2021
Read more
  • Coding
  • Python
  • Tutorial
statworx Cheatsheets – Python Basics Cheatsheet for Data Science
Team statworx
13. August 2021
Read more
  • Data Science
  • statworx
  • Strategy
STATWORX meets DHBW – Data Science Real-World Use Cases
Team statworx
04. August 2021
Read more
  • Data Engineering
  • Data Science
  • Machine Learning
Deploy and Scale Machine Learning Models with Kubernetes
Team statworx
29. July 2021
Read more
  • Cloud Technology
  • Data Engineering
  • Machine Learning
3 Scenarios for Deploying Machine Learning Workflows Using MLflow
Team statworx
30. June 2021
Read more
  • Artificial Intelligence
  • Deep Learning
  • Machine Learning
Car Model Classification III: Explainability of Deep Learning Models With Grad-CAM
Team statworx
19. May 2021
Read more
  • Artificial Intelligence
  • Coding
  • Deep Learning
Car Model Classification II: Deploying TensorFlow Models in Docker Using TensorFlow Serving
No items found.
12. May 2021
Read more
  • Coding
  • Deep Learning
Car Model Classification I: Transfer Learning with ResNet
Team statworx
05. May 2021
Read more
  • Artificial Intelligence
  • Deep Learning
  • Machine Learning
Car Model Classification IV: Integrating Deep Learning Models With Dash
Dominique Lade
05. May 2021
Read more
  • AI Act
Potential Not Yet Fully Tapped – A Commentary on the EU’s Proposed AI Regulation
Team statworx
28. April 2021
Read more
  • Artificial Intelligence
  • Deep Learning
  • statworx
Creaition – revolutionizing the design process with machine learning
Team statworx
31. March 2021
Read more
  • Artificial Intelligence
  • Data Science
  • Machine Learning
5 Types of Machine Learning Algorithms With Use Cases
Team statworx
24. March 2021
Read more
  • Recaps
  • statworx
2020 – A Year in Review for Me and GPT-3
Sebastian Heinz
23. Dezember 2020
Read more
  • Artificial Intelligence
  • Deep Learning
  • Machine Learning
5 Practical Examples of NLP Use Cases
Team statworx
12. November 2020
Read more
  • Data Science
  • Deep Learning
The 5 Most Important Use Cases for Computer Vision
Team statworx
11. November 2020
Read more
  • Data Science
  • Deep Learning
New Trends in Natural Language Processing – How NLP Becomes Suitable for the Mass-Market
Dominique Lade
29. October 2020
Read more
  • Data Engineering
5 Technologies That Every Data Engineer Should Know
Team statworx
22. October 2020
Read more
  • Artificial Intelligence
  • Data Science
  • Machine Learning

Generative Adversarial Networks: How Data Can Be Generated With Neural Networks
Team statworx
10. October 2020
Read more
  • Coding
  • Data Science
  • Deep Learning
Fine-tuning Tesseract OCR for German Invoices
Team statworx
08. October 2020
Read more
  • Artificial Intelligence
  • Machine Learning
Whitepaper: A Maturity Model for Artificial Intelligence
Team statworx
06. October 2020
Read more
  • Data Engineering
  • Data Science
  • Machine Learning
How to Provide Machine Learning Models With the Help Of Docker Containers
Thomas Alcock
01. October 2020
Read more
  • Recap
  • statworx
STATWORX 2.0 – Opening of the New Headquarters in Frankfurt
Julius Heinz
24. September 2020
Read more
  • Machine Learning
  • Python
  • Tutorial
How to Build a Machine Learning API with Python and Flask
Team statworx
29. July 2020
Read more
  • Data Science
  • Statistics & Methods
Model Regularization – The Bayesian Way
Thomas Alcock
15. July 2020
Read more
  • Recap
  • statworx
Off To New Adventures: STATWORX Office Soft Opening
Team statworx
14. July 2020
Read more
  • Data Engineering
  • R
  • Tutorial
How To Dockerize ShinyApps
Team statworx
15. May 2020
Read more
  • Coding
  • Python
Making Of: A Free API For COVID-19 Data
Sebastian Heinz
01. April 2020
Read more
  • Frontend
  • Python
  • Tutorial
How To Build A Dashboard In Python – Plotly Dash Step-by-Step Tutorial
Alexander Blaufuss
26. March 2020
Read more
  • Coding
  • R
Why Is It Called That Way?! – Origin and Meaning of R Package Names
Team statworx
19. March 2020
Read more
  • Data Visualization
  • R
Community Detection with Louvain and Infomap
Team statworx
04. March 2020
Read more
  • Coding
  • Data Engineering
  • Data Science
Testing REST APIs With Newman
Team statworx
26. February 2020
Read more
  • Coding
  • Frontend
  • R
Dynamic UI Elements in Shiny – Part 2
Team statworx
19. Febuary 2020
Read more
  • Coding
  • Data Visualization
  • R
Animated Plots using ggplot and gganimate
Team statworx
14. Febuary 2020
Read more
  • Machine Learning
Machine Learning Goes Causal II: Meet the Random Forest’s Causal Brother
Team statworx
05. February 2020
Read more
  • Artificial Intelligence
  • Machine Learning
  • Statistics & Methods
Machine Learning Goes Causal I: Why Causality Matters
Team statworx
29.01.2020
Read more
  • Data Engineering
  • R
  • Tutorial
How To Create REST APIs With R Plumber
Stephan Emmer
23. January 2020
Read more
  • Recaps
  • statworx
statworx 2019 – A Year in Review
Sebastian Heinz
20. Dezember 2019
Read more
  • Artificial Intelligence
  • Deep Learning
Deep Learning Overview and Getting Started
Team statworx
04. December 2019
Read more
  • Coding
  • Machine Learning
  • R
Tuning Random Forest on Time Series Data
Team statworx
21. November 2019
Read more
  • Data Science
  • R
Combining Price Elasticities and Sales Forecastings for Sales Improvement
Team statworx
06. November 2019
Read more
  • Data Engineering
  • Python
Access your Spark Cluster from Everywhere with Apache Livy
Team statworx
30. October 2019
Read more
  • Recap
  • statworx
STATWORX on Tour: Wine, Castles & Hiking!
Team statworx
18. October 2019
Read more
  • Data Science
  • R
  • Statistics & Methods
Evaluating Model Performance by Building Cross-Validation from Scratch
Team statworx
02. October 2019
Read more
  • Data Science
  • Machine Learning
  • R
Time Series Forecasting With Random Forest
Team statworx
25. September 2019
Read more
  • Coding
  • Frontend
  • R
Dynamic UI Elements in Shiny – Part 1
Team statworx
11. September 2019
Read more
  • Machine Learning
  • R
  • Statistics & Methods
What the Mape Is FALSELY Blamed For, Its TRUE Weaknesses and BETTER Alternatives!
Team statworx
16. August 2019
Read more
  • Coding
  • Python
Web Scraping 101 in Python with Requests & BeautifulSoup
Team statworx
31. July 2019
Read more
  • Coding
  • Frontend
  • R
Getting Started With Flexdashboards in R
Thomas Alcock
19. July 2019
Read more
  • Recap
  • statworx
statworx summer barbecue 2019
Team statworx
21. June 2019
Read more
  • Data Visualization
  • R
Interactive Network Visualization with R
Team statworx
12. June 2019
Read more
No items found.
This is some text inside of a div block.
This is some text inside of a div block.