Logging and Monitoring: What You Should Log and Why

TL;DR

Five log categories form the minimum: authentication, access to critical data, administrative changes, network traffic, and application errors.
Logs must be collected centrally — local logs on individual systems are worthless during a security incident because attackers delete them.
For SMEs, open-source solutions like Wazuh or Graylog are a realistic alternative to expensive enterprise SIEMs.
Alert rules must be specific and action-triggering: Who gets notified, what needs to be done, and within what timeframe.
Logging is subject to GDPR — you need a legal basis, defined retention periods, and must automate deletion.

Why Logging and Monitoring Are Not Optional Extras

A mid-market company with 120 employees suffered a ransomware attack on a Friday evening. By Monday morning, it was clear that the attackers had been in the network for three weeks. They had moved laterally, exfiltrated data, and finally triggered the encryption. The IT department wanted to trace how the attack unfolded, which systems were affected, and what data had been exfiltrated. The answer to all these questions was the same: We don't know. The logs were stored locally on each individual server, and the servers were encrypted. Even the few logs that were still accessible only went back 48 hours because the default retention period of the Windows Event Log had never been adjusted.

The BSI regularly reports that missing logging is one of the biggest obstacles when investigating cyber incidents. You know something happened, but you can't reconstruct what exactly, when, or through which path.

Logging is the foundation for incident response, forensics, and regulatory reporting. Monitoring goes a step further and tries to detect incidents before they escalate.

ISO 27001 explicitly requires in Annex A.8.15 (Logging) and A.8.16 (Monitoring activities) that security-relevant events are logged, that logs are protected, and that they are regularly reviewed. NIS2 further tightens these requirements and demands in Article 21 the ability to detect and respond to security incidents — which is simply impossible without functioning logging and monitoring.

What You Should Log: The Five Log Categories

The answer "everything" sounds tempting but leads to a dead end: too much noise, exploding storage costs. The better answer is organized around five categories that together form a solid foundation.

Category 1: Authentication

Every login attempt in your environment should be logged — both successful and failed. This includes:

Logins to Windows domains (Active Directory)
Logins to VPN and remote desktop services
Logins to web applications (ERP, CRM, webmail, cloud services)
Logins to network devices (firewalls, switches, WLAN controllers)
Password changes and password resets
MFA events (successful and rejected second factors)

Authentication logs are the first line of defense. Brute force shows up through mass failed attempts, password spraying through scattered failures across many accounts, and compromised accounts through logins at unusual times or locations.

On a Windows Domain Controller, the relevant Event IDs are:

Event ID	Meaning
4624	Successful logon
4625	Failed logon
4648	Logon with explicit credentials (RunAs)
4771	Kerberos pre-authentication failed
4776	NTLM authentication (success/failure)

Category 2: Access to Critical Data and Systems

Not every file access needs to be logged — that would be neither practical nor useful on a file server with millions of files. But access to particularly sensitive data and systems absolutely should be.

This includes:

Access to personnel files, salary data, and executive documents
Access to financial data and accounting systems
Access to customer databases with personal data
Access to technical documentation and trade secrets
Access to backup systems and their configuration
Access to Active Directory (LDAP queries, especially for privileged objects)

The technology for this in Windows environments is Object Access Auditing, which you enable via GPO and configure on specific folders. For highly sensitive data, monitoring both read and write access is recommended; for less critical areas, write access and deletions are sufficient.

At the database level, most systems offer their own audit functions (SQL Server Audit, pgAudit, MySQL Enterprise Audit). Enable them at least for DDL commands, access to sensitive tables, and schema changes.

Category 3: Administrative Changes

Every change to the configuration of your IT environment must be traceable. This is not just a security requirement but also operationally useful — because if something stops working after a change, you need to know what was changed.

Log at minimum:

User and group management in Active Directory (creation, modification, deletion, group memberships)
GPO changes (creation, editing, linking, deletion)
Firewall rule changes
DNS changes
DHCP configuration changes
Changes to backup jobs and schedules
Software installations on servers
Permission changes on file shares
Configuration changes to network devices

In Active Directory, the relevant Event IDs are:

Event ID	Meaning
4720	User account created
4722	User account enabled
4725	User account disabled
4726	User account deleted
4728/4732/4756	Member added to security group
4729/4733/4757	Member removed from security group
4738	User account changed
5136/5137/5141	Directory service object created/changed/deleted

Category 4: Network Traffic

You don't need to log every single network frame — that would be simply impossible in a corporate network. But you need logs at two levels: at the firewall level and at the DNS level.

Firewall logs show you what traffic was allowed and what was blocked. Particularly relevant are outbound connections to unknown or unusual destinations (a server suddenly sending data to an IP address in a foreign country is a classic exfiltration indicator) as well as inbound connection attempts on unexpected ports.

DNS logs are an underrated goldmine for security analysis. Malware frequently communicates with its command-and-control servers via DNS. If you log DNS queries, you can trace which systems communicated with which domains after an incident. A workstation querying hundreds of different subdomains of an unknown domain within minutes is very likely compromised.

DHCP logs help you map IP addresses to computer names and thus to users. VPN logs show who connected from outside, when, and for how long.

Category 5: Application Errors and System Events

The last category covers events that don't seem directly security-relevant but often are: application crashes, service outages, resource exhaustion. Many attack techniques leave traces in the form of application errors. A buffer overflow exploit causes the targeted service to crash; malware injection makes processes unstable.

Relevant Windows Event IDs in this category:

Event ID	Source	Meaning
7036	Service Control Manager	Service started/stopped
7045	Service Control Manager	New service installed (often a malware indicator)
1102	Security	Audit log cleared (almost always an attack)
4688	Security	New process created (with command line, if enabled)
4697	Security	Service installed on the system

Event ID 1102 deserves special attention: If someone clears the Security Event Log, this is virtually never a legitimate action in a production environment. It is almost always an attempt to cover tracks. An alert on this Event ID is a must.

Where to Store: Central Log Management

Logs that only reside locally on individual systems have two fundamental problems: They're gone after a compromise, and during an incident spanning multiple systems, decentralized analysis is far too slow.

The solution is a central log management system: a dedicated server to which all systems send their logs and which keeps them searchable.

SIEM Alternatives for SMEs

The acronym SIEM (Security Information and Event Management) represents the gold standard of log management: systems that correlate logs from various sources, apply rules, and trigger alerts. Enterprise SIEMs like Splunk, QRadar, or Microsoft Sentinel are powerful but often neither financially nor staffing-wise feasible for a company with 100 employees. License costs quickly reach five figures per year, and operation requires dedicated expertise.

The good news: There are alternatives that are realistic for SMEs.

Wazuh is an open-source platform that combines endpoint monitoring, log analysis, and SIEM functionality. Wazuh collects logs via agents on Windows and Linux systems, stores them centrally, and offers predefined rules for detecting common attack patterns. For 100 endpoints, you need a server with 8 GB RAM and 500 GB storage — manageable hardware requirements.

Graylog Open is particularly well suited if you primarily want to collect, search, and analyze logs without needing the full SIEM feature set. The interface is more intuitive than Wazuh's, and setup is less complex.

Windows Event Forwarding (WEF) is the lowest-barrier option and already included in Windows. You configure a Windows server as a collector, and all other systems forward their relevant events to it. WEF has limited search and alerting capabilities but works well as an entry-level solution.

Elastic Stack (ELK) with Elasticsearch, Logstash, and Kibana is powerful and flexible but requires more technical know-how for setup and operation.

Recommendation for getting started: Start with Windows Event Forwarding to centrally collect the most important Windows events. When you find you need more analysis and alerting capabilities, migrate to Wazuh or Graylog. The transition is seamless because all these systems process the same log formats.

Architecture and Security

Regardless of the chosen solution, some fundamental principles apply to the architecture:

Network segmentation. The log server belongs in its own network segment or at least in a segment that isn't directly reachable from the same network as the workstations. An attacker who compromises a workstation shouldn't be able to trivially access the log server.

Write access, no deletion. Systems that send logs should only be able to write to the log server, not read or delete. This prevents a compromised system from manipulating its own logs on the central server.

Separate authentication. The log server ideally should not be joined to the Active Directory domain. If AD is compromised, the log server remains intact and the logs stay available for forensics.

Encrypted transmission. Logs contain sensitive information (usernames, IP addresses, sometimes resource paths). Transmission from the source system to the log server must be encrypted (TLS for Syslog, HTTPS for API-based solutions).

Defining Retention Periods

How long do you need to keep logs? The answer depends on three factors: regulatory requirements, practical necessity, and storage costs.

Regulatory Requirements

ISO 27001 doesn't specify a fixed retention period but requires that the retention duration is defined and documented and that logs remain available long enough to investigate security incidents.

NIS2 requires the ability to analyze incidents. Since the reporting obligation demands a detailed final report within one month, logs must be available at least long enough to conduct a complete incident analysis. In practice, that means at least 90 days, preferably six months.

DSGVO (GDPR) limits retention duration: Logs containing personal data may only be stored as long as necessary for the processing purpose (principle of storage limitation, Article 5(1)(e)).

BSI IT-Grundschutz recommends a retention period of at least 90 days in OPS.1.1.5.

Commercial retention may become relevant for certain logs (such as access to financial systems), where the tax retention period of ten years may apply if the logs are classified as business-relevant records.

Recommended Retention Periods

For a mid-market company, I recommend the following tiered approach:

Log Category	Online Retention (searchable)	Archive Retention (compressed)
Authentication (AD, VPN, MFA)	6 months	12 months
Access to critical data	6 months	12 months
Administrative changes	12 months	24 months
Firewall and DNS logs	3 months	6 months
Application errors and system events	3 months	6 months

Online retention means the logs are directly searchable on the log server. Archive retention means that older logs are stored compressed on archive storage and can be restored when needed. This tiering keeps storage costs manageable without compromising analytical capability.

Automatic deletion. Define an automatic deletion deadline for each log category and implement it technically. Logs stored beyond the defined retention period are not just a storage problem but also a data protection problem.

Defining Alerts: From Logs to Actions

Collecting logs is only half the battle. The real value emerges when you derive alerts from logs — automated notifications that draw your attention to suspicious activity.

The most common mistake is defining too many alerts. If your IT team receives 200 alerts daily, 195 of which are false positives, all of them will be ignored after a short time. This phenomenon (alert fatigue) is more dangerous than having no alerts at all.

Starter Alerts

Begin with a manageable number of alerts that are highly likely to indicate real problems:

Alert 1: Mass failed login attempts. Threshold: More than 20 failed login attempts within 5 minutes on a single account (brute force) or more than 5 failed login attempts across more than 10 different accounts within 10 minutes (password spraying). Notification: IT security officer via email and messenger. Response: Lock account (for brute force), identify and investigate source IP.

Alert 2: Admin account logon outside business hours. Threshold: Any successful logon of a Tier 0 or Tier 1 admin account between 10:00 PM and 6:00 AM or on weekends. Notification: IT management via SMS or messenger. Response: Immediately verify whether the logon is legitimate.

Alert 3: Member added to a privileged group. Threshold: Any addition of a member to Domain Admins, Enterprise Admins, Schema Admins, Administrators, or Backup Operators. Notification: IT management and CISO via email. Response: Verify that an approved change request exists.

Alert 4: Audit log cleared. Threshold: Any occurrence of Event ID 1102 (Security Log was cleared). Notification: CISO via messenger, immediately. Response: Immediate investigation, as this almost always indicates an attack.

Alert 5: New service installed on a server. Threshold: Any occurrence of Event ID 7045 on a server system. Notification: IT security officer via email. Response: Verify whether the installation was planned and authorized. Malware frequently installs itself as a Windows service.

Alert 6: Unusually high data transfer. Threshold: Outbound network traffic from a single system exceeding three times its usual daily average. Notification: IT security officer via email. Response: Investigate for data exfiltration or compromised system.

Structuring Alert Setup

Every alert needs a clear definition: Name and description, data source, threshold and time window, recipients, notification channel, expected response, escalation if no response, and estimated false-positive rate.

Document these definitions in your ISMS. During an audit, you'll be asked not only whether you have alerts but also whether they are documented and whether you can demonstrate that alerts are actually being handled.

Windows Event IDs You Should Know

If you work in a Windows-dominated environment, you'll inevitably deal with Windows Event IDs. The following compilation contains the Event IDs most relevant for security monitoring, grouped by use case.

Logons and Authentication

Event ID	Meaning	Monitoring Note
4624	Successful logon	Watch for Logon Type 10 (RemoteInteractive) from unusual source IPs
4625	Failed logon	Mass failures = brute force or password spraying
4648	Logon with explicit credentials	Often an indicator of lateral movement (RunAs, PsExec)
4672	Special privileges assigned	Shows admin logons; correlate with expected admin systems
4768	Kerberos TGT requested	AS-REP Roasting detection for accounts without pre-auth
4769	Kerberos Service Ticket requested	Encryption Type 0x17 = RC4 = possible Kerberoasting
4771	Kerberos pre-auth failed	Password spraying via Kerberos

Account and Group Management

Event ID	Meaning	Monitoring Note
4720	Account created	Investigate unplanned account creation
4722/4725	Account enabled/disabled	Disabled accounts being re-enabled are suspicious
4724	Password reset	Admin password resets outside the helpdesk process
4728/4732/4756	Member added to group	Alert on privileged groups
4738	Account changed	Changes to admin accounts
4740	Account locked out	Mass lockouts = attack

System Security

Event ID	Meaning	Monitoring Note
1102	Security log cleared	Almost always an attack; respond immediately
4688	New process created	With command line (enable Process Command Line Auditing)
4697	Service installed	Investigate unknown services on servers
7045	New service registered	Like 4697, different source
5136	Directory service object changed	Monitor GPO and AD changes
5145	Network share access	Monitor access to administrative shares (C$, ADMIN$)

Process Command Line Auditing

A particularly valuable feature that is not enabled in many environments: Process Command Line Auditing. If you enable this feature via GPO (Computer Configuration > Administrative Templates > System > Audit Process Creation > Include command line in process creation events), the full command line is logged with every new process (Event ID 4688).

This makes an enormous difference: Instead of "powershell.exe was started," you see "powershell.exe -enc SQBuAHYAbwBrAGUALQBXAGUAYg...," which immediately shows you that a Base64-encoded command was executed — a classic attack pattern.

Data Protection in Logging

Logs contain personal data. Usernames, IP addresses, access timestamps, visited resources — all of this can be attributed to an identifiable person. This means log processing is subject to DSGVO (GDPR), and you need a solid legal basis.

Legal Basis

The standard legal basis for security-related logging is Article 6(1)(f) DSGVO (GDPR) (legitimate interest). Your legitimate interest is ensuring IT security, detecting and investigating security incidents, and fulfilling legal obligations (NIS2, ISO 27001). This interest generally outweighs the interest of data subjects in not being monitored, provided the processing is proportionate.

Proportionate means: You only log what is necessary for the security purpose, don't store longer than needed, restrict access, and inform the data subjects.

Information Obligation

You must inform employees that security-relevant events are being logged. Data subject rights must be upheld. This belongs in the data protection notice for employees, which is typically part of the employment contract or a works agreement.

The information must include: what data is logged, for what purpose, on what legal basis, how long it is retained, who has access, and what rights the data subjects have (access, deletion, complaint to the supervisory authority).

Works Council

In companies with a works council, the logging concept is subject to co-determination under Section 87(1) No. 6 BetrVG (technical facilities suitable for monitoring). You should involve the works council early and ideally conclude a works agreement on IT monitoring that governs the purpose, scope, and access restrictions.

Without a works agreement, you risk the entire logging system being legally challengeable under employment law and any findings being inadmissible.

Access Restrictions

Define an authorization concept for the log management system:

Read access to all logs: Only the CISO and IT security officer
Read access to technical logs: IT administrators (for troubleshooting)
Write access/administration: Only the log system administrator
No access: HR department, executive management (unless a specific, documented reason exists)

The last point is important: Logs must not be used for general employee surveillance. If management wants to know who sat at which computer when, that is not a security reason but surveillance — and without a concrete suspicion and works agreement, it is not permissible.

Audit Trail for NIS2

NIS2 introduces tightened accountability requirements for affected companies. Article 21 requires measures for handling security incidents, and Article 23 requires reporting significant security incidents within defined reporting deadlines: an early warning within 24 hours, an initial assessment within 72 hours, and a final report within one month.

To meet these deadlines, you need a complete audit trail. You must be able to demonstrate when the incident began, how it was detected, which systems and data were affected, what measures were taken, and whether reporting obligations were met.

Without central logging, it becomes nearly impossible to submit a well-founded early warning within 24 hours. You don't know what happened, since when it's been happening, or what's affected.

When the competent supervisory authority reviews your NIS2 compliance, they will examine your logging concept: What is being logged, where, for how long, who has access, how are alerts evaluated? All of this must be documented and demonstrable.

Logging Concept as an ISMS Document

Create a logging concept as a standalone ISMS document with the following sections. In ISMS Lite, the logging concept can be set up as a policy document with an approval workflow, and retention periods per log category can be systematically managed. The tool covers all ISMS modules 500 Euro pro Jahr without seat licenses. Scope, log categories with data sources and Event IDs, central log management (architecture and security), retention periods per category, alert rules with thresholds and escalation, access control, data protection (legal basis, information obligation, works agreement), and review cycle.

This document is your evidence for auditors and supervisory authorities, demonstrating that the logging strategy is a deliberately designed element of your ISMS.

From Theory to Practice: Where to Start?

If you haven't been doing systematic logging, the volume of requirements in this article may feel overwhelming. The good news: You don't need to implement everything at once. Start with the basics and build incrementally.

Phase 1: Lay the Foundations (Week 1-4)

Enable the advanced audit policies on the Domain Controllers via GPO
Increase the Security Event Log size on all servers (at least 1 GB)
Set up Windows Event Forwarding to centrally collect the most critical events
Enable Process Command Line Auditing
Document what you are logging and why

Phase 2: Centralization (Month 2-3)

Decide on a log management solution (Wazuh, Graylog, or ELK)
Install the solution and connect the first sources (Domain Controller, firewall, VPN)
Define retention periods and configure automatic deletion
Gradually connect additional log sources (file server, database server, DNS)

Phase 3: Alerting (Month 3-4)

Define the six starter alerts from the "Defining Alerts" section
Specifically test each alert (simulate failed logons, add a test account to Domain Admins)
Document the alert rules in the ISMS
Establish a process for daily or weekly review of alert evaluations

Phase 4: Optimization (Ongoing)

Evaluate false positives and adjust thresholds
Add further alert rules based on new threats or incidents
Conduct regular log reviews (at least monthly)
Review and update the logging concept as part of the annual ISMS review

Logging and monitoring are not projects that are ever "finished." They grow with the threat landscape and your IT environment. One afternoon for the GPO configuration, one day for Windows Event Forwarding, and you already have a foundation that will prove invaluable during a security incident.