Prompt templates for IT service runbooks

Here are several prompt templates for IT service runbooks, which you can adapt based on the specific IT service or issue you’re addressing:

Prompt:

Title: IT Service Incident Response: [Service Name] Failure
Scope:
- Define the scope of the service affected (e.g., application, network, database).
Problem Description:
- Describe the issue in detail (e.g., users unable to log in, website down, network congestion).
Affected Users/Systems:
- Identify which users or systems are impacted (e.g., all employees, specific department, external customers).
Immediate Actions to Take:
- Step-by-step instructions for isolating the issue (e.g., checking system logs, verifying service status).
Root Cause Investigation:
- Instructions on how to investigate the root cause of the issue (e.g., reviewing error messages, checking system metrics).
Resolution Steps:
- Provide a detailed set of steps for resolving the issue (e.g., restarting services, applying patches, restoring backups).
Post-Incident Review:
- What to check after resolution (e.g., verify system stability, monitor for recurrence).
Preventive Actions:
- Recommendations to prevent the issue from happening again (e.g., monitoring improvements, configuration changes).

Prompt:

Title: System Maintenance for [System/Service Name]
Scope:
- Define the scope of the maintenance (e.g., server upgrades, software patching).
Preparation:
- What preparations need to be made before starting the maintenance (e.g., notify users, ensure backups are available).
Maintenance Tasks:
- Step-by-step instructions on the maintenance tasks (e.g., apply security patches, upgrade hardware).
Expected Downtime:
- How long will the system be down, if applicable (e.g., 30 minutes, 1 hour)?
Rollback Plan:
- Define how to revert changes if something goes wrong (e.g., restore from backup, roll back patches).
Post-Maintenance Checks:
- What to verify once maintenance is completed (e.g., system performance, availability checks).
Sign-Off:
- Instructions on who needs to approve the completion of the maintenance and confirm system stability.

Prompt:

Title: System Monitoring Alert Response: [Alert Type]
Scope:
- Define the scope of the alert (e.g., CPU usage exceeds 90%, disk space running low).
Alert Details:
- Specifics of the alert (e.g., high memory usage, critical server failure).
Immediate Actions:
- Step-by-step actions to take when an alert is triggered (e.g., check system logs, run diagnostics).
Investigation:
- Guidance on how to investigate the root cause (e.g., checking logs, identifying trends).
Resolution:
- Steps to resolve the issue (e.g., restart the service, optimize resource usage).
Post-Alert Follow-up:
- What to monitor after resolution to ensure the issue is fully resolved (e.g., check metrics over the next 24 hours).
Documentation & Reporting:
- How to document the issue, actions taken, and resolution for future reference (e.g., create an incident report).

Prompt:

Title: Disaster Recovery Plan: [Service/System Name]
Scope:
- Define the systems and services covered by the disaster recovery plan (e.g., critical database servers, file storage systems).
Disaster Trigger:
- List the conditions that trigger the disaster recovery plan (e.g., total system failure, data corruption).
Initial Response:
- Steps to take immediately after the disaster is detected (e.g., assess the scope of the disaster, alert stakeholders).
Recovery Strategy:
- Detailed steps for recovery (e.g., restoring from backups, setting up failover systems).
System Validation:
- What to check after recovery to ensure the system is operational (e.g., verify data integrity, system performance).
Communication Plan:
- How to communicate the recovery status to stakeholders (e.g., email updates, status page updates).
Post-Recovery Actions:
- Steps to review and prevent future disasters (e.g., conduct root cause analysis, improve system monitoring).

Prompt:

Title: Security Incident Response: [Incident Type]
Scope:
- Define the security incident (e.g., data breach, ransomware attack, phishing attempt).
Immediate Containment:
- First steps to take to contain the security threat (e.g., disconnect affected systems from the network, disable compromised accounts).
Investigation:
- How to investigate the extent of the security breach (e.g., check logs, identify entry points).
Mitigation & Eradication:
- Actions to remove the threat and mitigate further risks (e.g., patch vulnerabilities, remove malware).
Recovery:
- Steps for system recovery (e.g., restore data from backups, rebuild affected systems).
Post-Incident Review:
- Analyze the incident to improve future responses (e.g., update incident response procedures, enhance security policies).
Communication:
- How to communicate with stakeholders during and after the incident (e.g., informing affected users, public statements).

Prompt:

Title: Backup and Restore Procedures: [System/Service Name]
Scope:
- Define the systems and services covered by the backup process (e.g., file servers, databases).
Backup Schedule:
- Outline the backup schedule (e.g., daily, weekly, monthly).
Backup Verification:
- Instructions on how to verify the integrity of backups (e.g., test restores, checksum verification).
Restore Process:
- Detailed instructions for restoring data (e.g., from cloud storage, on-prem backup system).
Testing Restoration:
- How to test the restored system to ensure data integrity and functionality (e.g., spot-check files, run application tests).
Post-Restore Actions:
- Steps to perform after a successful restore (e.g., notify users, re-enable affected services).
Documentation:
- How to document the backup and restore process, including any issues encountered.

Prompt:

Title: Patch Management: [System/Service Name]
Scope:
- Define the scope of patch management (e.g., security patches, software updates).
Patch Assessment:
- How to assess patches for relevance and urgency (e.g., security-critical patches, functionality updates).
Testing Patches:
- Steps to test patches in a non-production environment (e.g., deploy in staging, run functional tests).
Deployment:
- Detailed deployment steps for applying patches to production systems (e.g., use deployment tools, manual installation).
Post-Deployment Validation:
- Steps to validate the success of the patch deployment (e.g., check logs, verify service availability).
Rollback Plan:
- Instructions on how to roll back patches if necessary (e.g., revert changes, restore system from backup).
Documentation & Reporting:
- How to document applied patches and any issues encountered (e.g., record patch IDs, issue details).

These templates provide a structured approach for IT teams to follow, ensuring consistency and efficiency in handling common IT tasks and incidents.

Share this Page your favorite way: Click any app below to share.

Check Out Our Newest Posts we wrote about