The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Prompt templates for IT service runbooks

Here are several prompt templates for IT service runbooks, which you can adapt based on the specific IT service or issue you’re addressing:

1. General IT Service Incident Response

Prompt:

  • Title: IT Service Incident Response: [Service Name] Failure

  • Scope:

    • Define the scope of the service affected (e.g., application, network, database).

  • Problem Description:

    • Describe the issue in detail (e.g., users unable to log in, website down, network congestion).

  • Affected Users/Systems:

    • Identify which users or systems are impacted (e.g., all employees, specific department, external customers).

  • Immediate Actions to Take:

    • Step-by-step instructions for isolating the issue (e.g., checking system logs, verifying service status).

  • Root Cause Investigation:

    • Instructions on how to investigate the root cause of the issue (e.g., reviewing error messages, checking system metrics).

  • Resolution Steps:

    • Provide a detailed set of steps for resolving the issue (e.g., restarting services, applying patches, restoring backups).

  • Post-Incident Review:

    • What to check after resolution (e.g., verify system stability, monitor for recurrence).

  • Preventive Actions:

    • Recommendations to prevent the issue from happening again (e.g., monitoring improvements, configuration changes).


2. System Maintenance Runbook

Prompt:

  • Title: System Maintenance for [System/Service Name]

  • Scope:

    • Define the scope of the maintenance (e.g., server upgrades, software patching).

  • Preparation:

    • What preparations need to be made before starting the maintenance (e.g., notify users, ensure backups are available).

  • Maintenance Tasks:

    • Step-by-step instructions on the maintenance tasks (e.g., apply security patches, upgrade hardware).

  • Expected Downtime:

    • How long will the system be down, if applicable (e.g., 30 minutes, 1 hour)?

  • Rollback Plan:

    • Define how to revert changes if something goes wrong (e.g., restore from backup, roll back patches).

  • Post-Maintenance Checks:

    • What to verify once maintenance is completed (e.g., system performance, availability checks).

  • Sign-Off:

    • Instructions on who needs to approve the completion of the maintenance and confirm system stability.


3. System Monitoring & Alerting Response

Prompt:

  • Title: System Monitoring Alert Response: [Alert Type]

  • Scope:

    • Define the scope of the alert (e.g., CPU usage exceeds 90%, disk space running low).

  • Alert Details:

    • Specifics of the alert (e.g., high memory usage, critical server failure).

  • Immediate Actions:

    • Step-by-step actions to take when an alert is triggered (e.g., check system logs, run diagnostics).

  • Investigation:

    • Guidance on how to investigate the root cause (e.g., checking logs, identifying trends).

  • Resolution:

    • Steps to resolve the issue (e.g., restart the service, optimize resource usage).

  • Post-Alert Follow-up:

    • What to monitor after resolution to ensure the issue is fully resolved (e.g., check metrics over the next 24 hours).

  • Documentation & Reporting:

    • How to document the issue, actions taken, and resolution for future reference (e.g., create an incident report).


4. Disaster Recovery Runbook

Prompt:

  • Title: Disaster Recovery Plan: [Service/System Name]

  • Scope:

    • Define the systems and services covered by the disaster recovery plan (e.g., critical database servers, file storage systems).

  • Disaster Trigger:

    • List the conditions that trigger the disaster recovery plan (e.g., total system failure, data corruption).

  • Initial Response:

    • Steps to take immediately after the disaster is detected (e.g., assess the scope of the disaster, alert stakeholders).

  • Recovery Strategy:

    • Detailed steps for recovery (e.g., restoring from backups, setting up failover systems).

  • System Validation:

    • What to check after recovery to ensure the system is operational (e.g., verify data integrity, system performance).

  • Communication Plan:

    • How to communicate the recovery status to stakeholders (e.g., email updates, status page updates).

  • Post-Recovery Actions:

    • Steps to review and prevent future disasters (e.g., conduct root cause analysis, improve system monitoring).


5. Security Incident Response

Prompt:

  • Title: Security Incident Response: [Incident Type]

  • Scope:

    • Define the security incident (e.g., data breach, ransomware attack, phishing attempt).

  • Immediate Containment:

    • First steps to take to contain the security threat (e.g., disconnect affected systems from the network, disable compromised accounts).

  • Investigation:

    • How to investigate the extent of the security breach (e.g., check logs, identify entry points).

  • Mitigation & Eradication:

    • Actions to remove the threat and mitigate further risks (e.g., patch vulnerabilities, remove malware).

  • Recovery:

    • Steps for system recovery (e.g., restore data from backups, rebuild affected systems).

  • Post-Incident Review:

    • Analyze the incident to improve future responses (e.g., update incident response procedures, enhance security policies).

  • Communication:

    • How to communicate with stakeholders during and after the incident (e.g., informing affected users, public statements).


6. Backup and Restore Runbook

Prompt:

  • Title: Backup and Restore Procedures: [System/Service Name]

  • Scope:

    • Define the systems and services covered by the backup process (e.g., file servers, databases).

  • Backup Schedule:

    • Outline the backup schedule (e.g., daily, weekly, monthly).

  • Backup Verification:

    • Instructions on how to verify the integrity of backups (e.g., test restores, checksum verification).

  • Restore Process:

    • Detailed instructions for restoring data (e.g., from cloud storage, on-prem backup system).

  • Testing Restoration:

    • How to test the restored system to ensure data integrity and functionality (e.g., spot-check files, run application tests).

  • Post-Restore Actions:

    • Steps to perform after a successful restore (e.g., notify users, re-enable affected services).

  • Documentation:

    • How to document the backup and restore process, including any issues encountered.


7. Patch Management Runbook

Prompt:

  • Title: Patch Management: [System/Service Name]

  • Scope:

    • Define the scope of patch management (e.g., security patches, software updates).

  • Patch Assessment:

    • How to assess patches for relevance and urgency (e.g., security-critical patches, functionality updates).

  • Testing Patches:

    • Steps to test patches in a non-production environment (e.g., deploy in staging, run functional tests).

  • Deployment:

    • Detailed deployment steps for applying patches to production systems (e.g., use deployment tools, manual installation).

  • Post-Deployment Validation:

    • Steps to validate the success of the patch deployment (e.g., check logs, verify service availability).

  • Rollback Plan:

    • Instructions on how to roll back patches if necessary (e.g., revert changes, restore system from backup).

  • Documentation & Reporting:

    • How to document applied patches and any issues encountered (e.g., record patch IDs, issue details).

These templates provide a structured approach for IT teams to follow, ensuring consistency and efficiency in handling common IT tasks and incidents.

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About