Platform Reliability Engineer
Company: Hearst Communications, Inc.
Location: Dallas
Posted on: January 26, 2025
Job Description:
Homecare Homebase, a subsidiary of Hearst Corporation, is a
market leader in healthcare software development providing mobile
cloud-based solutions for clinical, operational, and financial
improvement of homecare and hospice agencies throughout the United
States. Our software enables real-time solutions for wireless
information exchange and communication between office staff, field
staff, and physicians.Our success is fueled by our talented
technology teams that are driven by their passion to make a
difference in patient care. Our employees work in a culture that is
guided by values of caring, action, respect, excellence, and smile
(a positive attitude). If you want to work in a role where your
skills have a direct influence on patient care, Homecare Homebase
is the next step in your career. We are hiring technologists that
want to make a difference.Reliability EngineerReliability
Engineering (RE) combines software and systems engineering to build
and run large-scale, distributed, fault-tolerant systems. RE
ensures that HCHB's critical externally visible services have
reliability, uptime appropriate to users' needs, and a fast rate of
improvement. Additionally, RE will keep an ever-watchful eye on our
systems' capacity and performance. Much of our engineering and
software development focuses on optimizing existing systems and
infrastructure and eliminating work through automation.On the RE
team, you'll have the opportunity to manage the complex challenges
of scale unique to HCHB, while using your expertise in coding,
complexity analysis, troubleshooting, and large-scale modern system
design.RESPONSIBILITIES
- Practice sustainable incident response and blameless
postmortems.
- Operationalization of services including system testing,
instrumentation, monitoring, capacity model development, training,
and transition to operation teams.
- Write engineering-level documentation and develop operational
excellence standard operating procedures and run books with a bias
towards automation.
- Maintain services once they are live by measuring and
monitoring availability, latency, and overall system health.
- Platform engineering and automation to maintain scale and
reliability of systems.
- Manage deployments of major releases.MINIMUM QUALIFICATIONS
- Bachelor's degree in Computer Science, Engineering, Math, or
related (equivalent experience considered) required.
- 3+ years' experience in a 24x7 production enterprise-class
environment as an SRE or comparable role.
- 1+ year Kubernetes administration/support in a production
environment.
- 1+ year Azure or comparable cloud PaaS, IaaS, and resource
administration/support in a production environment.
- Strong written and verbal interpersonal skills.
- Excellent problem-solving and analytical skills with attention
to detail and driving issues to resolution.
- Experience solving problems via automation using orchestration
platforms such as JAMS, Ansible, Azure Automation, and ServiceNow
Flows.
- Proficient with data tier languages: TSQL and GraphQL.
- Proficient with the following monitoring solutions (multiple
preferred): Splunk, Prometheus/Grafana, Application Insights, Azure
Monitor, and Microsoft SCOM.PREFERRED QUALIFICATIONS
- Academic coursework in Algorithms, Data Structures, Distributed
Systems, Machine Learning, and Information Security.
- 3+ years Windows and Linux administration/support in a
production environment.
- Proficient with networking and troubleshooting (i.e.,
addressing, routing, DNS, load balancing, mesh networking).
- Ability to debug and optimize infrastructure as code pipelines
using Ansible, Terraform, and Azure ARM.
- Proficient with ITSM/ITIL practices such as service management,
change management, incident management, and problem
management.
- Experience designing and developing software oriented towards
systems or network automation.
- Proficient with administration, automation, and orchestration
of large-scale Windows and Linux environments using configuration
management solutions such as DSC and Ansible.
- Experience operating in large SQL databases with complex
business logic.
- Experience with Healthcare industry HIPAA regulations (similar
regulated industry experience considered, i.e., PCI, SOX).
- Experience working in an Agile and/or SAFe
environment.CERTIFICATION / TRAINING
- Candidates with relevant certifications are preferred,
including but not limited to the following:
- ITIL Foundations
- Configuration: RHCE-Ansible
- Kubernetes: CKA, KCSP
- Linux: RHCE, CompTIA Linux+, GCUX, LPI
- Microsoft: Azure Administrator, Azure DevOps Engineer, MCSEThis
position does not provide sponsorship. All applicants should either
be US Citizens or Permanent Residents eligible to work in the US
without immigration restrictions.#LI-CC1#LI-HybridJob Info
- Job Identification 2023641
- Job Category Technology
- Posting Date 01/22/2025, 04:36 PM
- Job Schedule Full time
#J-18808-Ljbffr
Keywords: Hearst Communications, Inc., North Richland Hills , Platform Reliability Engineer, Engineering , Dallas, Texas
Didn't find what you're looking for? Search again!
Loading more jobs...