Laying the GroundWork for Better Systems Monitoring

Founded in 1998, EZ Prints is an Atlanta-based provider of digital image fulfillment technology for retailers, portals, ISPs, digital content owners and professional photo services. About 500 online and offline retailers in the U.S. and Europe, including six of the top 10 online photo sites, use EZ Prints’ technology platform to offer services that allow consumers and businesses to personalize digital content.

Companies and individuals do this by printing photos, creating apparel, and customizing products from mugs to clocks with content-based images such as corporate logos or family pictures. Clients upload digital images, which are then saved in the EZ Prints system. These saved images are tied to the customer’s order, which makes its way to the processing facility to manufacture, package and ship the items, which could range from a single calendar to thousands of mouse pads.

Avoiding Black Friday Blackouts

In addition to the year round buzz, Christmas, Mother’s Day and Father’s Day are the busiest times of the year for EZ Prints. We offer partners Web hosting, storing of images, e-commerce and manufacturing — processing the transactions from order to fulfillment. To keep partners on top of their orders, it is essential that every step involved from order issuance to fulfillment be monitored and managed so the business can run smoothly, without the interruption of outages or sites going down completely.

In order to track and monitor each product at every step, we’ve built a complicated but comprehensive IT system, consisting of more than 120 servers that are split up between two locations. Running on these servers is a 90 percent Windows environment and a 10 percent mix of open source technology including Linux, VMWare, Apache and BIND DNS. To add a further layer of complexity, all of our partner-facing solutions are Windows-based, consisting of Windows 2003, SQL 2005 and IIS.

Even though the EZ Prints IT system is primarily Windows-based, because we are a mixed environment, Microsoft solutions are often incompatible with the Linux portion of our network and thus inappropriate for our needs on a very basic level. Additionally, because of the uniqueness of the EZ Prints sales and production channels, EZ Prints’ software engineers found that most proprietary solutions did not allow us the flexibility and customization options that we needed in order to be able to put custom hooks into the code base, debug and tune it to fit our needs.

More Detail, Fewer Problems

Early in the growth stage of the company, we were using a simple tool for our monitoring purposes. The data provided was enough for EZ Prints system administrators to know if servers were up or down, but there was little information that could be helpful in predicting potential events. While the system was adequate when installed, as the site scaled, traffic increased and the network became more complex, our needs became greater than its capabilities. Although downtime was minimal, there was no way to play offense when traffic spikes threatened to overwhelm the site.

Additionally, there was little detail about what was happening at the application level of the system; although it was far too evident when the network as a whole was experiencing downtime, there was no way to know even when parts of the application layer were failing or stressed unless the issue escalated to a network problem.

EZ Prints needed to be on the offense against any problems that could cause potential downtime. As an e-commerce site, any instance of a customer being unable to complete a sales transaction represented a direct loss of profit for the company. As a white label service provider for our partners, we needed to provide efficient, reliable service in a timely manner at every step of the order processing, manufacturing and shipping processes to allow our partners to maintain the standards of professional excellence expected by their customers.

In order to be able to take proactive action to keep the site up and running and meet the expectations of our clients and partners, more detail about every part of the network was needed. The EZ Prints system has complex requirements for running application checks, and prior to selecting GroundWork Open Source, the IT department reviewed Big Brother, SiteScope, HP OpenView and debated doing a blanket Nagios install.

Because of the wide variety of applications and partner APIs, EZ Prints needed a system with very flexible integration possibilities. The ability for our in house developers to build monitoring ‘hooks’ in our applications was critical. The system also needed to support a wide variety of platforms (Windows/IIS, VMWare, Linux/Apache) and devices.

Deep Data Diving with GroundWork

GroundWork met the feature set and provided the flexibility to hook in applications, which are all custom code, written in-house. The level of granularity, a major consideration, was higher than that of competing products, allowing an individual order to be broken down into over 20 separate steps and monitored at each one. Now, not only does EZ Prints know if all parts of the application layer are working, basic data that was not even previously available, but GroundWork gives incredible detail into how the application is performing, down to how long it is taking to do each individual step within the application.

The information allows EZ Prints to finally be on the offensive and manage their resources to optimize their IT operations.

More to Come

EZ Prints went from having the reputation of responding after the fact to being pro-active and predictive. Now, instead of IT learning of an issue from complaints via customer care, issues are resolved before they are even realized.

The end effect is nothing short of incredible. Uptime has gone from low 90s a year and a half ago to a fourth-quarter rate of 99.999. There are additional contributing factors, but GroundWork gave us the intelligence and insight in the six months prior to Q4 to attack where potential problem areas.

GroundWork’s potential to improve the level of knowledge and understanding of the EZ Prints system is only beginning to be tapped. Now, with the capabilities to write additional custom hooks and get a granular view of the system, EZ Prints anticipates incorporating even greater customization into IT operations with additional application hooks, leading to a more comprehensive view of the overall IT system and network, and a reduction in growing pains as EZ Prints continues to scale healthily.

Dante Martinez is a network engineer with EZ Prints.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

LinuxInsider Channels