Kim Byeong-guk, a server infrastructure engineer of 18 years, has contributed to NAVER’s small and big achievements by building a sturdy foundation for NAVER’s fundamental technologies. NAVER’s services ran smoothly even during emergency situations, such as when the fire broke out at the data center in 2022. Also, NAVER was able to nimbly build a hyperscale super-computing cluster when AI technologies started emerging. NAVER owes this all to server infrastructure technology that had already been long accumulated within the company. “Server engineers are like doctors that take care of the lives of technologies.” NAVER’s services and technologies are kept safe thanks to our server engineers and their commitment to keeping our servers secure.
Hello, my name is Kim Byeong-guk of the Server Infrastructure Engineering team at NAVER CLOUD. I’ve worked as a server engineer for 18 years, 16 of which I have spent at NAVER. I joined NAVER in 2008 – when NAVER was showing explosive growth. NAVER was starting to gain brand awareness with the TV ad starring actor Jeon Ji-hyun wearing the hat with wings, as NAVER was not yet a familiar name to the public. But then soon, I witnessed NAVER disrupt the IT industry with services like search, Knowledge iN, and Café, and I thought, “NAVER has market potential.” I applied to NAVER looking forward to growing as an engineer.
I also worked with servers at my previous job but after I joined NAVER, I was given technical tasks such as conducting technological research, which allowed me to deep dive into technical aspects of my job as well as managing servers like I had always done. I think I was able to grow as an engineer as the scope of my work grew wider. Now, I am responsible for architecting infrastructure systems and optimizing their performance so that our services can run stably. I am especially focusing on creating a company-wide standard for x86 servers and solving various issues that arise from operating them.
*x86 server: A type of server used throughout the IT industry known for its versatility. It operates on CPUs manufactured by Intel and AMD, and its operating systems are either Mac or Windows.
As servers are the foundation for technologies, server infrastructure needs to be secured first to make new technological attempts. To that end, server engineers perform the role of designing, building, operating, and maintaining servers. We design architectures that optimize server performance, and in case of incidents, we diagnose and analyze the problem and perform follow-up measures for its recovery, and we perform tasks that ensure our servers are running efficiently through consistent updates. Ultimately, our goal is to make sure that our servers are running stably.
“New technologies, whatever they may be, require a sturdy server infrastructure. I believe it is the result of our long-accumulated hard work.”
From internal technologies to facilities, our data center is built with NAVER’s in-house technology. I have not yet seen any IDCs (Internet Data Centers) that were built better than GAK Chuncheon or GAK Sejong. People may think that they do it better overseas, but I’ve visited leading data centers overseas, and I’m quite certain that we are doing a much better job.
NAVER operates an extensive number of servers. 100,000 units in GAK Chuncheon, and 600,000 in GAK Sejong, so 700,000 units in total. Because of the sheer number of servers, finding an efficient maintenance method is crucial. To that end, our team oversees the overall automation works needed for operating server hardware infrastructure, and we continue to deliberate on and optimize how better we can monitor and maintain the thousands of servers that we have. Moreover, depending on the purpose of these servers and in what form they will be used, be it for web services, DB, cloud, etc., their architecture design can vary greatly.
Therefore, one of our team’s important missions is to set a company-wide standard architecture so that each server can be built with the optimized specs required for the nature of the NAVER service platform it will be used for.
Server technology is evolving every year, so we continuously research and validate prior technologies. Our team is responsible for making the preparatory work necessary for quickly building a stable system. We are proud to say that we have redefined the necessary technologies to fit the environment that matches NAVER’s standards, and we have maximized its performance so that we can conduct optimized maintenance and management of thousands of systems regardless of their locations, in Korea or overseas.
Moreover, at our data center, we have concentrated infrastructure facility technologies that make up the physical environment of a building or a facility along with server engineering technologies. We have eco-friendly architectural designs with energy preservation in mind, and at GAK Sejong, there are server-managing robots and autonomous driving buses that help our staff get from one place to another within the huge data center. It is a privilege and my pride to be working as a server engineer here with all these resources at hand.
here was a big fire at the Pangyo IDC in 2022. We were aware of it as soon as the fire broke out and we were preparing for response measures, but the fire ended up spreading beyond everyone’s expectations. Towards the end, the situation got so urgent that we ran out of power and we had to shut down all the servers. I remember us having to pull all-nighters in turns until the recovery was complete. Even amidst all that, thankfully, NAVER’s services ran without any problems. Consequently, even during such emergencies, we were able to make recovery in our respective areas without panic because our services were designed so that they would operate without problems even when something were to happen at our IDCs, and also because our management systems were systematic and complete.
We have continuously made technological deliberations on how to guarantee stability, such as dividing our IDCs into several major hubs and dualizing our data. NAVER has made consistent and significant investments in infrastructure, and I think all the effort NAVER has put in so far came together at that moment. It was a moment where our long-accumulated effort shined.
It would have to be the AI project I’m currently working on. It has not been long since people started talking about AI on an everyday basis and it became apparent that companies would go obsolete if they did not embark on AI projects, but NAVER had noticed the importance of AI very early on. That is how we were able to quickly build and support the server side of the project when we finally decided to showcase HyperCLOVA X to the market. We quickly proceeded with this project with the thought in mind that if we missed this window, we might lose our lead in the market.
Creating an LLM (Large Language Model) that forms the basis of HyperCLOVA X also started from that project. As AI became more advanced, the technological capacity of servers also became very important. Through this project, we were able to build a massive GPU (Graphic Processing Unit) that we had never experienced before, and also built a supercomputing cluster for machine learning and during that process, we accumulated a lot of know-how on AI. Put simply, creating a supercomputer means building a system where tens and hundreds of unit systems work organically as one, compared to using one- or two-unit systems in our daily lives. Because supercomputers are composed of resources with the highest performance that are optimized for AI machine learning, the computing power for each of those components is impressive as well. Just like a task can be done much faster when we have more people on it, consequently, when we weave high-performing systems into one to train them to fit our purpose, we can reduce the time it takes to complete that task. The task is completed faster and the quality of the end result is higher because the task is divided up. Even if we scale up individual resources, there are limitations. So we spread them apart horizontally to create a large network, to weave them together as one.
I think in the realm of technology, everything is connected. Whatever the new technology may be, it starts with server infrastructure, and it needs to be connected to the front-end to become one before it is released. Once this infrastructure is reliable and sturdy, AI and its intelligence can advance. So even after the successful launch of HyperClOVA X, NAVER was an appealing player in the AI market as we had been continuously investing our funds and efforts in AI and its infrastructure. Now, NAVER has independent AI capabilities like Sovereign AI and it is in a position where it can boldly make statements on AI sovereignty which goes to show that NAVER is doing very well in this area.
*Sovereign AI: A public cloud computing infrastructure in which a cloud service is supported by building a data center in a certain country or a region for strengthened sovereignty and security
NAVER offers an incredibly wide range of search services along with comprehensive search on the main page, such as image, people, Knowledge iN, shopping, dictionary, Café, and Blogs to name a few, and of course there are many flows that can be detected.
Of course, the news traffic peaks when there is a social issue. After a year, you see a pattern – you can expect what kinds of societal events might happen at a certain period in a year. We observe these patterns from afar, and figure out what kind of support will be needed for which services. When school breaks start, traffic for Knowledge iN peaks, and during national holidays, we can expect more people using the Maps services, and since the holidays are longer this year, we will have even more users using Maps, so we increase the capacity for that service. We prepare for these events in advance.
A lot of what we do is preventing incidents from happening. The best-case scenario would be to be well-prepared so that incidents do not occur. That’s why we keep a close eye on the search trends, to make the necessary preparations on the server end so that services can operate seamlessly and accurately.
Moreover, as we need to increase the number of servers along with an increase in usage, it is a small joy for server engineers to be able to witness and support the growth of our services from the back.
“I go about my job with the mindset that I am a doctor, responsible for the well-being of NAVER’s technologies and services.”
I think having a steadfast mindset is also an important competency of a server engineer. Server operation, as it is a part of infrastructure, does not receive a lot of attention unless a problem arises. We put in a lot of work in building infrastructure including dualizing our systems to prevent issues, but these works usually go unnoticed. Honestly, it is sometimes regrettable that it is difficult to receive credit for the end results of a project than for people in other sectors. However, in the end, NAVER is able to stably deliver its technology and service to millions of users thanks to the infrastructure we built. We take immense pride in that, which helps me stay motivated and committed to my work.
You run into real-time issues when operating servers. Because we operate servers on a large scale, the scope and the depth of the issues we must prepare for also scale up in size. That being said, I think “timeliness” is crucial for an infrastructure engineer. As it is our mission to make sure that our services used by millions run stably, it is essential that we apply the necessary technology where necessary and nimbly respond to problems that arise. Of course, you run into problems with servers whilst operating infrastructure. What is important is to find out the exact cause of the problem, and then to take the necessary measures so that it does not reoccur.
One could say that our job is similar to that of a doctor. We find the diagnosis and the cause of the illness, treat it with the necessary measures within the golden time, and constantly check in to see if the treatments are working, and whether the wound has healed. “See you back in two weeks,” as doctors would say. This cycle of applying treatment and checking progress is similar to the process of operating server infrastructure. I’d like to take it one step further and be a reliable server engineer like a trustworthy doctor that can confidently say “You are all better now” to patients that ask.
Once servers are down, services become useless. I go about my job with a heavy sense of responsibility that what I do makes all of NAVER’s services and technology thrive in the best possible way.
Published Feb. 2025