A Digital Future for the Hmong Language

Neng Vang, Gilly Guo, Hooi Ling Soh, Annie Xie, Choua Vang and Kong Hanh
Neng Vang, Gilly Guo, Hooi Ling Soh, Annie Xie, Choua Vang and Kong Hanh

The UMN HMong Corpus Showcase

Thursday, May 1, 2025, 5 - 7 p.m.

Liberal Arts Engagement Hub

Join us for a showcase of the UMN HMong Corpus! 

Event Information & Registration

For generations, Hmong people have preserved their history, language, and culture through the stories and memories shared by families and friends. As technology evolves, so too do the ways we communicate and learn. This year, the Liberal Arts Engagement Hub has been a vehicle for the UMN HMong Corpus Project, a movement to create a digital library about the Hmong people, culture, and community, with an emphasis on the Hmong language. Members of the project, including project director Hooi Ling Soh, a linguistics professor at the University of Minnesota, encourage linguistic research on Hmong and provide a way to support language teaching and learning. The database will also be a resource for both people of Hmong heritage and non-Hmong heritage alike.

The UMN HMong Corpus Project

As a database, the UMN HMong Corpus Project’s most important aspect is the focus on the Hmong language; it will allow users to see how the language is used in various contexts. Soh also explains that another aspect of the database will be “a bilingual dictionary where users can search for English translations of Hmong words and phrases and vice versa.”

The corpus will also represent a way for the Hmong community to become closer and connect, allowing community members to share stories and social media posts, keeping and preserving them for the public. 

Soh shares that the corpus “will contain materials of different discourse types and formality, including immigrant stories, government web pages, social media posts, and academic texts,” while “a subset of the Hmong texts will have word-for-word translations, selected grammatical information, and sentence translations in English.”

Why the capital M?

The HMong language is often spelt as "Hmong," "Hmoob," or "Moob." The Corpus uses "HMong" to refer to the language to be inclusive of the HMong dialects spoken in the U.S.

Creating with Community

Students play a vital role in the project, helping to identify potential content by “obtaining permission from content creators/copyright holders to include the selected materials, and scanning and digitizing Hmong texts.” Each new post, medium, and story allows the group to create translations that let people access local voices, translated into both Hmong and English.

Similarly, community members will connect with individuals who may be interested in being a part of the corpus creation. “We hope that once the corpus is ready, community members can help publicize the existence of this resource,” shares Soh. 

The UMN HMong Corpus Project is supported by the Hmong Cultural Center (HCC), with Dr. Mark E. Pfeifer, HCC’s Director of Programs, serving as a project consultant. He connects project members with resources from the Hmong Resource Center Library at HCC, providing access to approximately 200 books in the Hmong language. Additionally, the project benefits from the support and collaboration of various departments within CLA.

The Goal of the Project

Project members hope to use technology to make the corpus easily accessible, with the ability to quickly process the data being added. 

“The corpus will have a search function that allows users to discover how Hmong words are used in context,” explains Soh.

Eventually, the corpus will be more than just a tool for looking up Hmong words and how they are used in context—it will also provide users with access to a subset of the collections the corpus has gathered in full. Both of these features will be useful for researchers and language learners.

How You Can Help

The HMong Corpus Project is open to anyone interested in adding to their database. Soh shares, “Various Hmong authors, publishers, and content creators have supported the project by giving the UMN HMong Corpus Project team permission to include selected materials in the corpus.” If you have content to share, please contact project director Hooi Ling Soh: [email protected].

The UMN HMong Corpus project has benefited tremendously from the Liberal Arts Engagement Hub residency program and the support of the Hmong Cultural Center. I am grateful to the project members, Neng Vang, Gilly Guo, Choua Vang, Eric Frank, Annie Xie, Kong Hanh, Mai Al-Khatib, and Sissy Lee, for their commitments and generosity. I would also like to acknowledge the support and assistance from CLA Research and Graduate Program, Liberal Arts Technologies and Innovation Services, CLA Language Center, University of Minnesota Libraries, Office of the General Counsel, Department of American Indian Studies, and the Institute of Linguistics.

Hooi Ling Soh

The Liberal Arts Engagement Hub

The HMong Corpus Project is one of eight Hub Residencies for the 2024-2025 academic year. The Liberal Arts Engagement Hub seeks to facilitate reciprocal and trusting partnerships between humanistic scholars in the arts, humanities, and social sciences and the community to respond to important social challenges. 

This story was written by Peyton Thao, an undergraduate student in CLA.

Share on: