The UMN HMong Corpus Project: Building a Free Searchable Web-based Annotated Corpus of the HMong Language

The UMN HMong Corpus project is building an open-source, web-based resource in the form of a searchable, annotated HMong language corpus (database) to support linguistic research and the teaching and learning of the HMong language. The corpus will support efforts to describe and analyze the HMong language and provide valuable materials that language teachers and learners can utilize to improve education efforts. As a low-resource language lacking data for machine learning, the corpus will also provide valuable data that AI can use to build technology resources for HMong.

During the residency period, the UMN HMong Corpus project team will begin a new partnership with the Hmong Cultural Center (HCC). Dr. Mark Pfeifer from the HCC will serve as a project consultant and will connect project members with resources from the Hmong Resource Center Library housed in the HCC, which holds approximately 200 books printed in the HMong language.

Project Partners:

  • Dr. Hooi Ling Soh (Professor, Institute of Linguistics), Project Director
  • Dr. Mark E. Pfeifer (Director of Programs and Development, Hmong Cultural Center), Project Consultant
  • Neng Vang (Individual Community Partner/2024 graduate of Linguistics MA program, Institute of Linguistics), Project Member
  • Choua Vang (Individual Community Partner), Project Member
  • Gilly Guo (2023 graduate of Linguistics and Computer Science undergraduate programs, UMN), Project Member
  • Eric Frank (Linguistics MA student, Institute of Linguistics), Project Member
  • Annie Xie (2024 graduate of the Boston University Linguistics MA program), Project Member
  • Kong Hanh (Linguistics undergraduate student, Institute of Linguistics), Project Member
  • Sissy Lee (Individual Community Partner/2024 graduate of Linguistics undergraduate
    program, Institute of Linguistics), Project Member
  • Dr. Mai Al-Khatib (2023 graduate of Cognitive Science Ph.D. program, Center for Cognitive Sciences), Project Member