Hongzhi Chen [陈宏智]

About Me

Hongzhi Chen

I'm a senior R.D. at ByteDance Infrastructure Team, where working on graph-related distributed storage/computing/training systems. I obtained my Ph.D. degree from the Department of Computer Science and Engineering, Chinese University of Hong Kong (CUHK), supervised by Prof. James Cheng in 2020. Before that, I received my B.E. degree from a Honor Class with the 1st-rank in the Department of Computer Science and Technology, Huazhong University of Science and Technology (HUST) in 2015. I worked as a research intern at Software Analytics Group, Microsoft Research Asia for one year [2014.6 - 2015.5] during my undergraduate with the Honor "Stars of Tomorrow" and visited NetDB Lab, University of Pennsylvania supervised by Prof. Boon Thau Loo in 2017.

My general research interests cover the broad area of distributed systems and databases, with special emphasis on distributed graph systems and distributed machine learning/deep learning systems. My current works focus on distributed Graph Neural Network systems, graph databases and RDMA based OLAP/OLTP systems over graphs.

Hongzhi's Curriculum Vitae.

github google scholar linkedin facebook

Email: chenhongzhi [at] bytedance [dot] com

Mail Address: 2nd Floor, Building #1, Zhonghang Square, Haidian, Beijing, China, ByteDance Inc.

Experience

Publications

  • G-Tran: Making Distributed Graph Transactions Fast.
    Hongzhi Chen, Changji Li, Chenguang Zheng, Chenghuan Huang, Juncheng Fang, James Cheng, Jian Zhang.
    arXiv preprint arXiv:2105.04449, 2021.
  • High Performance Distributed OLAP on Property Graphs with Grasper.
    Hongzhi Chen, Bowen Wu, Shiyuan Deng, Chenghuan Huang, Changji Li, Yichao Li, James Cheng.
    In Proceedings of the 2020 ACM International Conference on Management of Data. (SIGMOD'20)
  • [Paper] [Code]

  • Measuring and Improving the Use of Graph Information in Graph Neural Networks.
    Yifan Hou, Jian Zhang, James Cheng, Kaili Ma, Richard T. B. Ma, Hongzhi Chen, Ming-Chang Yang.
    In Proceedings of the 2020 International Conference on Learning Representations. (ICLR'20)
  • [Paper] [Code]

  • Grasper: A High Performance Distributed System for OLAP on Property Graphs.
    Hongzhi Chen, Changji Li, Juncheng Fang, Chenghuan Huang, James Cheng, Jian Zhang, Yifan Hou, Xiao Yan.
    In Proceedings of the 2019 ACM Symposium on Cloud Computing. (SoCC'19)
  • [Paper] [Code] [Slides in SoCC'2019]

  • A Representation Learning Framework for Property Graphs.
    Yifan Hou, Hongzhi Chen, Changji Li, James Cheng, Ming-Chang Yang.
    In Proceedings of the 25th ACM SIGKDD Conference on Knowledge Discovery and Data mining. (SIGKDD'19)
  • [Paper] [Code]

  • Large Scale Graph Mining with G-Miner.
    Hongzhi Chen, Xiaoxi Wang, Chenghuan Huang, Juncheng Fang, Yifan Hou, Changji Li, James Cheng.
    In Proceedings of the 2019 ACM International Conference on Management of Data. (SIGMOD'19)
  • [Paper] [Code]

  • Optimizing Declarative Graph Queries at Large Scale.
    Qizhen Zhang, Akash Acharya, Hongzhi Chen, Simran Arora, Ang Chen, Vincent Liu and Boon Thau Loo.
    In Proceedings of the 2019 ACM International Conference on Management of Data. (SIGMOD'19)
  • [Paper]

  • Scalable De Novo Genome Assembly Using a Pregel-Like Graph-Parallel System.
    Guimu Guo, Hongzhi Chen, Da Yan, James Cheng, Jake Chen, Zechen Chong.
    IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019. (TCBB'19)
  • [Paper] [Code]

  • Lightweight Fault Tolerance in Pregel-Like Systems.
    Da Yan, James Cheng, Hongzhi Chen, Cheng Long, Purushotham Bangalore.
    In Proceedings of the 48th International Conference on Parallel Processing. (ICPP'19)
  • [Paper] [Code]

  • G-Miner: An Efficient Task-Oriented Graph Mining System.
    Hongzhi Chen, Miao Liu, Yunjian Zhao, Xiao Yan, Da Yan, James Cheng.
    In Proceedings of the 2018 European Conference on Computer Systems. (EuroSys’18)
  • [Project] [Paper] [Code] [Slides in EuroSys'2018]

  • Scalable De Novo Genome Assembly Using Pregel.
    Da Yan, Hongzhi Chen, Zhenkun Cai, James Cheng, Bin Shao.
    In Proceedings of the 34th IEEE International Conference on Data Engineering, 2018. (ICDE’18)
  • [Paper] [Full Version] [Code]

  • GraphD: Distributed Vertex-Centric Graph Processing Beyond the Memory Limit.
    Da Yan, Yuzhen Huang, Miao Liu, Hongzhi Chen, James Cheng, Huanhuan Wu, Chengcui Zhang.
    IEEE Transactions on Parallel and Distributed Systems, 2018. (TPDS’18)
  • [Project] [Paper] [Code]

  • Norm-Ranging LSH for Maximum Inner Product Search.
    Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, and James Cheng.
    In Proceedings of the 31st Annual Conference on Neural Information Processing Systems. (NIPS'18)
  • [Paper] [Supp]

  • Architectural Implications on the Performance and Cost of Graph Analytics Systems.
    Qizhen Zhang, Hongzhi Chen, Da Yan, James Cheng, Boon Thau Loo, Purushotham Bangalore.
    In Proceedings of the 2017 ACM Symposium on Cloud Computing. (SoCC’17)
  • [Paper]

  • G-thinker: Big Graph Mining Made Easier and Faster.
    Da Yan, Hongzhi Chen, James Cheng, M. Tamer Ozsu, Qizhen Zhang, John C. S. Lui.
    arXiv preprint arXiv:1709.03110, 2017.
  • [Project] [Paper] [Code]

Projects

    My works in CUHK, BigGraph Team.

  • Grasper: An RDMA-enabled high performance OLAP system over property graphs with good scalability. We propose a novel execution engine, called Expert Model, to provide tailored optimizations with adaptive parallelism control for query operators. Grasper achieves order of magnitudes performance improvements over existing systems (e.g., Titan, JanusGraph, OrientDB, Neo4j).
  • PGE: A representation learning framework for property graph embedding. The key idea of PGE is a three-step framework to leverage both the topology and property information into Graph Neural Networks for a better node embedding result.
  • G-Miner: A distributed graph processing system aimed at general graph mining problems, which have intensive local computation inside a subgraph. It adopts a graph-centric paradigm to overcome the challenges arising from the property of mining problems. We model each subgraph processing into a task and design a task-based pipeline to improve the parallelism between computation and communication. A dynamic task stealing mechanism as well as an efficient cache strategy is also proposed to further speed up the task processing.
  • G-thinker: Real applications, such as graph matching and community detection, often require computation intensive graph analytics, which cannot be represented by vertex-centric algorithm for efficient execution. G-thinker dopts a novel subgraph-centric unified framework that is natural for subgraph finding.
  • PPA-assembler: A scalable toolkit for de novo genome assembly based on Pregel. It provides a set of key operations in genome assembly, which are implemented by practical Pregel algorithms (PPAs) with strong performance guarantees.
  • FPMS: A novel and general distributed framework to mine frequent patterns, including frequent itemsets/sequences/graphs.
  • RANGE-LSH: Using maximum inner product for similarity search, which significantly outperforms SIMPLE-LSH, and RANGE-LSH is robust to the shape of 2-norm distribution and different partitioning methods.
  • GraphD: A system offers out-of-core support for processing very big graphs in a small cluster of commodity PCs, with performance comparable with state-of-the-art distributed in-memory graph systems.
  • LWCP: A fault tolerance mechanism for Pregel-like systems with performance tens of times faster than conventional checkpointing mechanisms.
  • Pregel+: An open-source Pregel implementation with optimizations to reduce communication cost and eliminate skewness in communication. [WWW'15, PVLDB'14, PVLDB'15]

  • My work in University of Pennsylvania, Dept of Computer and Information Science.

  • GraphRex: An efficient framework for graph processing on datacenter infrastructure. The key technical contribution of GraphRex is the identification and optimization of a set of global operators whose efficient implementation is crucial to the good performance of large, datacenter-based graph analysis.

  • My works in MSRA, Software Analytics Group.

  • Service-Intelligence: A distributed log mining tool based on Microsoft Cosmos, aiming to do Text Clustering and Anomaly Detection on large streaming data. [ICSE'16]
  • iDice: An efficient algorithm for Emerging Issues Finding on mutli-dimension data. [ICSE'16]
  • Service-Insider: A data mining algorithm package in EXCEL for Frequent Pattern Mining, Event Clustering, Association Rule Mining, Anomaly Detection, Mutil-Dimension Change Detection.
  • In4: A distributed OLAP system based on Actor model, supporting online data analysis and lightweight data mining. [SIGKDD'18]

Awards and Honors

  • SoCC 2019 Travel Award, 2019.11
  • SIGMOD 2019 Travel Award, 2019.5
  • EuroSys 2018 Travel Award, 2018.4
  • CUHK Postgraduate Studentship, 2016 - 2020
  • The original winner of Hong Kong PhD Fellowship, 2015.6
  • "Stars of Tomorrow" in Microsoft Research Asia (Only 15% research interns won), 2015.6
  • Outstanding Graduates, HUST, 2015.6
  • CCF (China Computer Federation) National Top-100 Outstanding Undergraduates, 2014.10
  • Academic Excellence Scholarship, HUST, 2014.9
  • Merit Undergraduate, HUST, 2014.9
  • National Undergraduate Scholarship, HUST, 2013.9
  • Merit Undergraduate, HUST, 2013.9
  • Most Outstanding Undergraduate, HUST, 2012.9
  • Academic Excellence Scholarship, HUST, 2012.9

Teaching

  • Teaching Assistant (CUHK)
    • [Spring 2020] Advanced Topics in Database Systems (CSCI5120)
    • [Spring 2018] Hands-on Introduction to C++ (CSCI1020)
    • [Fall 2017] Problem Solving By Programming (ENGG1110)
    • [Spring 2017] Problem Solving By Programming (ENGG1110)
    • [Fall 2016] Problem Solving By Programming (ENGG1110)

Professional Activities

  • Participation & Talks
    • ACM SIGMOD/PODS International Conference on Management of Data, in the form of virtual online meeting, 2020
    • ACM Symposium on Cloud Computing, Santa Cruz, California, U.S.A., 2019
    • ACM SIGMOD/PODS International Conference on Management of Data, Amsterdam, Netherlands, 2019
    • European Conference on Computer Systems, Porto, Portugal, 2018
    • China National Computer Congress, Zhengzhou, China, 2015
  • External Reviewer
    • 2020: SIGMOD
    • 2019: SIGMOD
    • 2018: VLDB, ICDE
    • 2017: VLDB, ICDE, CCGRID, BigData
    • 2016: VLDB, KDD, SOCC, ICDM, DASFAA, BigData, APWeb