Hongzhi Chen [陈宏智]
About Me
I am now working on a stealth startup. I was a Staff Software Engineer at ByteDance Infra Department, where working on distributed Graph Database, Graph AI and Graph Processing systems.
My research interests cover the broad area of distributed systems and databases, with special emphasis on Graph Database, Graph Neural Network system and Relational Deep Learning.
I obtained my Ph.D. degree from the Department of Computer Science and Engineering at CUHK, supervised by Prof. James Cheng. I received my B.E. degree from a Honor Class with 1st-rank, Qingming College, Department of Computer Science and Technology at HUST. I worked for one year as a research intern at Microsoft Research Asia during my undergraduate with the Honor "Stars of Tomorrow". I visited NetDB Lab, the Department of Computer and Information Science (CIS) at University of Pennsylvania supervised by Prof. Boon Thau Loo during my Ph.D.
Hongzhi's Curriculum Vitae (updated in 2022)
Email: chzyaobaiwei [at] gmail [dot] com
Experience
- 08/2022 - present: Staff Software Engineer at ByteDance, Infra Department
- 07/2020 - 08/2022: Senior Software Engineer at ByteDance, Infra Department
- 02/2019 - 08/2019: Research Intern at HUAWEI, 2012 Lab, Parallel and Distributed Computing Laboratory, MindSpore Team.
- 05/2017 - 08/2017: Visiting Scholar at NetDB Lab, University of Pennsylvania, Supervisor: Prof. Boon Thau Loo.
- 09/2015 - 07/2019: Research Assistant at HDL Lab, The Chinese University of Hong Kong, Supervisor: Prof. James Cheng.
- 06/2014 - 05/2015: Research Intern at Software Analytics Group, Microsoft Research Asia, Supervisor: Qingwei Lin (Leader Researcher) and Dr. Jianguang Lou (Principle Researcher).
Publications
-
G-Tran: A High Performance Distributed Graph Database with a Decentralized Architecture.
Hongzhi Chen, Changji Li, Chenguang Zheng, Chenghuan Huang, Juncheng Fang, James Cheng, Jian Zhang.
In Proceedings of the 48th International Conference on Very Large Data Bases. (VLDB'22) -
ByteGraph: A High-Performance Distributed Graph Database in ByteDance.
Changji Li, Hongzhi Chen*, Shuai Zhang, Yingqian Hu, Chao Chen, Zhenjie Zhang, Meng Li, Xiangcheng Li, Dongqing Han, Xiaohui Chen, Xudong Wang, Huiming Zhu, Xuwei Fu, Tingwei Wu, Hongfei Tan, Hengtian Ding, Mengjing Liu, Kangcheng Wang, Ting Ye, Lei Li, Xin Li, Yu Wang, Chenguang Zheng, Hao Yang, James Cheng.
In Proceedings of the 48th International Conference on Very Large Data Bases. (VLDB'22) -
Colorful h-star Core Decomposition.
Sen Gao, Ronghua Li, Hongchao Qin, Hongzhi Chen, Ye Yuan, Guoren Wang.
In Proceedings of the 38th IEEE International Conference on Data Engineering. (ICDE'22) -
Fast Maximal Clique Enumeration on Uncertain Graphs: A Pivot-based Approach.
Qiangqiang Dai, Ronghua Li, Meihao Liao, Hongzhi Chen, Guoren Wang.
In Proceedings of the 42th ACM International Conference on Management of Data. (SIGMOD'22) -
Lightning Fast and Space Efficient k-clique Counting.
Xiaowei Ye, Ronghua Li, Qiangqiang Dai, Hongzhi Chen, Guoren Wang.
In Proceedings of the 31th International World Wide Web Conferences. (WWW'22) -
ByteGNN: Efficient Graph Neural Network Training at Large Scale.
Chenguang Zheng, Hongzhi Chen*, Yuxuan Cheng, Zhezheng Song, Yifan Wu, Changji Li, James Cheng, Hao Yang, Shuai Zhang.
In Proceedings of the 48th International Conference on Very Large Data Bases. (VLDB'22) -
BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing.
Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, Chuanxiong Guo.
In Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation. (NSDI'23) -
G-Tran: Making Distributed Graph Transactions Fast.
Hongzhi Chen, Changji Li, Chenguang Zheng, Chenghuan Huang, Juncheng Fang, James Cheng, Jian Zhang.
arXiv preprint arXiv:2105.04449, 2021. -
High Performance Distributed OLAP on Property Graphs with Grasper.
Hongzhi Chen, Bowen Wu, Shiyuan Deng, Chenghuan Huang, Changji Li, Yichao Li, James Cheng.
In Proceedings of the 2020 ACM International Conference on Management of Data. (SIGMOD'20) -
Measuring and Improving the Use of Graph Information in Graph Neural Networks.
Yifan Hou, Jian Zhang, James Cheng, Kaili Ma, Richard T. B. Ma, Hongzhi Chen, Ming-Chang Yang.
In Proceedings of the 2020 International Conference on Learning Representations. (ICLR'20) -
Grasper: A High Performance Distributed System for OLAP on Property Graphs.
Hongzhi Chen, Changji Li, Juncheng Fang, Chenghuan Huang, James Cheng, Jian Zhang, Yifan Hou, Xiao Yan.
In Proceedings of the 2019 ACM Symposium on Cloud Computing. (SoCC'19) -
A Representation Learning Framework for Property Graphs.
Yifan Hou, Hongzhi Chen, Changji Li, James Cheng, Ming-Chang Yang.
In Proceedings of the 25th ACM SIGKDD Conference on Knowledge Discovery and Data mining. (SIGKDD'19) -
Large Scale Graph Mining with G-Miner.
Hongzhi Chen, Xiaoxi Wang, Chenghuan Huang, Juncheng Fang, Yifan Hou, Changji Li, James Cheng.
In Proceedings of the 2019 ACM International Conference on Management of Data. (SIGMOD'19) -
Optimizing Declarative Graph Queries at Large Scale.
Qizhen Zhang, Akash Acharya, Hongzhi Chen, Simran Arora, Ang Chen, Vincent Liu and Boon Thau Loo.
In Proceedings of the 2019 ACM International Conference on Management of Data. (SIGMOD'19) -
Scalable De Novo Genome Assembly Using a Pregel-Like Graph-Parallel System.
Guimu Guo, Hongzhi Chen, Da Yan, James Cheng, Jake Chen, Zechen Chong.
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019. (TCBB'19) -
Lightweight Fault Tolerance in Pregel-Like Systems.
Da Yan, James Cheng, Hongzhi Chen, Cheng Long, Purushotham Bangalore.
In Proceedings of the 48th International Conference on Parallel Processing. (ICPP'19) -
G-Miner: An Efficient Task-Oriented Graph Mining System.
Hongzhi Chen, Miao Liu, Yunjian Zhao, Xiao Yan, Da Yan, James Cheng.
In Proceedings of the 2018 European Conference on Computer Systems. (EuroSys’18) -
Scalable De Novo Genome Assembly Using Pregel.
Da Yan, Hongzhi Chen, Zhenkun Cai, James Cheng, Bin Shao.
In Proceedings of the 34th IEEE International Conference on Data Engineering, 2018. (ICDE’18) -
GraphD: Distributed Vertex-Centric Graph Processing Beyond the Memory Limit.
Da Yan, Yuzhen Huang, Miao Liu, Hongzhi Chen, James Cheng, Huanhuan Wu, Chengcui Zhang.
IEEE Transactions on Parallel and Distributed Systems, 2018. (TPDS’18) -
Norm-Ranging LSH for Maximum Inner Product Search.
Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, and James Cheng.
In Proceedings of the 31st Annual Conference on Neural Information Processing Systems. (NIPS'18) -
Architectural Implications on the Performance and Cost of Graph Analytics Systems.
Qizhen Zhang, Hongzhi Chen, Da Yan, James Cheng, Boon Thau Loo, Purushotham Bangalore.
In Proceedings of the 2017 ACM Symposium on Cloud Computing. (SoCC’17) -
G-thinker: Big Graph Mining Made Easier and Faster.
Da Yan, Hongzhi Chen, James Cheng, M. Tamer Ozsu, Qizhen Zhang, John C. S. Lui.
arXiv preprint arXiv:1709.03110, 2017.
[Paper]
[Paper]
[Paper]
[Paper]
[Paper]
[Paper]
[Paper] [Code] [Slides in SoCC'2019]
[Paper]
[Project] [Paper] [Code] [Slides in EuroSys'2018]
[Paper] [Full Version] [Code]
[Paper]
Projects
- G-Tran: An RDMA-enabled distributed in-memory graph database with serializable and snapshot isolation support. We propose a graph-native data store to achieve good data locality and fast data access for transactional updates and queries. G-Tran adopts a decentralized architecture that leverages RDMA to process distributed transactions with the massively parallel processing (MPP) model.
- Grasper: An RDMA-enabled high performance OLAP system over property graphs with good scalability. We propose a novel execution engine, called Expert Model, to provide tailored optimizations with adaptive parallelism control for query operators. Grasper achieves order of magnitudes performance improvements over existing systems (e.g., Titan, JanusGraph, OrientDB, Neo4j).
- PGE: A representation learning framework for property graph embedding. The key idea of PGE is a three-step framework to leverage both the topology and property information into Graph Neural Networks for a better node embedding result.
- G-Miner: A distributed graph processing system aimed at general graph mining problems, which have intensive local computation inside a subgraph. It adopts a graph-centric paradigm to overcome the challenges arising from the property of mining problems. We model each subgraph processing into a task and design a task-based pipeline to improve the parallelism between computation and communication. A dynamic task stealing mechanism as well as an efficient cache strategy is also proposed to further speed up the task processing.
- G-thinker: Real applications, such as graph matching and community detection, often require computation intensive graph analytics, which cannot be represented by vertex-centric algorithm for efficient execution. G-thinker dopts a novel subgraph-centric unified framework that is natural for subgraph finding.
- PPA-assembler: A scalable toolkit for de novo genome assembly based on Pregel. It provides a set of key operations in genome assembly, which are implemented by practical Pregel algorithms (PPAs) with strong performance guarantees.
- FPMS: A novel and general distributed framework to mine frequent patterns, including frequent itemsets/sequences/graphs.
- RANGE-LSH: Using maximum inner product for similarity search, which significantly outperforms SIMPLE-LSH, and RANGE-LSH is robust to the shape of 2-norm distribution and different partitioning methods.
- GraphD: A system offers out-of-core support for processing very big graphs in a small cluster of commodity PCs, with performance comparable with state-of-the-art distributed in-memory graph systems.
- LWCP: A fault tolerance mechanism for Pregel-like systems with performance tens of times faster than conventional checkpointing mechanisms.
- Pregel+: An open-source Pregel implementation with optimizations to reduce communication cost and eliminate skewness in communication. [WWW'15, PVLDB'14, PVLDB'15]
- GraphRex: An efficient framework for graph processing on datacenter infrastructure. The key technical contribution of GraphRex is the identification and optimization of a set of global operators whose efficient implementation is crucial to the good performance of large, datacenter-based graph analysis.
- Service-Intelligence: A distributed log mining tool based on Microsoft Cosmos, aiming to do Text Clustering and Anomaly Detection on large streaming data. [ICSE'16]
- iDice: An efficient algorithm for Emerging Issues Finding on mutli-dimension data. [ICSE'16]
- Service-Insider: A data mining algorithm package in EXCEL for Frequent Pattern Mining, Event Clustering, Association Rule Mining, Anomaly Detection, Mutil-Dimension Change Detection.
- In4: A distributed OLAP system based on Actor model, supporting online data analysis and lightweight data mining. [SIGKDD'18]
My works in CUHK, BigGraph Team.
My work in University of Pennsylvania, Dept of Computer and Information Science.
My works in MSRA, Software Analytics Group.
Awards and Honors
- SoCC 2019 Travel Award, 2019.11
- SIGMOD 2019 Travel Award, 2019.5
- EuroSys 2018 Travel Award, 2018.4
- CUHK Postgraduate Studentship, 2016 - 2020
- The original winner of Hong Kong PhD Fellowship, 2015.6
- "Stars of Tomorrow" in Microsoft Research Asia (Only 15% research interns won), 2015.6
- Outstanding Graduates, HUST, 2015.6
- CCF (China Computer Federation) National Top-100 Outstanding Undergraduates, 2014.10
- Academic Excellence Scholarship, HUST, 2014.9
- Merit Undergraduate, HUST, 2014.9
- National Undergraduate Scholarship, HUST, 2013.9
- Merit Undergraduate, HUST, 2013.9
- Most Outstanding Undergraduate, HUST, 2012.9
- Academic Excellence Scholarship, HUST, 2012.9
Teaching
- Teaching Assistant (CUHK)
- [Spring 2020] Advanced Topics in Database Systems (CSCI5120)
- [Spring 2018] Hands-on Introduction to C++ (CSCI1020)
- [Fall 2017] Problem Solving By Programming (ENGG1110)
- [Spring 2017] Problem Solving By Programming (ENGG1110)
- [Fall 2016] Problem Solving By Programming (ENGG1110)
- [Spring 2020] Advanced Topics in Database Systems (CSCI5120)
Professional Activities
- Participation & Talks
- ACM SIGMOD/PODS International Conference on Management of Data, in the form of virtual online meeting, 2020
- ACM Symposium on Cloud Computing, Santa Cruz, California, U.S.A., 2019
- ACM SIGMOD/PODS International Conference on Management of Data, Amsterdam, Netherlands, 2019
- European Conference on Computer Systems, Porto, Portugal, 2018
- China National Computer Congress, Zhengzhou, China, 2015
- External Reviewer
- 2020: SIGMOD
- 2019: SIGMOD
- 2018: VLDB, ICDE
- 2017: VLDB, ICDE, CCGRID, BigData
- 2016: VLDB, KDD, SOCC, ICDM, DASFAA, BigData, APWeb