Research and Teaching Interests
- Data mining and analysis (including data analytics, data science & business intelligence solutions)
- Big data, databases (including image databases), data management, and data warehousing
- Data visualization and visual analytics
- Health informatics and electronic health
- Web technology and services, as well as social computing & social network analysis
Data mining refers to the search for previously unknown patterns and relationships that might be embedded in stored data.Most of the existing data mining algorithms treat the mining process as an impenetrable black-box, where users are not allowed to express their focus (user-specified constraints). As a result, these unconstrained mining algorithms can yield numerous patterns that do not make sense (e.g., “customers who buy diapers also buy beer”) or that are not interesting to users. To this end, we are developing a human-centered exploratory mining algorithm that (i) enables human analysts/users to impose constraints to focus the search, and (ii) avoids irrelevant and time-consuming computation. Such an algorithm shows an excellent division of labor, where the computer carries out the mechanical aspect of the work (e.g., the counting and searching) and the human performs the intelligent aspect of the work (e.g., the abstract thinking and observation).
It is understood that data mining is supposed to be an iterative and exploratory process. Hence, we not only allow users to impose certain constraints on the mining process, but also allow users to change these constraints dynamically in the middle of the computation. Towards the development of a practical environment of this human-centered exploratory mining algorithm, we are developing techniques to support dynamic mining. To enhance the performance of our dynamic mining algorithm, we have proposed the following novel structures: (i) the segment support map to facilitate scalable mining, and (ii) the OSSM to optimize frequency counting.
With respect to my research interest on image databases, the motivation is as follows. As the number of on-line digital images has increased rapidly, the development of efficient and effective retrieval of images is necessary. Many existing image database systems support whole-image queries, which require users to specify the contents of the whole images to be retrieved. However, users may only remember or care about some, but not all, portions of the images (i.e., subimages) they have seen before. Techniques for handling subimage queries of arbitrary size are therefore in demand. Unfortunately, not many image database management systems can handle these subimage queries. Among the systems that can deal with subimage queries of arbitrary size, multiscale similarity matching is rarely used. To this end, we developed techniques based on multiscale similarity matching to handle subimage queries of arbitrary size, and applied the techniques in large image databases.
- Carson Kai-Sang Leung, Mark Anthony F. Mateo, and Dale A. Brajczuk. A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data. In Proceedings of 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-08), Takashi Washio et al. (Editors), pages 653-661. Osaka, Japan, May 2008.
- Carson Kai-Sang Leung, Quamrul I. Khan, and Tariqul Hoque. CanTree: A Tree Structure for Efficient Incremental Mining of Frequent Patterns. In Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM-05), Jiawei Han, Benjamin W. Wah, Vijay Raghavan, Xindong Wu, and Rajeev Rastogi (Editors), pages 274-281. Houston, TX, USA, November 2005.
An extension appears in Knowledge and Information Systems: An International Journal (KAIS), Volume 11, Issue 3, pages 287-312, April 2007.
- Carson Kai-Sang Leung and Wookey Lee. Efficient Update of Data Warehouse Views with Generalised Referential Integrity Differential Files. In Proceedings of the 23rd British National Conference on Databases (BNCOD 23), David Bell and Jun Hong (Editors), pages 199-211. Belfast, Northern Ireland, UK, July 2006.
- Laks V.S. Lakshmanan, Carson Kai-Sang Leung, and Raymond T. Ng. Efficient Dynamic Mining of Constrained Frequent Sets. ACM Transactions on Database Systems (TODS), Volume 28, Issue 4, pages 337-389, December 2003.
- Carson Kai-Sang Leung, Raymond T. Ng, and Heikki Mannila. OSSM: A Segmentation Approach to Optimize Frequency Counting. In Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), Rakesh Agrawal, Klaus Dittrich, and Anne H.H. Ngu (Editors), pages 583-592. San Jose, CA, USA, February/March 2002.
- Carson Kai-Sang Leung. Data Mining in SQL. Research Report, IBM Centre for Advanced Studies (Toronto) & The University of British Columbia, August 2000.
- Carson Kai-Sang Leung. Evaluation of Data Mining Opportunities at Workers’ Compensation Board. Research Report, Workers’ Compensation Board of British Columbia & The University of British Columbia, November 1998.
- Kai-Sang Leung and Raymond Ng. Multiscale Similarity Matching for Subimage Queries of Arbitrary Size. In Visual Database Systems 4, Yannis Ioannidis and Wolfgang Klas (Editors), Chapter 21. London, UK: Chapman & Hall, 1998.
The University of British Columbia: www.ubc.ca
UBC Department of Computer Science: www.cs.ubc.ca
UBC Database Systems Laboratory: www.cs.ubc.ca/nest/dbsl/dbsl.html
Personal Web Page at UBC: www.cs.ubc.ca/~kleung