We are specialized in design, implementation and integration of
algorithms and software systems in computer vision, machine learning,
signal processing and large-scale data processing. We are also experienced
in system optimization by identifying and removing bottlenecks in space, time and accuracy.
We help our
customers to rapidly evaluate and adopt latest development in these
fields by
customizing open-source software, or by reimplementing
algorithms from scratch.
We speak English, Python and C++, and
deliver our software in modularized code packages, portable executables,
docker containers and AWS services.
When we introduce a new software technology to our customers,
we also help train their existing engineer or new recruit
and work with them closely, so when our job is done there are
people to carry on development and maintanance.
We provide CPT/OPT training opportunities to students who wants to
pursue a career in software engineer or data science, or to
apply latest deep-learning technologies to their field of study.
We are now offering a one-day hands-on beginner Tensorflow training program that
covers image annotation and basic model training (see codebase).
We have advised multiple startup companies on design of technology and product roadmaps, design of system architectures, selection of platforms and toolchains, etc. We identify and interview candidates for our customers and help them to build their engineering teams.
We maintain close relationship with academia and are involved in leading research in machine learning and its applications. Our current academic clients/collaborators include universities,particularly historically black colleges and universities.
As both GPUs per system and TFLOPs per GPU grow rapidly, how to efficiently preprocess and stream training data to keep the GPUs busy is becoming an increasingly challenging problem. We developed PicPac, a C++ library to efficiently manage and stream massive amount of training data. PicPac fully utilizes the high IOPS of SSD/NVME to support out-of-core random shuffling and stratified sampling, and implements a plug-in framework of data transformation and augmentation to support various training tasks. PicPac's python API is easy to use and is compatible with Tensorflow, PyTorch, MxNet and Caffe.
We are experienced in deep-learning with DICOM medical images, both 2D and 3D. We have developed deep-learning models to detect and segment lung cancer, breast cancer, multiple-myeloma and other lesions. Our solutions based on PicPac and have ranked high in multiple competitions. See our demo of carotid artery plaque segmentation and 3D reconstruction.
We developed KGraph,
one of today's fastest libraries for approximate nearest neighbor search (benchmark), and Donkey, a NoSQL feature vector database and toolkit
for developing nearest neighbor search engines. Donkey supports KGraph and Locality Sensitive Hashing for indexing and supports HTTP/Restful API.
Leveraging KGraph, Donkey and
latest deep-learning models for feature extraction, we have helped our
client in UK implement a
content-based image search engine that indexes tens of millions of images with
a single server.
A2Genomics is our cloud platform for high-throughput sequencing data analysis and pattern discovery. Our pipeline efficiently processes massivie NGS datasets, run multiple algorithms including PCA, SVD, DESeq, k-means, SOM and WGCNA, and generates publication quality visualizations.
We have helped a leading Chinese internet radio app with 70+ million users design and implement a recommendation system that minds user behavior and making online personalized recommendations.
We have helped our client in China develop audio fingerprinting algorithms and implement a system that indexes millions of hours of radio broadcast audio covering 100+ cities. The system provides online search-by-example service and automatically discovers repetitive audio clips for new advertisements monitoring.
We have been training deep-learning models for our customers since
2015 and the techniques we use have gone through many iterations,
from Caffe to Tensorflow and PyTorch and from FCN to U-Net and Mask R-CNN.
Most of our tasks involve semantic segmentation of various data, e.g.
ECG signals, CT/MRI volumes and video clips.
The following video demonstrates our semantic segmentation
capability.