Hi! Based on feedback from students in Fall 2021, we are making some large structural changes to the course. This is still in development so we'll try and update things here soon but:
- More low-level guidance/exercises on specific techniques: ✅
- More discussion of pre- and non-neural techniques (implementation and where to use)
- Specific tracks for more hardware (e.g. robot car) vs theoretical projects
- Increased lectures and discussion time in leiu of in-class project time: ✅
If you have any questions, please don't hesitate to ask. We'll hopefully have more to share soon
On-Device Machine Learning is a project-based course covering how to build, train, and deploy models that can run on low-power devices (e.g. smart phones, refrigerators, and mobile robots). The course will cover advances topics on distillation, quantization, weight imprinting, power calculation and more. Every week we will discuss a new research paper and area in this space one day, and have a lab working-group the second. Specifically, students will be provided with low-power compute hardware (e.g. SBCs and inference accelerators) in addition to sensors (e.g. microphones, cameras, and robotics) for their course project. The project will involve three components for building low-power multimodal models:
(1) inference
(2) performing training/updates for interactive ML, and
(3) maximizing power.
The more that can be performed on device, the more privacy preserving and mobile the solution is.
(1) inference
(2) performing training/updates for interactive ML, and
(3) maximizing power.
The more that can be performed on device, the more privacy preserving and mobile the solution is.
For each stage of the course project, the final model produced will have an mAh "budget" equivalent to one full charge of a smart phone battery (~4 Ah or 2hrs on Jetson Nano, 7hrs on RPi, or 26hrs on a RPi Zero W).
- Time & Place: 10:10am - 11:30am Tu/Th in POS 151
- GitHub template for assignments: https://github.com/strubell/11-767
- Assignments will be submitted and graded using Canvas
Example Industry Motivation "... if the coffee maker with voice recognition was in use for four years, the speech recognition cost for chewing on data back in the Mr Coffee datacenter would wipe out the entire revenue stream from that coffee maker, but that same function, if implemented on a device specifically tuned for this very precise job, could be done for under $1 and would not affect the purchase price significantly. " -- Source
Instructors

Yonatan Bisk
Instructor
Emma Strubell
InstructorPolicies and Grading
Grading breakdown coming soon...Submission Policies:
- Submit a link/PDF to Canvas.
- Lab reports are individual while projects will be written up and submitted as a group.
- All deadlines are midnight EST (determined by Canvas submission time).
- Late days: Every team has a budget of 5 late days to be used throughout the semester. They will be automatically calculated based on submit time, after which 2% absolute per day is removed from max grade.
In the event a student tests positive for COVID-19, they will be invited to attend discussion virtually and will be expected to participate as usual. This includes participation points for raising their hands with questions/answers and submission of lab-notebooks. Note, that students who attend class while exhibiting symptoms will be told to leave and join virtually for the protection of all others present.
Accommodations for Students with Disabilities: If you have a disability and have an accommodations letter from the Disability Resources office, we encourage you to discuss your accommodations and needs with the instructors as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, we encourage you to contact them at access@andrew.cmu.edu.
Note to students
Take care of yourself! As a student, you may experience a range of challenges that can interfere with learning, such as strained relationships, increased anxiety, substance use, feeling down, difficulty concentrating and/or lack of motivation. This is normal, and all of us benefit from support during times of struggle. There are many helpful resources available on campus and an important part of a healthy life is learning how to ask for help. Asking for support sooner rather than later is almost always helpful. CMU services are available free to students, and treatment does work. You can learn more about confidential mental health services available on campus through Counseling and Psychological Services (CaPS). Support is always available (24/7) at: 412-268-2922.Take care of your classmates and instructors! In this class, every individual will and must be treated with respect. The ways we are diverse are many and are fundamental to building and maintaining an equitable and an inclusive campus community. These include but are not limited to: race, color, national origin, caste, sex, disability (visible or invisible), age, sexual orientation, gender identity, religion, creed, ancestry, belief, veteran status, or genetic information.
Research shows that greater diversity across individuals leads to greater creativity in the group. We at CMU work to promote diversity, equity and inclusion not only because it is necessary for excellence and innovation, but because it is just. Therefore, while we are imperfect, we ask you all to fully commit to work, both inside and outside of our classrooms to increase our commitment to build and sustain a campus community that embraces these core values. It is the responsibility of each of us to create a safer and more inclusive environment. Incidents of bias or discrimination, whether intentional or unintentional in their occurrence, contribute to creating an unwelcoming environment for individuals and groups at the university. If you experience or observe unfair or hostile treatment on the basis of identity, we encourage you to speak out for justice and offer support in the moment and/or share your experience using the following resources:
- Center for Student Diversity and Inclusion: csdi@andrew.cmu.edu, (412) 268-2150
- Report-It online anonymous reporting platform (user name: tartans; password: plaid); (877) 700-7050
Projects, Hardware, and Resources
The course will be primarily centered on a few multimodal tasks/platforms to facilitate cross-team collaboration and technical assistance. If your team wants to use custom hardware or sensors not listed here -- that's fine, but please reach out so we can discuss it and put think through the implications. Every team will also be provided with one of the following Single Board Computers (SBCs)Example Projects
Input | Output | Task |
---|---|---|
Speech | Text | Real-time Machine Translation |
Images | Text | Object Detection or ASL Finger Spelling |
Images | Robot Arm | Learning from Demonstration |
Speech + Images | Robot Car | Vision-Language Navigation |
![]() |
![]() |
Single Board Computers
SBC | RAM | Notes |
---|---|---|
Raspberry Pi 4 | 8GB | 2Amp draw on moderately powerful processor |
Jetson Nano | 2GB, 4GB | 128-core NVIDIA Maxwell CUDA cores |
- Can we use other platforms? Yes! But let's talk about the pros and cons together.
- What about custom sensors and hardware? Same answer :)
- Will the course pay for a Coral/Pi Zero/...? Yes, but let's chat first.
Resources
- TinyML by Warden and Situnayake 2019
- Getting Started with AI on Jetson Nano
- PyTorch Mobile
- ONNX Mobile
- Example DexArm Rotary Code
Course Schedule
A basic course schedule is presented below. Papers will be chosen jointly by the instructors and the student presenting.Aug 30: Course structure & Background |
Sept 1: Practical Problems, Theory vs Practice |
Sept 6: Machine Learning and Optimization
|
Sept 8: Lab 1: Benchmarking Simple Models |
Sept 13: NLP and Computer Vision | Sept 15: Edge Hardware and Robotics |
Sept 20 Efficiency Benchmarking
|
Sept 22: Paper discussions |
Sept 27: Compression I: Quantization and Pruning |
Sept 29: Lab 2: Quantization |
Oct 4: Compression II: Pruning (cont) and Distillation | Oct 6: Lab 3: Pruning |
Oct 11: Neural Architecture Search | Oct 13: Paper discussion |
Oct 18: Fall Break | Oct 20: Fall Break |
Oct 25: Midterm project presentations | Oct 27: Midterm project presentations |
Nov 1: Architecture-specific tricks I: CNNs | Nov 3: Paper discussions |
Nov 8: Architecture-specific tricks II: Transformers | Nov 10: Lab 4: A new hardware platform |
Nov 15: Coming Soon... | Nov 17: Paper discussions |
Nov 22: Efficient Training | Nov 24: No class: Thanksgiving |
Nov 29: Carbon (and water and minerals) and the Future | Dec 1: Lab 5: Benchmarking and Carbon |
Dec 6: Final Presentations | Dec 8: Final Presentations |
Example Readings
- Green AI
- Energy and Policy Considerations for Deep Learning in NLP
- Early Fusion for Goal Directed Robotic Vision
- Knowledge Transfer for Efficient On-device False Trigger Mitigation
- Distilling the Knowledge in a Neural Network
- DistilBERT
- TernaryBERT: Distillation-aware Ultra-low Bit BERT
- MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
- Distilling Large Language Models into Tiny and Effective Students using pQRNN
- SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers
- FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable NAS
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
- High-Performance Large-Scale Image Recognition Without Normalization
- Show Your Work: Improved Reporting of Experimental Results
- Showing Your Work Doesn’t Always Work
- Training Deep Neural Networks with 8-bit Floating Point Numbers
- HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
- Bayesian Bits: Unifying Quantization and Pruning
- Scalable Methods for 8-bit Training of Neural Networks
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
- Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
- VisualBERT: A Simple and Performant Baseline for Vision and Language
- Mapping Navigation Instructions to Cont. Control Actions with Position-Visitation Pred.
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
- Efficient softmax approximation for GPUs
- Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
- Do Transformer Modifications Transfer Across Implementations and Applications?
- A Primer in BERTology: What we know about how BERT works
- Efficient Transformers: A Survey
- Improving Low Compute Language Modeling with In-Domain Embedding Initialisation
- Consistent Accelerated Inference via Confident Adaptive Transformers
- PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
- Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level
- Are Sixteen Heads Really Better Than One?
- Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
- I-BERT: Integer-only BERT Quantization
- Natural Language Processing with Small Feed-Forward Networks
- Recognizing People in Photos Through Private On-Device Machine Learning
- HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
- Once-for-All: Train One Network and Specialize it for Efficient Deployment
- The Hardware Lottery
- Streaming End-to-end Speech Recognition For Mobile Devices
- Pre-Training Transformers as Energy-Based Cloze Models
- Prefix-Tuning: Optimizing Continuous Prompts for Generation
- MLPerf Training Benchmark
- The curious case of developmental BERTology: On sparsity, transfer learning, generalization and the brain
- ZeRO-Offload: Democratizing Billion-Scale Model Training
- MiniVLM: A Smaller and Faster Vision-Language Model
- Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning
- Dataset Distillation
- XOR-Net: An Efficient Computation Pipeline for Binary Neural Network Inference on Edge Devices
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures
- Showing Your Work Doesn’t Always Work
- Evaluating the Energy Efficiency of Deep Convolutional Neural Networks on CPUs and GPUs
- A Study of Non-autoregressive Model for Sequence Generation
- Do Transformer Modifications Transfer Across Implementations and Applications?
- Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing