Dr. Hamidreza Kasaei

My research interest lies at the intersection of machine learning, robotics, and machine vision, particularly in the area of open-ended learning, 3D object perception, grasp affordance detection, and object manipulation. My goal is to achieve a breakthrough in robotics by enabling robots to learn from past experiences incrementally and safely interact with human users. I have been investigating on active perception, where robots use their mobility and manipulation capabilities not only to gain useful perceptual information but also to predict the next-best-view for improving the performance of object detection and manipulation. I have evaluated my works on different robotic platforms, including PR2, robotic arms (UR5e, Franka, Kinova), and humanoid robots. I am leading the Interactive Robot Learning Laboratory (IRL-Lab) focusing on Lifelong Interactive Robot Learning in the areas of 3D Object Perception, Grasp Affordance, and Object Manipulation. Navigate my webpage and IRL-Lab page if you want to know more about my research.

I developed the "Cognitive Robotics" course from scratch in 2019 and improving its content constantly. This course covers a diverse set of topics that focus on addressing the most critical aspects of building a cognitive robotic system. We recently wrote a survey paper about the state of lifelong learning in service robots that has been published in Journal of Intelligent & Robotic Systems (available online here). This survey paper covers all the topics of the cognitive robotics course in a concise and brief manner to help students in easy remembrance and quick revision. I will try to share my knowledge with you throughout the course!

Teaching Assistants

This year, the course has four TAs to help you during the practical sessions: Andrei, Vlad, Andreea, and Klemen. We would be happy to assist you to the best of our abilities, so don't hesitate to ask if you have any questions.

Learning Objectives

After successful completion of this course, students will be able to:

Explain meaning of different concepts often used in the field of 3D object perception, grasping and human robot interaction and their application in robotics.

Explain the main theories of open-ended learning and cognitive robotics.

Exploit deep transfer learning algorithms for open-ended object category learning and recognition.

Implement and experiment several methods for object grasping.

Create a tight coupling between object perception and manipulation and perform experiment using real Kinect data and a simulated Panda robotic arm.

Put hands-on experience working on a research project. It is expected that students will also gain the following research skills: analyzing literature related to a particular topic, critiquing papers, and presentation of research ideas.


Prior knowledge of basic linear algebra is recommended, but not required. For programming throughout the course, we mainly use C++/Python based ROS-melodic. For your final project, you are free to choose MATLAB, Python, or C++ as your coding language.

Class Format

A cognitive robot should process very different types of information in varying time scales. Two different modes of processing, generally labelled as System 1 and System 2, are commonly accepted theories in cognitive psychology. The operations of System 1 (i.e. perception and action) are typically fast, automatic, reactive and intuitive. The operations of System 2 (i.e. semantic) are slow, deliberative and analytic. We recently wrote a survey paper about the State of Lifelong Learning in Service Robots. It covers all the topics of the cognitive robotics course in a concise and brief manner to help students in easy remembrance and quick revision.

This year theme of the course is built on top of two important topics: towards lifelong assistive robotics: a tight coupling between object perception and manipulation and simultaneous multi-view object grasping and recognition in open-ended domains. The course is a combination of lectures, reading sessions and robotic lab sessions. The lectures discuss the fundamentals of topics required to develop a cognitive robotic system mainly with the distinctive characteristics of System 1. During the reading sessions, students present and discuss recent contributions in the fields of object perception and manipulation. See detailed contents, coursework and grading policies below.


Topics include Introduction to Cognitive Robotics, 3D Object Perception, Object Grasping and Manipulation, Planning, Human Robot Interaction, Open-Ended Learning, Deep Transfer Learning, Evaluations, and Application to Assistive Robots.

Essay Assignment and Reading Sessions (15%)

An essay assignment (i.e., two pages IEEE conference format) has been designed to ensure all students will read important papers on 3D object perception, affordance detection and manipulation. All students will get a chance to present their essay.

Robotic Lab Sessions (35%) + Final Project (50%)

Two practical assignments have been devised to provide hands-on experiences for fundamental theories (35%). During the practical sessions, we will use the RACE framework to get more insight into different algorithms. Then, students will work together on a finall project (50%).

List of Topics

The preliminary schedule is provided below and is subject to change.

Session Description Date
1 Introduction to Cognitive Robotics 20/9
2 3D Object Detection and Representation 27/9
3 Open-Ended Object Category Learning and Recognition + Evaluation of object recognition 4/10
4 Reading Day 7/10
5 Object Grasping and Manipulation I 11/10
6 Reading Day 14/10
7 Object Grasping and Manipulation 2 18/10
8 Reading Day 21/10
9 Reading Day 28/10
10 Task/Path/Trajectory Planning and Memory Management 1/11
11 Reading Day 4/11

Coursework and Grading Policies

Throughout the course, students will work partly individual and partly in groups of two on a related research project that they present at the end of the course. The grading policy for this course is based on an essay assignment, practical works and a final project. Students will need to score at least a 5.5 average, and will have to get at least a 5.0 on each component:

1- Essay assignment (15%):

An essay assignment (i.e., two pages IEEE conference format) has been designed to ensure all students will read important papers on 3D object perception, affordance detection and manipulation. All students will get a chance to present their essay throughout the class during the reading days. Each group will presents the content of a (set of) relevant paper(s). The presenters need to send the essay of their reading assignments two dayes before the day of the class. As an example, if you are going to present your work on Wednesday, you need to submit your essay by midnight on Monday.

The grading policy for this assignment is based on a Rubric chart containing the following criteria (available on Nestor):

  • Content/Delivery/Creativity/Organization/Ability to answer questions/Length of presentation/Overall quality

Possible topics for the essay assignment include but not limited to the following:

Topics Suggested Papers
Object Perception - Selecting CNN Features for Online Learning of 3D Objects
- Multi-View Convolutional Neural Networks
- Unseen Object Instance Segmentation for Robotic Environments
- Learning to See before Learning to Act: Visual Pre-training for Manipulation
- Simultaneous Multi-View Object Grasping and Recognition in Open-Ended Domains
- Improving Unimodal Object Recognition with Multimodal Contrastive Learning
- Learning 3D Dynamic Scene Representations for Robot Manipulation
Continual Deep Learning - A survey on Deep Learning Advances on Different 3D Data Representations
- Replay in Deep Learning: Current Approaches and Missing Biological Elements
- Continual Lifelong Learning with Neural Networks:A Review
- A continual learning survey: Defying forgetting in classification tasks
- Lifelong Learning with Dynamically Expandable Networks
- 3D_DEN: Open-ended 3D Object Recognition using Dynamically Expandable Networks
- OrthographicNet: A deep learning approach for 3D object recognition in open-ended domains
Human Robot Interaction - Reactive Human-to-Robot Handovers of Arbitrary Objects
- Objects Affordance Recognition for Natural Language Human-Robot Interaction
- Few-Shot Visual Grounding for Natural Human-Robot Interaction
- OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
- SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds
- Object affordance based multimodal fusion for natural Human-Robotinteraction
- Semantic Scene Manipulation Based on 3D Spatial Object Relations and Language Instructions
- Learning One-Shot Imitation from Humans without Humans
Object Affordance Detection - Learning to Grasp 3D Objects using Deep Residual U-Nets
- Affordance detection of tool parts from geometric features
- Robotics Dexterous Grasping: The Methods Based on Point Cloud and Deep Learning (review)
- AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection
- Object affordance detection with relationship-aware network
- Affordance Detection of Tool Parts from Geometric Features
- One-Shot Object Affordance Detection in the Wild
- Robotic pick-and-place of novel objects in clutter with multiaffordance grasping and cross-domain image matching
Shape Completion for Grasping - Shape Completion Enabled Robotic Grasping*
- Beyond Top-Grasps Through Scene Completion
- Multi-Modal Geometric Learning for Grasping and Manipulation
- RGB-D Local Implicit Function for Depth Completion of Transparent Objects
- 3D Shape Estimation of Transparent Objects for Manipulation
- Task-driven Perception and Manipulation for Constrained Placement with No Shape Prior
- Learning 3D Shape Completion Under Weak Supervision
- Diverse Plausible Shape Completions from Ambiguous Depth Images
Object Grasping and Maniulation - MVGrasp: Real-Time Multi-View 3D Object Grasping in Highly Cluttered Environments
- Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach
- Volumetric Grasping Network: Real-time 6 DOF Grasp Detection in Clutter
- High precision grasp pose detection in dense clutter
- 6-DOF Grasping for Target-driven Object Manipulation in Clutter
- Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics
- PointNet++ Grasping: Learning An End-to-end Spatial Grasp Generation Algorithm from Sparse Point Clouds
- Challenges and Outlook in Robotic Manipulation of Deformable Objects
- Self-Imitation Learning by Planning
- Fast geometry-based computation of grasping points on three-dimensional point clouds
Kinestectich Teaching / Learning from Demonstrations - Learning to Grasp Familiar Objects using Object View Recognition and Template Matching
- Learning of grasp selection based on shape-templates
- Learning robots to grasp by demonstration
- Virtual Kinesthetic Teaching for Bimanual Telemanipulation
- Learning Dexterous Grasping with Object-Centric Visual Affordances
- Robot Grasping in Clutter:Using a Hierarchy of Supervisors for Learning from Demonstrations
- Grasping in the Wild: Learning 6DoF Closed-Loop Grasping From Low-Cost Demonstrations
- Learning Interactively to Resolve Ambiguity in Reference Frame Selection
- Recent Advances in Robot Learning from Demonstration
- A Learning-from-Observation Framework: One-Shot Robot Teaching for Grasp-Manipulation-Release Household Operations
Task-Informed/Oriented Grasping - Learning task-oriented grasping for tool manipulation from simulated self-supervision
- KETO: Learning Keypoint Representations for Tool Manipulation Tool Manipulation
- OmniHang: Learning to Hang Arbitrary Objects
- Learning Task-Oriented Grasping From Human Activity Datasets
- Same Object, Different Grasps: Data and Semantic Knowledge for Task-Oriented Grasping
- Category-based task specific grasping
- Part-based Grasp Planning for Familiar Objects
- Same Object, Different Grasps: Data and Semantic Knowledge for Task-Oriented Grasping
Dual-Arm Collaborative Manipulation - Autonomous Dual-Arm Manipulation of Familiar Objects
- Autonomous Bimanual Functional Regrasping of Novel Object Class Instances
- Grasping Bulky Objects with Two Anthropomorphic Hands
- Learning a Decentralized Multi-arm Motion Planner
- Learning and Generalisation of Primitives Skills Towards Robust Dual-arm Manipulation
- Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from Images
- Learning Geometric Reasoning and Control for Long-Horizon Tasks from Visual Input
- A Novel Pipeline for Bi-manual Handover Task
- Vision-Based Robotic Pushing and Grasping for Stone Sample Collection under Computing Resource Constraints
- Dual arm manipulation - A survey
- Interactive, Collaborative Robots: Challenges and Opportunities
- Fully Autonomous Picking with a Dual-ArmPlatform for Intralogistics
Pile Segmentation and Manipulation - Perceiving, Learning, and Exploiting Object Affordances for Autonomous Pile Manipulation
- Interactive singulation of objects from a pile
- Learning to Singulate Objects using a Push Proposal Network
- Model-Based Grasping of Unknown Objects from a Random Pile
- Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning
- Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching
- A Learning-based Robotic Bin-picking with Flexibly Customizable Grasping Conditions
- Probabilistic segmentation and targeted exploration of objects in cluttered environments
Sim2Real Transfer - Closing the Simulation-to-Reality Gap for Deep Robotic Learning
- Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks
- Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance?
- Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control
- Using Synthetic Data and Deep Networks to Recognize Primitive Shapes for Object Grasping
- Transferring Grasp Configurations using Active Learning and Local Replanning
6D pose estimation - Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd
- Robust 6D Object Pose Estimation in Cluttered Scenes using Semantic Segmentation and Pose Regression Networks
- Recovering 6D Object Pose: A Review and Multi-modal Analysis
- DONet: Learning Category-Level 6D Object Pose and Size Estimation from Depth Observation
- Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge
- Multi-view 6D object pose estimation and camera motion planning using RGBD images
- Multi-view Self-supervised Deep Learning for 6D Pose Estimationin the Amazon Picking Challenge
Visual Servoing - Survey on Visual Servoing for Manipulation
- Vision-driven Compliant Manipulation for Reliable, High-Precision Assembly Tasks
- End-to-End Training of Deep Visuomotor Policies
- Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection
- Dynamic Grasping with Reachability and Motion Awareness
Assistive Robotics - Robots for Humanity Using Assistive Robotics to Empower People with Disabilities
- Adaptive Robot-Assisted Feeding: An Online Learning Framework for Acquiring Previously-Unseen Food Items
- Is More Autonomy Always Better? Exploring Preferences of Users with Mobility Impairments in Robot-assisted Feeding
- Robot-Assisted Feeding: Generalizing Skewering Strategies across Food Items on a Plate
- Transfer depends on Acquisition: Analyzing Manipulation Strategies for Robotic Feeding
- Towards Robotic Feeding: Role of Haptics in Fork-based Food Manipulation
- Sensing Shear Forces During Food Manipulation: Resolving the Trade-Off Between Range and Sensitivity
... ...

2- Practical assignments and report (35%):

Two practical assignments have been devised to provide hands-on experiences for fundamental theories. During the practical sessions, we will use the RACE framework, Gazebo, Rviz, MoveIt, and standard datasets to get more insight into different algorithms. Particularly, students will work on a “clear_table” scenario using the RACE framework, which composes of two projects:

  • Project1: Open-Ended Learning Approaches for 3D Object Recognition.
  • Project2: Coupling between Object Perception and Manipulation: Learning to Grasp Objects in Highly Cluttered Environments

For both projects, students are provided with stub of the code and are expected to complete it (see the details below). During the first and second assignments, students have to think about their final project, in particular, what they want to develop and if possible which comparisons they want to make.

*Note: At the end of each practical assignment a report (i.e., up to four pages IEEE conference format), including all the figures, tables, and references, has to be delivered. These practical assignments prepare students to do the final course project. Please read the detailed description of each project below.

*Extra credit: We will evaluate your object recognition system using an open-ended evaluation protocol. We will add 0.5 point to the final score of the student who achieves the highest performance, 0.35 point to the student who achieves the second place, and 0.20 point to the student who achieves third place. We will compute the performance of your algorithm ourselves (code that does not run will be disqualified from the contest). This reward is designed to encourage you to experiment with different algorithms and hyperparameter settings to obtain the best performance.

The grading policy for this assignment is based on a Rubric chart containing the following criteria (will be available on Nestor):

  • Content/Delivery/Creativity and presentation/Organization/Neatness and attractiveness/Overall quality

Setup the RACE framework and its dependencies

Here you will discover how easy it is to get started with the RACE framework. This tutorial is not meant to be a deep dive into the ROS functionalities or codes surrounding object perception and learning. Here we just refer the students to the installation steps required for each project. If you’re interested in studying the framework in depth, including both (1) discussion of theories and (2) hands-on implementations, check out my Ph.D. thesis.

[Project1] Open-Ended Learning Approaches for 3D Object Recognition

Human beings learn to recognize object categories ceaselessly over time. This ability to refine and extend knowledge from the set of accumulated experiences facilitates the adaptation to new environments. In this assignment, studnets will learn how to create a lifelong object recognition system that can learn about 3D object categories in an open-ended fashion [see the example].

[Project2] Coupling between Object Perception and Manipulation

In this assignment, students will learn about the concept of visual grasping and manipulation. In particular, we developed a simulation environment in the PyBullet, where a UR5e robot perceives the environment using an RGB-D camera (e.g., Kinect); In this assignment, students should program the robot to be able to grasp and manipulate objects in different situations.

3- Final project and report (50%):

For the final project, students will work in groups of four on either a specific research project (up to 25% materials can be borrowed from the previous assignments) or an in-depth literature survey (~20-30 relevant papers, organized by different features, identifying gaps in the state of the art). Finally, each group will write a report with the structure of a scientific conference paper (i.e., 6+n pages IEEE conference format). The report material (including text, figures, tables, acknowledgment, etc.) must fit into 6 pages, while there is no page limit for the appendix/references (n pages) sections. Reports exceeding the (6+n) page limit will not be evaluated. The last two sessions are dedicated to the final project presentations. Each group has to send a short description of the project (up to 2 paragraphs) to Hamidreza Kasaei ( to get feedback about the proposed project. This document should state:

  • Title of project, student names and student numbers,
  • Explain the goal of the to-be-developed system,
  • Explain how will you compare your approach with other state-of-the-art approaches if possible.

Detailed information about the content of the report and the deadline will be posted later. Here is a short list of possible projects. You may want to check out my list of projects and publications (see You should select a topic from the below list and develop/test your idea using the provided PyBullet framework, or come up with another project idea on your own (the topic should be relevant to the content of the course). If you prefer to work on a c++ based framework, please contact Hamidreza Kasaei (course coordinator).

Active Object Recognition Shape Completion for Grasping Pile Segmentation and Manipulation
Affordance Detection 6D pose estimation Active Segmentation of Cluttered Scenes
Grasping and Manipulation Dual-Arm Manipulation Task-Informed/Oriented Grasping
Human-Robot Interaction Deep Transfer Learning Eye-in-Hand and Eye-to-Hand System
Open-Ended learning RL for Manipulation Path Planning in Manipulation

The grading policy for this assignment is based on a Rubric chart containing the following criteria (will be available on Nestor):

  • Content/Delivery/Creativity and presentation/Organization/Neatness and attractiveness/Overall quality


  • Late submissions will lose 25%, 50%, and 100% of the initial mark per day respectively. Therefore, having 0 marks on the third day after the deadline. You may submit your first practical assignment late (up to two days) with no penalty; this will be applied ONLY to your first practical assignment.
  • All the page limits exclude references.
  • Feel free to collaborate on solving the problem but write your code individually. In particular, do not copy code from other students.