Cognitive Robotics

Cognitive robots are expected to be more autonomous and efficiently work in human-centric environments. For these robots, open-ended learning for object perception and grasping is a challenging task due to the high demand for accurate and real-time response under changing and unpredictable environmental conditions. In this course, “open-ended” implies that the set of object categories to be learned is not known in advance, and the training instances are extracted from online experiences of a robot, and become gradually available over time, rather than being completely available at the beginning of the learning process. This way the robot adapts its perception and grasping skills over time to different environments.

This course covers a diverse set of topics that focus on addressing the most critical aspects of building a cognitive robotic system. Students will practice essential research skills including put hands-on experiences, debating, reviewing and critiquing papers, and presenting ideas effectively.

*Remark: The course unit prepares students to do their graduation project if they choose to do it in robotics.

Coordinator

Dr. Hamidreza Kasaei

My research interest lies at the intersection of machine learning, robotics, and machine vision, particularly in the area of open-ended learning, 3D object perception, grasp affordance detection, and object manipulation. My goal is to achieve a breakthrough in robotics by enabling robots to learn from past experiences incrementally and safely interact with human users. I have been investigating on active perception, where robots use their mobility and manipulation capabilities not only to gain useful perceptual information but also to predict the next-best-view for improving the performance of object detection and manipulation. I have evaluated my works on different robotic platforms, including PR2, robotic arms (UR5e, Franka, Kinova), and humanoid robots. I am leading the Interactive Robot Learning Laboratory (IRL-Lab) focusing on Lifelong Interactive Robot Learning in the areas of 3D Object Perception, Grasp Affordance, and Object Manipulation. Navigate my webpage and IRL-Lab page if you want to know more about my research.

I developed the "Cognitive Robotics" course from scratch in 2019 and improving its content constantly. This course covers a diverse set of topics that focus on addressing the most critical aspects of building a cognitive robotic system. We recently wrote a survey paper about the state of lifelong learning in service robots that has been published in Journal of Intelligent & Robotic Systems (available online here). This survey paper covers all the topics of the cognitive robotics course in a concise and brief manner to help students in easy remembrance and quick revision. I will try to share my knowledge with you throughout the course!

Teaching Assistants

This year, the course has five TAs to help you during the practical sessions: Andreea, Vlad and Georgios. We would be happy to assist you to the best of our abilities, so don't hesitate to ask if you have any questions.

Andreea Toca
a.toca@student.rug.nl

Vlad Iftime
v.c.iftime@student.rug.nl

Georgios Tziafas
g.t.tziafas@rug.nl

Learning Objectives

After successful completion of this course, students will be able to:

Explain meaning of different concepts often used in the field of 3D object perception, grasping and human robot interaction and their application in robotics.

Explain the main theories of open-ended learning and cognitive robotics.

Exploit deep transfer learning algorithms for open-ended object category learning and recognition.

Implement and experiment several methods for object grasping.

Create a tight coupling between object perception and manipulation and perform experiment using real Kinect data and a simulated Panda robotic arm.

Put hands-on experience working on a research project. It is expected that students will also gain the following research skills: analyzing literature related to a particular topic, critiquing papers, and presentation of research ideas.

Prerequisites

Prior knowledge of basic linear algebra is recommended, but not required. For programming throughout the course, we mainly use C++/Python based ROS-melodic. For your final project, you are free to choose MATLAB, Python, or C++ as your coding language.

Class Format

A cognitive robot should process very different types of information in varying time scales. Two different modes of processing, generally labelled as System 1 and System 2, are commonly accepted theories in cognitive psychology. The operations of System 1 (i.e. perception and action) are typically fast, automatic, reactive and intuitive. The operations of System 2 (i.e. semantic) are slow, deliberative and analytic. We recently wrote a survey paper about the State of Lifelong Learning in Service Robots. It covers all the topics of the cognitive robotics course in a concise and brief manner to help students in easy remembrance and quick revision.

The course is a combination of lectures, reading sessions and robotic lab sessions. The lectures discuss the fundamentals of topics required to develop a cognitive robotic system mainly with the distinctive characteristics of System 1. During the reading sessions, students present and discuss recent contributions in the fields of object perception and manipulation. See detailed contents, coursework and grading policies below.

Lectures

Topics include Introduction to Cognitive Robotics, 3D Object Perception, Object Grasping and Manipulation, Planning, Human Robot Interaction, Open-Ended Learning, Deep Transfer Learning, Evaluations, and Application to Assistive Robots.

Essay Assignment and Reading Sessions (15%)

Robotic Lab Sessions (35%) + Final Project (50%)

Two practical assignments have been devised to provide hands-on experiences for fundamental theories (35%). During the practical sessions, we will use the RACE framework to get more insight into different algorithms. Then, students will work together on a finall project (50%).

List of Topics

The preliminary schedule is provided below and is subject to change.

Session	Description	Date
1	Introduction to Cognitive Robotics	20/9
2	Object Grasping and Manipulation I	11/10
3	Object Grasping and Manipulation 2	18/10
4	Reading Day	7/10
5	3D Object Detection and Representation	27/9
6	Reading Day	14/10
7	Open-Ended Object Category Learning and Recognition	4/10
8	Reading Day	21/10
9	Natural Human-Robot Interaction	4/10
10	Reading Day	28/10
11	Task/Path/Trajectory Planning	1/11

Coursework and Grading Policies

Throughout the course, students will work partly individual and partly in groups of two on a related research project that they present at the end of the course. The grading policy for this course is based on an essay assignment, practical works and a final project. Students will need to score at least a 6.0 average, and will have to get at least a 5.5 on each component:

1- Essay assignment (15%):

An essay assignment (i.e., two pages IEEE conference format) has been designed to ensure all students will read important papers on 3D object perception, affordance detection and manipulation. All students will get a chance to present their essay throughout the class during the reading days. Each group will presents the content of a (set of) relevant paper(s). The presenters need to send the essay of their reading assignments two dayes before the day of the class. As an example, if you are going to present your work on Wednesday, you need to submit your essay by midnight on Monday.

The grading policy for this assignment is based on a Rubric chart containing the following criteria (available on Nestor):

Content/Delivery/Creativity/Organization/Ability to answer questions/Length of presentation/Overall quality

Possible topics for the essay assignment include but not limited to the following papers. It should be noted that you can use connected paper website to find related papers to the specific one.

Topics	Suggested Papers
Language-guided Grasping	- Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter - Learning 6-DoF Object Poses to Grasp Category-level Objects by Language Instructions - Open-World Object Manipulation using Pre-trained Vision-Language Models
LLMs in Robotics	- Code as Policies: Language Model Programs for Embodied Control - Inner Monologue: Embodied Reasoning through Planning with Language Models - Language to Rewards for Robotic Skill Synthesis - Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition - ProgPrompt: Generating Situated Robot Task Plans using Large Language Models - VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models - Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model - Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners - TidyBot: Personalized Robot Assistance with Large Language Models
Human Robot Interaction	- Reactive Human-to-Robot Handovers of Arbitrary Objects - Objects Affordance Recognition for Natural Language Human-Robot Interaction - Few-Shot Visual Grounding for Natural Human-Robot Interaction - OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields - SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds - Object affordance based multimodal fusion for natural Human-Robotinteraction - Semantic Scene Manipulation Based on 3D Spatial Object Relations and Language Instructions - Learning One-Shot Imitation from Humans without Humans - Human-to-Robot Imitation in the Wild
Object Affordance Detection	- Learning to Grasp 3D Objects using Deep Residual U-Nets - Affordance detection of tool parts from geometric features - Robotics Dexterous Grasping: The Methods Based on Point Cloud and Deep Learning (review) - AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection - Object affordance detection with relationship-aware network - Affordance Detection of Tool Parts from Geometric Features - One-Shot Object Affordance Detection in the Wild - Robotic pick-and-place of novel objects in clutter with multiaffordance grasping and cross-domain image matching
Shape Completion for Grasping	- Shape Completion Enabled Robotic Grasping* - Beyond Top-Grasps Through Scene Completion - Multi-Modal Geometric Learning for Grasping and Manipulation - RGB-D Local Implicit Function for Depth Completion of Transparent Objects - 3D Shape Estimation of Transparent Objects for Manipulation - Task-driven Perception and Manipulation for Constrained Placement with No Shape Prior - Learning 3D Shape Completion Under Weak Supervision - Diverse Plausible Shape Completions from Ambiguous Depth Images
Object Grasping and Maniulation	- Grasp-Anything: Large-scale Grasp Dataset from Foundation Models - Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes - 6-DOF GraspNet: Variational Grasp Generation for Object Manipulation - MVGrasp: Real-Time Multi-View 3D Object Grasping in Highly Cluttered Environments - Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach - Volumetric Grasping Network: Real-time 6 DOF Grasp Detection in Clutter - High precision grasp pose detection in dense clutter - 6-DOF Grasping for Target-driven Object Manipulation in Clutter - Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics - PointNet++ Grasping: Learning An End-to-end Spatial Grasp Generation Algorithm from Sparse Point Clouds - Challenges and Outlook in Robotic Manipulation of Deformable Objects - Self-Imitation Learning by Planning - Fast geometry-based computation of grasping points on three-dimensional point clouds - SafePicking: Learning Safe Object Extraction via Object-Level Mapping -
Kinestectich Teaching / Learning from Demonstrations	- Learning to Grasp Familiar Objects using Object View Recognition and Template Matching - Learning of grasp selection based on shape-templates - Learning robots to grasp by demonstration - Virtual Kinesthetic Teaching for Bimanual Telemanipulation - Learning Dexterous Grasping with Object-Centric Visual Affordances - Robot Grasping in Clutter:Using a Hierarchy of Supervisors for Learning from Demonstrations - Grasping in the Wild: Learning 6DoF Closed-Loop Grasping From Low-Cost Demonstrations - Learning Interactively to Resolve Ambiguity in Reference Frame Selection - Recent Advances in Robot Learning from Demonstration - A Learning-from-Observation Framework: One-Shot Robot Teaching for Grasp-Manipulation-Release Household Operations
Task-Informed/Oriented Grasping	- Learning task-oriented grasping for tool manipulation from simulated self-supervision - KETO: Learning Keypoint Representations for Tool Manipulation Tool Manipulation - OmniHang: Learning to Hang Arbitrary Objects - Learning Task-Oriented Grasping From Human Activity Datasets - Same Object, Different Grasps: Data and Semantic Knowledge for Task-Oriented Grasping - Category-based task specific grasping - Part-based Grasp Planning for Familiar Objects - Same Object, Different Grasps: Data and Semantic Knowledge for Task-Oriented Grasping
Dual-Arm Collaborative Manipulation	- Autonomous Dual-Arm Manipulation of Familiar Objects - Autonomous Bimanual Functional Regrasping of Novel Object Class Instances - Grasping Bulky Objects with Two Anthropomorphic Hands - Learning a Decentralized Multi-arm Motion Planner - Learning and Generalisation of Primitives Skills Towards Robust Dual-arm Manipulation - Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from Images - Learning Geometric Reasoning and Control for Long-Horizon Tasks from Visual Input - A Novel Pipeline for Bi-manual Handover Task - Vision-Based Robotic Pushing and Grasping for Stone Sample Collection under Computing Resource Constraints - Dual arm manipulation - A survey - Interactive, Collaborative Robots: Challenges and Opportunities - Fully Autonomous Picking with a Dual-ArmPlatform for Intralogistics
Pile Segmentation and Manipulation	- Perceiving, Learning, and Exploiting Object Affordances for Autonomous Pile Manipulation - Interactive singulation of objects from a pile - Learning to Singulate Objects using a Push Proposal Network - Model-Based Grasping of Unknown Objects from a Random Pile - Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning - Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching - A Learning-based Robotic Bin-picking with Flexibly Customizable Grasping Conditions - Probabilistic segmentation and targeted exploration of objects in cluttered environments
Sim2Real Transfer	- Closing the Simulation-to-Reality Gap for Deep Robotic Learning - Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks - Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance? - Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control - Using Synthetic Data and Deep Networks to Recognize Primitive Shapes for Object Grasping - Transferring Grasp Configurations using Active Learning and Local Replanning
Visual Servoing	- Survey on Visual Servoing for Manipulation - Vision-driven Compliant Manipulation for Reliable, High-Precision Assembly Tasks - End-to-End Training of Deep Visuomotor Policies - Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection - Dynamic Grasping with Reachability and Motion Awareness
Object Perception	- Selecting CNN Features for Online Learning of 3D Objects - Multi-View Convolutional Neural Networks - Unseen Object Instance Segmentation for Robotic Environments - Learning to See before Learning to Act: Visual Pre-training for Manipulation - Simultaneous Multi-View Object Grasping and Recognition in Open-Ended Domains - Improving Unimodal Object Recognition with Multimodal Contrastive Learning - Learning 3D Dynamic Scene Representations for Robot Manipulation - Big Transfer (BiT): General Visual Representation Learning - Few-shot Image Classification: Just Use a Library of Pre-trained Feature Extractors and a Simple Classifier
6D pose estimation	- Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd - Robust 6D Object Pose Estimation in Cluttered Scenes using Semantic Segmentation and Pose Regression Networks - Recovering 6D Object Pose: A Review and Multi-modal Analysis - DONet: Learning Category-Level 6D Object Pose and Size Estimation from Depth Observation - Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge - Multi-view 6D object pose estimation and camera motion planning using RGBD images - Multi-view Self-supervised Deep Learning for 6D Pose Estimationin the Amazon Picking Challenge
Continual Deep Learning	- A survey on Deep Learning Advances on Different 3D Data Representations - Replay in Deep Learning: Current Approaches and Missing Biological Elements - Continual Lifelong Learning with Neural Networks:A Review - A continual learning survey: Defying forgetting in classification tasks - Lifelong Learning with Dynamically Expandable Networks - 3D_DEN: Open-ended 3D Object Recognition using Dynamically Expandable Networks - OrthographicNet: A deep learning approach for 3D object recognition in open-ended domains
Assistive Robotics	- Robots for Humanity Using Assistive Robotics to Empower People with Disabilities - Adaptive Robot-Assisted Feeding: An Online Learning Framework for Acquiring Previously-Unseen Food Items - Is More Autonomy Always Better? Exploring Preferences of Users with Mobility Impairments in Robot-assisted Feeding - Robot-Assisted Feeding: Generalizing Skewering Strategies across Food Items on a Plate - Transfer depends on Acquisition: Analyzing Manipulation Strategies for Robotic Feeding - Towards Robotic Feeding: Role of Haptics in Fork-based Food Manipulation - Sensing Shear Forces During Food Manipulation: Resolving the Trade-Off Between Range and Sensitivity
...	...

2- Practical assignment and report (35%):

A practical assignment has been devised to provide hands-on experiences for fundamental theories. During the practical sessions, we will use the PyBullet simulation environment, Universal Robot (UR5e), and standard datasets to get more insight into different algorithms. Particularly, students will work on Coupling between Object Perception and Manipulation: Learning to Grasp Objects in Highly Cluttered Environments project.

Students are provided with a code stub and are expected to complete it (see the details below). During this assignment, students are encouraged to think about their final project, with a particular emphasis on how they intend to improve the existing framework for different tasks. If possible, they should also consider the comparisons they wish to incorporate.

Object grasping in isolated scenarios

The main challenge in object grasping in isolated scenarios is the variability and diversity of object shapes, sizes, and orientations, which requires a robotic system to adapt and generalize its grasping strategies effectively. Additionally, dealing with novel or previously unseen objects poses a significant challenge in such scenarios, as the system must make intelligent grasp decisions without prior knowledge of the object's properties.

Object grasping in packed scenarios

In object grasping in packed scenarios, the primary challenge lies in selecting the best grasp configuration to grasp objects from a cluttered and densely packed environment. Additionally, ensuring that the robot can execute precise and collision-free grasps while navigating through tight spaces between objects adds complexity to the task. These challenges make object grasping in packed scenarios a particularly intricate and demanding problem for robotics.

Object grasping in pile scenarios

In object grasping within pile scenarios, the primary challenge is dealing with varying numbers of objects, shapes, sizes, and orientations. The robot must autonomously select individual objects from the pile for grasping while also considering potential occlusions and ensuring stable and collision-free grasps. Moreover, managing the uncertainty associated with predicted grasp pose further complicates the grasping process in pile scenarios.

2-1: Assignment description & Code

The initial code and detailed instructions for the assignment can be found by following these links:

Instruction (PDF)

GitHub

2-2: Report & Grading Policy

At the end of the practical assignment a report (i.e., up to four pages IEEE conference format), including all the figures, tables, and references, has to be delivered. These practical assignments prepare students to do the final course project. Please read the detailed description of each project below.

The grading policy for this assignment is based on a Rubric chart containing the following criteria (will be available on BrightSpace):

Content/Delivery/Creativity and presentation/Organization/Neatness and attractiveness/Overall quality

3- Final project and report (50%):

For the final project, students will work in groups of four on either a specific research project (up to 25% materials can be borrowed from the previous assignments) or an in-depth literature survey (~20-30 relevant papers, organized by different features, identifying gaps in the state of the art). Finally, each group will write a report with the structure of a scientific conference paper (i.e., 6+n pages IEEE conference format). The report material (including text, figures, tables, acknowledgment, etc.) must fit into 6 pages, while there is no page limit for the appendix/references (n pages) sections. Reports exceeding the (6+n) page limit will not be evaluated. The last two sessions are dedicated to the final project presentations. Each group has to send a short description of the project (up to 2 paragraphs) to Hamidreza Kasaei (cognitiverobotic@gmail.com) to get feedback about the proposed project. This document should state:

Title of project, student names and student numbers,
Explain the goal of the to-be-developed system,
Explain how will you compare your approach with other state-of-the-art approaches if possible.

Detailed information about the content of the report and the deadline will be posted later. Here is a short list of possible projects. You may want to check out my list of projects and publications (see https://www.ai.rug.nl/irl-lab/publications.html). You should select a topic from the below list and develop/test your idea using the provided PyBullet framework, or come up with another project idea on your own (the topic should be relevant to the content of the course). If you prefer to work on a c++ based framework, please contact Hamidreza Kasaei (course coordinator).

Active Object Recognition	Shape Completion for Grasping	Pile Segmentation and Manipulation
Affordance Detection	6D pose estimation	Active Segmentation of Cluttered Scenes
Grasping and Manipulation	Dual-Arm Manipulation	Task-Informed/Oriented Grasping
Human-Robot Interaction	Deep Transfer Learning	Eye-in-Hand and Eye-to-Hand System
Open-Ended learning	RL for Manipulation	Path Planning in Manipulation

The grading policy for this assignment is based on a Rubric chart containing the following criteria (will be available on Nestor):

Content/Delivery/Creativity and presentation/Organization/Neatness and attractiveness/Overall quality

Rmarks

Late submissions will lose 25%, 50%, and 100% of the initial mark per day respectively. Therefore, having 0 marks on the third day after the deadline. You may submit your first practical assignment late (up to two days) with no penalty; this will be applied ONLY to your first practical assignment.
All the page limits exclude references.
Feel free to collaborate on solving the problem but write your code individually. In particular, do not copy code from other students.