Kiro Building Note 6 (Problem Formulation)

Kiro Building Note 6 (Problem Formulation)

Full Self Page Turning (Non-rigid Object Manipulation)

Problem: Automatically turn pages of a physical book from start to end without human intervention.

Current Hardware Structure

  1. Overhead Camera
    1. Objective: Observe the entire page-turning process, detect current page and arm positions.
    2. Behavior: Continuously records video (640x480 resolution) at 20fps.
  2. LED
    1. Objective: Provide consistent lighting for OCR and state recognition.
    2. Behavior: Turn on/off based on brightness of the current environment.
  3. Lift Arm
    1. Objective: Lift only the next page.
    2. Behavior: Rotates around the -y axis, from position vector (1, 0, 0) to (0, 0, 1), anchoring on the book spine.
  4. Turn Arm
    1. Objective: Turn the lifted page across the book.
    2. Behavior: Rotates around the z axis, from position vector (1, 0, 0) to (-1, 0, 0).
  5. Hold Arm
    1. Objective: Prevent previous or turned pages from turning back.
    2. Behavior: Rotates around the -y axis, from position vector (0, 0, 1) to (-1, 0, 0), anchoring on the book spine.

System Definition

Current

  • Observation: Overhead Camera video stream (640x480)
    • Normal (4):
      • idle: The system is waiting before turning page.
      • lifted: A single page is successfully lifted and is attached to the lift arm.
      • turned: The page is being turned from right to left by the turn arm.
      • ready_to_hold: The page is ready to be hold by the hold arm.
    • Abnormal (5): critical❗(require_human_intervention), warning ⚠️ (need_to_monitor), check 🔄 (auto_retry)
      • page_stuck_on_lift (critical❗ )The page is strongly adhered to the lift arm and cannot be released.A human must manually detach the page to continue.
      • arm_misaligned (critical❗)The arm's movement is misaligned from the intended trajectory.A human must correct the mechanical alignment.
      • lack_of_adhesion (critical❗)The arm failed to lift the page after multiple attempts due to insufficient adhesion.A human must replace or adjust the gripping mechanism (e.g., arm tip or suction).
      • multi_page_lifted (check 🔄)More than one page was lifted.
      • page_slipped_during_turn (check 🔄):The page slipped away from the turn arm during motion.
      • page_slipped_during_hold (check 🔄):The page was successfully turned but failed to stay held by the hold arm.
  • Action: Control angle values for 3 arms (lift, turn, hold) or Request Human Intervention

State:

Future

  • Observation: Add secondary camera (front view), tactile sensor, IR distance sensor, etc.
  • State: More fine-grained
  • Action: Fully autonomous - DO NOT request human intervention

Feasible Solution

  1. ObservationStateAction
    1. ObservationState (Perception)
      1. Algorithm (Software 1.0): Edge Detection, Contour Matching, Motion Tracking, etc.
      2. ML (Software 2.0): Page State Classification
      3. Algorithm + ML (Software 1.0 + 2.0): Heuristics + ML
      4. Large Model (Software 3.0)
    2. StateAction (Policy)
      1. Algorithm: Rule-based actuator control depending on the state
  2. ObservationAction (End-to-end)
    1. Vision-based direct control with imitation learning or reinforcement learning

v1.0 Solution

Autonomy is more important than speed or cost.

Architecture

Observation → State: Use Large Vision Model

State → Action: Algorithmic Control

Plan after v1.0

  1. ObservationState
    1. Replace large model with lighter ML classification model for effective, real-time classification
    2. Curate custom dataset from Kiro’s teleoperation videos
  2. ObservationAction (End-to-end)
    1. Imitation Learning or Reinforcement Learning