DSCC 201/401
Tools and Infrastructure for Data Science
February 15, 2021
Using BlueHive
2
Logging on to BlueHive
• You must be connected to the UR_Connected wireless network or using the VPN client (See: http://tech.rochester.edu/remote-access-vpn- tutorials/ for help with VPN)
• Detailed instructions and links to learn how to connect is at:
http://info.circ.rochester.edu
• 2 Ways to Connect:
• Log in with FastX for a Graphical User Interface (GUI):
https://bluehive.circ.rochester.edu
• Connect through Terminal (Mac) or PuTTY (Windows) (No GUI)
3
Running a Program on BlueHive
• Running jobs interactively with FastX (GUI) • Default FastX partition
• Custom resources
• Running jobs interactively with a terminal session (no GUI) • Login node
• Compute node
4
ssh web browser
BlueHive Resource Allocation
bluehive
terminal session on bluehive
default
FastX server
interactive interactive
FastX node (1 core, 2GB)
Slurm Server
compute node
(with requested resources)
FastX
• Graphical connection to https://bluehive.circ.rochester.edu • Login with NetID username and password
• DUO authentication
• Default Session
• 1 CPU
• 2 GB RAM
• “Unlimited” time
• Interactive Session
• User selects resources • Up to 12 hours
6
Terminal Session
• Connect to bluehive.circ.rochester.edu with a terminal application (e.g. Terminal on Mac OS X or PuTTY on Windows)
• Login with NetID username and password
• DUO authentication
• Log in to shared login node (Do not do calculations on this node)
• Interactive Session
• User selects resources
interactive -p interactive -t 1:00:00 -c 1
–mem-per-cpu=2GB
• Up to 12 hours
• Note: Interactive session with a terminal is also available from FastX
7
info.circ.rochester.edu
8
Introduction to Linux and Bash
9
What is Linux?
• The Unix operating system was developed in the late 1970s at Bell Labs (AT&T) to run on computer hardware. It was designed to work closely with the C programming language so software could be easily developed.
• Unix eventually was developed to provide a multiuser, shared environment with user applications, libraries, etc.
• Philosophy follows “everything is a file” as much as possible.
• Linux was developed in the 1990s (Linus Torvalds).
• Linux is a kernel that is based on the Unix operating system.
• A kernel is software that runs on a computer to provide control for specific hardware components and data transfers from different components of the computer. All input/output actions (system and user) are handled by the kernel in a computer.
• The kernel handles common tasks such as running processes on CPUs; managing data I/O to RAM, storage, network interfaces; and program control and interrupts.
10
Linux Kernel
User Space
Linux Kernel
RAM
CPU
Storage
Server
Network
11
What Does the User Space Look Like?
• A shell exists in user space and can provide an interface to the kernel
• The shell allows users to access programs and data
• A shell is usually provided by the operating system software (e.g. Bourne shell and C shell)
• We will examine Bash (Bourne Again Shell)
• Sometimes more generally referred to as Command Line Interface (CLI) to contrast with Graphical User Interface (GUI)
• CLI advantages:
• Excellent for sophisticated actions on files • Easy to automate
• CLI disadvantages:
• Steep learning curve
• Can easily destroy files (and system) if not careful
12
Linux File System Hierarchy Tree Structure
13
Common Directory Spaces on BlueHive
• /home – Home directories; Every user has a home directory (25 GB quota); Backed up nightly
• /scratch – Scratch directories; Every user has a scratch directory (200 GB quota): Not backed up!
• /public – Directories to share data with other users (easily)
• /software – Location of all user application software on BlueHive
• /archive – Archival storage for research groups and users
14
Navigating the Tree – Linux File Commands
• Listing files: ls
• Help on a command: man
• Print working directory: pwd
• Change directory: cd
• Remove a file: rm
• Copy a file: cp
• Make a directory: mkdir
• Remove a directory: rmdir
• Shortcuts: .., ., ~, and –
• BlueHive: /home, /scratch, and /public
15
Working with Files
• Wildcards: *, ?
• Viewing files: cat, more, less
• Beginning and end of files: head, tail • Counting: wc
• Searching for text and files: grep, find • Pipes and directors: |, >
• Sorting: sort
• Useful commands: date, who, history • File editors: nano, vi, vim, and emacs • Searching: find
16
Linux Process Control and Limits
• What is currently running?: ps
• What is running in the background?: jobs • Interrupting a command: ^C
• Current processes running on system: top • Current disk space and limits: quota
• Current space used in directory: du
• Current memory usage: free
• Information on CPUs: cat /proc/cpuinfo
• Standard output redirection: >
• Standard error redirection: 2>
17
Remotely Copying Files
• Secure copy local file to remote with scp
scp localfile netid@bluehive.circ.rochester.edu:
• Secure copy remote file to local file with scp
scp netid@bluehive.circ.rochester.edu:remotefile .
18