Frequently Asked Questions
Accounts¶
How do I get an account?¶
See the CCI projects article for information on creating a project and associated accounts.
All forms for new accounts should be send to cci-support[at].rpi.edu only.
How do I change my password?¶
Send an email with the subject "Password Reset" to cci-support[at]rpi.edu
NOTE: You must change your password before you can log into the landing pads.
What is the meaning of the user name format?¶
CCI usernames are in the format of PROJuser, where 'PROJ' is the project name and 'user' is your individual user principal.
If one person is involved with multiple projects they will be given different 'usernames' for each project that they are involved with, but the principal remains the same.
Usage¶
How do I access the systems?¶
Use ssh
to connect to a landing pad. Then connect from a landing pad to a cluster front-end such as dcsfen01 to access the DCS Supercomputer (AiMOS).
Please check our List of Available Systems for more information.
Why can't I connect to a landing pad? / Why do I receive this error when connecting to a landing pad: Operation timed out?¶
You must use two-factor authentication and connect to the landing pads, blp01-04.
The following landing pads are available for two-factor SSH connections:
blp01.ccni.rpi.edu
blp02.ccni.rpi.edu
blp03.ccni.rpi.edu
blp04.ccni.rpi.edu
(More information is available in the landing pads article.)
Why can't I download XYZ from the Internet?¶
General outbound access to other resources or sites on the Internet is prohibited. Some common remote repositories are available via a proxy.
If you still cannot download your files, send a email to cci-support@rpi.edu and a brief explanation of what you're looking to allow.
How do I get my data into or out of CCI?¶
Use scp
to transfer data to the landing pad systems. The GPFS filesystem is mounted on the landing pads in the same manner as the front-end nodes and compute nodes, so files transferred to a landing pad will also be available on all cluster nodes.
See this page for large file transfers.
File System¶
How do I check my GPFS quota usage?¶
Running df -h .
in a directory will display usage based on the project quota.
All projects/users have 10G home directories. Home directory space is shared amongst all users in a project. This quota cannot be changed.
How do I increase my quota?¶
A quota increase request for barn space must be sent to support with an explanation by the project PI/sponsor or, for a non-RPI project, by the organization manager. Please include an explanation of why the quota-free scratch space is insufficient for your needs.
Scheduling jobs / Slurm¶
What is the time limit for running jobs?¶
The default wall time limit is six hours. Some clusters support running longer jobs in certain circumstances. See our Available QoSs for more information.
There are free nodes, why isn't my job running? / Why are my jobs waiting in the queue for a long time?¶
Jobs are automatically prioritized based on a number of parameters such as size, project usage, project classification or time in queue.
The queue is not a simple 'first in, first out' queue. Jobs may be inserted into the middle of the queue based on initial priority and even move towards the end of the queue if a project's usage increases significantly while the job is pending. A job will begin once it reached the front of the queue.
When there is a large job in queue that requires many nodes, the scheduler will hold nodes free when jobs complete in anticipation of the large job. Smaller jobs, shorter jobs will fill in nodes (backfill) if they have priority and do not interfere with the large job beginning at the expected time. The scheduler may also hold nodes free in anticipation of a maintenance outage.
It is also possible that a special reservation is necessary for maintenance or upgrade reasons.
My job has a different state than normal/isn't starting or running/has the wrong parameters, should I cancel and resubmit it?¶
It is generally better not to. A job's accumulated wait time is factored into the job priority, making it more likely to run the longer it waits. If you cancel and resubmit then this advantage is obviously lost. If there is a real system issue, removing the job makes it harder to determine what the problem could be.
It is possible to modify a job's properties after submission, see the scontrol update
command for more information.
Why is Slurm giving me this error: error: Unable to allocate resources: Invalid account or account/partition combination specified?¶
Your account is not authorized to submit to this partition. There may be another partition that your account is authorized to use or your account may not have authorization at all for that particular cluster/system.
Software¶
Why isn't library or tool foo installed on bar?¶
The systems have a base set of common libraries and tools installed for their native architecture and operating system. A reasonable effort is made to provide libraries and tools required by most users, but there will always be some library or tool that someone needs that we do not provide. In these cases we can render advice on how to obtain or set up the given package but we can not install it for you, globally or locally. This applies to both free/open source and commercial software.
What MPI implementations are available? How do I use them?¶
There are several different implementations of MPI available on the clusters including OpenMPI and MVAPICH2. We use modules to simplify the process of making these libraries available to users.