linux poison RSS
linux poison Email

Temporarily suspend a process

At times, you may find it necessary to temporarily suspend a process, and then resume its execution at a later time. The following two commands will suspend a process, and the resume it, respectively:

# kill -STOP 945
# kill -CONT 945


14 comments:

Stu said...

SWEET! I needed that... I'm throwing your blog into GReader.

Anonymous said...

This is perfect. There's a memory leak of some kind in FF that freezes it everytime I leave the computer alone for a while...when I come back, I have to spend 10min signing in and getting a terminal up just to kill it (by which time it's using over 1gb of RAM). This should help tremendously. Thanks!

Anonymous said...

Thanks, it's taken me too long to learn this capability. Didn't think to check out 'kill'

Eric Njogu said...

To get around getting the PID, one can use

pkill -stop [process-name]
pkill -cont [[process-name]

Joshua McGee said...

I'm embarrassed that I have Linux certification and didn't know this. You were first match on Google with my keywords. Thanks for the help.

Anonymous said...

But what if you want to suspend a thread rather than a process? Got the thread ID using ps -eLf, but trying kill -STOP with that ID doesnt work.

Anonymous said...

would you be allowed to do such a thing? Being part of their process threads likely have interdependency (sharing resources etc)and doing this could often cause a some sort of deadlock. (thread A suspended by user while holding resource, thread B suspended because it tries to access resource, but will never acquire it)

Anonymous said...

how to retrive data of suspended process? pls help.

Anonymous said...

How can i do what i want... it does not work, let me explain and then answer will be understood...

What i do:
1: I have more than one PC
2: One PC acts as a NFS server, so files and folders on it are seen (mounted) on the rest of PCs
3: Run a bash script on each PC, also on the one that acts as NFS server, this scripts works perfect.
4: The script does some short time things, then call a Binary (that i can not compile, like a comercial console app), such does an intense computation calculations, more than one hour, sometimes takes more than four hours, others a full day...
5: i can do the script to pause where ever i want, so no problem to suspend it and resume next day, week or month... the concept of a batch processing with breakpoints after each step.

The problem is that one of such steps (the Binary console app it calls) takes a lot of time to end, a lot of hours, sometimes also a full day or more.

Sometimes i must leave and i want to hibernate the PCs... but that does not work, well they hibernate well and Linux resume well, but such Binary process fails without telling any error, just end inmediatly!!!

How i do the suspend / hibernate? Just as i hope it is the best way... telling Linux to hibernate.

Hey, the NFS Server is the last one i hibernate, otherwise some of such Binary console app running will fail.

Well i had tried not to hibernate the PC acting as NFS server, did not work either.

The problem step by step:
-PowerON PC acting as NFS server
-PowerON another PC
-Mount NFS filesystem on such another PC
-Run such binary console app (takes a lot of hours to end) on a console
-Before it ends, tell Linux to hibernate such another PC, not the PC acting as NFS server
-Power ON again such another PC
-After Linux resume, console appears, but such binary process exit inmediatly not informing of any error
-If i check access to NFS filesystem i can read from and write to it

Why hiberating Linux does not let such Binary to continue... it does not use any Internet connection... only read from and write to silesystem.

Such binary console app only reads a file, make a lot of calulating while reading it and write results on another file, it takes hours and sometimes days to end such calculating.

If i hibernate Linux with such process running with only local filesystem (files on local filesystem) and the resume it does not fail.

If i hibernate Linux with such process running using NFS filesystem (files on remote NFS filesystem) and the resume it fails.

So it must be related with such NFS use... but hey did not Linux make such transparent to applications?

i mean:
-i mount a remote NFS filesystem manually
-i hibernate and resume Linux
-i check such mounted filesystem and it works perfect i can read and write on it

Also i can do this:
-i mount a remote NFS filesystem manually
-i hibernate Linux
-i hibernate Linux on such NFS server
-i resume Linux on such NFS server
-i let it time to get up
-i resume Linux that access such remote NFS system
-i check such mounted filesystem and it works perfect i can read and write on it

As was supposed to be... Linux mounts filesystems in a transparent way... hibernate and resume must let continue any application that do not use network... server is the last one hibernated and the first one resumed... the rest of PCs are resumed after letting enough time to server to end its resume...

But, hey this binay console app fails if Linux is resumed and it was reading and/or writting to a mount point holding a remote NFS filesystem.

...continue...

Anonymous said...

...continuation...

Hope i had let it clear:
-SERVER acts as a NFS filesystem server (does not mind if hibernated or not)
-CLIENT where i run such binary console app
-Power On SERVER
-Power On CLIENT
-Mount on CLIENT a NFS path that resides on SERVER, just to access SERVER files as if they where locally
-Run a Binary Console app that reads from a file, process it doing a lot of calculations, write results to a different file (local or on such NFS path does not mind)
-If i hibernate CLIENT and resume it, such Binary console app exit without finishing its work and not informing of any error

So... i wish if i could test a way to let's say pause such process before hibernate and "unpause" it after totally resume Linux... neither work...

I test this:
-pause the Binary console app process
-check NFS mounted filesystem works -> OK
-hibernate Linux
-resume Linux
-check process status -> paused -> OK
-recheck NFS mounted filesystem works -> OK
-unpause the Binary console app process
And surprise, it ends inmediatly, its works has not being terminated, it does not inform of any error at all.

Hey, i have also test this other thing:
-pause the Binary console app process
-check NFS mounted filesystem -> works -> OK
-unplug the RJ45 cable
-check NFS mounted filesystem -> fails -> OK
-replug the RJ45 cable
-check NFS mounted filesystem -> fails -> OK
-recheck NFS mounted filesystem after a while -> works -> OK
-unpause the Binary console app process
And surprise it continues perfectly.

So i must supose hibernate / resume Linux is doing something wrong.

What else can i test?

Of course, thanks for #kill -STOP, that let me do such full tests.

I am getting mad with such ugly non-interactive binary console app... i have no source code and i must use it...

Why i must use it? simple... is the fastest one i found and is free of charge for any use... let me say i tried more than ten alternatives console apps, no one do the work as perfect as that one and all the rest takes ten times more to end... where it take one day the others takes on the best case more than a week...

That let me think,... maybe such app is doing something at a very low level and accessing filesystem at a low level and so Linux does not handle it correctly when hibernating / resuming.

But what can i do?

Till now the best workarround i do is to estimate time to end each step (each call to such Binary console app) and stop if not enough time left before i must go out, so i can power down and no need to hibernate... but sometimes a telephone call make me go out before letting end, so i must kill it and all work done lost.

Can i check anything else?

Is there any other command to suspend a process?

By the way, how can i get all process handles, i mean all files, etc oppened... so i can check after resume from hibernating all are as was before hibernate, just to discard hibernate problems!!!

Thanks

Anonymous said...

Maybe the solution could be this on... must try:

CryoPID - A Process Freezer for Linux
[URL]http://cryopid.berlios.de/[/URL]

On that page it can be readed:
"CryoPID allows you to capture the state of a running process in Linux and save it to a file. This file can then be used to resume the process later on, either after a reboot or even on another machine."

That also would be great if also possible to change PC, so i can use a faster one when it is not doing anymore...

As i say must check if that works... but could be an alternative, since with it it is not needed to hibernate, just normal power off... so if hibernate is the cause, it would be a perfect workarround.

Also may be an extra solution, because another thing i was tring is to migrate such Binay Console app to a faster CPU when such one finish its own work.

Imagine more than four PCs, all being used at same time... all of different speeds... when faster one ends... imagine you can transfer the process on the slower one to such faster one that now is not doing anything... it will finish sooner...

As i said it must be tested... that is the reason i put all this comments...

Sometimes kill -STOP is not a solution, because the process could fail... sometimes it is much better to use a canyon than a gun to kill a fly, other no.

I will test if if i have time... anyway if someone want to test it, please comment the result.

Anonymous said...

thanks alot pal :)

abhi said...

Related to "But what if you want to suspend a thread rather than a process? Got the thread ID using ps -eLf, but trying kill -STOP with that ID doesnt work."


Yes, this can be done. I got one application for Solaris.

SomeDaySomeSay said...

Good day! I have seen that your Rss of this domain is working without any mistakes, did you all the properties all by yourself or you just left the original settings of this widget?

Post a Comment

Related Posts with Thumbnails