PROBLEM WITH JOB SUPPLY FROM CERN


Advanced search

Message boards : News : PROBLEM WITH JOB SUPPLY FROM CERN

1 · 2 · 3 · 4 · Next
Author Message
Ben Segal
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 1 Nov 10
Posts: 722

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 264
RAC: 0
Message 3198 - Posted: 11 Aug 2011, 16:54:42 UTC
Last modified: 11 Aug 2011, 18:51:43 UTC

WE HAVE A PROBLEM. WE BELIEVE SOME "ROGUE" SYSTEMS ARE SUCKING OUR QUEUES DRY AND THUS OVERLOADING THE SUPPLY OF JOBS TO EVERYONE ELSE.

Please be patient - we will try and fix this as soon as possible.

Until this problem is understood, we have disabled the creation of new jobs and users for the moment. As soon as the system is recovered, the work units flow will be restored.


LHC@home 2.0 team

Crystal Pellet
Volunteer moderator
Avatar
Send message
Joined: 5 Aug 11
Posts: 1307

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 1,120,039
RAC: 1,295
Message 3205 - Posted: 11 Aug 2011, 17:47:10 UTC - in response to Message 3198.

Hi Ben,

Does it help, if we suspend the T4T task in BOINC Manager, seeing the VM has no real work to do atm (running idle) until we get your sign: go go go?

Regards,
CP

Profile Daniel Lombraña González
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 10 Nov 10
Posts: 1661

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 566
RAC: 0
Message 3209 - Posted: 11 Aug 2011, 18:33:50 UTC
Last modified: 11 Aug 2011, 18:49:03 UTC

Dear all,

As we are having problems in the last hours, we have disabled the creation of new jobs and users for the moment. As soon as the system is recovered, the work units flow will be restored.

Regards,

Daniel

Profile Yinette
Send message
Joined: 11 Aug 11
Posts: 2

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 6,254
RAC: 0
Message 3226 - Posted: 11 Aug 2011, 21:34:11 UTC
Last modified: 11 Aug 2011, 21:35:51 UTC

Hello

I joined up yesterday and have been having some issues regarding getting jobs and sending them back, i do hope i'm not causing the issue.

I've suspended my VM and Boinc task for now. Hopefully the issue can be resolved.

Abdullah Afzal
Send message
Joined: 11 Aug 11
Posts: 2

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 2,474
RAC: 0
Message 3230 - Posted: 11 Aug 2011, 21:47:38 UTC
Last modified: 11 Aug 2011, 21:48:08 UTC

Hi
I have just joined into this project after having read about it on Google news.I am excited ,can't wait any longer for the service to start back.I hope it gets resolved soon.

Hadron42
Send message
Joined: 11 Aug 11
Posts: 2

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 113,285
RAC: 0
Message 3233 - Posted: 11 Aug 2011, 22:23:27 UTC

Could it be that some people have set a large number of days work to cache?

Profile Ageless
Avatar
Send message
Joined: 1 Aug 11
Posts: 176

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 9,258
RAC: 0
Message 3234 - Posted: 11 Aug 2011, 22:29:59 UTC - in response to Message 3233.

Could it be that some people have set a large number of days work to cache?

That doesn't matter as on this project, the BOINC cache preferences are ignored. All hosts get 1 task = 24 hours worth of work through Virtual Box.
____________
Jord

-BOINC FAQ Service


Real is just a matter of perception.

Sunny129
Avatar
Send message
Joined: 5 Aug 11
Posts: 169

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 1,476,997
RAC: 1,301
Message 3235 - Posted: 11 Aug 2011, 22:32:28 UTC - in response to Message 3233.
Last modified: 11 Aug 2011, 22:33:04 UTC

Could it be that some people have set a large number of days work to cache?

that doesn't matter b/c the project is currently set (on the server side) to issue no more than one [BOINC] task at a time to any given host. even if you change your cache to several days, you'll notice that there is never more than a single T4T@H task running at any given moment (even on hosts with multi-core CPUs), and there is never another T4T@H task waiting in the queue. you can't fetch a new [BOINC] task until the current one finishes.


...looks like you beat me to it Ageless :-)

jujube
Send message
Joined: 5 Aug 11
Posts: 1414

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 701,572
RAC: 0
Message 3236 - Posted: 11 Aug 2011, 22:40:24 UTC - in response to Message 3233.


All ur tsks are belong to uz. No retrn by dedlyne. Free Julienne Masange. Lulz.


Joking of course.


Could it be that some people have set a large number of days work to cache?


T4T sends only 1 task at a time. You can't get another task until you finish the first one. Anyway it's not the BOINC tasks they're running out of, it's the CERN jobs downloaded by BOINC_VM. BOINC tasks and CERN jobs... two very different things.

Profile Ageless
Avatar
Send message
Joined: 1 Aug 11
Posts: 176

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 9,258
RAC: 0
Message 3237 - Posted: 11 Aug 2011, 22:47:28 UTC - in response to Message 3236.

Anyway it's not the BOINC tasks they're running out of, it's the CERN jobs downloaded by BOINC_VM. BOINC tasks and CERN jobs... two very different things.

I was checking through some tasks this afternoon to find a comparison for a task Bill & Patsy ran, but 50 tasks down, plus 50 tasks up from the one he pointed out, all were computational errors. There's a lot of trashing going on around here.

Plus a lot of people new to BOINC and Virtual Box, who don't (yet) know how to deal with shutting down these programs correctly. Or who find that VBox continues running after they close BOINC down and then manually end the process in task manager, which will give their task an immediate computational error on the next BOINC restart, or it'll give a BOINC_VM error that they don't know how to handle.

The BBC article may have been a bit too much.
____________
Jord

-BOINC FAQ Service


Real is just a matter of perception.

Profile Yinette
Send message
Joined: 11 Aug 11
Posts: 2

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 6,254
RAC: 0
Message 3239 - Posted: 11 Aug 2011, 22:55:16 UTC - in response to Message 3237.
Last modified: 11 Aug 2011, 23:26:33 UTC

I had to trash one job (Reset VM as instructed) due to the want_getJob issue, i wonder if the two known problems are related.

Anyway, waiting patiently here until the issue can be resolved :)

jujube
Send message
Joined: 5 Aug 11
Posts: 1414

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 701,572
RAC: 0
Message 3243 - Posted: 11 Aug 2011, 23:26:24 UTC - in response to Message 3237.

Anyway it's not the BOINC tasks they're running out of, it's the CERN jobs downloaded by BOINC_VM. BOINC tasks and CERN jobs... two very different things.

I was checking through some tasks this afternoon to find a comparison for a task Bill & Patsy ran, but 50 tasks down, plus 50 tasks up from the one he pointed out, all were computational errors. There's a lot of trashing going on around here.

Plus a lot of people new to BOINC and Virtual Box, who don't (yet) know how to deal with shutting down these programs correctly. Or who find that VBox continues running after they close BOINC down and then manually end the process in task manager, which will give their task an immediate computational error on the next BOINC restart, or it'll give a BOINC_VM error that they don't know how to handle.

The BBC article may have been a bit too much.


This wouldn't be the first project to be a victim of its own popularity.

I've noticed all the trashing going on here too, at the BOINC task level. They need to turn on/configure the option in BOINC server to limit how many BOINC tasks a host can receive in a day. A limit of 20 seems reasonable. Whatever, it shouldn't be left unlimited as it seems to be now. They need a similar limit at the CERN job level too.

Profile The Dreamer
Avatar
Send message
Joined: 2 Aug 11
Posts: 17

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 372,539
RAC: 261
Message 3244 - Posted: 11 Aug 2011, 23:36:22 UTC

When I first started running this project.... things were just going compute error on Windows at first.

The first time the VM runs, the pops a bunch of dialogs warning you of things like the guest isn't optimal, or that its not properly taking the mouse, or that its about to grab the keyboard away. And, needing me to check the box to stop doing that.

Otherwise, the VM doesn't respond and the wrapper aborts and returns the unit in error and asks for another one.

The problem is that I only have BOINC running when my windows computer is idle...so I didn't know that I needed to interact with the BOINC_VM initially...or that I needed to disable the idle only to go through the dialogs before the VM would run correctly.

But after a day of lots of failed units, I suspended things until a co-worker reported that he had successfully completed a unit on his Windows box. So, I resumed work fetch thinking that there were just early bad units.

Wasn't until later I saw a glimpse of the VM with dialog after my screensaver stopped and before boinc had unloaded everything.

But, before I caught on to that...I had tried running this project on my Linux box...where I left boinc run all the time.....but its normally headless...I only start gdm by hand when I need it for something. I have a cron job that checks if boinc is running each hour and starts it if necessary (I only run the command line version on Unix/Linux...and enable remote-gui so I can manage things from Windows...)

So, I was going to work units pretty fast on Linux until I figured out that it wanted a display....

I'm sure as more and more people join they are going to run into these issues in the beginning as well....and some may not give up sucking jobs down thinking that things being beta...things will start working on their own eventually.

I know I certainly thought that when I first joined. And, if I hadn't just happened to see the VM 'stuck' that one time...I wouldn't have known to babysit the first run.

Once I knew this, adding two more hosts to this project went a lot smoother...

The Dreamer.

Sunny129
Avatar
Send message
Joined: 5 Aug 11
Posts: 169

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 1,476,997
RAC: 1,301
Message 3245 - Posted: 11 Aug 2011, 23:52:43 UTC - in response to Message 3243.

This wouldn't be the first project to be a victim of its own popularity.

I've noticed all the trashing going on here too, at the BOINC task level. They need to turn on/configure the option in BOINC server to limit how many BOINC tasks a host can receive in a day. A limit of 20 seems reasonable. Whatever, it shouldn't be left unlimited as it seems to be now. They need a similar limit at the CERN job level too.

i agree. i have to admit, i'm guilty of trashing a few BOINC tasks myself over the past few days. i've been troubleshooting one of my hosts, trying to get BOINC, VirtualBox, and the wrapper to play nice together. but now that i better understand the various functions of VirtualBox and how to use them (thanks to Ageless), i now have better manual control over VirtualBox and the things that can cause the corresponding BOINC task to error out. i agree that 20 tasks per day should be more than enough...i can't imagine needing to trash more than a task or two per day, whether intentionally or accidentally. and once i resolve my bug, there should be no wasting of tasks.

Profile Ageless
Avatar
Send message
Joined: 1 Aug 11
Posts: 176

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 9,258
RAC: 0
Message 3246 - Posted: 12 Aug 2011, 0:06:40 UTC - in response to Message 3245.

(thanks to Ageless)

Thank them who asked me back all three times I left this project. ;-)

i agree that 20 tasks per day should be more than enough...i can't imagine needing to trash more than a task or two per day, whether intentionally or accidentally. and once i resolve my bug, there should be no wasting of tasks.

The thing is though, it isn't continuously 20 tasks per day. That's only the first day. As soon as you start thrashing tasks, each one that ends in error deducts your amount with one, so you end up on 1 task per core per host per day pretty quickly.

The only way to get up from a ration of 1 task per day is to return correct work. Which can be pretty daunting to do on a Beta project. This isn't a production project, it's a Beta project, meaning that there WILL be problems.

Luckily the tasks run for 24 hours when they actually do run. But when you are at 1 task per day, and this is the only project you run, it may be pretty frustrating to run a computer that isn't essentially doing anything during that time. (But then, BOINC wasn't really made for one project only. ;-))
____________
Jord

-BOINC FAQ Service


Real is just a matter of perception.

Profile Magic
Avatar
Send message
Joined: 1 Mar 11
Posts: 455

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 3,992,315
RAC: 3,280
Message 3247 - Posted: 12 Aug 2011, 0:23:03 UTC


I was hoping this would be back to work faster than it is going right now.

I usually have my 4 running here 24/7 and now I have one down and 3 to send one in in the next few hours.

I guess they can run all cores on Einstein until we are up and running here again.

I did have mine all set up so the task reports were at times when I can check them since I have already spent hundreds of hours here at 4am.

Be nice if this is running in the next 5 hours or less


jujube
Send message
Joined: 5 Aug 11
Posts: 1414

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 701,572
RAC: 0
Message 3249 - Posted: 12 Aug 2011, 1:38:22 UTC - in response to Message 3246.

i agree that 20 tasks per day should be more than enough...i can't imagine needing to trash more than a task or two per day, whether intentionally or accidentally. and once i resolve my bug, there should be no wasting of tasks.

The thing is though, it isn't continuously 20 tasks per day. That's only the first day. As soon as you start thrashing tasks, each one that ends in error deducts your amount with one, so you end up on 1 task per core per host per day pretty quickly.[/quote]

I thought it could be configured to not decrease the allotment.

Profile Daniel Lombraña González
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 10 Nov 10
Posts: 1661

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 566
RAC: 0
Message 3277 - Posted: 12 Aug 2011, 7:46:19 UTC

Dear all,

We are still working on this issue, so please, be patient. As soon as we have recovered the system, we'll let you know.

Regards,

The team!

Profile Daniel
Avatar
Send message
Joined: 9 Aug 11
Posts: 2

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 217,113
RAC: 0
Message 3298 - Posted: 12 Aug 2011, 8:49:13 UTC

G'day,

I hope it wasn't me. I only just joined up 2 days ago. I set boinc to have 10gig of hdd space and you guys sent me 1.23 gig of data? however I still only have 1 task in boinc.

Not to brag but my systems pretty awesome and you're getting access to some serious hardware.

But i'm a team player. so keep the messages concise with instructions for us none science types and I will do what I can from this end.

Profile Daniel Lombraña González
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 10 Nov 10
Posts: 1661

MCPLOT stats
Jobs: Loading
Events: Loading

BOINC stats
Credit: 566
RAC: 0
Message 3336 - Posted: 12 Aug 2011, 13:03:15 UTC - in response to Message 3298.

Dear Daniel,

Welcome to the project!! As you have mentioned we ask you to "lend us" 9 GB of your hard disk, but sometimes we will not use all the capacity. This is because some experiments require less space and others will require all the space: the 9 GB. Thus, don't worry, you will see that from time to time your hard disk will be filled with more data ;)

Regards,

Daniel

1 · 2 · 3 · 4 · Next
Post to thread

Message boards : News : PROBLEM WITH JOB SUPPLY FROM CERN