![]() ![]() On the other system, every one was an instant Computation error and they were set to run with Rosetta Mini v3.78 windows_intelx86, and there was one Task using Rosetta Mini v3.78 windows_x86_64 that errored out as well. One Task no problem, crunching away using minirosetta_3.78_windows_x86_64.exe ![]() Those hit errors within a few seconds of starting. Work gets done, errors reduced (if not eliminated), the Manager will figure out how much work you can & can't do and stop getting too much, and it won't require frequent intervention on your part. ![]() Don't abort the Tasks, if they miss the deadline the project will resend them to another host. Set the number of threads used for crunching equal to or less than the number that can be supported with the RAM your systems have allowing for 1.3GB per task (unless you wish to add more- that also solves the problem) and in your computing preferences allow BOINC to make use of the RAM you have (set it for 90% or higher). Make sure your checkpointing request is set for 60 seconds. Set a small cache, 1 day or less, additional days. Again, selective nuking of tasks can get the CPU's busy againSimple. On top of that, some of the machines wind up wasting time because of large batches of tasks with large memory requirements that cause the "Waiting for memory" status on some tasks. #1 Freedom = (Meaningful - Constrained) Choice != (Beer^3 | Speech) There were a couple of teams in the lab that are probably doing Covid stuff now (but I'm retired, so I have no idea). In addition, if I were still supporting researchers, I would not recommend that they rely on data processed on Rosetta because such problems make the entire thing dubious. I'm sure I could even shop around for projects that are also working on Covid projects. The reason I switched to Rosetta was because the projects I used to support were not well managed. I've currently earned over 12 million points, which is supposed to indicate a moderate contribution, but I'm thinking about moving along. I'm fairly confident that BOINC has those capabilities to assess and manage memory, but it seems they are not being used by the Baker Lab people. Again, selective nuking of tasks can get the CPU's busy again, but I'm NOT supposed to be spending time managing memory problems because the people running can't figure it out. Plus its wasting the bandwidth at the project end when they send data that is just discarded. Obvious workaround (though it's tedious) is to manually abort the tasks that can't be completed, but that causes problems because the flow of tasks has become sporadic again. Exacerbated by more checkpoint problems, too.Īctually writing from the machine that has the most problems dealing with the deadlines, but even some of my bigger machines clearly have more queued tasks than they can possibly complete within the short deadlines. Is it accomplishing anything if my contributions are just discarded? And discarded for the sake of deadlines that seem quite arbitrary, even silly. Trying and even eager to be of help, but.Īll these short deadline units are troublesome. /projects/_rosetta/minirosetta_3.78_x86_64-pc-linux-gnu -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers -in:file:native 00001.pdb -corrections::beta_nov16 -silent_gz 1 -frag9 00001.200.9mers -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 15 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip CF_monomer_28_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 600 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 2362735ĮRROR: Option matching -corrections:beta_nov16 not found in command line top-level context :: BOINC :: boinc_init()īOINC:: Setting up shared resources. I notice the scheduler is down now so maybe they are removing them from the queue. I ended up aborting the few that hadn't committed suicide. Other work units seem fine, just not these ones. Got 56 CF_monomer/Rosetta Mini work units that all failed with an instant "Error while computing". ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |