[INFO] fetching crate rusty-llm-jury 0.1.0...
[INFO] testing rusty-llm-jury-0.1.0 against try#33835004928d3bf65db4d4712e1330766263b0bd for pr-155739-1
[INFO] extracting crate rusty-llm-jury 0.1.0 into /workspace/builds/worker-1-tc2/source
[INFO] started tweaking crates.io crate rusty-llm-jury 0.1.0
[INFO] removed 0 missing tests
[INFO] finished tweaking crates.io crate rusty-llm-jury 0.1.0
[INFO] tweaked toml for crates.io crate rusty-llm-jury 0.1.0 written to /workspace/builds/worker-1-tc2/source/Cargo.toml
[INFO] validating manifest of crates.io crate rusty-llm-jury 0.1.0 on toolchain 33835004928d3bf65db4d4712e1330766263b0bd
[INFO] running `Command { std: CARGO_HOME="/workspace/cargo-home" RUSTUP_HOME="/workspace/rustup-home" "/workspace/cargo-home/bin/cargo" "+33835004928d3bf65db4d4712e1330766263b0bd" "metadata" "--manifest-path" "Cargo.toml" "--no-deps", kill_on_drop: false }`
[INFO] crate crates.io crate rusty-llm-jury 0.1.0 already has a lockfile, it will not be regenerated
[INFO] running `Command { std: CARGO_HOME="/workspace/cargo-home" RUSTUP_HOME="/workspace/rustup-home" "/workspace/cargo-home/bin/cargo" "+33835004928d3bf65db4d4712e1330766263b0bd" "fetch" "--manifest-path" "Cargo.toml", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc2/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc2/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:d429b63d4308055ea97f60fb1d3dfca48854a00942f1bd2ad806beaf015945ec" "/opt/rustwide/cargo-home/bin/cargo" "+33835004928d3bf65db4d4712e1330766263b0bd" "metadata" "--no-deps" "--format-version=1", kill_on_drop: false }`
[INFO] [stdout] 42b0539ac68d8ad4b98abcf88e8cf89cead375a70287793c760502d9064d120b
[INFO] running `Command { std: "docker" "start" "-a" "42b0539ac68d8ad4b98abcf88e8cf89cead375a70287793c760502d9064d120b", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "inspect" "42b0539ac68d8ad4b98abcf88e8cf89cead375a70287793c760502d9064d120b", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "rm" "-f" "42b0539ac68d8ad4b98abcf88e8cf89cead375a70287793c760502d9064d120b", kill_on_drop: false }`
[INFO] [stdout] 42b0539ac68d8ad4b98abcf88e8cf89cead375a70287793c760502d9064d120b
[INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc2/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc2/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_INCREMENTAL=0" "-e" "RUST_BACKTRACE=full" "-e" "RUSTFLAGS=--cap-lints=forbid" "-e" "RUSTDOCFLAGS=--cap-lints=forbid" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:d429b63d4308055ea97f60fb1d3dfca48854a00942f1bd2ad806beaf015945ec" "/opt/rustwide/cargo-home/bin/cargo" "+33835004928d3bf65db4d4712e1330766263b0bd" "build" "--frozen" "--message-format=json", kill_on_drop: false }`
[INFO] [stdout] e258e77b5f39cde9b800ffd81a14e66cea8687b4cc7fb32a3536380ef77684ea
[INFO] running `Command { std: "docker" "start" "-a" "e258e77b5f39cde9b800ffd81a14e66cea8687b4cc7fb32a3536380ef77684ea", kill_on_drop: false }`
[INFO] [stderr]    Compiling proc-macro2 v1.0.95
[INFO] [stderr]    Compiling unicode-ident v1.0.18
[INFO] [stderr]    Compiling autocfg v1.4.0
[INFO] [stderr]    Compiling libc v0.2.172
[INFO] [stderr]    Compiling zerocopy v0.8.25
[INFO] [stderr]    Compiling serde v1.0.219
[INFO] [stderr]    Compiling cfg-if v1.0.0
[INFO] [stderr]    Compiling utf8parse v0.2.2
[INFO] [stderr]    Compiling anstyle-query v1.1.2
[INFO] [stderr]    Compiling is_terminal_polyfill v1.70.1
[INFO] [stderr]    Compiling anstyle v1.0.10
[INFO] [stderr]    Compiling colorchoice v1.0.3
[INFO] [stderr]    Compiling memchr v2.7.4
[INFO] [stderr]    Compiling ryu v1.0.20
[INFO] [stderr]    Compiling strsim v0.11.1
[INFO] [stderr]    Compiling rawpointer v0.2.1
[INFO] [stderr]    Compiling serde_json v1.0.140
[INFO] [stderr]    Compiling anstyle-parse v0.2.6
[INFO] [stderr]    Compiling itoa v1.0.15
[INFO] [stderr]    Compiling thiserror v1.0.69
[INFO] [stderr]    Compiling heck v0.5.0
[INFO] [stderr]    Compiling clap_lex v0.7.4
[INFO] [stderr]    Compiling anyhow v1.0.98
[INFO] [stderr]    Compiling anstream v0.6.18
[INFO] [stderr]    Compiling num-traits v0.2.19
[INFO] [stderr]    Compiling matrixmultiply v0.3.10
[INFO] [stderr]    Compiling csv-core v0.1.12
[INFO] [stderr]    Compiling clap_builder v4.5.39
[INFO] [stderr]    Compiling quote v1.0.40
[INFO] [stderr]    Compiling syn v2.0.101
[INFO] [stderr]    Compiling getrandom v0.2.16
[INFO] [stderr]    Compiling rand_core v0.6.4
[INFO] [stderr]    Compiling num-complex v0.4.6
[INFO] [stderr]    Compiling num-integer v0.1.46
[INFO] [stderr]    Compiling ndarray v0.15.6
[INFO] [stderr]    Compiling ppv-lite86 v0.2.21
[INFO] [stderr]    Compiling rand_chacha v0.3.1
[INFO] [stderr]    Compiling rand v0.8.5
[INFO] [stderr]    Compiling serde_derive v1.0.219
[INFO] [stderr]    Compiling clap_derive v4.5.32
[INFO] [stderr]    Compiling thiserror-impl v1.0.69
[INFO] [stderr]    Compiling clap v4.5.39
[INFO] [stderr]    Compiling csv v1.3.1
[INFO] [stderr]    Compiling rusty-llm-jury v0.1.0 (/opt/rustwide/workdir)
[INFO] [stderr]     Finished `dev` profile [unoptimized + debuginfo] target(s) in 21.53s
[INFO] running `Command { std: "docker" "inspect" "e258e77b5f39cde9b800ffd81a14e66cea8687b4cc7fb32a3536380ef77684ea", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "rm" "-f" "e258e77b5f39cde9b800ffd81a14e66cea8687b4cc7fb32a3536380ef77684ea", kill_on_drop: false }`
[INFO] [stdout] e258e77b5f39cde9b800ffd81a14e66cea8687b4cc7fb32a3536380ef77684ea
[INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc2/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc2/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_INCREMENTAL=0" "-e" "RUST_BACKTRACE=full" "-e" "RUSTFLAGS=--cap-lints=forbid" "-e" "RUSTDOCFLAGS=--cap-lints=forbid" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:d429b63d4308055ea97f60fb1d3dfca48854a00942f1bd2ad806beaf015945ec" "/opt/rustwide/cargo-home/bin/cargo" "+33835004928d3bf65db4d4712e1330766263b0bd" "test" "--frozen" "--no-run" "--message-format=json", kill_on_drop: false }`
[INFO] [stdout] 7ffee0d933afbe53fb3cee79c8b50eef77e4504814aba9d2a6e867ca3197ae72
[INFO] running `Command { std: "docker" "start" "-a" "7ffee0d933afbe53fb3cee79c8b50eef77e4504814aba9d2a6e867ca3197ae72", kill_on_drop: false }`
[INFO] [stderr]    Compiling getrandom v0.3.3
[INFO] [stderr]    Compiling rustix v1.0.7
[INFO] [stderr]    Compiling bitflags v2.9.1
[INFO] [stderr]    Compiling linux-raw-sys v0.9.4
[INFO] [stderr]    Compiling fastrand v2.3.0
[INFO] [stderr]    Compiling once_cell v1.21.3
[INFO] [stderr]    Compiling approx v0.5.1
[INFO] [stderr]    Compiling tempfile v3.20.0
[INFO] [stderr]    Compiling rusty-llm-jury v0.1.0 (/opt/rustwide/workdir)
[INFO] [stderr]     Finished `test` profile [unoptimized + debuginfo] target(s) in 5.39s
[INFO] running `Command { std: "docker" "inspect" "7ffee0d933afbe53fb3cee79c8b50eef77e4504814aba9d2a6e867ca3197ae72", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "rm" "-f" "7ffee0d933afbe53fb3cee79c8b50eef77e4504814aba9d2a6e867ca3197ae72", kill_on_drop: false }`
[INFO] [stdout] 7ffee0d933afbe53fb3cee79c8b50eef77e4504814aba9d2a6e867ca3197ae72
[INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc2/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc2/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_INCREMENTAL=0" "-e" "RUST_BACKTRACE=full" "-e" "RUSTFLAGS=--cap-lints=forbid" "-e" "RUSTDOCFLAGS=--cap-lints=forbid" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:d429b63d4308055ea97f60fb1d3dfca48854a00942f1bd2ad806beaf015945ec" "/opt/rustwide/cargo-home/bin/cargo" "+33835004928d3bf65db4d4712e1330766263b0bd" "test" "--frozen", kill_on_drop: false }`
[INFO] [stdout] 9a82cc3c2d9392db35128af528330324a4e70855a1489c4753e652c177100168
[INFO] running `Command { std: "docker" "start" "-a" "9a82cc3c2d9392db35128af528330324a4e70855a1489c4753e652c177100168", kill_on_drop: false }`
[INFO] [stderr]     Finished `test` profile [unoptimized + debuginfo] target(s) in 0.10s
[INFO] [stderr]      Running unittests src/lib.rs (/opt/rustwide/target/debug/deps/llmjury-3fffb1f64d493512)
[INFO] [stdout] 
[INFO] [stdout] running 44 tests
[INFO] [stdout] test bias_correction::tests::test_input_validation_empty_arrays ... ok
[INFO] [stdout] test bias_correction::tests::test_input_validation_invalid_confidence_level ... ok
[INFO] [stdout] test bias_correction::tests::test_input_validation_non_binary ... ok
[INFO] [stdout] test bias_correction::tests::test_input_validation_mismatched_lengths ... ok
[INFO] [stdout] test bias_correction::tests::test_judge_accuracy_too_low ... ok
[INFO] [stdout] test bias_correction::tests::test_judge_metrics_perfect_judge ... ok
[INFO] [stdout] test bias_correction::tests::test_judge_metrics_random_judge ... ok
[INFO] [stdout] test bias_correction::tests::test_estimate_success_rate_basic ... ok
[INFO] [stdout] test bias_correction::tests::test_no_negative_examples ... ok
[INFO] [stdout] test bias_correction::tests::test_estimate_success_rate_perfect_judge ... ok
[INFO] [stdout] test synthetic::tests::test_generate_test_data_perfect_accuracy ... ok
[INFO] [stdout] test cli::tests::test_synth_experiment_args_create_config ... ok
[INFO] [stdout] test cli::tests::test_estimate_args_load_data_from_strings ... ok
[INFO] [stdout] test bias_correction::tests::test_no_positive_examples ... ok
[INFO] [stdout] test cli::tests::test_estimate_args_validation ... ok
[INFO] [stdout] test synthetic::tests::test_create_example_dataset_invalid_scenario ... ok
[INFO] [stdout] test synthetic::tests::test_generate_test_data_input_validation ... ok
[INFO] [stdout] test synthetic::tests::test_generate_test_data_reproducibility ... ok
[INFO] [stdout] test utils::tests::test_format_percentage ... ok
[INFO] [stdout] test synthetic::tests::test_generate_test_data_zero_accuracy ... ok
[INFO] [stdout] test cli::tests::test_estimate_args_load_data_from_files ... ok
[INFO] [stdout] test synthetic::tests::test_generate_unlabeled_data_basic ... ok
[INFO] [stdout] test synthetic::tests::test_generate_unlabeled_data_extreme_pass_rates ... ok
[INFO] [stdout] test synthetic::tests::test_generate_unlabeled_data_input_validation ... ok
[INFO] [stdout] test utils::tests::test_load_binary_from_csv ... ok
[INFO] [stdout] test utils::tests::test_parse_binary_string_empty ... ok
[INFO] [stdout] test utils::tests::test_parse_binary_string_invalid ... ok
[INFO] [stdout] test utils::tests::test_parse_binary_string_valid ... ok
[INFO] [stdout] test utils::tests::test_load_binary_from_csv_nonexistent_file ... ok
[INFO] [stdout] test bias_correction::tests::test_different_confidence_levels ... ok
[INFO] [stdout] test utils::tests::test_load_binary_from_csv_with_empty_lines ... ok
[INFO] [stdout] test utils::tests::test_parse_range ... ok
[INFO] [stdout] test utils::tests::test_validate_probability ... ok
[INFO] [stdout] test synthetic::tests::test_create_example_dataset_all_scenarios ... ok
[INFO] [stdout] test tests::test_version_is_set ... ok
[INFO] [stdout] test synthetic::tests::test_create_example_dataset_different_scenarios_differ ... ok
[INFO] [stdout] test synthetic::tests::test_scenario_accuracy_properties ... ok
[INFO] [stdout] test synthetic::tests::test_create_example_dataset_reproducibility ... ok
[INFO] [stdout] test utils::tests::test_load_binary_from_csv_with_header ... ok
[INFO] [stdout] test utils::tests::test_load_binary_from_csv_invalid_data ... ok
[INFO] [stdout] test synthetic::tests::test_generate_test_data_basic ... ok
[INFO] [stdout] test utils::tests::test_format_float ... ok
[INFO] [stdout] test synthetic::tests::test_run_sensitivity_experiment_tnr ... ok
[INFO] [stdout] test synthetic::tests::test_run_sensitivity_experiment_tpr ... ok
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 44 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.07s
[INFO] [stdout] 
[INFO] [stderr]      Running unittests src/main.rs (/opt/rustwide/target/debug/deps/llm_jury-85a5095319dd66aa)
[INFO] [stdout] 
[INFO] [stdout] running 0 tests
[INFO] [stderr]      Running tests/cli_tests.rs (/opt/rustwide/target/debug/deps/cli_tests-1420e0093a9b275f)
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
[INFO] [stdout] 
[INFO] [stdout] 
[INFO] [stdout] running 6 tests
[INFO] [stdout] test test_cli_error_handling ... ok
[INFO] [stdout] test test_cli_version ... ok
[INFO] [stdout] test test_cli_help ... ok
[INFO] [stdout] test test_cli_estimate_basic ... ok
[INFO] [stdout] test test_cli_synth_experiment ... ok
[INFO] [stdout] test test_cli_estimate_with_files ... ok
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.20s
[INFO] [stdout] 
[INFO] [stderr]      Running tests/integration_test.rs (/opt/rustwide/target/debug/deps/integration_test-63adfd31b6b76f4e)
[INFO] [stdout] 
[INFO] [stdout] running 11 tests
[INFO] [stdout] test test_performance_benchmark ... ignored
[INFO] [stdout] test test_error_handling ... ok
[INFO] [stdout] test test_boundary_conditions ... ok
[INFO] [stdout] test test_reproducibility ... ok
[INFO] [stdout] test test_utility_functions ... ok
[INFO] [stdout] test test_csv_file_loading ... ok
[INFO] [stdout] test test_confidence_intervals ... ok
[INFO] [stdout] test test_judge_metrics ... ok
[INFO] [stdout] test test_example_scenarios ... ok
[INFO] [stdout] test test_complete_workflow ... ok
[INFO] [stdout] test test_large_dataset ... ok
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 10 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.49s
[INFO] [stdout] 
[INFO] [stderr]    Doc-tests llmjury
[INFO] [stdout] 
[INFO] [stdout] running 7 tests
[INFO] [stdout] test src/utils.rs - utils::load_binary_from_csv (line 50) - compile ... ok
[INFO] [stdout] test src/bias_correction.rs - bias_correction::estimate_success_rate (line 124) ... ok
[INFO] [stdout] test src/synthetic.rs - synthetic::generate_test_data (line 104) ... ok
[INFO] [stdout] test src/synthetic.rs - synthetic::create_example_dataset (line 385) ... ok
[INFO] [stdout] test src/utils.rs - utils::parse_binary_string (line 11) ... ok
[INFO] [stdout] test src/synthetic.rs - synthetic::run_sensitivity_experiment (line 268) ... ok
[INFO] [stdout] test src/synthetic.rs - synthetic::generate_unlabeled_data (line 178) ... ok
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.89s
[INFO] [stdout] 
[INFO] running `Command { std: "docker" "inspect" "9a82cc3c2d9392db35128af528330324a4e70855a1489c4753e652c177100168", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "rm" "-f" "9a82cc3c2d9392db35128af528330324a4e70855a1489c4753e652c177100168", kill_on_drop: false }`
[INFO] [stdout] 9a82cc3c2d9392db35128af528330324a4e70855a1489c4753e652c177100168
