[INFO] fetching crate rusty-llm-jury 0.1.0...
[INFO] testing rusty-llm-jury-0.1.0 against master#c90bcb9571b7aab0d8beaa2ce8a998ffaf079d38 for pr-146098-7
[INFO] extracting crate rusty-llm-jury 0.1.0 into /workspace/builds/worker-4-tc1/source
[INFO] started tweaking crates.io crate rusty-llm-jury 0.1.0
[INFO] removed 0 missing tests
[INFO] finished tweaking crates.io crate rusty-llm-jury 0.1.0
[INFO] tweaked toml for crates.io crate rusty-llm-jury 0.1.0 written to /workspace/builds/worker-4-tc1/source/Cargo.toml
[INFO] validating manifest of crates.io crate rusty-llm-jury 0.1.0 on toolchain c90bcb9571b7aab0d8beaa2ce8a998ffaf079d38
[INFO] running `Command { std: CARGO_HOME="/workspace/cargo-home" RUSTUP_HOME="/workspace/rustup-home" "/workspace/cargo-home/bin/cargo" "+c90bcb9571b7aab0d8beaa2ce8a998ffaf079d38" "metadata" "--manifest-path" "Cargo.toml" "--no-deps", kill_on_drop: false }`
[INFO] crate crates.io crate rusty-llm-jury 0.1.0 already has a lockfile, it will not be regenerated
[INFO] running `Command { std: CARGO_HOME="/workspace/cargo-home" RUSTUP_HOME="/workspace/rustup-home" "/workspace/cargo-home/bin/cargo" "+c90bcb9571b7aab0d8beaa2ce8a998ffaf079d38" "fetch" "--manifest-path" "Cargo.toml", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-4-tc1/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-4-tc1/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:4848fb76d95f26979359cc7e45710b1dbc8f3acb7aeedee7c460d7702230f228" "/opt/rustwide/cargo-home/bin/cargo" "+c90bcb9571b7aab0d8beaa2ce8a998ffaf079d38" "metadata" "--no-deps" "--format-version=1", kill_on_drop: false }`
[INFO] [stdout] 2f0ea60e88d751cfc5c536d41b5cb4bf64292c125ae3cbb629f28086bdd24fd2
[INFO] running `Command { std: "docker" "start" "-a" "2f0ea60e88d751cfc5c536d41b5cb4bf64292c125ae3cbb629f28086bdd24fd2", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "inspect" "2f0ea60e88d751cfc5c536d41b5cb4bf64292c125ae3cbb629f28086bdd24fd2", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "rm" "-f" "2f0ea60e88d751cfc5c536d41b5cb4bf64292c125ae3cbb629f28086bdd24fd2", kill_on_drop: false }`
[INFO] [stdout] 2f0ea60e88d751cfc5c536d41b5cb4bf64292c125ae3cbb629f28086bdd24fd2
[INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-4-tc1/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-4-tc1/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_INCREMENTAL=0" "-e" "RUST_BACKTRACE=full" "-e" "RUSTFLAGS=--cap-lints=forbid" "-e" "RUSTDOCFLAGS=--cap-lints=forbid" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:4848fb76d95f26979359cc7e45710b1dbc8f3acb7aeedee7c460d7702230f228" "/opt/rustwide/cargo-home/bin/cargo" "+c90bcb9571b7aab0d8beaa2ce8a998ffaf079d38" "build" "--frozen" "--message-format=json", kill_on_drop: false }`
[INFO] [stdout] baec76d70cb821b5d1d48dd48e4ac72765bf26fc0e7aee67df84556150c2b185
[INFO] running `Command { std: "docker" "start" "-a" "baec76d70cb821b5d1d48dd48e4ac72765bf26fc0e7aee67df84556150c2b185", kill_on_drop: false }`
[INFO] [stderr]    Compiling libc v0.2.172
[INFO] [stderr]    Compiling zerocopy v0.8.25
[INFO] [stderr]    Compiling matrixmultiply v0.3.10
[INFO] [stderr]    Compiling csv-core v0.1.12
[INFO] [stderr]    Compiling syn v2.0.101
[INFO] [stderr]    Compiling clap_builder v4.5.39
[INFO] [stderr]    Compiling num-complex v0.4.6
[INFO] [stderr]    Compiling num-integer v0.1.46
[INFO] [stderr]    Compiling ndarray v0.15.6
[INFO] [stderr]    Compiling getrandom v0.2.16
[INFO] [stderr]    Compiling rand_core v0.6.4
[INFO] [stderr]    Compiling ppv-lite86 v0.2.21
[INFO] [stderr]    Compiling rand_chacha v0.3.1
[INFO] [stderr]    Compiling rand v0.8.5
[INFO] [stderr]    Compiling serde_derive v1.0.219
[INFO] [stderr]    Compiling thiserror-impl v1.0.69
[INFO] [stderr]    Compiling clap_derive v4.5.32
[INFO] [stderr]    Compiling thiserror v1.0.69
[INFO] [stderr]    Compiling clap v4.5.39
[INFO] [stderr]    Compiling serde v1.0.219
[INFO] [stderr]    Compiling serde_json v1.0.140
[INFO] [stderr]    Compiling csv v1.3.1
[INFO] [stderr]    Compiling rusty-llm-jury v0.1.0 (/opt/rustwide/workdir)
[INFO] [stderr]     Finished `dev` profile [unoptimized + debuginfo] target(s) in 14.60s
[INFO] running `Command { std: "docker" "inspect" "baec76d70cb821b5d1d48dd48e4ac72765bf26fc0e7aee67df84556150c2b185", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "rm" "-f" "baec76d70cb821b5d1d48dd48e4ac72765bf26fc0e7aee67df84556150c2b185", kill_on_drop: false }`
[INFO] [stdout] baec76d70cb821b5d1d48dd48e4ac72765bf26fc0e7aee67df84556150c2b185
[INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-4-tc1/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-4-tc1/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_INCREMENTAL=0" "-e" "RUST_BACKTRACE=full" "-e" "RUSTFLAGS=--cap-lints=forbid" "-e" "RUSTDOCFLAGS=--cap-lints=forbid" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:4848fb76d95f26979359cc7e45710b1dbc8f3acb7aeedee7c460d7702230f228" "/opt/rustwide/cargo-home/bin/cargo" "+c90bcb9571b7aab0d8beaa2ce8a998ffaf079d38" "test" "--frozen" "--no-run" "--message-format=json", kill_on_drop: false }`
[INFO] [stdout] 756a89b693e5c3631398a04efb54c898967b2de25803df3c228065db6da73867
[INFO] running `Command { std: "docker" "start" "-a" "756a89b693e5c3631398a04efb54c898967b2de25803df3c228065db6da73867", kill_on_drop: false }`
[INFO] [stderr]    Compiling rustix v1.0.7
[INFO] [stderr]    Compiling linux-raw-sys v0.9.4
[INFO] [stderr]    Compiling bitflags v2.9.1
[INFO] [stderr]    Compiling getrandom v0.3.3
[INFO] [stderr]    Compiling approx v0.5.1
[INFO] [stderr]    Compiling tempfile v3.20.0
[INFO] [stderr]    Compiling rusty-llm-jury v0.1.0 (/opt/rustwide/workdir)
[INFO] [stderr]     Finished `test` profile [unoptimized + debuginfo] target(s) in 5.49s
[INFO] running `Command { std: "docker" "inspect" "756a89b693e5c3631398a04efb54c898967b2de25803df3c228065db6da73867", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "rm" "-f" "756a89b693e5c3631398a04efb54c898967b2de25803df3c228065db6da73867", kill_on_drop: false }`
[INFO] [stdout] 756a89b693e5c3631398a04efb54c898967b2de25803df3c228065db6da73867
[INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-4-tc1/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-4-tc1/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_INCREMENTAL=0" "-e" "RUST_BACKTRACE=full" "-e" "RUSTFLAGS=--cap-lints=forbid" "-e" "RUSTDOCFLAGS=--cap-lints=forbid" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:4848fb76d95f26979359cc7e45710b1dbc8f3acb7aeedee7c460d7702230f228" "/opt/rustwide/cargo-home/bin/cargo" "+c90bcb9571b7aab0d8beaa2ce8a998ffaf079d38" "test" "--frozen", kill_on_drop: false }`
[INFO] [stdout] dbcc18dc61ced8f028ebe2df2f7c3bc3c851917a360abc6c5a804f234a8743ab
[INFO] running `Command { std: "docker" "start" "-a" "dbcc18dc61ced8f028ebe2df2f7c3bc3c851917a360abc6c5a804f234a8743ab", kill_on_drop: false }`
[INFO] [stderr]     Finished `test` profile [unoptimized + debuginfo] target(s) in 0.11s
[INFO] [stderr]      Running unittests src/lib.rs (/opt/rustwide/target/debug/deps/llmjury-f17142e3d6651ccb)
[INFO] [stdout] 
[INFO] [stdout] running 44 tests
[INFO] [stdout] test bias_correction::tests::test_input_validation_empty_arrays ... ok
[INFO] [stdout] test bias_correction::tests::test_estimate_success_rate_basic ... ok
[INFO] [stdout] test bias_correction::tests::test_different_confidence_levels ... ok
[INFO] [stdout] test bias_correction::tests::test_input_validation_invalid_confidence_level ... ok
[INFO] [stdout] test bias_correction::tests::test_input_validation_non_binary ... ok
[INFO] [stdout] test bias_correction::tests::test_judge_accuracy_too_low ... ok
[INFO] [stdout] test bias_correction::tests::test_judge_metrics_perfect_judge ... ok
[INFO] [stdout] test bias_correction::tests::test_judge_metrics_random_judge ... ok
[INFO] [stdout] test bias_correction::tests::test_no_negative_examples ... ok
[INFO] [stdout] test bias_correction::tests::test_no_positive_examples ... ok
[INFO] [stdout] test cli::tests::test_estimate_args_validation ... ok
[INFO] [stdout] test bias_correction::tests::test_estimate_success_rate_perfect_judge ... ok
[INFO] [stdout] test cli::tests::test_synth_experiment_args_create_config ... ok
[INFO] [stdout] test cli::tests::test_estimate_args_load_data_from_strings ... ok
[INFO] [stdout] test bias_correction::tests::test_input_validation_mismatched_lengths ... ok
[INFO] [stdout] test cli::tests::test_estimate_args_load_data_from_files ... ok
[INFO] [stdout] test synthetic::tests::test_generate_test_data_perfect_accuracy ... ok
[INFO] [stdout] test synthetic::tests::test_create_example_dataset_invalid_scenario ... ok
[INFO] [stdout] test synthetic::tests::test_generate_test_data_input_validation ... ok
[INFO] [stdout] test synthetic::tests::test_generate_test_data_zero_accuracy ... ok
[INFO] [stdout] test synthetic::tests::test_generate_unlabeled_data_extreme_pass_rates ... ok
[INFO] [stdout] test synthetic::tests::test_generate_unlabeled_data_input_validation ... ok
[INFO] [stdout] test synthetic::tests::test_create_example_dataset_reproducibility ... ok
[INFO] [stdout] test synthetic::tests::test_generate_test_data_basic ... ok
[INFO] [stdout] test synthetic::tests::test_generate_test_data_reproducibility ... ok
[INFO] [stdout] test synthetic::tests::test_generate_unlabeled_data_basic ... ok
[INFO] [stdout] test tests::test_version_is_set ... ok
[INFO] [stdout] test synthetic::tests::test_create_example_dataset_different_scenarios_differ ... ok
[INFO] [stdout] test utils::tests::test_format_float ... ok
[INFO] [stdout] test utils::tests::test_format_percentage ... ok
[INFO] [stdout] test utils::tests::test_load_binary_from_csv ... ok
[INFO] [stdout] test utils::tests::test_load_binary_from_csv_invalid_data ... ok
[INFO] [stdout] test utils::tests::test_load_binary_from_csv_nonexistent_file ... ok
[INFO] [stdout] test utils::tests::test_load_binary_from_csv_with_empty_lines ... ok
[INFO] [stdout] test utils::tests::test_parse_binary_string_empty ... ok
[INFO] [stdout] test utils::tests::test_parse_binary_string_valid ... ok
[INFO] [stdout] test utils::tests::test_load_binary_from_csv_with_header ... ok
[INFO] [stdout] test utils::tests::test_parse_range ... ok
[INFO] [stdout] test utils::tests::test_validate_probability ... ok
[INFO] [stdout] test utils::tests::test_parse_binary_string_invalid ... ok
[INFO] [stdout] test synthetic::tests::test_create_example_dataset_all_scenarios ... ok
[INFO] [stdout] test synthetic::tests::test_scenario_accuracy_properties ... ok
[INFO] [stdout] test synthetic::tests::test_run_sensitivity_experiment_tpr ... ok
[INFO] [stdout] test synthetic::tests::test_run_sensitivity_experiment_tnr ... ok
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 44 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.06s
[INFO] [stdout] 
[INFO] [stderr]      Running unittests src/main.rs (/opt/rustwide/target/debug/deps/llm_jury-8de16dc533609b04)
[INFO] [stdout] 
[INFO] [stdout] running 0 tests
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
[INFO] [stdout] 
[INFO] [stderr]      Running tests/cli_tests.rs (/opt/rustwide/target/debug/deps/cli_tests-760f0cd34cfc3f5b)
[INFO] [stdout] 
[INFO] [stdout] running 6 tests
[INFO] [stdout] test test_cli_help ... ok
[INFO] [stdout] test test_cli_estimate_with_files ... ok
[INFO] [stdout] test test_cli_estimate_basic ... ok
[INFO] [stdout] test test_cli_version ... ok
[INFO] [stdout] test test_cli_synth_experiment ... ok
[INFO] [stdout] test test_cli_error_handling ... ok
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.32s
[INFO] [stdout] 
[INFO] [stderr]      Running tests/integration_test.rs (/opt/rustwide/target/debug/deps/integration_test-5d214d06b60fd0a0)
[INFO] [stdout] 
[INFO] [stdout] running 11 tests
[INFO] [stdout] test test_boundary_conditions ... ok
[INFO] [stdout] test test_confidence_intervals ... ok
[INFO] [stdout] test test_performance_benchmark ... ignored
[INFO] [stdout] test test_csv_file_loading ... ok
[INFO] [stdout] test test_reproducibility ... ok
[INFO] [stdout] test test_judge_metrics ... ok
[INFO] [stdout] test test_complete_workflow ... ok
[INFO] [stdout] test test_utility_functions ... ok
[INFO] [stdout] test test_example_scenarios ... ok
[INFO] [stdout] test test_error_handling ... ok
[INFO] [stdout] test test_large_dataset ... ok
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 10 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.41s
[INFO] [stdout] 
[INFO] [stderr]    Doc-tests llmjury
[INFO] [stdout] 
[INFO] [stdout] running 7 tests
[INFO] [stdout] test src/utils.rs - utils::load_binary_from_csv (line 50) - compile ... ok
[INFO] [stdout] test src/synthetic.rs - synthetic::generate_test_data (line 104) ... ok
[INFO] [stdout] test src/synthetic.rs - synthetic::generate_unlabeled_data (line 178) ... ok
[INFO] [stdout] test src/bias_correction.rs - bias_correction::estimate_success_rate (line 124) ... ok
[INFO] [stdout] test src/synthetic.rs - synthetic::create_example_dataset (line 385) ... ok
[INFO] [stdout] test src/utils.rs - utils::parse_binary_string (line 11) ... ok
[INFO] [stdout] test src/synthetic.rs - synthetic::run_sensitivity_experiment (line 268) ... ok
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.97s
[INFO] [stdout] 
[INFO] running `Command { std: "docker" "inspect" "dbcc18dc61ced8f028ebe2df2f7c3bc3c851917a360abc6c5a804f234a8743ab", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "rm" "-f" "dbcc18dc61ced8f028ebe2df2f7c3bc3c851917a360abc6c5a804f234a8743ab", kill_on_drop: false }`
[INFO] [stdout] dbcc18dc61ced8f028ebe2df2f7c3bc3c851917a360abc6c5a804f234a8743ab
