[INFO] fetching crate recallbench 0.4.0... [INFO] testing recallbench-0.4.0 against try#dec9417b8611e34e787a3e4c37686b5131f9e5c5 for pr-154210-2 [INFO] extracting crate recallbench 0.4.0 into /workspace/builds/worker-0-tc2/source [INFO] started tweaking crates.io crate recallbench 0.4.0 [INFO] finished tweaking crates.io crate recallbench 0.4.0 [INFO] tweaked toml for crates.io crate recallbench 0.4.0 written to /workspace/builds/worker-0-tc2/source/Cargo.toml [INFO] validating manifest of crates.io crate recallbench 0.4.0 on toolchain dec9417b8611e34e787a3e4c37686b5131f9e5c5 [INFO] running `Command { std: CARGO_HOME="/workspace/cargo-home" RUSTUP_HOME="/workspace/rustup-home" "/workspace/cargo-home/bin/cargo" "+dec9417b8611e34e787a3e4c37686b5131f9e5c5" "metadata" "--manifest-path" "Cargo.toml" "--no-deps", kill_on_drop: false }` [INFO] crate crates.io crate recallbench 0.4.0 already has a lockfile, it will not be regenerated [INFO] running `Command { std: CARGO_HOME="/workspace/cargo-home" RUSTUP_HOME="/workspace/rustup-home" "/workspace/cargo-home/bin/cargo" "+dec9417b8611e34e787a3e4c37686b5131f9e5c5" "fetch" "--manifest-path" "Cargo.toml", kill_on_drop: false }` [INFO] [stderr] Blocking waiting for file lock on package cache [INFO] [stderr] Blocking waiting for file lock on package cache [INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-0-tc2/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-0-tc2/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:d429b63d4308055ea97f60fb1d3dfca48854a00942f1bd2ad806beaf015945ec" "/opt/rustwide/cargo-home/bin/cargo" "+dec9417b8611e34e787a3e4c37686b5131f9e5c5" "metadata" "--no-deps" "--format-version=1", kill_on_drop: false }` [INFO] [stdout] 0f55fd3056b35031a9f55c477ce6d5c7991cb55d0e34de3d8ad5c2702493b531 [INFO] running `Command { std: "docker" "start" "-a" "0f55fd3056b35031a9f55c477ce6d5c7991cb55d0e34de3d8ad5c2702493b531", kill_on_drop: false }` [INFO] running `Command { std: "docker" "inspect" "0f55fd3056b35031a9f55c477ce6d5c7991cb55d0e34de3d8ad5c2702493b531", kill_on_drop: false }` [INFO] running `Command { std: "docker" "rm" "-f" "0f55fd3056b35031a9f55c477ce6d5c7991cb55d0e34de3d8ad5c2702493b531", kill_on_drop: false }` [INFO] [stdout] 0f55fd3056b35031a9f55c477ce6d5c7991cb55d0e34de3d8ad5c2702493b531 [INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-0-tc2/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-0-tc2/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_INCREMENTAL=0" "-e" "RUST_BACKTRACE=full" "-e" "RUSTFLAGS=--cap-lints=forbid" "-e" "RUSTDOCFLAGS=--cap-lints=forbid" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:d429b63d4308055ea97f60fb1d3dfca48854a00942f1bd2ad806beaf015945ec" "/opt/rustwide/cargo-home/bin/cargo" "+dec9417b8611e34e787a3e4c37686b5131f9e5c5" "build" "--frozen" "--message-format=json", kill_on_drop: false }` [INFO] [stdout] 3b67f4df38a9baafd4783d5a8fa368e086d2562fe5d3b889009b867f5fcb22f4 [INFO] running `Command { std: "docker" "start" "-a" "3b67f4df38a9baafd4783d5a8fa368e086d2562fe5d3b889009b867f5fcb22f4", kill_on_drop: false }` [INFO] [stderr] Compiling libc v0.2.183 [INFO] [stderr] Compiling smallvec v1.15.1 [INFO] [stderr] Compiling serde_core v1.0.228 [INFO] [stderr] Compiling slab v0.4.12 [INFO] [stderr] Compiling syn v2.0.117 [INFO] [stderr] Compiling cc v1.2.57 [INFO] [stderr] Compiling pkg-config v0.3.32 [INFO] [stderr] Compiling http-body v1.0.1 [INFO] [stderr] Compiling litemap v0.8.1 [INFO] [stderr] Compiling writeable v0.6.2 [INFO] [stderr] Compiling icu_normalizer_data v2.1.1 [INFO] [stderr] Compiling icu_properties_data v2.1.2 [INFO] [stderr] Compiling crossbeam-utils v0.8.21 [INFO] [stderr] Compiling httpdate v1.0.3 [INFO] [stderr] Compiling foreign-types-shared v0.1.1 [INFO] [stderr] Compiling try-lock v0.2.5 [INFO] [stderr] Compiling openssl v0.10.76 [INFO] [stderr] Compiling atomic-waker v1.1.2 [INFO] [stderr] Compiling zerocopy v0.8.47 [INFO] [stderr] Compiling getrandom v0.3.4 [INFO] [stderr] Compiling want v0.3.1 [INFO] [stderr] Compiling foreign-types v0.3.2 [INFO] [stderr] Compiling http-body-util v0.1.3 [INFO] [stderr] Compiling sync_wrapper v1.0.2 [INFO] [stderr] Compiling futures-channel v0.3.32 [INFO] [stderr] Compiling rustix v1.1.4 [INFO] [stderr] Compiling native-tls v0.2.18 [INFO] [stderr] Compiling mime v0.3.17 [INFO] [stderr] Compiling unicase v2.9.0 [INFO] [stderr] Compiling num-traits v0.2.19 [INFO] [stderr] Compiling form_urlencoded v1.2.2 [INFO] [stderr] Compiling serde_json v1.0.149 [INFO] [stderr] Compiling ipnet v2.12.0 [INFO] [stderr] Compiling portable-atomic v1.13.1 [INFO] [stderr] Compiling openssl-probe v0.2.1 [INFO] [stderr] Compiling anstyle-parse v1.0.0 [INFO] [stderr] Compiling mime_guess v2.0.5 [INFO] [stderr] Compiling adler2 v2.0.1 [INFO] [stderr] Compiling colorchoice v1.0.5 [INFO] [stderr] Compiling litrs v1.0.0 [INFO] [stderr] Compiling anstyle v1.0.14 [INFO] [stderr] Compiling simd-adler32 v0.3.8 [INFO] [stderr] Compiling unicode-width v0.2.2 [INFO] [stderr] Compiling anstream v1.0.0 [INFO] [stderr] Compiling miniz_oxide v0.8.9 [INFO] [stderr] Compiling rust-embed-utils v8.11.0 [INFO] [stderr] Compiling is-docker v0.2.0 [INFO] [stderr] Compiling raw-cpuid v11.6.0 [INFO] [stderr] Compiling iri-string v0.7.10 [INFO] [stderr] Compiling option-ext v0.2.0 [INFO] [stderr] Compiling winnow v0.7.15 [INFO] [stderr] Compiling clap_lex v1.1.0 [INFO] [stderr] Compiling http-range-header v0.4.2 [INFO] [stderr] Compiling zeroize v1.8.2 [INFO] [stderr] Compiling document-features v0.2.12 [INFO] [stderr] Compiling openssl-sys v0.9.112 [INFO] [stderr] Compiling parking_lot_core v0.9.12 [INFO] [stderr] Compiling errno v0.3.14 [INFO] [stderr] Compiling socket2 v0.6.3 [INFO] [stderr] Compiling signal-hook-registry v1.4.8 [INFO] [stderr] Compiling mio v1.1.1 [INFO] [stderr] Compiling rustls-pki-types v1.14.0 [INFO] [stderr] Compiling parking_lot v0.12.5 [INFO] [stderr] Compiling rand_core v0.9.5 [INFO] [stderr] Compiling dashmap v6.1.0 [INFO] [stderr] Compiling console v0.15.11 [INFO] [stderr] Compiling dirs-sys v0.5.0 [INFO] [stderr] Compiling quanta v0.12.6 [INFO] [stderr] Compiling clap_builder v4.6.0 [INFO] [stderr] Compiling flate2 v1.1.9 [INFO] [stderr] Compiling nom v7.1.3 [INFO] [stderr] Compiling is-wsl v0.4.0 [INFO] [stderr] Compiling crossbeam-channel v0.5.15 [INFO] [stderr] Compiling csv-core v0.1.13 [INFO] [stderr] Compiling spinning_top v0.3.0 [INFO] [stderr] Compiling encoding_rs v0.8.35 [INFO] [stderr] Compiling number_prefix v0.4.0 [INFO] [stderr] Compiling unicode-segmentation v1.12.0 [INFO] [stderr] Compiling byteorder v1.5.0 [INFO] [stderr] Compiling nonzero_ext v0.3.0 [INFO] [stderr] Compiling futures-timer v3.0.3 [INFO] [stderr] Compiling env_home v0.1.0 [INFO] [stderr] Compiling web-time v1.1.0 [INFO] [stderr] Compiling no-std-compat v0.4.1 [INFO] [stderr] Compiling base64 v0.21.7 [INFO] [stderr] Compiling matchit v0.8.4 [INFO] [stderr] Compiling serde_path_to_error v0.1.20 [INFO] [stderr] Compiling pathdiff v0.2.3 [INFO] [stderr] Compiling open v5.3.3 [INFO] [stderr] Compiling csv v1.4.0 [INFO] [stderr] Compiling indicatif v0.17.11 [INFO] [stderr] Compiling dirs v6.0.0 [INFO] [stderr] Compiling crossterm v0.29.0 [INFO] [stderr] Compiling which v7.0.3 [INFO] [stderr] Compiling comfy-table v7.2.2 [INFO] [stderr] Compiling synstructure v0.13.2 [INFO] [stderr] Compiling hdrhistogram v7.5.4 [INFO] [stderr] Compiling zerovec-derive v0.11.2 [INFO] [stderr] Compiling tokio-macros v2.6.1 [INFO] [stderr] Compiling displaydoc v0.2.5 [INFO] [stderr] Compiling tracing-attributes v0.1.31 [INFO] [stderr] Compiling serde_derive v1.0.228 [INFO] [stderr] Compiling futures-macro v0.3.32 [INFO] [stderr] Compiling openssl-macros v0.1.1 [INFO] [stderr] Compiling clap_derive v4.6.0 [INFO] [stderr] Compiling rust-embed-impl v8.11.0 [INFO] [stderr] Compiling zerofrom-derive v0.1.6 [INFO] [stderr] Compiling yoke-derive v0.8.1 [INFO] [stderr] Compiling async-trait v0.1.89 [INFO] [stderr] Compiling rust-embed v8.11.0 [INFO] [stderr] Compiling tokio v1.50.0 [INFO] [stderr] Compiling futures-util v0.3.32 [INFO] [stderr] Compiling ppv-lite86 v0.2.21 [INFO] [stderr] Compiling tracing v0.1.44 [INFO] [stderr] Compiling rand_chacha v0.9.0 [INFO] [stderr] Compiling zerofrom v0.1.6 [INFO] [stderr] Compiling axum-core v0.5.6 [INFO] [stderr] Compiling tracing-subscriber v0.3.23 [INFO] [stderr] Compiling yoke v0.8.1 [INFO] [stderr] Compiling rand v0.9.2 [INFO] [stderr] Compiling zerovec v0.11.5 [INFO] [stderr] Compiling zerotrie v0.2.3 [INFO] [stderr] Compiling tinystr v0.8.2 [INFO] [stderr] Compiling potential_utf v0.1.4 [INFO] [stderr] Compiling icu_collections v2.1.1 [INFO] [stderr] Compiling clap v4.6.0 [INFO] [stderr] Compiling icu_locale_core v2.1.1 [INFO] [stderr] Compiling serde v1.0.228 [INFO] [stderr] Compiling icu_provider v2.1.1 [INFO] [stderr] Compiling icu_normalizer v2.1.1 [INFO] [stderr] Compiling icu_properties v2.1.2 [INFO] [stderr] Compiling governor v0.8.1 [INFO] [stderr] Compiling serde_spanned v0.6.9 [INFO] [stderr] Compiling toml_datetime v0.6.11 [INFO] [stderr] Compiling serde_urlencoded v0.7.1 [INFO] [stderr] Compiling chrono v0.4.44 [INFO] [stderr] Compiling toml_edit v0.22.27 [INFO] [stderr] Compiling idna_adapter v1.2.1 [INFO] [stderr] Compiling idna v1.1.0 [INFO] [stderr] Compiling url v2.5.8 [INFO] [stderr] Compiling toml v0.8.23 [INFO] [stderr] Compiling tokio-util v0.7.18 [INFO] [stderr] Compiling tower v0.5.3 [INFO] [stderr] Compiling tokio-native-tls v0.3.1 [INFO] [stderr] Compiling h2 v0.4.13 [INFO] [stderr] Compiling tower-http v0.6.8 [INFO] [stderr] Compiling hyper v1.8.1 [INFO] [stderr] Compiling hyper-util v0.1.20 [INFO] [stderr] Compiling hyper-tls v0.6.0 [INFO] [stderr] Compiling axum v0.8.8 [INFO] [stderr] Compiling reqwest v0.12.28 [INFO] [stderr] Compiling recallbench v0.4.0 (/opt/rustwide/workdir) [INFO] [stderr] Finished `dev` profile [unoptimized + debuginfo] target(s) in 1m 26s [INFO] running `Command { std: "docker" "inspect" "3b67f4df38a9baafd4783d5a8fa368e086d2562fe5d3b889009b867f5fcb22f4", kill_on_drop: false }` [INFO] running `Command { std: "docker" "rm" "-f" "3b67f4df38a9baafd4783d5a8fa368e086d2562fe5d3b889009b867f5fcb22f4", kill_on_drop: false }` [INFO] [stdout] 3b67f4df38a9baafd4783d5a8fa368e086d2562fe5d3b889009b867f5fcb22f4 [INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-0-tc2/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-0-tc2/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_INCREMENTAL=0" "-e" "RUST_BACKTRACE=full" "-e" "RUSTFLAGS=--cap-lints=forbid" "-e" "RUSTDOCFLAGS=--cap-lints=forbid" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:d429b63d4308055ea97f60fb1d3dfca48854a00942f1bd2ad806beaf015945ec" "/opt/rustwide/cargo-home/bin/cargo" "+dec9417b8611e34e787a3e4c37686b5131f9e5c5" "test" "--frozen" "--no-run" "--message-format=json", kill_on_drop: false }` [INFO] [stdout] fefd736ff993c51cee9a084965c2f606dbc6f1e68adcfa8bf9653d9cce23b55d [INFO] running `Command { std: "docker" "start" "-a" "fefd736ff993c51cee9a084965c2f606dbc6f1e68adcfa8bf9653d9cce23b55d", kill_on_drop: false }` [INFO] [stderr] Compiling rustix v1.1.4 [INFO] [stderr] Compiling getrandom v0.4.2 [INFO] [stderr] Compiling crossterm v0.29.0 [INFO] [stderr] Compiling which v7.0.3 [INFO] [stderr] Compiling tempfile v3.27.0 [INFO] [stderr] Compiling comfy-table v7.2.2 [INFO] [stderr] Compiling recallbench v0.4.0 (/opt/rustwide/workdir) [INFO] [stderr] Finished `test` profile [unoptimized + debuginfo] target(s) in 29.24s [INFO] running `Command { std: "docker" "inspect" "fefd736ff993c51cee9a084965c2f606dbc6f1e68adcfa8bf9653d9cce23b55d", kill_on_drop: false }` [INFO] running `Command { std: "docker" "rm" "-f" "fefd736ff993c51cee9a084965c2f606dbc6f1e68adcfa8bf9653d9cce23b55d", kill_on_drop: false }` [INFO] [stdout] fefd736ff993c51cee9a084965c2f606dbc6f1e68adcfa8bf9653d9cce23b55d [INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-0-tc2/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-0-tc2/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_INCREMENTAL=0" "-e" "RUST_BACKTRACE=full" "-e" "RUSTFLAGS=--cap-lints=forbid" "-e" "RUSTDOCFLAGS=--cap-lints=forbid" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:d429b63d4308055ea97f60fb1d3dfca48854a00942f1bd2ad806beaf015945ec" "/opt/rustwide/cargo-home/bin/cargo" "+dec9417b8611e34e787a3e4c37686b5131f9e5c5" "test" "--frozen", kill_on_drop: false }` [INFO] [stdout] 6898f8d389f428ac42121e3819ba6738ad232ff82d1d1734d4c130d114656eef [INFO] running `Command { std: "docker" "start" "-a" "6898f8d389f428ac42121e3819ba6738ad232ff82d1d1734d4c130d114656eef", kill_on_drop: false }` [INFO] [stderr] Finished `test` profile [unoptimized + debuginfo] target(s) in 0.42s [INFO] [stderr] Running unittests src/lib.rs (/opt/rustwide/target/debug/deps/recallbench-e47af41e76ba9146) [INFO] [stdout] [INFO] [stdout] running 127 tests [INFO] [stdout] test config::tests::load_nonexistent_file ... ok [INFO] [stdout] test config::tests::parse_empty_toml ... ok [INFO] [stdout] test config::tests::default_config ... ok [INFO] [stdout] test config::tests::parse_toml ... ok [INFO] [stdout] test datasets::convomem::tests::abstention_category ... ok [INFO] [stdout] test checkpoint::tests::load_nonexistent ... ok [INFO] [stdout] test config::tests::env_overrides ... ok [INFO] [stdout] test datasets::convomem::tests::parse_convomem_format ... ok [INFO] [stdout] test datasets::custom::tests::parse_valid_custom ... ok [INFO] [stdout] test datasets::custom::tests::validate_empty_dataset ... ok [INFO] [stdout] test datasets::custom::tests::validate_missing_fields ... ok [INFO] [stdout] test datasets::download::tests::cache_dir_is_valid ... ok [INFO] [stdout] test datasets::halumem::tests::parse_halumem_jsonl ... ok [INFO] [stdout] test datasets::download::tests::is_cached_nonexistent ... ok [INFO] [stdout] test datasets::longmemeval::tests::dataset_trait ... ok [INFO] [stdout] test datasets::longmemeval::tests::parse_sample_json ... ok [INFO] [stdout] test datasets::mab::tests::split_names ... ok [INFO] [stdout] test datasets::longmemeval::tests::stats ... ok [INFO] [stdout] test datasets::longmemeval::tests::question_types_detected ... ok [INFO] [stdout] test datasets::membench::tests::parse_membench_format ... ok [INFO] [stdout] test datasets::tests::registry_list_sorted ... ok [INFO] [stdout] test datasets::longmemeval::tests::type_distribution ... ok [INFO] [stdout] test datasets::tests::registry_has_all_datasets ... ok [INFO] [stdout] test datasets::locomo::tests::category_mapping ... ok [INFO] [stdout] test datasets::longmemeval::tests::abstention_detection ... ok [INFO] [stdout] test datasets::longmemeval::tests::variant_parsing ... ok [INFO] [stdout] test datasets::longmemeval::tests::answer_formats ... ok [INFO] [stdout] test datasets::mab::tests::parse_json_format ... ok [INFO] [stdout] test errors::tests::display_judge_error ... ok [INFO] [stdout] test datasets::tests::registry_unknown_returns_none ... ok [INFO] [stdout] test errors::tests::display_config_error ... ok [INFO] [stdout] test errors::tests::display_dataset_error ... ok [INFO] [stdout] test datasets::longmemeval::tests::sessions_parsed ... ok [INFO] [stdout] test errors::tests::display_llm_error ... ok [INFO] [stdout] test errors::tests::display_system_error ... ok [INFO] [stdout] test errors::tests::error_is_send_sync ... ok [INFO] [stdout] test errors::tests::from_io_error ... ok [INFO] [stdout] test judge::calibration::tests::calibration_with_mock ... ok [INFO] [stdout] test judge::calibration::tests::parse_calibration_json ... ok [INFO] [stdout] test judge::dual::tests::ambiguous_with_tiebreaker ... ok [INFO] [stdout] test judge::dual::tests::ambiguous_without_tiebreaker ... ok [INFO] [stdout] test judge::dual::tests::clear_no ... ok [INFO] [stdout] test judge::dual::tests::clear_yes ... ok [INFO] [stdout] test judge::prompts::tests::abstention_prompt ... ok [INFO] [stdout] test judge::prompts::tests::default_prompt ... ok [INFO] [stdout] test checkpoint::tests::checkpoint_roundtrip ... ok [INFO] [stdout] test judge::prompts::tests::multi_session_prompt ... ok [INFO] [stdout] test judge::prompts::tests::preference_prompt ... ok [INFO] [stdout] test judge::prompts::tests::prompt_includes_all_parts ... ok [INFO] [stdout] test judge::prompts::tests::temporal_prompt ... ok [INFO] [stdout] test judge::tests::judge_ambiguous_defaults_false ... ok [INFO] [stdout] test judge::tests::judge_answer_no ... ok [INFO] [stdout] test judge::prompts::tests::knowledge_update_prompt ... ok [INFO] [stdout] test judge::tests::judge_answer_yes ... ok [INFO] [stdout] test judge::tests::parse_contains_no ... ok [INFO] [stdout] test judge::tests::parse_contains_yes ... ok [INFO] [stdout] test judge::tests::parse_no_variants ... ok [INFO] [stdout] test judge::tests::parse_yes_variants ... ok [INFO] [stdout] test judge::tests::parse_ambiguous ... ok [INFO] [stdout] test llm::cli::tests::chatgpt_args ... ok [INFO] [stdout] test llm::cli::tests::claude_args ... ok [INFO] [stdout] test llm::cli::tests::cli_client_name ... ok [INFO] [stdout] test llm::codex::tests::client_name ... ok [INFO] [stdout] test llm::tests::registry_defaults ... ok [INFO] [stdout] test llm::tests::resolve_chatgpt_models ... ok [INFO] [stdout] test llm::rate_limit::tests::rate_limited_client_works ... ok [INFO] [stdout] test llm::tests::resolve_claude_models ... ok [INFO] [stdout] test llm::tests::resolve_codex ... ok [INFO] [stdout] test llm::tests::resolve_gemini_models ... ok [INFO] [stdout] test llm::tests::resolve_unknown_defaults_to_claude ... ok [INFO] [stdout] test longevity::tests::generate_questions_correct_count ... ok [INFO] [stdout] test longevity::tests::generate_sessions_correct_count ... ok [INFO] [stdout] test longevity::tests::render_table_works ... ok [INFO] [stdout] test metrics::accuracy::tests::abstention_accuracy ... ok [INFO] [stdout] test metrics::accuracy::tests::all_correct ... ok [INFO] [stdout] test metrics::accuracy::tests::empty_results ... ok [INFO] [stdout] test metrics::accuracy::tests::mixed_results ... ok [INFO] [stdout] test metrics::accuracy::tests::no_abstention_questions ... ok [INFO] [stdout] test metrics::cost::tests::compute_costs ... ok [INFO] [stdout] test metrics::cost::tests::one_million_tokens ... ok [INFO] [stdout] test metrics::cost::tests::empty_results ... ok [INFO] [stdout] test metrics::latency::tests::empty_results ... ok [INFO] [stdout] test report::json::tests::build_report ... ok [INFO] [stdout] test report::markdown::tests::markdown_output ... ok [INFO] [stdout] test metrics::latency::tests::single_result ... ok [INFO] [stdout] test report::csv::tests::csv_output ... ok [INFO] [stdout] test report::failure::tests::mixed_failures ... ok [INFO] [stdout] test report::failure::tests::no_failures ... ok [INFO] [stdout] test report::tests::parse_formats ... ok [INFO] [stdout] test runner::tests::generation_prompt_with_date ... ok [INFO] [stdout] test report::table::tests::render_accuracy ... ok [INFO] [stdout] test report::table::tests::format_numbers ... ok [INFO] [stdout] test resume::tests::append_and_load ... ok [INFO] [stdout] test resume::tests::load_nonexistent ... ok [INFO] [stdout] test runner::tests::evaluate_with_echo_and_mock ... ok [INFO] [stdout] test runner::tests::generation_prompt_without_date ... ok [INFO] [stdout] test sampling::tests::at_least_one_per_type ... ok [INFO] [stdout] test sampling::tests::deterministic_with_same_seed ... ok [INFO] [stdout] test systems::echo::tests::echo_ingest_and_retrieve ... ok [INFO] [stdout] test sampling::tests::different_seed_different_selection ... ok [INFO] [stdout] test sampling::tests::maintains_proportions ... ok [INFO] [stdout] test sampling::tests::returns_all_if_subset_larger ... ok [INFO] [stdout] test sampling::tests::returns_empty_for_empty ... ok [INFO] [stdout] test types::tests::benchmark_question_defaults ... ok [INFO] [stdout] test systems::echo::tests::echo_name_and_version ... ok [INFO] [stdout] test systems::echo::tests::echo_reset ... ok [INFO] [stdout] test systems::echo::tests::echo_respects_token_budget ... ok [INFO] [stdout] test systems::subprocess::tests::parse_config ... ok [INFO] [stdout] test types::tests::benchmark_question_roundtrip ... ok [INFO] [stdout] test types::tests::conversation_session_no_date ... ok [INFO] [stdout] test types::tests::eval_result_roundtrip ... ok [INFO] [stdout] test types::tests::conversation_session_roundtrip ... ok [INFO] [stdout] test types::tests::ingest_stats_default ... ok [INFO] [stdout] test resume::tests::handles_malformed_lines ... ok [INFO] [stdout] test types::tests::turn_roundtrip ... ok [INFO] [stdout] test types::tests::eval_result_jsonl_format ... ok [INFO] [stdout] test types::tests::retrieval_result_roundtrip ... ok [INFO] [stdout] test metrics::latency::tests::multiple_results ... ok [INFO] [stdout] test llm::anthropic::tests::client_name ... ok [INFO] [stdout] test llm::compatible::tests::client_direct_construction ... ok [INFO] [stdout] test llm::openai::tests::client_name ... ok [INFO] [stdout] test llm::gemini::tests::client_name ... ok [INFO] [stdout] test llm::compatible::tests::base_url_trailing_slash_stripped ... ok [INFO] [stdout] test llm::gemini::tests::model_resolution ... ok [INFO] [stdout] test systems::http::tests::parse_config ... ok [INFO] [stdout] test llm::compatible::tests::client_from_config ... ok [INFO] [stdout] test llm::anthropic::tests::model_resolution ... ok [INFO] [stdout] [INFO] [stdout] test result: ok. 127 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.70s [INFO] [stdout] [INFO] [stderr] Running unittests src/main.rs (/opt/rustwide/target/debug/deps/recallbench-e71f413e4fba0699) [INFO] [stdout] [INFO] [stdout] running 127 tests [INFO] [stdout] test config::tests::default_config ... ok [INFO] [stdout] test datasets::custom::tests::parse_valid_custom ... ok [INFO] [stdout] test config::tests::load_nonexistent_file ... ok [INFO] [stdout] test config::tests::parse_toml ... ok [INFO] [stdout] test checkpoint::tests::load_nonexistent ... ok [INFO] [stdout] test datasets::convomem::tests::parse_convomem_format ... ok [INFO] [stdout] test config::tests::parse_empty_toml ... ok [INFO] [stdout] test datasets::convomem::tests::abstention_category ... ok [INFO] [stdout] test datasets::custom::tests::validate_missing_fields ... ok [INFO] [stdout] test datasets::download::tests::cache_dir_is_valid ... ok [INFO] [stdout] test config::tests::env_overrides ... ok [INFO] [stdout] test datasets::custom::tests::validate_empty_dataset ... ok [INFO] [stdout] test datasets::longmemeval::tests::abstention_detection ... ok [INFO] [stdout] test datasets::longmemeval::tests::question_types_detected ... ok [INFO] [stdout] test datasets::halumem::tests::parse_halumem_jsonl ... ok [INFO] [stdout] test datasets::longmemeval::tests::answer_formats ... ok [INFO] [stdout] test checkpoint::tests::checkpoint_roundtrip ... ok [INFO] [stdout] test datasets::membench::tests::parse_membench_format ... ok [INFO] [stdout] test errors::tests::display_config_error ... ok [INFO] [stdout] test datasets::longmemeval::tests::parse_sample_json ... ok [INFO] [stdout] test errors::tests::display_judge_error ... ok [INFO] [stdout] test errors::tests::display_system_error ... ok [INFO] [stdout] test errors::tests::display_llm_error ... ok [INFO] [stdout] test errors::tests::error_is_send_sync ... ok [INFO] [stdout] test errors::tests::from_io_error ... ok [INFO] [stdout] test datasets::longmemeval::tests::sessions_parsed ... ok [INFO] [stdout] test judge::calibration::tests::calibration_with_mock ... ok [INFO] [stdout] test judge::calibration::tests::parse_calibration_json ... ok [INFO] [stdout] test datasets::longmemeval::tests::variant_parsing ... ok [INFO] [stdout] test judge::dual::tests::ambiguous_with_tiebreaker ... ok [INFO] [stdout] test datasets::mab::tests::parse_json_format ... ok [INFO] [stdout] test judge::dual::tests::ambiguous_without_tiebreaker ... ok [INFO] [stdout] test judge::dual::tests::clear_no ... ok [INFO] [stdout] test judge::dual::tests::clear_yes ... ok [INFO] [stdout] test judge::prompts::tests::abstention_prompt ... ok [INFO] [stdout] test judge::prompts::tests::default_prompt ... ok [INFO] [stdout] test judge::prompts::tests::knowledge_update_prompt ... ok [INFO] [stdout] test datasets::longmemeval::tests::dataset_trait ... ok [INFO] [stdout] test judge::prompts::tests::temporal_prompt ... ok [INFO] [stdout] test judge::prompts::tests::multi_session_prompt ... ok [INFO] [stdout] test judge::tests::judge_ambiguous_defaults_false ... ok [INFO] [stdout] test judge::prompts::tests::preference_prompt ... ok [INFO] [stdout] test judge::prompts::tests::prompt_includes_all_parts ... ok [INFO] [stdout] test judge::tests::judge_answer_no ... ok [INFO] [stdout] test judge::tests::parse_contains_no ... ok [INFO] [stdout] test judge::tests::parse_contains_yes ... ok [INFO] [stdout] test judge::tests::judge_answer_yes ... ok [INFO] [stdout] test judge::tests::parse_no_variants ... ok [INFO] [stdout] test judge::tests::parse_yes_variants ... ok [INFO] [stdout] test judge::tests::parse_ambiguous ... ok [INFO] [stdout] test llm::cli::tests::claude_args ... ok [INFO] [stdout] test llm::cli::tests::cli_client_name ... ok [INFO] [stdout] test datasets::longmemeval::tests::type_distribution ... ok [INFO] [stdout] test datasets::mab::tests::split_names ... ok [INFO] [stdout] test datasets::download::tests::is_cached_nonexistent ... ok [INFO] [stdout] test llm::cli::tests::chatgpt_args ... ok [INFO] [stdout] test llm::codex::tests::client_name ... ok [INFO] [stdout] test llm::tests::registry_defaults ... ok [INFO] [stdout] test llm::tests::resolve_chatgpt_models ... ok [INFO] [stdout] test llm::tests::resolve_claude_models ... ok [INFO] [stdout] test llm::tests::resolve_codex ... ok [INFO] [stdout] test llm::tests::resolve_gemini_models ... ok [INFO] [stdout] test llm::rate_limit::tests::rate_limited_client_works ... ok [INFO] [stdout] test llm::tests::resolve_unknown_defaults_to_claude ... ok [INFO] [stdout] test longevity::tests::generate_questions_correct_count ... ok [INFO] [stdout] test longevity::tests::render_table_works ... ok [INFO] [stdout] test metrics::accuracy::tests::abstention_accuracy ... ok [INFO] [stdout] test datasets::tests::registry_has_all_datasets ... ok [INFO] [stdout] test datasets::tests::registry_list_sorted ... ok [INFO] [stdout] test datasets::tests::registry_unknown_returns_none ... ok [INFO] [stdout] test errors::tests::display_dataset_error ... ok [INFO] [stdout] test datasets::locomo::tests::category_mapping ... ok [INFO] [stdout] test longevity::tests::generate_sessions_correct_count ... ok [INFO] [stdout] test datasets::longmemeval::tests::stats ... ok [INFO] [stdout] test metrics::cost::tests::one_million_tokens ... ok [INFO] [stdout] test metrics::latency::tests::multiple_results ... ok [INFO] [stdout] test metrics::latency::tests::single_result ... ok [INFO] [stdout] test report::csv::tests::csv_output ... ok [INFO] [stdout] test report::failure::tests::mixed_failures ... ok [INFO] [stdout] test report::failure::tests::no_failures ... ok [INFO] [stdout] test report::json::tests::build_report ... ok [INFO] [stdout] test report::markdown::tests::markdown_output ... ok [INFO] [stdout] test report::table::tests::format_numbers ... ok [INFO] [stdout] test metrics::latency::tests::empty_results ... ok [INFO] [stdout] test report::table::tests::render_accuracy ... ok [INFO] [stdout] test report::tests::parse_formats ... ok [INFO] [stdout] test resume::tests::append_and_load ... ok [INFO] [stdout] test resume::tests::load_nonexistent ... ok [INFO] [stdout] test runner::tests::evaluate_with_echo_and_mock ... ok [INFO] [stdout] test resume::tests::handles_malformed_lines ... ok [INFO] [stdout] test runner::tests::generation_prompt_with_date ... ok [INFO] [stdout] test runner::tests::generation_prompt_without_date ... ok [INFO] [stdout] test sampling::tests::at_least_one_per_type ... ok [INFO] [stdout] test sampling::tests::deterministic_with_same_seed ... ok [INFO] [stdout] test metrics::cost::tests::empty_results ... ok [INFO] [stdout] test metrics::accuracy::tests::no_abstention_questions ... ok [INFO] [stdout] test metrics::accuracy::tests::empty_results ... ok [INFO] [stdout] test metrics::accuracy::tests::all_correct ... ok [INFO] [stdout] test metrics::accuracy::tests::mixed_results ... ok [INFO] [stdout] test systems::echo::tests::echo_reset ... ok [INFO] [stdout] test systems::echo::tests::echo_respects_token_budget ... ok [INFO] [stdout] test metrics::cost::tests::compute_costs ... ok [INFO] [stdout] test systems::subprocess::tests::parse_config ... ok [INFO] [stdout] test sampling::tests::maintains_proportions ... ok [INFO] [stdout] test systems::echo::tests::echo_name_and_version ... ok [INFO] [stdout] test types::tests::benchmark_question_defaults ... ok [INFO] [stdout] test types::tests::conversation_session_roundtrip ... ok [INFO] [stdout] test types::tests::eval_result_jsonl_format ... ok [INFO] [stdout] test types::tests::eval_result_roundtrip ... ok [INFO] [stdout] test sampling::tests::different_seed_different_selection ... ok [INFO] [stdout] test types::tests::ingest_stats_default ... ok [INFO] [stdout] test types::tests::retrieval_result_roundtrip ... ok [INFO] [stdout] test types::tests::turn_roundtrip ... ok [INFO] [stdout] test types::tests::benchmark_question_roundtrip ... ok [INFO] [stdout] test types::tests::conversation_session_no_date ... ok [INFO] [stdout] test sampling::tests::returns_all_if_subset_larger ... ok [INFO] [stdout] test systems::echo::tests::echo_ingest_and_retrieve ... ok [INFO] [stdout] test sampling::tests::returns_empty_for_empty ... ok [INFO] [stdout] test llm::gemini::tests::model_resolution ... ok [INFO] [stdout] test llm::compatible::tests::base_url_trailing_slash_stripped ... ok [INFO] [stdout] test llm::compatible::tests::client_from_config ... ok [INFO] [stdout] test llm::gemini::tests::client_name ... ok [INFO] [stdout] test llm::compatible::tests::client_direct_construction ... ok [INFO] [stdout] test systems::http::tests::parse_config ... ok [INFO] [stdout] test llm::openai::tests::client_name ... ok [INFO] [stdout] test llm::anthropic::tests::client_name ... ok [INFO] [stdout] test llm::anthropic::tests::model_resolution ... ok [INFO] [stdout] [INFO] [stdout] test result: ok. 127 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1.09s [INFO] [stdout] [INFO] [stderr] Doc-tests recallbench [INFO] [stdout] [INFO] [stdout] running 1 test [INFO] [stdout] test src/lib.rs - (line 10) ... ignored [INFO] [stdout] [INFO] [stdout] test result: ok. 0 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.00s [INFO] [stdout] [INFO] [stdout] all doctests ran in 0.35s; merged doctests compilation took 0.35s [INFO] running `Command { std: "docker" "inspect" "6898f8d389f428ac42121e3819ba6738ad232ff82d1d1734d4c130d114656eef", kill_on_drop: false }` [INFO] running `Command { std: "docker" "rm" "-f" "6898f8d389f428ac42121e3819ba6738ad232ff82d1d1734d4c130d114656eef", kill_on_drop: false }` [INFO] [stdout] 6898f8d389f428ac42121e3819ba6738ad232ff82d1d1734d4c130d114656eef