Danish legislation prescribes surveillance of footpad dermatitis (FPD) at slaughter as an indicator of on-farm broiler welfare. The 3-point scale being used was originally developed in Sweden to score feet from conventional broilers, but the extent and causes of misclassifications have not been investigated, neither in conventional nor organic broilers. Hence, we investigated the performance of the official Danish FPD scoring system in conventional and organic broilers by assessing agreement between official scores from the slaughterhouse and consecutive scoring of the same feet by a reference method. We also investigated the impact of performing an incision of the footpad during scoring. In total, 902 conventional and 897 organic broiler feet (∼100 per flock from 18 flocks) were collected at a large Danish slaughterhouse for the official FPD surveillance system. Laboratory scoring, according to predefined criteria for visual and invasive investigations of the feet derived from the official system, was compared to the official scores assigned at slaughter. Footpad lesions were primarily chronic, representing a wide range of severity. Marked differences in color, shape, and degree of papillary hypertrophy and hyperkeratosis of organic and conventional feet were observed. Low agreement primarily regarding score 2 lesions was observed when comparing official and reference foot scores in conventional (0.31) and organic (0.05) broilers. Variation in agreement when comparing flock scores suggested a non-systematic bias, which might be attributed to differences among official raters. The very low agreement in feet from organic broilers shows that these were more difficult to score than conventional. This might be due to a mismatch between lesion characteristics and scoring criteria, or because the lesions were less severe. Strictly visual examination detected 3 out of 4 score 2 lesions identified by the reference method. This study indicated that a large proportion of severe FPD lesions go unnoticed in the official Danish scoring system. The results further suggested that the complexity and impracticality of the scoring criteria impede uniform scoring among raters and production systems.