Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad result detected #114

Closed
DamonsJ opened this issue Jan 18, 2022 · 3 comments
Closed

bad result detected #114

DamonsJ opened this issue Jan 18, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@DamonsJ
Copy link

DamonsJ commented Jan 18, 2022

I got bad result using layout-parser
here is the image I am used:
1

here is the code run in python :

image = cv2.imread("1.png")
# Convert the image from BGR (cv2 default loading style)
# to RGB
image = image[..., ::-1]
origin_image = image.copy()

model = lp.Detectron2LayoutModel('lp://PubLayNet/mask_rcnn_R_50_FPN_3x/config', 
                             extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
                             label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})
# Load the deep layout model from the layoutparser API 
# For all the supported model, please check the Model 
# Zoo Page: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html

layout = model.detect(image)
# print("layout : ", layout)
# Detect the layout of the input image
text_blocks = lp.Layout([b for b in layout if b.type=='Text'])
drawRectangleInImage(origin_image, text_blocks, (36,255,12))

titles_blocks = lp.Layout([b for b in layout if b.type=='Title'])
drawRectangleInImage(origin_image, titles_blocks, (76, 155, 175))

figure_blocks = lp.Layout([b for b in layout if b.type=='Figure'])
drawRectangleInImage(origin_image, figure_blocks, (122, 96, 216))

lists_blocks = lp.Layout([b for b in layout if b.type=='List'])
drawRectangleInImage(origin_image, lists_blocks, (176, 155, 175))

tables_blocks = lp.Layout([b for b in layout if b.type=='Table'])
drawRectangleInImage(origin_image, tables_blocks, (76, 255, 75))

cv2.imshow('image', origin_image)
cv2.waitKey()

here is the result:

截屏2022-01-18 11 45 06

by the way :

there is some warning generated :

/usr/local/lib/python3.9/site-packages/detectron2/structures/image_list.py:99: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
max_size = (max_size + (stride - 1)) // stride * stride
/usr/local/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]

@DamonsJ DamonsJ added the bug Something isn't working label Jan 18, 2022
@lolipopshock
Copy link
Member

Thank you for reporting this -- it can be easily resolved by reconfiguring the models hyperparameters, and one example is: https://github.com/allenai/VILA/blob/96cafe591ae6ee8a70f941a52dd37bbe0a60b243/datasets/s2-vl-utils/vision_model_loader.py#L140 .

@DamonsJ
Copy link
Author

DamonsJ commented Jan 19, 2022

Thank you for reporting this -- it can be easily resolved by reconfiguring the models hyperparameters, and one example is: https://github.com/allenai/VILA/blob/96cafe591ae6ee8a70f941a52dd37bbe0a60b243/datasets/s2-vl-utils/vision_model_loader.py#L140 .

Hi, thanks very much for replying
I just want to recognize text, figure and table from published document.
how should I adjust the parameters?
when I use the extra config in :https://github.com/allenai/VILA/blob/96cafe591ae6ee8a70f941a52dd37bbe0a60b243/datasets/s2-vl-utils/vision_model_loader.py#L140 .

I can recognize text , figure , but math equation can not be recognized.

Thanks!

@lolipopshock
Copy link
Member

There's a separate model Layout-Parser/platform#20 which can be used for detecting equation regions. Also see the code here https://github.com/allenai/VILA/blob/96cafe591ae6ee8a70f941a52dd37bbe0a60b243/datasets/s2-vl-utils/vision_model_loader.py#L150

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants