OpenCV Cat Face Detection

After trying the built in cat face detection HAAR cascade (you can get the vector xml from github), I decided to try to understand how to build my own cascade for each of my cats.  This is a general guide on the commands needed to go about it.  I’m running OpenCV on an i7 Ubuntu docker container so I do all of this remotely and then have the containerized version do the heavy lifting.

First, I started with my cat Thor because he has a high contrast face.  thor_train

And I used 10k negative samples from an online tut.

opencv_createsamples -img donutface.jpg -bg bg.txt -info info/info.lst -pngoutput info -maxxangle 0.5 -maxyangle 0.5 -maxzangle 0.5 -num 10000

 

This generates a bunch of negative samples with thor superimposed on them that look like this.

1366_0020_0044_0044_0044

The image is tiny but you can just see his face transformed and superimposed on the left of center.

opencv_createsamples -info info/info.lst -num 1950 -w 20 -h 20 -vec positives.vec


 


and finally:

opencv_traincascade -data data -vec positives.vec -bg bg.txt -numPos 1800 -numNeg 900 -numStages 10 -w 20 -h 20

 

Then I just whipped up a little boilerplate python to use OpenCV locally to find him. Doesn't find him here:

neg arnold

but does find him in some random pics of him with his face showing. Curiously, it didn't find him in an almost identical version of one of the picture.  I can loosen the rigidity of the match but then it finds multiple false positives.

Advertisements

Preprocessing US Data Screens with OpenCV

After the previous success (partial) getting Tesseract to read some US data, I looked into preprocessing and did some fundamental level reading on image preprocessing required to optimize OCR.  The recommendations for preprocessing mostly tend toward OpenCV but there are options including my favorite batch manipulator ImageMagick.

I’ve read that Otsu’s thresholding method is pretty good at preprocessing so I’m starting with the basics on OpenCV.

So with some basic preprocessing and binarization, we end up with the following.  asshat.png

This is problematic for a few reasons:

  1.  I’m not using the original US images.  I’ll run one of those through below.
  2. The watermark interferes.
  3. The text is “skinny” which is a common occurence in thresholding.  Any OCR or computer vision will not like missing pixels.

On the upside, I am moving much closer to my target of black text on white background which is what all computer vision likes.  Nice, crisp gradients.  I’m using new test data, and I can’t remember what kind of US it came from but I think the carotid is from GE.

The OB/GYN is more difficult.  We’ll see how it does.

Original image

 

Threshold Preprocessing

Data | Description |
Gravda 0 Para 0) = ABSOSsEctopic: «=D Height 77cm = Weight 77k wp [07/03/2018
Indication PELVIC PAIN

2D Mode

Uterus
Endometrium

24.8mm Volume 13.3.ce
23.2 mm Volume 10.1 cc

Rt Follicle
Lt Follicle

[ cm Worksheet | Send Report[ Return |

Hard to know what to make of it since it missed a lot of text.  I think there is some substantial erosion of the text that is occuring during binarization and thresholding.

Extracting Text from Journal Articles: Postoperative Imaging of Sarcomas

doi.org/10.2214/AJR.18.19954

I’m mixing it up a little bit with this post.  Yesterday I was able to successfully extract the text using pyPDF2 but the text was jumbled.  It cleaned up okay using the .replace() function but the text was still out of order.

After looking at GhostScript, I figured I might give my go-to PAID pdf app a shot and it didn’t fail me.  I used NitroPro to export text and it was largely unpurturbed from the original format.

Clipboard.jpg

There are still a few formatting issues which yesterday’s python can easily clean up.  No need for imported modules.

You may ask why go through the trouble of extracting the text?  I often summarize my learning points on this site so I can easily reference them and include tables to add to my reference page.  If you copy from the web page, you end up with this text appended.

Read More: https://www.ajronline.org/doi/full/10.2214/AJR.18.19954?action=autoLogin&sso_token=39F…0D2334

Essentially, you would need to delete the JavaScript that adds that link and reload the page.  This isn’t really practical just to read one article.
So with a little bit of python, it goes to:
Capture
Still not perfect but pretty darn good for copypasta into a summary.
Script Below:

with open('C:\\Users\\g\\Downloads\\ajr.txt', 'r', encoding='utf-8') as file_obj:
    in_text = file_obj.read()
    with open('C:\\Users\\g\\Downloads\\ajr_clean.txt', 'w+', encoding='utf-8') as outfile:
        in_text.replace('\n',' ')
        in_text = (in_text.replace('- ',''))
        outfile.write(in_text)

  • Multiple studies in the literature support a correlation between local recurrence of soft tissue sarcoma and high-risk factors such as intermediateor high-grade tumor, tumor larger than 5 cm, deep location, multifocally positive surgical margins, and absence of wide resection [1, 12–14]. Mortality from soft tissue sarcoma has been associated with local recurrence, tumor larger than 10 cm, deep location, high grade, and positive surgical margins
  • While CT scan is often preferred due to its greater sensitivity in detecting small lung nodules, it is unknown whether this provides benefit over CXR alone. Both modalities are considered highly appropriate for this purpose by the American College of Radiology (ACR) [74].
    • This is questionably accurate.  CT is recommended by the ACR.  While CXR is appropriate and would be considered reasonable, CT  is preferred
    • Although chest radiographs were historically used [6], unenhanced chest CT has become the recommended modality. Surveillance intervals vary from 3 to 6 months in the first several years to annually up to 10 years (Tables 1 and 2).
  • A retrospective review performed in the United Kingdom found that CXR alone detected two-thirds of pulmonary metastases in patients with soft tissue sarcoma; when compared with CT as the “gold standard,” the sensitivity, specificity, positive predictive value, and negative predictive value of CXR were 60.8, 99.6, 93.3, and 96.7 percent, respectively [29]. The use of CXR only to stage the lungs would have missed one-third of all patients with lung metastases, but because of the infrequency of lung metastases overall (96 of 1170 patients), the initial staging would have been inaccurate in only 3.1 percent of cases.
    • Commentary:  There is a CME question on this one and I believe the question is either vague or their answer is just wrong.  First, it isn’t in the article.  A review of primary literature sources showed a rate far lower than 1/3 and the summary above
  • Radiation-induced sarcomas have different histologic composition than the patients’ original treated tumors; within the field of treatment, therefore, MRI characteristics widely differ [14]. High-grade undifferentiated pleomorphic sarcoma is the most common postradiation sarcoma of the soft tissues, representing two-thirds of radiation-induced sarcoma. Extraskeletal osteosarcoma and fibrosarcoma follow, representing 13% and 11% of cases [14]. Conversely, osteosarcoma is by far the most common radiation-induced malignancy affecting bone, accounting for approximately 60% of cases [25]. Undifferentiated pleomorphic sarcoma is a distant second, accounting for approximately 20% of cases [14].

Postoperative Imaging of the Ankle

Achilles Tendon Repair

  • In acute injury with functionally limiting partial-thickness tear or complete tear with less than 3 cm tendon gap, a direct end-to- end anastomotic repair is preferable [9].
    • In this repair the proximal and distal tendon stumps are mobilized and directly anastomosed
  • MRI or ultrasound imaging performed within the first 2 months after surgery may show a residual tendon gap at the site of anastomosis related to postsurgical granulation. This gap should fill in with T2-intermediate fibrous material by 14 weeks [12]. Hetero – geneous intersubstance T2 signal intensity persists as long as 12 months postoperative- ly, after which the tendon assumes a round- ed morphologic appearance as much as 4–6 times the diameter of the contralateral unaf- fected tendon [13, 14] (Fig. 1).

Code to extract text from the PDF, wordpress screwed the formatting, as usual.

import PyPDF2
import dateutil.parser as dparser
from dateutil.parser import parse
import re

file_obj = open('article.pdf', 'rb') # print(file_obj)
pdfReader = PyPDF2.PdfFileReader(file_obj) # creating a pdf reader object
pagecount = pdfReader.numPages
with open('outfile.txt', 'wb+') as outfile:
for pagenbr in range ( 0 ,pagecount - 1):
pagetxt = (pdfReader.getPage(pagenbr).extractText().replace('\n',' '))
pagetxt = (pagetxt.replace('-','').encode("utf-8"))
outfile.write(pagetxt)

file_obj.close()

Micropython: Setting Static IP

I’ve heard a lot about micropython speeding up dev cycle since there is no compiling and uploading and then observing behavior to see if it works.  I found the documentation frustratingly incomplete which is surprising for python.  Even more surprising is that the tutorials for getting arduino up and running are more comprehensive than for micropython, but I figure it’s the size of the communities and each micropython implementation has it’s own quirks whereas arduino has made a concerted effort to standardize across platforms.

So, here is the boot.py file you need to establish your connection with a static IP for webREPL usage.

You should use a locally hosted webREPL copy so you can edit the HTML to make the default address the chip you’re working with.

# This file is executed on every boot (including wake-boot from deepsleep)
#import esp
#esp.osdebug(None)
import gc
import webrepl
webrepl.start()
gc.collect()

ssid = "XXXXXX"
pwd = "XXXXXX"
staticIP ='XXXXXX'
subnet = 'XXXXXX'
gateway = 'XXXXXX'
dns = 'XXXXXX'

def do_connect():
    import network
    sta_if = network.WLAN(network.STA_IF)
    sta_if.active(True)

    # Must be passed as a tuple, hence the double parens
    sta_if.ifconfig((staticIP, subnet, gateway, dns))

    sta_if.connect(ssid,pwd)
    print('network config:', sta_if.ifconfig())

do_connect()

 

Tasmota DST/STD timezone setting

These commands should be entered at the console to configure time.

Set timezone to configurable params


10:55:47 CMD: timezone 99
10:55:47 MQT: stat/sonoff/relay4/RESULT = {"Timezone":99}

Set DST rules


17:58:38 CMD: timedst 1, 2, 3, 1, 2, -240
17:58:38 MQT: stat/sonoff/relay4/RESULT = {"TimeDst":{"Hemisphere":1,"Week":2,"Month":3,"Day":1,"Hour":2,"Offset":-240"}}

Time Standard Set


10:00:25 CMD: timestd 1, 1, 11, 1, 2, -300
10:00:25 MQT: stat/sonoff/relay4/RESULT = {"TimeStd":{"Hemisphere":1,"Week":1,"Month":11,"Day":1,"Hour":2,"Offset":-300"}}

Latitude query


10:02:20 CMD: latitude
10:02:20 MQT: stat/sonoff/relay4/RESULT = {"Latitude":"48.858360"}

Latitude Set


10:02:27 CMD: latitude 36.146729
10:02:27 MQT: stat/sonoff/relay4/RESULT = {"Latitude":"36.146729"}

Longitude query


10:02:34 CMD: longitude
10:02:34 MQT: stat/sonoff/relay4/RESULT = {"Longitude":"2.294442"}

Longitude Set


10:02:56 CMD: longitude -79.801646
10:02:56 MQT: stat/sonoff/relay4/RESULT = {"Longitude":"-79.801646"}